Introduction
A hidden Markov model (HMM) models a sequence where the observed data is generated by hidden states that evolve over time.
Examples:
- Speech recognition.
- Part-of-speech tagging.
- Biological sequence analysis.
- User behavior states.
- Machine health monitoring.
The key assumption is Markovian: the next hidden state depends on the current hidden state, not the full history.
Components
An HMM has:
- Hidden states: the unobserved state at each time step.
- Observations: the visible data emitted at each time step.
- Initial distribution: probability of starting in each state.
- Transition matrix: probability of moving from one hidden state to another.
- Emission model: probability of observing data given a hidden state.
Notation:
- Hidden state: $z_t$
- Observation: $x_t$
- Transition: $P(z_t \mid z_{t-1})$
- Emission: $P(x_t \mid z_t)$
Three Core Questions
Evaluation
Given model parameters and observations, what is the probability of the observed sequence?
This is solved with the forward algorithm.
Decoding
Given model parameters and observations, what is the most likely hidden state sequence?
This is solved with the Viterbi algorithm.
Learning
Given observations but unknown parameters, how do we estimate transition and emission probabilities?
This is solved with Baum-Welch, an expectation-maximization algorithm for HMMs.
Forward Algorithm
The forward algorithm computes the probability of observations up to time $t$ while ending in each hidden state.
It avoids enumerating every possible hidden state sequence, which would be exponential in sequence length.
At each step:
- Combine previous state probabilities with transition probabilities.
- Multiply by the emission probability of the current observation.
- Continue forward through the sequence.
Viterbi Algorithm
The Viterbi algorithm finds the most likely hidden state path.
It is similar to the forward algorithm, but instead of summing over possible previous states, it takes the maximum and stores backpointers.
Use Viterbi when you need a single best explanation of the observed sequence.
Baum-Welch
Baum-Welch estimates HMM parameters when the hidden states are unknown.
It alternates:
- E-step: estimate expected state occupancies and transitions using current parameters.
- M-step: update transition and emission probabilities using those expectations.
Like other EM methods, it can converge to local optima, so initialization matters.
Practical Considerations
HMMs are useful when:
- The process is sequential.
- Hidden states are meaningful.
- State transitions are relatively simple.
- The Markov assumption is reasonable.
They are less suitable when long-range context dominates. In those cases, recurrent networks, transformers, or state-space models may be better.
Closing
Hidden Markov models are a clean framework for sequence data with latent states. Their main value is the combination of interpretable structure and efficient dynamic programming algorithms.