A hidden Markov model (HMM) is a statistical model of a system that moves through a chain of hidden states, where each state emits an observable output. You see the outputs but not the states, and the model lets you infer the most likely sequence of hidden states that produced what you observed. The “Markov” part means the next state depends only on the current state, an idea that traces back to Andrey Markov’s 1913 analysis of letter sequences in Pushkin’s Eugene Onegin.
In speech recognition, the hidden states are the words or sounds a person intended, and the observations are the acoustic signal. The recognizer’s job is to recover the intended words from the noisy audio. Frederick Jelinek’s IBM group, in his words from his 2009 ACL Lifetime Achievement account “The Dawn of Statistical ASR and MT,” cast this as a noisy-channel problem and used HMMs to estimate the probabilities, displacing the rule-based phonetic systems that came before.
The same machinery carried into text. HMMs became a workhorse for tasks such as part-of-speech tagging, where the hidden states are grammatical categories and the observations are words, and for early statistical alignment in machine translation. Efficient algorithms - the Viterbi algorithm to find the best state sequence and the Baum-Welch procedure to train the probabilities from data - made these models practical on the hardware of the 1970s and 1980s.
Why business readers should care: HMMs were the first widely deployed example of a now-standard pattern - treating language as a probability problem and learning the numbers from data rather than coding rules by hand - which is the conceptual ancestor of every modern language model.