Markov analyzes Eugene Onegin and builds the first Markov chain

On January 23, 1913, the Russian mathematician Andrey Markov presented to the Imperial Academy of Sciences in St. Petersburg a study with an unglamorous title: “An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains.” He had taken the first 20,000 letters of Alexander Pushkin’s novel in verse, stripped out punctuation and spaces, and classified each letter as a vowel or a consonant.

Markov then counted not just how many vowels and consonants there were, but how often a vowel was followed by another vowel, a vowel by a consonant, and so on. By measuring these transition frequencies he showed that a sequence of dependent events - where each outcome’s probability depends on the one before it - could be analyzed rigorously. This was the first real-world application of what are now called Markov chains, a tool he had introduced in more abstract form a few years earlier to extend the law of large numbers to dependent variables.

The paper was little cited in Markov’s lifetime and was not translated into English until 2006. Yet the idea proved foundational. Markov chains underpin Claude Shannon’s information theory, modern speech recognition, PageRank, reinforcement learning, and the n-gram language models that were direct ancestors of today’s large language models - all of which model sequences as chains of probability-linked states.