Chomsky Hierarchy

The Chomsky hierarchy is a classification of formal grammars, and the languages they generate, into a sequence of nested levels of increasing expressive power. It grew out of Noam Chomsky’s 1956 paper “Three Models for the Description of Language,” in which he compared grammars of different strength and showed that a finite-state model is too weak to capture the structure of natural language, motivating the more powerful levels above it.

As usually presented, the hierarchy has four types. Type 3, the regular grammars, are the weakest and are recognized by finite automata. Type 2, the context-free grammars, can describe nested and recursive structure and are recognized by pushdown automata. Type 1, the context-sensitive grammars, are more powerful still, and Type 0, the recursively enumerable grammars, correspond to full Turing machines. Each level strictly contains the one below it.

The reason this hierarchy matters to programming is that it pairs each kind of grammar with a kind of machine. The Stanford Encyclopedia’s discussion of computation describes the same span of models, from finite automata up to Turing machines, that the grammar types map onto. Knowing which level a language belongs to tells you what sort of parser you need to recognize it.

In practice, the lexical structure of a programming language, its identifiers and numbers and operators, is regular, so it can be tokenized by a finite automaton. The nested syntax of expressions, statements, and blocks is context-free, so it is parsed with techniques built for context-free grammars. This two-level split, regular for tokens and context-free for structure, is a direct application of Chomsky’s hierarchy and is the standard architecture of compilers.