Regular Expressions

A regular expression is a short pattern that describes a set of strings: a way to say “any line containing this shape of text” rather than one exact string. The idea began in mathematics. The notation grew out of Stephen Kleene’s 1951 study of events in nerve nets and finite automata, which gave a formal account of the patterns a simple machine can recognize.

Ken Thompson turned this theory into a working tool. In his 1968 Communications of the ACM paper “Regular Expression Search Algorithm,” he describes a method that compiles a regular expression into machine instructions that then scan text and signal each time the pattern matches. This construction, still taught today, is what made fast regular expression search practical on real computers.

Inside Unix, regular expressions became part of the everyday toolkit. The Version 7 Unix manual page for grep explains that its patterns are “limited regular expressions in the style of ed,” while its companion egrep accepts “full regular expressions” with operators for alternation, repetition, and grouping. The text editors ed, vi, and later emacs used the same notation for search and replace.

Because so many tools shared one pattern language, learning regular expressions once paid off everywhere on the system. That common notation, theory from 1951 made usable by Thompson and spread through the Unix utilities, remains a standard feature of programming languages and command-line tools to this day.

Sources

Related