In 1948 Claude Shannon published “A Mathematical Theory of Communication” in the Bell System Technical Journal (volume 27, pages 379 to 423 and 623 to 656). It is one of the most influential scientific papers of the twentieth century and the founding document of information theory. The version cited here is the corrected reprint hosted by the Harvard mathematics department.
Shannon’s key abstraction was to separate the engineering problem of communication from the meaning of any particular message. As he put it, the semantic aspects of communication are irrelevant to the engineering problem; what matters is that the actual message is one selected from a set of possible messages. By measuring information in terms of how much uncertainty a message removes, he could treat any source, text, speech, or images, in the same mathematical terms.
The paper introduced the bit as the basic unit of information, noting that the name was suggested by J. W. Tukey, and it defined entropy as a measure of the average information produced by a source. From these foundations Shannon derived limits on how far data can be compressed and how fast information can be sent through a noisy channel without error, results that govern the design of every modern communication and storage system.
Information theory underpins much of the technology that artificial intelligence depends on, from data compression and error correction to the loss functions and probability models used to train machine learning systems. Shannon’s measure of information remains a basic tool wherever uncertainty must be quantified.