Markup Language

A markup language is a system for adding annotations to a text so that the annotations are distinguishable from the text itself. The annotations, usually called tags, tell a program how to treat the words around them: where a paragraph begins, which words are a heading, where a table cell ends. The text plus its tags travels together as plain characters, which is part of why markup has been so durable and portable across machines.

The term comes from publishing, where an editor would “mark up” a manuscript with handwritten instructions for the typesetter. Early computer markup carried the same idea: codes embedded in a stream of text told a printer or formatter what to do. Over time the field split into two broad styles. Procedural markup says what action to perform, such as “skip a line and indent.” Descriptive markup says what a thing is, such as “this is a list item,” and leaves the formatting decision to the software that renders it.

The W3C’s Extensible Markup Language (XML) 1.0 specification gives a precise account of what markup is in the descriptive tradition. It states that “Markup encodes a description of the document’s storage layout and logical structure,” and it enumerates the exact constructs that count as markup: “start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations, and any white space that is at the top level of the document entity.” Everything that is not markup, in this model, is the document’s character data.

This descriptive style traces back to SGML, the Standard Generalized Markup Language, which established the idea of tags drawn from a declared vocabulary. HTML and XML are both descendants of that lineage, and the World Wide Web is built almost entirely on descriptive markup languages. Even a deliberately lightweight format like Markdown is a markup language: it uses a small set of plain-text conventions to mark headings, emphasis, and links.

Markup languages matter because they separate content from the program that consumes it. A document marked up descriptively can be displayed on a screen, printed, read aloud by assistive software, or indexed by a search engine, all from the same source. That neutrality, where the tags say what the content is rather than locking it to one presentation, is the central insight that the markup tradition contributed to computing.

Sources

Last verified June 8, 2026