XML, the Extensible Markup Language, is a text format for representing structured information. The W3C Recommendation describes XML as “a subset of SGML that is completely described in this document,” whose goal “is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.” Where HTML offers one fixed vocabulary of elements, XML lets each application define its own element and attribute names, so the same simple, bracketed syntax can carry configuration files, web service messages, office documents, and arbitrary data records.
XML was developed under the W3C beginning in 1996 by a working group that included Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen among its editors. The aim was to keep SGML’s power to define custom markup while removing the parts of SGML that made it hard to implement, so that lightweight, conforming parsers could be written quickly and consistently. The first edition, “Extensible Markup Language (XML) 1.0,” was published as a W3C Recommendation on 10 February 1998. The current Fifth Edition, dated 26 November 2008, restates the same goals and abstract.
The specification’s most important contribution is a precise, two-level notion of conformance. A document is well-formed if it obeys XML’s syntactic rules: a single root element, properly nested and closed tags, quoted attribute values, and correct character usage. A well-formed document may additionally be valid if it conforms to a declared grammar such as a Document Type Definition. Well-formedness is mandatory and can be checked by any XML parser without external information, which is what made XML parsers small, interoperable, and ubiquitous.
Because XML separates a strict, simple syntax from any particular vocabulary, it became the substrate for an entire family of standards. Transformation and addressing are handled by XSLT and XPath; richer schema languages such as XML Schema succeeded DTDs for describing and typing document structure; and namespaces let documents combine vocabularies from different sources without name collisions. Higher-level formats built directly on XML include SOAP for web service messaging, DocBook for technical documentation, and Office Open XML for office documents.
XML’s influence is hard to overstate: for roughly a decade it was the default format for data interchange, configuration, and document storage across nearly every platform. Even as lighter formats such as JSON took over many data-interchange roles, XML remains central wherever validated structure, mixed content, rich schema typing, or document-oriented markup matter, and it remains the direct, standardized realization of SGML’s vision for the web.