GenCode The first well-known public presentation of markup languages in computer text processing was made by
William W. Tunnicliffe at a conference in 1967, although he preferred to call it
generic coding. It can be seen as a response to the emergence of processing programs such as
RUNOFF that each used their own control notation, often specific to the target typesetting device. In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry.
Book designer Stanley Rice published speculation along similar lines in 1970.
Brian Reid, in his 1980 dissertation at
Carnegie Mellon University, developed a theory and working implementation of descriptive markup in actual use. However,
IBM researcher
Charles Goldfarb is more commonly considered the inventor of markup languages. Goldfarb developed the basic idea while working on a primitive
document management system intended for law firms in 1969, and helped invent IBM's
Generalized Markup Language (GML) later that same year. GML was first publicly disclosed in 1973. In 1975, Goldfarb moved from
Cambridge, Massachusetts to
Silicon Valley and became a product planner at the
IBM Almaden Research Center. There, he convinced IBM's executives to deploy GML commercially in 1978 as part of IBM's
Document Composition Facility product, and it was widely used in business within a few years.
Standard Generalized Markup Language (SGML), the first standard descriptive markup language, was based on both GML and GenCode. It was the result of an
International Organization for Standardization (ISO) committee that was first chaired by Tunnicliffe, and which Goldfarb also worked on beginning in 1974. The availability of WYSIWYG publishing software supplanted much use of these languages among casual users, though professional publishing work still uses markup to specify the non-visual structure of texts, and WYSIWYG editors now usually save documents in a markup-language-based format.
TeX Another major publishing standard is TeX, created and refined by
Donald Knuth in the 1970s and 1980s. TeX concentrated on the detailed layout of text and font descriptions to typeset mathematical books. This required Knuth to spend considerable time investigating the art of typesetting. TeX is mainly used in academia, where it is a
de facto standard in many scientific disciplines. A TeX macro package known as LaTeX provides a descriptive markup system on top of TeX, and is widely used both among the scientific community and the publishing industry.
Scribe, GML, and SGML The first language to make a clear distinction between structure and presentation was Scribe, developed by Brian Reid and described in his doctoral thesis in 1980. Scribe was revolutionary in a number of ways, introducing the idea of styles separated from the marked-up document, and a
grammar that controlled the usage of descriptive elements. Scribe influenced the development of GML and later SGML, and is a direct ancestor to HTML and LaTeX. In the early 1980s, the idea that markup should focus on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James A. Marke were also key members of the SGML committee. SGML specifies a
syntax for including the markup in documents, as well as one for separately describing what tags are allowed, and where (the
document type definition (DTD), later known as a
schema). This allows authors to create and use any markup they want, selecting tags that make the most sense to them and are named in their own
natural languages, while also allowing automated verification. Thus, SGML is properly a
metalanguage, and many markup languages are derived from it. From the late 1980s onward, most substantial new markup languages have been based on SGML, including the
Text Encoding Initiative (TEI) guidelines and
DocBook. SGML was promulgated as the ISO 8879 standard in 1986. SGML found wide acceptance and use in fields with very large-scale
documentation requirements. However, many found it cumbersome and difficult to learn—a side effect of its design attempting to do too much and being too flexible. For example, SGML made end tags (or start tags, or both) optional in certain contexts, because its developers thought markup would be done manually by overworked support staff who would appreciate saving keystrokes.
HTML In 1989, computer scientist
Tim Berners-Lee wrote a memo proposing an
Internet-based
hypertext system, then specified HTML and wrote the browser and server software in late 1990. The first publicly available description of HTML was a document called "HTML Tags", first mentioned on the Internet by Berners-Lee in late 1991. It describes 18 elements comprising the initial, relatively simple design of HTML. Except for the
hyperlink tag, these were strongly influenced by
SGMLguid, an in-house SGML-based documentation format at
CERN, and very similar to the sample schema in the SGML standard. Eleven of these elements still exist in HTML 4. Berners-Lee considered HTML an SGML application. The
Internet Engineering Task Force (IETF) formally defined it as such with the mid-1993 publication of the first proposal for an HTML
specification: "Hypertext Markup Language (HTML)" by Berners-Lee and
Dan Connolly, which included an SGML DTD to define the grammar. Many of the HTML text elements are found in the 1988 ISO technical report
TR 9537 Techniques for using SGML, which in turn covers the features of early text formatting languages, such as that used by the
RUNOFF command developed in the early 1960s for the
Compatible Time-Sharing System operating system. These formatting commands were derived from those used by typesetters to manually format documents. Steven DeRose argues that HTML's use of descriptive markup (and the influence of SGML in particular) was a major factor in the success of the Web, because of the flexibility and
extensibility that it enabled. HTML became the main markup language for creating web pages and other information that can be displayed in a web browser and is likely the most used markup language in the world in the 21st century.
XML XML (Extensible Markup Language) is a widely used meta markup language. It was developed by the
World Wide Web Consortium (W3C) in a committee created and chaired by
Jon Bosak. The main purpose of XML was to simplify SGML by focusing on a particular use case—documents on the Internet. XML remains a metalanguage like SGML, allowing users to create any tags needed (hence
extensible) and then describing those tags and their permitted uses. XML adoption was hastened by the fact that every XML document can be written so that it is also an SGML document, allowing existing SGML users and software to switch to XML fairly easily. At the same time, XML eliminates many complex features of SGML to simplify implementation environments such as documents and publications. It appears to balance simplicity and flexibility, as well as support very robust schema definitions and validation tools, and was rapidly adopted for many uses. XML is now widely used for
communicating data between applications,
serializing program data, for hardware communication protocols,
vector graphics, and other uses besides documents.
XHTML From January 2000 until
HTML 5 was released, all
W3C recommendations for HTML were based on XML, using
XHTML (Extensible HyperText Markup Language). The language specification requires that XHTML documents be
well-formed XML documents. This allows for more rigorous and robust documents, by avoiding many syntax errors which historically led to unwanted browser behavior, while still using document components familiar to HTML users. One of the most noticeable differences between HTML and XHTML is the latter's rule that
all tags must be closed: empty HTML tags such as must either be
closed with a regular end-tag, or replaced by a special form: (the space before the slash on the end tag is optional but frequently used, because it enables some pre-XML web browsers and SGML parsers to accept the tag). Another difference is that all
attribute values in tags must be quoted. Both these differences are commonly criticized as verbose but also praised because they make it far easier to detect, localize, and repair errors. Finally, all tag and attribute names within the XHTML namespace must be lowercase to be valid. HTML, on the other hand, was case-insensitive.
Other XML-based applications Many XML-based applications exist, including the
Resource Description Framework as
RDF/XML,
XForms,
DocBook,
SOAP, and the
Web Ontology Language (OWL). For a partial list of these, see
list of XML markup languages. ==Features==