The TeX software incorporates several aspects that were not available, or were of lower quality, in other typesetting programs at the time when TeX was released. Some of the innovations are based on interesting algorithms, and have led to several theses for Knuth's students. While some of these discoveries have now been incorporated into other typesetting programs, others, such as the rules for mathematical spacing, are still unique.
Mathematical spacing font Since the primary goal of the TeX language is high-quality typesetting for publishers of books, Knuth gave a lot of attention to the spacing rules for mathematical formulae. He took three bodies of work that he considered to be standards of excellence for mathematical typography: the books typeset by the
Addison-Wesley Publishing house (the publisher of
The Art of Computer Programming) under the supervision of Hans Wolf; editions of the mathematical journal
Acta Mathematica from around 1910; and a copy of
Indagationes Mathematicae, a
Dutch mathematics journal. Knuth looked closely at these printed papers to sort out a set of rules for spacing. The typesetting of math in TeX is not without criticism, particularly with respect to technical details of the font metrics, which were designed in an era when significant attention was paid to storage requirements. This resulted in some "hacks" overloading some fields, which in turn required other "hacks". On an aesthetics level, the rendering of radicals has also been criticized. The
OpenType math font specification largely borrows from TeX, but has some new features/enhancements.
Hyphenation and justification In comparison with manual typesetting, the problem of
justification is easy to solve with a digital system such as TeX, which, provided that good points for line breaking have been defined, can automatically spread the spaces between words to fill in the line. The problem is thus to find the set of breakpoints that will give the most visually pleasing result. Many line-breaking algorithms use a
greedy algorithm, where the breakpoints for each line are determined one after the other, and no breakpoint is changed after it has been chosen. Such a system is not able to define a breakpoint depending on the effect that it will have on the following lines. In comparison, TeX utilizes a globally optimizing
line-breaking algorithm developed by Donald Knuth and Michael Plass that considers all the possible breakpoints in a paragraph. Formally, the algorithm defines a value called
badness associated with each possible line break; the badness is increased if the spaces on the line must stretch or shrink too much to make the line the correct width. Penalties are added if a breakpoint is particularly undesirable: for example, if a word must be
hyphenated, if two lines in a row are hyphenated, or if a very loose line is immediately followed by a very tight line. The algorithm will then find the breakpoints that will minimize the sum of squares of the badness (including penalties) of the resulting lines. If the paragraph contains n possible breakpoints, the number of situations that must be evaluated naively is 2^n. However, by using the method of
dynamic programming, the complexity of the algorithm can be brought down to O(n^2) (see
Big O notation). Further simplifications (for example, not testing extremely unlikely breakpoints such as a hyphenation in the first word of a paragraph, or very overfull lines) lead to an efficient algorithm whose running time is O(n w), where w is the width of a line. A similar algorithm is used to determine the best way to break paragraphs across two pages, in order to avoid
widows or
orphans (lines that appear alone on a page while the rest of the paragraph is on the following or preceding page). However, in general, a thesis by Michael Plass shows how the page-breaking problem can be
NP-complete because of the added complication of placing figures. TeX's line-breaking algorithm has been adopted by several other programs, such as
Adobe InDesign (a
desktop publishing application) and the
GNU fmt Unix command line utility. If no suitable line break can be found for a line, the system will try to hyphenate a word. The original version of TeX used a hyphenation algorithm based on a set of rules for the removal of prefixes and suffixes of words, and for deciding if it should insert a break between the two consonants in a pattern of the form
vowel–
consonant–
consonant–
vowel (which is possible most of the time). TeX82 introduced a new hyphenation algorithm, designed by
Frank Liang in 1983, to assign priorities to breakpoints in letter groups. A list of hyphenation patterns is first generated automatically from a corpus of hyphenated words (a list of 50,000 words). If TeX must find the acceptable hyphenation positions in the word
encyclopedia, for example, it will consider all the subwords of the extended word
.encyclopedia., where
. is a special marker to indicate the beginning or end of the word. The list of subwords includes all the subwords of length 1 (
.,
e,
n,
c,
y, etc.), of length 2 (
.e,
en,
nc, etc.), etc., up to the subword of length 14, which is the word itself, including the markers. TeX will then look into its list of hyphenation patterns, and find subwords for which it has calculated the desirability of hyphenation at each position. In the case of our word, 11 such patterns can be matched, namely 1c4l4, 1cy, 1d4i3a, 4edi, e3dia, 2i1a, ope5d, 2p2ed, 3pedi, pedia4, y1c. For each position in the word, TeX will calculate the
maximum value obtained among all matching patterns, yielding en1cy1c4l4o3p4e5d4i3a4. Finally, the acceptable positions are those indicated by an
odd number, yielding the acceptable hyphenations
en-cy-clo-pe-di-a. This system based on subwords allows the definition of very general patterns (such as 2i1a), with low indicative numbers (either odd or even), which can then be superseded by more specific patterns (such as 1d4i3a) if necessary. These patterns find about 90% of the hyphens in the original dictionary; more importantly, they do not insert any spurious hyphen. In addition, a list of exceptions (words for which the patterns do not predict the correct hyphenation) are included with the Plain TeX format; additional ones can be specified by the user.
Metafont Metafont, not strictly part of TeX, is a font description system which allows the designer to describe characters algorithmically. It uses
Bézier curves in a fairly standard way to generate the actual characters to be displayed, but Knuth devotes substantial attention to the
rasterizing problem on
bitmapped displays. Another thesis, by
John Hobby, further explores this problem of digitizing "brush trajectories". This term derives from the fact that Metafont describes characters as having been drawn by abstract brushes (and erasers). It is commonly believed that TeX is based on bitmap fonts but, in fact, these programs "know" nothing about the fonts that they are using other than their dimensions. It is the responsibility of the device driver to appropriately handle fonts of other types, including PostScript Type 1 and TrueType. Computer Modern (commonly known as "the TeX font") is freely available in Type 1 format, as are the AMS math fonts. Users of TeX systems that output directly to PDF, such as pdfTeX, XeTeX, or LuaTeX, generally never use Metafont output at all.
Macro language TeX documents are written and programmed using an unusual macro language. Broadly speaking, the running of this macro language involves expansion and execution stages which do not interact directly. Expansion includes both literal expansion of macro definitions as well as conditional branching, and execution involves such tasks as setting variables/registers and the actual typesetting process of adding glyphs to boxes. The definition of a macro not only includes a list of commands but also the syntax of the call. It differs with most widely used
lexical preprocessors like
M4, in that the body of a macro gets tokenized at definition time. The TeX macro language has been used to write larger document production systems, most notably including LaTeX and ConTeXt. == Operation ==