A
hyphenation algorithm is a set of rules, especially one codified for implementation in a computer program, that decides at which points a word can be broken over two lines with a hyphen. For example, a hyphenation algorithm might decide that
impeachment can be broken as
impeach-ment or
im-peachment but not
impe-achment. One of the reasons for the complexity of the rules of word-breaking is that different dialects of English tend to differ on hyphenation:
American English tends to work on sound, but
British English tends to look to the origins of the word and then to sound. There are also a large number of exceptions, which further complicates matters. Some guidelines can be found in Major Keary's article "On Hyphenation – Anarchy of Pedantry." Among the
algorithmic approaches to hyphenation, the one implemented in the
TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of
Computers and Typesetting by Donald Knuth and in Franklin Mark Liang's dissertation. The aim of Liang's work was to get the algorithm as accurate as possible and to keep exceptions to a minimum. In addition to identifying rule-based patterns, algorithmic approaches often include
hardcoded, word-by-word exceptions for sufficiently important word hyphenations that the patterns do not produce. The original TeX hyphenation algorithm used an exception list of approximately 300 words. In
TeX82, over 1000 words were added to this list. In Knuth's
Plain TeX's hyphenation patterns for American English, the exception list contains only 14 words.
In TeX Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including
Haskell,
JavaScript,
Perl,
PostScript,
Python,
Ruby,
C#, and TeX can be made to show hyphens in the log by the command \showhyphens. In
LaTeX, hyphenation correction can be added by users by using: \hyphenation{words} The \hyphenation command declares allowed hyphenation points in which words is a list of words, separated by spaces, in which each hyphenation point is indicated by a - character. For example, \hyphenation{fortran er-go-no-mic} declares that in the current job "fortran" should not be hyphenated and that if "ergonomic" must be hyphenated, it will be at one of the indicated points. However, there are several limits. For example, the stock \hyphenation command accepts only
ASCII letters by default and so it cannot be used to correct hyphenation for words with non-ASCII characters (like
ä,
é,
ç), which are very common in many languages. Simple workarounds exist, however. == See also ==