According to the
Chen–Fox–Lyndon theorem, every string may be formed in a unique way by concatenating a sequence of Lyndon words, in such a way that the words in the sequence are nonincreasing lexicographically. The final Lyndon word in this sequence is the lexicographically smallest suffix of the given string. A factorization into a nonincreasing sequence of Lyndon words (the so-called Lyndon factorization) can be constructed in
linear time. and in algorithms for
digital geometry. Such factorizations can be written (uniquely) as finite binary trees, with the leaves labelled by the alphabet, with each rightward branch given by the final Lyndon word in the sequence. Such trees are sometimes called
standard bracketings and can be taken as factorization of the elements of a
free group or as the basis elements for a
free Lie algebra. These trees are a special case of
Hall trees (as Lyndon words are a special case of Hall words), and so likewise, the Hall words provide a standard order, called the
commutator collecting process for groups, and basis for Lie algebras. Indeed, they provide an explicit construction for the commutators appearing in the
Poincaré–Birkhoff–Witt theorem needed for the construction of
universal enveloping algebras. Every Lyndon word can be understood as a
permutation, the
suffix standard permutation.
Duval algorithm developed an algorithm for finding the standard factorization that runs in linear time and constant space. It iterates over a string trying to find as long a Lyndon word as possible. When it finds one, it adds it to the result list and proceeds to search the remaining part of the string. The resulting list of strings is the standard factorization of the given string. More formal description of the algorithm follows. Given a string
S of length
N, one should proceed with the following steps: • Let
m be the index of the symbol-candidate to be appended to the already collected symbols. Initially,
m = 1 (indices of symbols in a string start from zero). • Let
k be the index of the symbol we would compare others to. Initially,
k = 0. • While
k and
m are less than
N, compare
S[
k] (the
k-th symbol of the string
S) to
S[
m]. There are three possible outcomes: •
S[
k] is equal to
S[
m]: append
S[
m] to the current collected symbols. Increment
k and
m. •
S[
k] is less than
S[
m]: if we append
S[
m] to the current collected symbols, we'll get a Lyndon word. But we can't add it to the result list yet because it may be just a part of a larger Lyndon word. Thus, just increment
m and set
k to 0 so the next symbol would be compared to the first one in the string. •
S[
k] is greater than
S[
m]: if we append
S[
m] to the current collected symbols, it will be neither a Lyndon word nor a possible beginning of one. Thus, add the first
m−
k collected symbols to the result list, remove them from the string, set
m to 1 and
k to 0 so that they point to the second and the first symbol of the string respectively. • When
m >
N, it is essentially the same as encountering minus infinity, thus, add the first
m−
k collected symbols to the result list after removing them from the string, set
m to 1 and
k to 0, and return to the previous step. • Add
S to the result list. ==Connection to de Bruijn sequences==