An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998. Another one, called
probabilistic latent semantic analysis (PLSA), was created by Thomas Hofmann in 1999.
Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSA. Developed by
David Blei,
Andrew Ng, and
Michael I. Jordan in 2002, LDA introduces sparse
Dirichlet prior distributions over document-topic and topic-word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words. Other topic models are generally extensions on LDA, such as
Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Hierarchical latent tree analysis (HLTA) is an alternative to LDA, which models word co-occurrence using a tree of latent variables and the states of the latent variables, which correspond to soft clusters of documents, are interpreted as topics. . Every column corresponds to a document, every row to a word. A cell stores the frequency of a word in a document, with dark cells indicating high word frequencies. This procedure groups documents, which use similar words, as it groups words occurring in a similar set of documents. Such groups of words are then called topics. More usual topic models, such as LDA, only group documents, based on a more sophisticated and probabilistic mechanism. ==Topic models for context information==