Distributional semantics favor the use of linear algebra as a computational tool and representational framework. The basic approach is to collect distributional information in high-dimensional vectors, and to define distributional/semantic similarity in terms of vector similarity. Different kinds of similarities can be extracted depending on which type of distributional information is used to collect the vectors:
topical similarities can be extracted by populating the vectors with information on which text regions the linguistic items occur in;
paradigmatic similarities can be extracted by populating the vectors with information on which other linguistic items the items co-occur with. Note that the latter type of vectors can also be used to extract
syntagmatic similarities by looking at the individual vector components. The basic idea of a correlation between distributional and semantic similarity can be operationalized in many different ways. There is a rich variety of computational models implementing distributional semantics, including
latent semantic analysis (LSA),
Hyperspace Analogue to Language (HAL), syntax- or dependency-based models,
random indexing,
semantic folding and various variants of the
topic model. Distributional semantic models differ primarily with respect to the following parameters: • Context type (text regions vs. linguistic items) • Context window (size, extension, etc.) • Frequency weighting (e.g.
entropy,
pointwise mutual information, etc.) • Dimension reduction (e.g.
random indexing,
singular value decomposition, etc.) •
Similarity measure (e.g.
cosine similarity,
Minkowski distance, etc.) Distributional semantic models that use linguistic items as context have also been referred to as
word space, or vector space models. ==Beyond Lexical Semantics==