The cosine of two non-zero vectors can be derived by using the
Euclidean dot product formula: :\mathbf{A}\cdot\mathbf{B} =\left\|\mathbf{A}\right\|\left\|\mathbf{B}\right\|\cos\theta Given two
n-dimensional
vectors of attributes,
A and
B, the cosine similarity, , is represented using a
dot product and
magnitude as :\text{cosine similarity} =S_C (A,B):= \cos(\theta) = {\mathbf{A} \cdot \mathbf{B} \over \|\mathbf{A}\| \|\mathbf{B}\|} = \frac{ \sum\limits_{i=1}^{n}{A_i B_i} }{ \sqrt{\sum\limits_{i=1}^{n}{A_i^2}} \cdot \sqrt{\sum\limits_{i=1}^{n}{B_i^2}} }, where A_i and B_i are the ith
components of vectors \mathbf{A} and \mathbf{B}, respectively. The resulting similarity ranges from −1 meaning exactly opposite, to +1 meaning exactly the same, with 0 indicating
orthogonality (no
correlation), while in-between values indicate intermediate similarity or dissimilarity. For
text matching, the attribute vectors
A and
B are usually the
term frequency vectors of the documents. Cosine similarity can be seen as a method of
normalizing document length during comparison. In the case of
information retrieval, the cosine similarity of two documents will range from 0 \to 1, since the term frequencies cannot be negative. This remains true when using
TF-IDF weights. The angle between two term frequency vectors cannot be greater than 90°. If the attribute vectors are normalized by subtracting the vector means (e.g., A - \bar{A}), the measure is called the centered cosine similarity and is equivalent to the
Pearson correlation coefficient. For an example of centering, : \text{if}\, A = [A_1, A_2]^T, \text{ then } \bar{A} = \left[\frac{(A_1+A_2)}{2},\frac{(A_1+A_2)}{2}\right]^T, : \text{ so } A-\bar{A}= \left[\frac{(A_1-A_2)}{2},\frac{(-A_1+A_2)}{2}\right]^T.
Cosine distance When the distance between two unit-length vectors is defined to be the length of their vector difference then \operatorname{dist}(\mathbf A, \mathbf B) = \sqrt{(\mathbf A - \mathbf B) \cdot (\mathbf A - \mathbf B)} = \sqrt{\mathbf A \cdot \mathbf A -2(\mathbf A \cdot \mathbf B) + \mathbf B \cdot \mathbf B} = \sqrt{2(1-S_C(\mathbf A, \mathbf B))}\,. Nonetheless the
cosine distance is often defined without the square root or factor of 2: : \text{cosine distance} = D_C(A,B) := 1 - S_C(A,B)\,. It is important to note that, by virtue of being proportional to squared Euclidean distance, the cosine distance is not a true
distance metric; it does not exhibit the
triangle inequality property — or, more formally, the
Schwarz inequality — and it violates the coincidence axiom. To repair the triangle inequality property while maintaining the same ordering, one can convert to
Euclidean distance \sqrt{2(1- S_C(A,B))} or angular distance . Alternatively, the triangular inequality that does work for angular distances can be expressed directly in terms of the cosines; see
below.
Angular distance and similarity The normalized angle, referred to as
angular distance, between any two vectors A and B is a formal
distance metric and can be calculated from the cosine similarity. The complement of the angular distance metric can then be used to define
angular similarity function bounded between 0 and 1, inclusive. When the vector elements may be positive or negative: :\text{angular distance} = D_{\theta} := \frac{ \arccos( \text{cosine similarity} ) }{ \pi } = \frac{\theta}{\pi} :\text{angular similarity} = S_{\theta} := 1 - \text{angular distance} = 1 - \frac{\theta}{\pi} Or, if the vector elements are always positive: :\text{angular distance} = D_{\theta} := \frac{ 2 \cdot \arccos( \text{cosine similarity} ) }{ \pi } = \frac{2\theta}{\pi} :\text{angular similarity} = S_{\theta} := 1 - \text{angular distance} = 1 - \frac{2\theta}{\pi} Unfortunately, computing the inverse cosine () function is slow, making the use of the angular distance more computationally expensive than using the more common (but not metric) cosine distance above.
L2-normalized Euclidean distance Another effective proxy for cosine distance can be obtained by
L_2 normalisation of the vectors, followed by the application of normal
Euclidean distance. Using this technique each term in each vector is first divided by the magnitude of the vector, yielding a vector of unit length. Then the Euclidean distance over the end-points of any two vectors is a proper metric which gives the same ordering as the cosine distance (a
monotonic transformation of Euclidean distance; see
below) for any comparison of vectors, and furthermore avoids the potentially expensive trigonometric operations required to yield a proper metric. Once the normalisation has occurred, the vector space can be used with the full range of techniques available to any Euclidean space, notably standard
dimensionality reduction techniques. This normalised form distance is often used within many
deep learning algorithms.
Otsuka–Ochiai coefficient In biology, there is a similar concept known as the Otsuka–Ochiai coefficient named after
Yanosuke Otsuka (also spelled as Ōtsuka, Ootsuka or Otuka, ) and Akira Ochiai (), also known as the Ochiai–Barkman or Ochiai coefficient, which can be represented as: :K =\frac{\sqrt} Here, A and B are
sets, and |A| is the number of elements in A. If sets are represented as bit vectors, the Otsuka–Ochiai coefficient can be seen to be the same as the cosine similarity. It is identical to the score introduced by
Godfrey Thomson. In a recent book, the coefficient is tentatively misattributed to another Japanese researcher with the family name Otsuka. The confusion arises because in 1957 Akira Ochiai attributes the coefficient only to Otsuka (no first name mentioned) who in turn cites the original 1936 article by Yanosuke Otsuka. == Properties ==