Let k\text{-distance}(A) be the distance of the object
A to the
k-th nearest neighbor. Note that the set of the
k nearest neighbors includes all objects at this distance, which can in the case of a "tie" be more than
k objects. We denote the set of
k nearest neighbors as N_k(A). This distance is used to define what is called
reachability distance: \text{reachability-distance}_k(A,B)=\max\{k \text{-distance}(B), d(A,B)\} In words, the
reachability distance of an object
A from B is the true distance between the two objects, but at least the k\text{-distance} of
B. Objects that belong to the
k nearest neighbors of
B (the "core" of
B, see
DBSCAN cluster analysis) are considered to be equally distant. The reason for this is to reduce the statistical fluctuations between all points
A close to
B, where increasing the value for
k increases the smoothing effect. Note that this is not a
distance in the mathematical definition, since it is not symmetric. (While it is a common mistake to always use the k\text{-distance}(A), this yields a slightly different method, referred to as Simplified-LOF) The
local reachability density of an object
A is defined by :\text{lrd}_k(A) := \frac{\sum_{B \in N_k(A)} \text{reachability-distance}_k(A, B)} which is the inverse of the average reachability distance of the object
A from its neighbors. Note that it is not the average reachability of the neighbors from
A (which by definition would be the k\text{-distance}(A)), but the distance at which
A can be "reached"
from its neighbors. With duplicate points, this value can become infinite. The local reachability densities are then compared with those of the neighbors using : \text{LOF}_k(A) := \frac{1} \sum_{B \in N_k(A)} \frac{\text{lrd}_k(B)}{\text{lrd}_k(A)} = \frac{1}{|N_k(A)| \cdot \text{lrd}_k(A)} \sum_{B \in N_k(A)} \text{lrd}_k(B) which is the
average local reachability density of the neighbors divided by the object's own local reachability density. A value of approximately indicates that the object is comparable to its neighbors (and thus not an outlier). A value below indicates a denser region (which would be an inlier), while values significantly larger than indicate outliers. \text{LOF}_k(A) \sim 1 means
Similar density as neighbors, \text{LOF}_k(A) means
Higher density than neighbors (Inlier), \text{LOF}_k(A) > 1 means
Lower density than neighbors (Outlier). == Advantages ==