The Kolmogorov distribution is the distribution of the
random variable K=\sup_{t\in[0,1]}|B(t)| where
B(
t) is the
Brownian bridge. The
cumulative distribution function of
K is given by \begin{align} \operatorname{Pr}(K\leq x) &= 1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2} \\ &=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)}, \end{align} which can also be expressed by the
Jacobi theta function \vartheta_{01}(z=0;\tau=2ix^2/\pi). Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by
Andrey Kolmogorov, while a table of the distribution was published by
Nikolai Smirnov. Recurrence relations for the distribution of the test statistic in finite samples are available. The
goodness-of-fit test or the Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when n \to\infty. It rejects the null hypothesis at level \alpha if \sqrt{n}D_n>K_\alpha,\, where
Kα is found from \operatorname{Pr}(K\leq K_\alpha)=1-\alpha.\, The asymptotic
power of this test is 1. Fast and accurate algorithms to compute the cdf \operatorname{Pr}(D_n \leq x) or its complement for arbitrary n and x, are available from: • and for continuous null distributions with code in C and Java to be found in. for purely discrete, mixed or continuous null distribution implemented in the KSgeneral package of the
R project for statistical computing, which for a given sample also computes the KS test statistic and its p-value. Alternative C++ implementation is available from. and later publications also include the
Gumbel distribution. The
Lilliefors test represents a special case of this for the normal distribution. The logarithm transformation may help to overcome cases where the Kolmogorov test data does not seem to fit the assumption that it came from the normal distribution. Using estimated parameters, the question arises which estimation method should be used. Usually this would be the
maximum likelihood method, but e.g. for the normal distribution MLE has a large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df = 2 via KS test whether the data could be normal or not, then a ML estimate based on H0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than a fit with minimum KS. In this case we should reject H0, which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject H0. In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power.
Discrete and mixed null distribution Under the assumption that F is non-decreasing and right-continuous, with countable (possibly infinite) number of jumps, the KS test statistic can be expressed as: D_n= \sup_x |F_n(x)-F(x)| = \sup_{0 \leq t \leq 1} |F_n(F^{-1}(t)) - F(F^{-1}(t))|. From the right-continuity of F, it follows that F(F^{-1}(t)) \geq t and F^{-1}(F(x)) \leq x and hence, the distribution of D_{n} depends on the null distribution F, i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of D_{n} when F is purely discrete or mixed, as part of the dgof package of the R language. Major statistical packages among which
SAS PROC NPAR1WAY,
Stata ksmirnov implement the KS test under the assumption that F(x) is continuous, which is more conservative if the null distribution is actually not continuous (see ). ==Two-sample Kolmogorov–Smirnov test==