The null distribution of the Pearson statistic with
j rows and
k columns is approximated by the
chi-squared distribution with (
k − 1)(
j − 1) degrees of freedom. This approximation arises as the true distribution, under the null hypothesis, if the expected value is given by a
multinomial distribution. For large sample sizes, the
central limit theorem says this distribution tends toward a certain
multivariate normal distribution.
Two cells In the special case where there are only two cells in the table, the expected values follow a
binomial distribution, O \sim \operatorname{Bin}(n,p), where :
p = probability, under the null hypothesis, :
n = number of observations in the sample. In the above example the hypothesised probability of a male observation is 0.5, with 100 samples. Thus we expect to observe 50 males. If
n is sufficiently large, the above binomial distribution may be approximated by a Gaussian (normal) distribution and thus the Pearson test statistic approximates a chi-squared distribution, \operatorname{Bin}(n,p) \approx \mathcal{N}\big(np, np(1 - p)\big). Let
O1 be the number of observations from the sample that are in the first cell. The Pearson test statistic can be expressed as \frac{(O_1 - np)^2}{np} + \frac{\big(n - O_1 - n(1 - p)\big)^2}{n(1 - p)}, which can in turn be expressed as \left(\frac{O_1 - np}{\sqrt{np(1 - p)}}\right)^2. By the normal approximation to a binomial, this is the squared of one standard normal variate, and hence is distributed as chi-squared with 1 degree of freedom. Note that the denominator is one standard deviation of the Gaussian approximation, so can be written \frac{(O_1 - \mu)^2}{\sigma^2}. So as consistent with the meaning of the chi-squared distribution, we are measuring how probable the observed number of standard deviations away from the mean is under the Gaussian approximation (which is a good approximation for large
n). The chi-squared distribution is then integrated on the right of the statistic value to obtain the
p-value, which is equal to the probability of getting a statistic equal or bigger than the observed one, assuming the null hypothesis.
Two-by-two contingency tables When the test is applied to a
contingency table containing two rows and two columns, the test is equivalent to a
Z-test of proportions.
Many cells Broadly similar arguments as above lead to the desired result, though the details are more involved. One may apply an orthogonal change of variables to turn the limiting summands in the test statistic into one fewer squares of i.i.d. standard normal random variables. Let us now prove that the distribution indeed approaches asymptotically the \chi^2 distribution as the number of observations approaches infinity. Let n be the number of observations, m the number of cells and p_i the probability of an observation to fall in the
i-th cell, for 1 \le i \le m. We denote by \{k_i\} the configuration where for each
i there are k_i observations in the
i-th cell. Note that \sum_{i=1}^m k_i = n \quad \text{and} \quad \sum_{i=1}^m p_i = 1. Let \chi^2_P(\{k_i\}, \{p_i\}) be Pearson's cumulative test statistic for such a configuration, and let \chi^2_P(\{p_i\}) be the distribution of this statistic. We will show that the latter probability approaches the \chi^2 distribution with m - 1 degrees of freedom, as n \to \infty. For any arbitrary value T: P\big(\chi^2_P(\{p_i\}) > T\big) = \sum_{\{k_i \mid \chi^2_P(\{k_i\}, \{p_i\}) > T\}} \frac{n!}{k_1! \cdots k_m!} \prod_{i=1}^m {p_i}^{k_i}. We will use a procedure similar to the approximation in
de Moivre–Laplace theorem. Contributions from small k_i are of subleading order in n and thus for large n we may use
Stirling's formula for both n! and k_i! to get the following: P\big(\chi^2_P(\{p_i\}) > T\big) \sim \sum_{\{k_i \mid \chi^2_P(\{k_i\},\{p_i\}) > T \}} \prod_{i=1}^m \left(\frac{np_i}{k_i}\right)^{k_i} \sqrt{\frac{2\pi n}{\prod_{i=1}^m 2\pi k_i}}. By substituting for x_i = \frac{k_i - np_i}{\sqrt{n}}, \quad i = 1, \cdots, m - 1, we may approximate for large n the sum over the k_i by an integral over the x_i. Noting that k_m = np_m - \sqrt{n} \sum_{i=1}^{m-1} x_i, we arrive at \begin{align} P\big(\chi^2_P(\{p_i\}\big) > T) &\sim \sqrt{\frac{2\pi n}{\prod_{i=1}^m 2\pi k_i}} \int_\Omega \left[ \prod_{i=1}^{m-1} \sqrt{n} dx_i \right] \times \\ & \qquad \times \left \{\prod_{i=1}^{m-1} \left(1 + \frac{x_i}{\sqrt{n} p_i}\right)^{-(n p_i + \sqrt{n} x_i) } \left(1 - \frac{\sum_{i=1}^{m-1} x_i}{\sqrt{n} p_m}\right)^{-\left(n p_m-\sqrt{n} \sum_{i=1}^{m-1} x_i\right)} \right\} \\ &= \sqrt{\frac{2\pi n}{\prod_{i=1}^m \left(2\pi n p_i + 2\pi \sqrt{n} x_i\right)}} \int_\Omega \left\{\prod_{i=1}^{m-1} {\sqrt{n} dx_i}\right\} \times \\ & \qquad \times \left\{ \prod_{i=1}^{m-1} \exp\left[-\left(n p_i + \sqrt{n} x_i \right) \ln \left(1 + \frac{x_i}{\sqrt{n} p_i}\right)\right] \exp \left[ -\left(n p_m - \sqrt{n} \sum_{i=1}^{m-1} x_i\right) \ln \left(1 - \frac{\sum_{i=1}^{m-1} x_i}{\sqrt{n} p_m}\right) \right] \right\}, \end{align} where \Omega is the set defined through \chi^2_P(\{k_i\}, \{p_i\}) = \chi^2_P(\{\sqrt{n} x_i + n p_i\}, \{p_i\}) > T. By
expanding the logarithm and taking the leading terms in n, we get P\big(\chi^2_P(\{p_i\}) > T\big) \sim \frac{1}{\sqrt{(2\pi)^{m-1} \prod_{i=1}^m p_i}} \int_\Omega \left[ \prod_{i=1}^{m-1} dx_i\right] \prod_{i=1}^{m-1} \exp\left[-\frac{1}{2} \sum_{i=1}^{m-1} \frac{x_i^2}{p_i} - \frac{1}{2p_m} \left(\sum_{i=1}^{m-1} x_i\right)^2 \right]. Pearson's chi, \chi^2_P(\{k_i\}, \{p_i\}) = \chi^2_P(\{\sqrt{n} x_i + n p_i\}, \{p_i\}), is precisely the argument of the exponent (except for the −1/2; note that the final term in the exponent's argument is equal to (k_m - n p_m)^2/(n p_m)). This argument can be written as -\frac{1}{2} \sum_{i,j=1}^{m-1} x_i A_{ij} x_j, \quad A_{ij} = \frac{\delta_{ij}}{p_i} + \frac{1}{p_m}, \quad i, j = 1, \cdots, m - 1. A is a regular symmetric (m - 1) \times (m - 1) matrix, and hence
diagonalizable. It is therefore possible to make a linear change of variables in \{x_i\} so as to get m - 1 new variables \{y_i\} so that \sum_{i,j=1}^{m-1} x_i A_{ij} x_j = \sum_{i=1}^{m-1} y_i^2. This linear change of variables merely multiplies the integral by a constant
Jacobian, so we get P\big(\chi^2_P(\{p_i\}) > T\big) \sim C \int_{\sum_{i=1}^{m-1} y_i^2 > T} \left\{\prod_{i=1}^{m-1} dy_i \right\} \prod_{i=1}^{m-1} \exp\left[-\frac{1}{2} \left(\sum_{i=1}^{m-1} y_i^2 \right)\right], where
C is a constant. This is the probability that squared sum of m - 1 independent normally distributed variables of zero mean and unit variance will be greater than
T, namely that \chi^2 with m - 1 degrees of freedom is larger than
T. We have thus shown that at the limit where n \to \infty, the distribution of Pearson's chi approaches the chi distribution with m - 1 degrees of freedom. An alternative derivation is on the
multinomial distribution page. ==Examples==