In the 19th century, statistical analytical methods were mainly applied in biological data analysis and it was customary for researchers to assume that observations followed a
normal distribution, such as
Sir George Airy and
Mansfield Merriman, whose works were criticized by
Karl Pearson in his 1900 paper. At the end of the 19th century, Pearson noticed the existence of significant
skewness within some biological observations. To model the observations regardless of being normal or skewed, Pearson, in a series of articles published from 1893 to 1916, devised the
Pearson distribution, a family of continuous
probability distributions, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing a test of goodness of fit to determine how well the model really fits to the observations.
Pearson's chi-squared test In 1900, Pearson published a paper In this paper, Pearson investigated a test of goodness of fit. Suppose that observations in a random sample from a population are classified into mutually exclusive classes with respective observed numbers of observations (for ), and a null hypothesis gives the probability that an observation falls into the th class. So we have the expected numbers for all , where :\begin{align} & \sum^k_{i=1}{p_i} = 1 \\[8pt] & \sum^k_{i=1}{m_i} = n\sum^k_{i=1}{p_i} = n \end{align} Pearson proposed that, under the circumstance of the null hypothesis being correct, as the limiting distribution of the quantity given below is the distribution. :X^2=\sum^k_{i=1}{\frac{(x_i-m_i)^2}{m_i}}=\sum^k_{i=1}{\frac{x_i^2}{m_i}-n} Pearson dealt first with the case in which the expected numbers are large enough known numbers in all cells assuming every observation may be taken as
normally distributed, and reached the result that, in the limit as becomes large, follows the distribution with degrees of freedom. However, Pearson next considered the case in which the expected numbers depended on the parameters that had to be estimated from the sample, and suggested that, with the notation of being the true expected numbers and being the estimated expected numbers, the difference :X^2-{X'}^2=\sum^k_{i=1}{\frac{x_i^2}{m_i}}-\sum^k_{i=1}{\frac{x_i^2}{m'_i}} will usually be positive and small enough to be omitted. In a conclusion, Pearson argued that if we regarded as also distributed as distribution with degrees of freedom, the error in this approximation would not affect practical decisions. This conclusion caused some controversy in practical applications and was not settled for 20 years until Fisher's 1922 and 1924 papers. == Other examples of chi-squared tests ==