Suppose each observation is
yxi where
x indicates the category that observation is in and
i is the label of the particular observation. Let
nx be the number of observations in category
x and :\overline{y}_x=\frac{\sum_i y_{xi}}{n_x} and \overline{y}=\frac{\sum_x n_x \overline{y}_x}{\sum_x n_x}, where \overline{y}_x is the mean of the category
x and \overline{y} is the mean of the whole population. The correlation ratio η (
eta) is defined as to satisfy :\eta^2 = \frac{\sum_x n_x (\overline{y}_x-\overline{y})^2}{\sum_{x,i} (y_{xi}-\overline{y})^2} which can be written as :\eta^2 = \frac{{\sigma_{\overline{y}}}^2}{{\sigma_{y}}^2}, \text{ where }{\sigma_{\overline{y}}}^2 = \frac{\sum_x n_x (\overline{y}_x-\overline{y})^2}{\sum_x n_x} \text{ and } {\sigma_{y}}^2 = \frac{\sum_{x,i} (y_{xi}-\overline{y})^2}{n}, i.e. the weighted variance of the category means divided by the variance of all samples. If the relationship between values of x and values of \overline{y}_x is linear (which is certainly true when there are only two possibilities for
x) this will give the same result as the square of Pearson's
correlation coefficient; otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships. ==Range==