About 50 to 100 different measures of effect size are known. Many effect sizes of different types can be converted to other types, as many estimate the separation of two distributions, so are mathematically related. For example, a correlation coefficient can be converted to a Cohen's d and vice versa.
Correlation family: Effect sizes based on "variance explained" These effect sizes estimate the amount of the variance within an experiment that is "explained" or "accounted for" by the experiment's model (
Explained variation).
Pearson r or correlation coefficient Pearson's correlation, often denoted
r and introduced by
Karl Pearson, is widely used as an
effect size when paired quantitative data are available; for instance if one were studying the relationship between birth weight and longevity. The correlation coefficient can also be used when the data are binary. Pearson's
r can vary in magnitude from −1 to 1, with −1 indicating a perfect negative linear relation, 1 indicating a perfect positive linear relation, and 0 indicating no linear relation between two variables.
Coefficient of determination (r2 or R2) A related
effect size is
r2, the
coefficient of determination (also referred to as
R2 or "
r-squared"), calculated as the square of the Pearson correlation
r. In the case of paired data, this is a measure of the proportion of variance shared by the two variables, and varies from 0 to 1. For example, with an
r of 0.21 the coefficient of determination is 0.0441, meaning that 4.4% of the variance of either variable is shared with the other variable. The
r2 is always positive, so does not convey the direction of the correlation between the two variables.
Eta-squared (η2) Eta-squared describes the ratio of variance explained in the dependent variable by a predictor while controlling for other predictors, making it analogous to the
r2. Eta-squared is a biased estimator of the variance explained by the model in the population (it estimates only the effect size in the sample). This estimate shares the weakness with
r2 that each additional variable will automatically increase the value of
η2. In addition, it measures the variance explained of the sample, not the population, meaning that it will always overestimate the effect size, although the bias grows smaller as the sample grows larger. \eta ^2 = \frac{SS_\text{Treatment}}{SS_\text{Total}} .
Omega-squared (ω2) A less biased estimator of the variance explained in the population is
ω2 \omega^2 = \frac{\text{SS}_\text{treatment}-df_\text{treatment} \cdot \text{MS}_\text{error}}{\text{SS}_\text{total} + \text{MS}_\text{error}} . This form of the formula is limited to between-subjects analysis with equal sample sizes in all cells. In addition, methods to calculate partial
ω2 for individual factors and combined factors in designs with up to three independent variables have been published. The f^{2} effect size measure for sequential multiple regression and also common for
PLS modeling is defined as: f^2 = {R^2_{AB} - R^2_A \over 1 - R^2_{AB}} where
R2
A is the variance accounted for by a set of one or more independent variables
A, and
R2
AB is the combined variance accounted for by
A and another set of one or more independent variables of interest
B. By convention,
f2 effect sizes of 0.1^2, 0.25^2, and 0.4^2 are termed
small,
medium, and
large, respectively. \theta = \frac{\mu_1 - \mu_2} \sigma, where
μ1 is the mean for one population,
μ2 is the mean for the other population, and σ is a
standard deviation based on either or both populations. In the practical setting the population values are typically not known and must be estimated from sample statistics. The several versions of effect sizes based on means differ with respect to which statistics are used. This form for the effect size resembles the computation for a
t-test statistic, with the critical difference that the
t-test statistic includes a factor of \sqrt{n}. This means that for a given effect size, the significance level increases with the sample size. Unlike the
t-test statistic, the effect size aims to estimate a population
parameter and is not affected by the sample size. SMD values of 0.2 to 0.5 are considered small, 0.5 to 0.8 are considered medium, and greater than 0.8 are considered large.
Cohen's d Cohen's
d is defined as the difference between two means divided by a standard deviation for the data, i.e. d = \frac{\bar{x}_1 - \bar{x}_2} s.
Jacob Cohen defined
s, the
pooled standard deviation, as (for two independent samples): s = \sqrt{\frac{(n_1-1)s^2_1 + (n_2-1)s^2_2}{n_1+n_2 - 2}} where the variance for one of the groups is defined as s_1^2 = \frac 1 {n_1-1} \sum_{i=1}^{n_1} (x_{1,i} - \bar{x}_1)^2, and similarly for the other group. Other authors choose a slightly different computation of the standard deviation when referring to "Cohen's
d" where the denominator is without "-2" s = \sqrt{\frac{(n_1-1)s^2_1 + (n_2-1)s^2_2}{n_1+n_2}} This definition of "Cohen's
d" is termed the
maximum likelihood estimator by Hedges and Olkin, Cohen's
d is frequently used in
estimating sample sizes for statistical testing. A lower Cohen's
d indicates the necessity of larger sample sizes, and vice versa, as can subsequently be determined together with the additional parameters of desired
significance level and
statistical power.
Glass' Δ In 1976,
Gene V. Glass proposed an estimator of the effect size that uses only the standard deviation of the second group is like the other measures based on a standardized difference CRTs involve randomising clusters, such as schools or classrooms, to different conditions and are frequently used in education research.
Ψ, root-mean-square standardized effect A similar effect size estimator for multiple comparisons (e.g.,
ANOVA) is the Ψ root-mean-square standardized effect: :\beta = \frac{\mu_1 - \mu_2}{\sqrt{\sigma_1^2 + \sigma_2^2 - 2\sigma_{12} }}. If the two groups are independent, :\beta = \frac{\mu_1 - \mu_2}{\sqrt{\sigma_1^2 + \sigma_2^2 }}. If the two independent groups have equal
variances \sigma^2, :\beta = \frac{\mu_1 - \mu_2}{\sqrt{2}\sigma}.
Other metrics Mahalanobis distance (D) is a multivariate generalization of Cohen's d, which takes into account the relationships between the variables.
Categorical family: Effect sizes for associations among categorical variables Commonly used measures of association for the
chi-squared test are the
Phi coefficient and
Cramér's
V (sometimes referred to as Cramér's phi and denoted as
φc). Phi is related to the
point-biserial correlation coefficient and Cohen's
d and estimates the extent of the relationship between two variables (2 × 2). Cramér's V may be used with variables having more than two levels. Phi can be computed by finding the square root of the chi-squared statistic divided by the sample size. Similarly, Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the length of the minimum dimension (
k is the smaller of the number of rows
r or columns
c). φ
c is the intercorrelation of the two discrete variables and may be computed for any value of
r or
c. However, as chi-squared values tend to increase with the number of cells, the greater the difference between
r and
c, the more likely V will tend to 1 without strong evidence of a meaningful correlation.
Cohen's omega (ω) Another measure of effect size used for chi-squared tests is Cohen's omega ( \omega). This is defined as \omega = \sqrt{ \sum_{i=1}^m \frac{ (p_{1i} - p_{0i})^2 }{p_{0i}} } where
p0
i is the proportion of the
ith cell under
H0,
p1
i is the proportion of the
ith cell under
H1 and
m is the number of cells.
Odds ratio The
odds ratio (OR) is another useful effect size. It is appropriate when the research question focuses on the degree of association between two
binary variables. For example, consider a study of spelling ability. In a control group, two students pass the class for every one who fails, so the odds of passing are two to one (or 2/1 = 2). In the treatment group, six students pass for every one who fails, so the odds of passing are six to one (or 6/1 = 6). The effect size can be computed by noting that the odds of passing in the treatment group are three times higher than in the control group (because 6 divided by 2 is 3). Therefore, the odds ratio is 3. Odds ratio statistics are on a different scale than Cohen's
d, so this '3' is not comparable to a Cohen's
d of 3.
Relative risk The
relative risk (RR), also called
risk ratio, is simply the risk (probability) of an event relative to some independent variable. This measure of effect size differs from the odds ratio in that it compares
probabilities instead of
odds, but asymptotically approaches the latter for small probabilities. Using the example above, the
probabilities for those in the control group and treatment group passing is 2/3 (or 0.67) and 6/7 (or 0.86), respectively. The effect size can be computed the same as above, but using the probabilities instead. Therefore, the relative risk is 1.28. Since rather large probabilities of passing were used, there is a large difference between relative risk and odds ratio. Had
failure (a smaller probability) been used as the event (rather than
passing), the difference between the two measures of effect size would not be so great. While both measures are useful, they have different statistical uses. In medical research, the
odds ratio is commonly used for
case-control studies, as odds, but not probabilities, are usually estimated. Relative risk is commonly used in
randomized controlled trials and
cohort studies, but relative risk contributes to overestimations of the effectiveness of interventions.
Risk difference The
risk difference (RD), sometimes called absolute risk reduction, is simply the difference in risk (probability) of an event between two groups. It is a useful measure in experimental research, since RD tells you the extent to which an experimental interventions changes the probability of an event or outcome. Using the example above, the probabilities for those in the control group and treatment group passing is 2/3 (or 0.67) and 6/7 (or 0.86), respectively, and so the RD effect size is 0.86 − 0.67 = 0.19 (or 19%). RD is the superior measure for assessing effectiveness of interventions. They used the following example (about heights of men and women): "in any random pairing of young adult males and females, the probability of the male being taller than the female is .92, or in simpler terms yet, in 92 out of 100 blind dates among young adults, the male will be taller than the female", is a measure of how often the values in one distribution are larger than the values in a second distribution. Crucially, it does not require any assumptions about the shape or spread of the two distributions. The sample estimate d is given by: d = \frac{\sum_{i,j} [x_i > x_j] - [x_i where the two distributions are of size n and m with items x_i and x_j, respectively, and [\cdot] is the
Iverson bracket, which is 1 when the contents are true and 0 when false. d is linearly related to the
Mann–Whitney U statistic; however, it captures the direction of the difference in its sign. Given the Mann–Whitney U, d is: d = \frac{2U}{mn} - 1
Cohen's g One of simplest effect sizes for measuring how much a proportion differs from 50% is Cohen's g. It measures how much a proportion differs from 50%. For example, if 85.2% of arrests for car theft are males, then effect size of sex on arrest when measured with Cohen's g is g = 0.852-0.5=0.352. In general: g = P - 0.50 \text{ or } 0.50 - P \quad (\text{directional}), g = |P - 0.50| \quad (\text{nondirectional}). Units of Cohen's g are more intuitive (proportion) than in some other effect sizes. It is sometime used in combination with
Binomial test. == Confidence intervals by means of noncentrality parameters ==