Suppose we have a
random variable that produces either a success or a failure. We want to compare a model M_1 where the probability of success is
q = , and another model M_2 where
q is unknown and we take a
prior distribution for
q that is
uniform on [0,1]. We take a sample of 200, and find 115 successes and 85 failures. The likelihood can be calculated according to the
binomial distribution: :{{200 \choose 115}q^{115}(1-q)^{85}}. Thus we have for M_1 :P(X=115 \mid M_1)={200 \choose 115}\left({1 \over 2}\right)^{200} \approx 0.006 whereas for M_2 we have :P(X=115 \mid M_2) = \int_{0}^1{200 \choose 115}q^{115}(1-q)^{85}dq = {1 \over 201} \approx 0.005 The ratio is then 1.2, which is "barely worth mentioning" even if it points very slightly towards M_1. A
frequentist hypothesis test of M_1 (here considered as a
null hypothesis) would have produced a very different result. Such a test says that M_1 should be rejected at the 5% significance level, since the probability of getting 115 or more successes from a sample of 200 if
q = is 0.02, and as a two-tailed test of getting a figure as extreme as or more extreme than 115 is 0.04. Note that 115 is more than two standard deviations away from 100. Thus, whereas a
frequentist hypothesis test would yield
significant results at the 5% significance level, the Bayes factor hardly considers this to be an extreme result. Note, however, that a non-uniform prior (for example one that reflects that you expect the number of success and failures to be of the same order of magnitude) could result in a Bayes factor that is more in agreement with the frequentist hypothesis test. A classical
likelihood-ratio test would have found the
maximum likelihood estimate for
q, namely \hat q =\frac{115}{200} = 0.575, whence :\textstyle P(X=115 \mid M_2) = {{200 \choose 115}\hat q^{115}(1-\hat q)^{85}} \approx 0.06 (rather than averaging over all possible
q). That gives a likelihood ratio of 0.1 and points towards
M2. M_2 is a more complex model than M_1 because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why
Bayesian inference has been put forward as a theoretical justification for and generalisation of
Occam's razor, reducing
Type I errors. On the other hand, the modern method of
relative likelihood takes into account the number of free parameters in the models, unlike the classical likelihood ratio. The relative likelihood method could be applied as follows. Model
M1 has 0 parameters, and so its
Akaike information criterion (AIC) value is 2\cdot 0 - 2\cdot \ln(0.005956)\approx 10.2467. Model
M2 has 1 parameter, and so its AIC value is 2\cdot 1 - 2\cdot\ln(0.056991)\approx 7.7297. Hence
M1 is about \exp\left(\frac{7.7297- 10.2467}{2}\right)\approx 0.284 times as probable as
M2 to minimize the information loss. Thus
M2 is slightly preferred, but
M1 cannot be excluded. == See also ==