In statistics, logistic regression is a type of
regression analysis used for predicting the outcome of a categorical dependent variable (with a limited number of categories) or dichotomic dependent variable based on one or more predictor variables. The probabilities describing the possible outcome of a single trial are modeled, as a function of explanatory (independent) variables, using a
logistic function or
multinomial distribution. Logistic regression measures the relationship between a categorical or dichotomic dependent variable and usually a continuous independent variable (or several), by converting the dependent variable to probability scores. The probabilities can be retrieved using the logistic function or the multinomial distribution, while those probabilities, like in
probability theory, takes on values between zero and one: P(y_i)= \frac{e^{\beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik}}}{1+e^{\beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik}}} =\frac{1}{1 + e^{-(\beta_0 + \beta_1 x_{i1} + \dots+ \beta_k x_{ik})}} So the model tested can be defined by: f(y_i) = \ln \frac {P(y_i)}{1-P(y_i)} = \beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik} , whereas yi is the category of the dependent variable for the i-th observation and xij is the j independent variable (j=1,2,...k) for that observation, βj is the j-th coefficient of xij and indicates its influence on and expected from the fitted model . Note: independent variables in logistic regression can also be continuous.
Omnibus test relates to the hypotheses H0: β1= β2=....= βk = 0 H1: at least one βj ≠ 0
Model fitting: maximum likelihood method The omnibus test, among the other parts of the logistic regression procedure, is a likelihood-ratio test based on the maximum likelihood method. Unlike the Linear Regression procedure in which estimation of the regression coefficients can be derived from least square procedure or by minimizing the sum of squared residuals as in maximum likelihood method, in logistic regression there is no such an analytical solution or a set of equations from which one can derive a solution to estimate the regression coefficients. So logistic regression uses the maximum likelihood procedure to estimate the coefficients that maximize the likelihood of the regression coefficients given the predictors and criterion. The maximum likelihood solution is an iterative process that begins with a tentative solution, revises it slightly to see if it can be improved, and repeats this process until improvement is made, at which point the model is said to have converged. Applying the procedure in conditioned on convergence ( see also in the following "remarks and other considerations "). In general, regarding simple hypotheses on parameter θ ( for example):H0: θ=θ0vs.H1: θ=θ1, the likelihood ratio
test statistic can be referred as: \lambda(y_i)= \frac {L(y_i|\theta_0)}{L(y_i|\theta_1)} , where L(yi|θ) is the
likelihood function, which refers to the specific θ. The numerator corresponds to the maximum likelihood of an observed outcome under the null hypothesis. The denominator corresponds to the maximum likelihood of an observed outcome varying parameters over the whole
parameter space. The numerator of this ratio is less than the denominator. The likelihood ratio hence is between 0 and 1. Lower values of the likelihood ratio mean that the observed result was much less likely to occur under the null hypothesis as compared to the alternative. Higher values of the statistic mean that the observed outcome was more than or equally likely or nearly as likely to occur under the null hypothesis as compared to the alternative, and the null hypothesis cannot be rejected. The likelihood ratio test provides the following decision rule: If \lambda(y_i)>C do not reject H0, otherwise If \lambda(y_i) reject H0 and also reject H0 with probability, whereas the critical values c, q are usually chosen to obtain a specified significance level α, through : q \cdot P(\lambda(y_i)=C|H_0) + P(\lambda(y_i). Thus, the likelihood-ratio test rejects the null hypothesis if the value of this statistic is too small. How small is too small depends on the significance level of the test, i.e., on what probability of Type I error is considered tolerable The Neyman-Pearson lemma states that this likelihood ratio test is the most powerful among all level-α tests for this problem.
Test's statistic and distribution: Wilks' theorem First we define the test statistic as the deviate D=-2\ln\lambda(y_i) which indicates testing the ratio: D = - 2 \ln \lambda(y_i) = - 2 \ln\frac\text{likelihood under fitted model if null hypothesis is true}\text{likelihood under saturated model} While the saturated model is a model with a theoretically perfect fit. Given that deviance is a measure of the difference between a given model and the saturated model, smaller values indicate better fit as the fitted model deviates less from the saturated model. When assessed upon a chi-square distribution, non-significant chi-square values indicate very little unexplained variance and thus, good model fit. Conversely, a significant chi-square value indicates that a significant amount of the variance is unexplained. Two measures of deviance D are particularly important in logistic regression: null deviance and model deviance. The null deviance represents the difference between a model with only the intercept and no predictors and the saturated model. And, the model deviance represents the difference between a model with at least one predictor and the saturated model. In this respect, the null model provides a baseline upon which to compare predictor models. Therefore, to assess the contribution of a predictor or set of predictors, one can subtract the model deviance from the null deviance and assess the difference on a chi-square distribution with one degree of freedom. If the model deviance is significantly smaller than the null deviance then one can conclude that the predictor or set of predictors significantly improved model fit. This is analogous to the F-test used in linear regression analysis to assess the significance of prediction. In most cases, the exact distribution of the likelihood ratio corresponding to specific hypotheses is very difficult to determine. A convenient result, attributed to Samuel S. Wilks, says that as the sample size n approaches the test statistic has asymptotically distribution with degrees of freedom equal to the difference in dimensionality of and parameters the β coefficients as mentioned before on the omnibus test. e.g., if n is large enough and if the fitted model assuming the null hypothesis consist of 3 predictors and the saturated ( full ) model consist of 5 predictors, the Wilks' statistic is approximately distributed (with 2 degrees of freedom). This means that we can retrieve the critical value C from the chi squared with 2 degrees of freedom under a specific significance level.
Other considerations • In some instances the model may not reach convergence. When a model does not converge this indicates that the coefficients are not reliable as the model never reached a final solution. Lack of convergence may result from a number of problems: having a large ratio of predictors to cases, multi-collinearity, sparseness, or complete separation. Although not a precise number, as a rule of thumb, logistic regression models require a minimum of 10 cases per variable. Having a large proportion of variables to cases results in an overly conservative Wald statistic and can lead to non convergence. • Multi-collinearity refers to unacceptably high correlations between predictors. As multi-collinearity increases, coefficients remain unbiased but standard errors increase and the likelihood of model convergence decreases. To detect multi-collinearity among the predictors, one can conduct a linear regression analysis with the predictors of interest for the sole purpose of examining the tolerance statistic used to assess whether multi-collinearity is unacceptably high. • Sparseness in the data refers to having a large proportion of empty cells (cells with zero counts). Zero cell counts are particularly problematic with categorical predictors. With continuous predictors, the model can infer values for the zero cell counts, but this is not the case with categorical predictors. The reason the model will not converge with zero cell counts for categorical predictors is because the natural logarithm of zero is an undefined value, so final solutions to the model cannot be reached. To remedy this problem, researchers may collapse categories in a theoretically meaningful way or may consider adding a constant to all cells. Another numerical problem that may lead to a lack of convergence is complete separation, which refers to the instance in which the predictors perfectly predict the criterion - all cases are accurately classified. In such instances, one should reexamine the data, as there is likely some kind of error. • Wald statistic is defined by, where is the sample estimation of and is the standard error of . Alternatively, when assessing the contribution of individual predictors in a given model, one may examine the significance of the Wald statistic. The Wald statistic, analogous to the t-test in linear regression, is used to assess the significance of coefficients. The Wald statistic is the ratio of the square of the regression coefficient to the square of the standard error of the coefficient and is asymptotically distributed as a chi-square distribution. Although several statistical packages (e.g., SPSS, SAS) report the Wald statistic to assess the contribution of individual predictors, the Wald statistic has some limitations. First, When the regression coefficient is large, the standard error of the regression coefficient also tends to be large increasing the probability of Type-II error. Secondly, the Wald statistic also tends to be biased when data are sparse. • Model Fit involving categorical predictors may be achieved by using log-linear modeling.
Example 1 of logistic regression Spector and Mazzeo examined the effect of a teaching method known as PSI on the performance of students in a course, intermediate macro economics. The question was whether students exposed to the method scored higher on exams in the class. They collected data from students in two classes, one in which PSI was used and another in which a traditional teaching method was employed. For each of 32 students, they gathered data on
Independent variables • GPA-Grade point average before taking the class. • TUCE-the score on an exam given at the beginning of the term to test entering knowledge of the material. • PSI- a dummy variable indicating the teaching method used (1 = used Psi, 0 = other method).
Dependent variable • GRADE — coded 1 if the final grade was an A, 0 if the final grade was a B or C. The particular interest in the research was whether PSI had a significant effect on GRADE. TUCE and GPA are included as control variables. Statistical analysis using logistic regression of Grade on GPA, Tuce and Psi was conducted in SPSS using Stepwise Logistic Regression. In the output, the "block" line relates to Chi-Square test on the set of independent variables that are tested and included in the model fitting. The "step" line relates to Chi-Square test on the step level while variables included in the model step by step. Note that in the output a step chi-square, is the same as the block chi-square since they both are testing the same hypothesis that the tested variables enter on this step are non-zero. If you were doing
stepwise regression, however, the results would be different. Using forward stepwise selection, researchers divided the variables into two blocks (see METHOD on the syntax following below). LOGISTIC REGRESSION VAR=grade /METHOD=fstep psi / fstep gpa tuce /CRITERIA PIN(.50) POUT(.10) ITERATE(20) CUT(.5). The default PIN value is .05, was changed by the researchers to .5 so the insignificant TUCE would make it in. In the first block, psi alone gets entered, so the block and step Chi Test relates to the hypothesis H0: βPSI = 0. Results of the omnibus Chi-Square tests implies that PSI is significant for predicting that GRADE is more likely to be a final grade of A. =====Block 1: method = forward stepwise (conditional)=====
Omnibus tests of model coefficients Then, in the next block, the forward selection procedure causes GPA to get entered first, then TUCE (see METHOD command on the syntax before). =====Block 2: method = forward stepwise (conditional)=====
Omnibus tests of model coefficients The first step on block2 indicates that GPA is significant (P-Value=0.0030: βGPA = βTUCE = 0. • The model chi-square, 15.404, tells you whether any of the three Independent Variables has significant effects. It is the equivalent of a global F test, i.e. it tests H0: βGPA = βTUCE = βPSI = 0. Tests of Individual Parameters shown on the "variables in the equation table", which
Wald test (W=(b/sb)2, where b is β estimation and sb is its standard error estimation ) that is testing whether any individual parameter equals zero . You can, if you want, do an incremental LR chi-square test. That, in fact, is the best way to do it, since the Wald test referred to next is biased under certain situations. When parameters are tested separately, by controlling the other parameters, we see that the effects of GPA and PSI are statistically significant, but the effect of TUCE is not. Both have Exp(β) greater than 1, implying that the probability to get "A" grade is greater than getting other grade depends upon the teaching method PSI and a former grade average GPA.
Variables in the equation a. Variable(s) entered on step 1: PSI
Example 2 of logistic regression Research subject: "The Effects of Employment, Education, Rehabilitation and Seriousness of Offense on Re-Arrest". A social worker in a criminal justice probation agency tends to examine whether some of the factors are leading to re-arrest of those managed by the person's agency over the past five years who were convicted and then released. The data consist of 1,000 clients with the following variables:
Dependent variable (coded as a dummy variable) • Re-arrested vs. not re-arrested (0 = not re-arrested; 1 = re-arrested) – categorical, nominal
Independent variables (coded as a dummy variables) • Whether or not the client was adjudicated for a second criminal offense (1= adjudicated,0=not). • Seriousness of first offense (1=felony vs. 0=misdemeanor) -categorical, nominal • High school graduate vs. not (0 = not graduated; 1 = graduated) - categorical, nominal • Whether or not client completed a rehabilitation program after the first offense,0 = no rehab completed; 1 = rehab completed)-categorical, nominal • Employment status after first offense (0 = not employed; 1 = employed) Note: Continuous independent variables were not measured on this scenario. The null hypothesis for the overall model fit: The overall model does not predict re-arrest. OR, the independent variables as a group are not related to being re-arrested. (And for the independent variables: any of the separate independent variables is not related to the likelihood of re-arrest). The alternative hypothesis for the overall model fit: The overall model predicts the likelihood of re-arrest. (The meaning respectively independent variables: having committed a felony (vs. a misdemeanor), not completing high school, not completing a rehab program, and being unemployed are related to the likelihood of being re-arrested). Logistic regression was applied to the data on SPSS, since the Dependent variable is Categorical (dichotomous) and the researcher examine the odd ratio of potentially being re-arrested vs. not expected to be re-arrested.
Omnibus tests of model coefficients The table shows the "Omnibus Test of Model Coefficients" based on Chi-Square test, which implies that the overall model is predictive of re-arrest (focus is on row three—"Model"): (4 degrees of freedom) = 41.15, p < .001, and the null can be rejected. Testing the null that the Model, or the group of independent variables that are taken together, does not predict the likelihood of being re-arrested. This result means that the model of expecting re-arrestment is more suitable to the data.
Variables in the equation One can also reject the null that the B coefficients for having committed a felony, completing a rehab program, and being employed are equal to zero—they are statistically significant and predictive of re-arrest. Education level, however, was not found to be predictive of re-arrest. Controlling for other variables, having committed a felony for the first offense increases the odds of being re-arrested by 33% (p = .046), compared to having committed a misdemeanor. Completing a rehab program and being employed after the first offense decreases the odds or re-arrest, each by more than 50% (p < .001). The last column, Exp(B) (taking the B value by calculating the inverse natural log of B) indicates odds ratio: the probability of an event occurring, divided by the probability of the event not occurring. An Exp(B) value over 1.0 signifies that the independent variable increases the odds of the dependent variable occurring. An Exp(B) under 1.0 signifies that the independent variable decreases the odds of the dependent variable occurring, depending on the decoding that mentioned on the variables details before. A negative B coefficient will result in an Exp(B) less than 1.0, and a positive B coefficient will result in an Exp(B) greater than 1.0. The statistical significance of each B is tested by the Wald Chi-Square—testing the null that the B coefficient = 0 (the alternate hypothesis is that it does not = 0). p-values lower than alpha are significant, leading to rejection of the null. Here, only the independent variables felony, rehab, employment, are significant ( P-Value<0.05. Examining the odds ratio of being re-arrested vs. not re-arrested, means to examine the odds ratio for comparison of two groups (re-arrested = 1 in the numerator, and re-arrested = 0 in the denominator) for the felony group, compared to the baseline misdemeanor group. Exp(B)=1.327 for "felony" can indicates that having committed a felony vs. misdemeanor increases the odds of re-arrest by 33%. For "rehab", a person can say that having completed rehab reduces the likelihood (or odds) of being re-arrested by almost 51%. ==See also==