In
statistics and
regression analysis, an
independent variable that can take on only two possible values is called a
dummy variable. For example, it may take on the value 0 if an observation is of a white subject or 1 if the observation is of a black subject. The two possible categories associated with the two possible values are mutually exclusive, so that no observation falls into more than one category, and the categories are exhaustive, so that every observation falls into some category. Sometimes there are three or more possible categories, which are pairwise mutually exclusive and are collectively exhaustive — for example, under 18 years of age, 18 to 64 years of age, and age 65 or above. In this case a set of dummy variables is constructed, each dummy variable having two mutually exclusive and jointly exhaustive categories — in this example, one dummy variable (called D1) would equal 1 if age is less than 18, and would equal 0
otherwise; a second dummy variable (called D2) would equal 1 if age is in the range 18–64, and 0 otherwise. In this set-up, the dummy variable pairs (D1, D2) can have the values (1,0) (under 18), (0,1) (between 18 and 64), or (0,0) (65 or older) (but not (1,1), which would nonsensically imply that an observed subject is both under 18 and between 18 and 64). Then the dummy variables can be included as independent (explanatory) variables in a regression. The number of dummy variables is always one less than the number of categories: with the two categories black and white there is a single dummy variable to distinguish them, while with the three age categories two dummy variables are needed to distinguish them. Such
qualitative data can also be used for
dependent variables. For example, a researcher might want to predict whether someone gets arrested or not, using family income or race, as explanatory variables. Here the variable to be explained is a dummy variable that equals 0 if the observed subject does not get arrested and equals 1 if the subject does get arrested. In such a situation,
ordinary least squares (the basic regression technique) is widely seen as inadequate; instead
probit regression or
logistic regression is used. Further, sometimes there are three or more categories for the dependent variable — for example, no charges, charges, and death sentences. In this case, the
multinomial probit or
multinomial logit technique is used. ==See also==