Dummy Variables
Dummy Variables
The main purpose of “dummy variables” is that they are tools that allow us to represent
nominal-level independent variables in statistical techniques like regression analysis.
Without the tool of dummy variables, these statistical methods would not be able to
include nominal-level variables, which would be a severe limitation.
• First note that we use a set of n-1 dummy variables as tools to represent an
n-category variable.
• Choose one of the categories to serve as the “reference” category, the category to
which you compare the other categories.
• Create dummy (0/1) variables to represent each of the other categories. Each
dummy is coded so that it has the value 1 if a case is in that category, and 0 if
not.
• Interpret the regression coefficient for each dummy variable as how that category
compares to the reference category.
Then say we estimate our regression equation and get the following results:
Interpretation of the above results for the dummy variables involves a straight-
forward comparison with the reference category: Past smokers, compared to
people who never smoked, have a blood pressure 6 points higher, controlling for
the other independent variables. Current smokers, compared to people who never
smoked, have a blood pressure 14 points higher, controlling for the other
independent variables. Comparing current smokers to past smokers, we see that
current smokers have a blood pressure 8 points higher (14-6), controlling for the
other independent variables.