0% found this document useful (0 votes)
81 views2 pages

Dummy Variables

Dummy variables allow nominal independent variables to be included in statistical techniques like regression analysis by representing categorical data numerically. To represent an n-category variable, n-1 dummy variables are used, with one category as the reference. Dummy variables are coded as 1 if an observation is in that category and 0 otherwise. The regression coefficients for dummy variables represent differences compared to the reference category.

Uploaded by

Elena Kirtcheva
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views2 pages

Dummy Variables

Dummy variables allow nominal independent variables to be included in statistical techniques like regression analysis by representing categorical data numerically. To represent an n-category variable, n-1 dummy variables are used, with one category as the reference. Dummy variables are coded as 1 if an observation is in that category and 0 otherwise. The regression coefficients for dummy variables represent differences compared to the reference category.

Uploaded by

Elena Kirtcheva
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Dummy Variables

The main purpose of “dummy variables” is that they are tools that allow us to represent
nominal-level independent variables in statistical techniques like regression analysis.
Without the tool of dummy variables, these statistical methods would not be able to
include nominal-level variables, which would be a severe limitation.

How to use dummy variables to represent an n-category variable:

• First note that we use a set of n-1 dummy variables as tools to represent an
n-category variable.
• Choose one of the categories to serve as the “reference” category, the category to
which you compare the other categories.
• Create dummy (0/1) variables to represent each of the other categories. Each
dummy is coded so that it has the value 1 if a case is in that category, and 0 if
not.
• Interpret the regression coefficient for each dummy variable as how that category
compares to the reference category.

Example of using dummy variables:

Say we are using multiple regression analysis to analyze predictors of blood


pressure. Our unit of analysis is the person. The dependent variable is the
person’s diastolic blood pressure. We have a number of interval-level
independent variables, such as the person’s age, weight, etc. But we also want to
include in the equation the person’s “smoking history”, whether the person 1)
never smoked, 2) used to smoke, or 3) currently smokes.

To represent this three-category variable we use two dummy variables. We could


let the “never smoked” category be the reference category, and create two dummy
variables:

• SmokPast = 1 if a past smoker; 0 otherwise

• SmokNow = 1 if a current smoker; 0 otherwise

Then say we estimate our regression equation and get the following results:

BP = a + b Age + c Weight + …… + 6 SmokPast + 14 SmokNow

Interpretation of the above results for the dummy variables involves a straight-
forward comparison with the reference category: Past smokers, compared to
people who never smoked, have a blood pressure 6 points higher, controlling for
the other independent variables. Current smokers, compared to people who never
smoked, have a blood pressure 14 points higher, controlling for the other
independent variables. Comparing current smokers to past smokers, we see that
current smokers have a blood pressure 8 points higher (14-6), controlling for the
other independent variables.

See the Allison text for further coverage of dummy variables:

• pp. 10, 45 (#3), basic concepts


• p. 163, how to use dummy variables to represent a categorical variable with more
than 2 categories
• pp. 46, questions 1-2, raises basic issues of interpretation
• pp. 164-165, example of representing possible non-linear effects using dummy
variables; similar example to my LA RTD vote example

You might also like