Class 8
Class 8
AND
INTERNATIONAL RELATIONS
Posc/Uapp 816
I. AGENDA:
A. Elements of the linear model
B. Interpretation of regression parameters
C. Causal inference from non-experimental research
D. Least squares principle
E. Reading: Agresti and Finlay Statistical Methods in the Social Sciences, 3rd
edition, Chapter 9.
D. Interpretation:
1. a is the intercept, that is the value of Y when X equals zero. If the line is
graphed on an Y-X coordinate system (see below), then a is the point
where the line crosses the Y axis.
2. b, called the slope, is the amount of change in Y for a one-unit change in
X. It's measured in units of the dependent variable, Y, but its numerical
value depends on the measurement scale: if X is measured in dollars, then b
will equal some particular value, but if the scale is thousands of dollars, b
will have a different value.
3. The figure presented in Class 7 shows a picture of the graph of a linear
relationship. Notice that the graph is a straight line.
4. The linear relationship described by this graph is:
Y ' a % bX ' 2 % (2)X
E. In other words, the intercept of this particular model is 2 and the slope is 2.0.
F. The numbers a and b are called regression parameters; note that they are constants
whereas X and Y are variables. The parameters show you how X affects or at least
is connected to Y.
Posc/Uapp 816 Class 8 - Two Variable Regression Page 2
B. Interpretation of parameters:
1. β0 is the regression constant or intercept, that is the value of Y when X
equals zero. If the line is graphed on an Y-X coordinate system, then β0 is
the point where the line crosses the Y axis.
2. β1, called the slope or regression parameter, is the amount of change in Y
for a one-unit change in X. As noted above, be thoughtful when looking at
β1
i. Its numerical value depends on the measurement scale: if X is
measured in dollars, then it will equal some particular value, but if
the scale is thousands of dollars, it will have a different value.
3. Another way of viewing the model: how an individual's (unit's) score on Y
is affected by the independent variable, X.
i. The parameter β1 is sometimes interpreted as a "causal" mechanism
linking X to Y.
ii. But see the next section.
iii. A linear model, in brief, is a summary of what we think we know
about the dependent variable.
C. Example:
1. Suppose the estimated or observed regression equation turns out to be:
i. Here β0 = 10.1.
a) Sometimes the constant has no “real” or substantive
meaning, as when for example we are relating achievement
to age. (Age = 0 would be meaningless in most social
science studies.)
ii. The regression constant is β1 = .03, which means that as X changes
(increases) 1 unit (say, one year), Y increases .03 units of whatever
Y is measured on, say an achievement index.
a) This may or may not be a large change.
b) You have to ask two questions at least:
* What is the substantive meaning of a one-unit
increase or decrease in X.
* What is the substantive meaning of a β1 unit change
Posc/Uapp 816 Class 8 Regression Page 3
in Y.
D. Mean value interpretation of Y:
1. The linear model is sometimes written (see Agresti and Finlay, Statistical
Methods, 3rd edition, page 314) as;
E(Yi) ' $0 % $1Xi
2. Interpretation:
i. The systematic part contains:
a) β0, the intercept or constant which is the value of Y when X
=0
b) β1, the slope or regression coefficient which shows how
much Y changes for a one-unit change in X.
c) Suppose β1 = 0? What does that mean?
D. Error part:
1. εi represents random error--that is, measurement error in Y (but hopefully
not X), random factors causing variation in Y, etc. εi symbolizes the part of
the variation in Y (e.g., illegitimacy) that is not explained by the model.
Posc/Uapp 816 Class 8 Regression Page 4
2. See Agresti and Finlay, Statistical Methods for Social Sciences, 3rd edition
pages 314 to 319.
3. An important goal of the social sciences is to reduce the magnitude of the
εi's and to ensure that they are really random. Doing so has the effect of
increasing the explanatory power of the model compared to the error
component.
E. What we need is some method for finding numerical values of β , and β1 , when
the data are scattered about as in the example.
1. Before looking at how parameters are estimated, however, let's interpret
regression parameters from another angle.
AFDC payments and family structure. The question then arises: are the
differences in illegitimacy due to a) AFDC payments; b) family structureand
social norms; or c) both.
2. Figure 2 suggests alternative models.
3. "Hard scientists" would try to answer the question by manipulating
variables. (They would move families at random to different states, thus
cancelling out the association between welfare payments and family
structure.) In a sense they would be comparing apples with apples: the
states being compared would be the same in all relevant respects except for
AFDC payment level. If their illegitimacy rates differed, they investigators
could attribute the differences to the main independent variable.
4. But, of course, in the real world such manipulations are not possible;
families cannot be moved around to test hypotheses. (Actually social
scientists and policy analysts have attempted to experiment on welfare
recipients.)
5. The only solution is to adjust whatever statistical measure of relation
between Y and X, β1 for example, for the effects of other factors.
6. These considerations lead to two conclusions:
i. We have to be careful about translating statistical relationships, as
measured by the betas, into causal assertions of the form "X causes
(variation) in Y."
ii. We need methods to adjust the statistical measures, the β’s, to take
into account at least some possible confounding influences.
E. This is a matter we will deal with in the remainder of the course.
Ŷi ' $
$̂0 % $
$̂1X
where Ŷi is the predicted value of Y, and $̂0 and $̂1 are the estimated values
2. Suppose, to continue with the above case, a case had X = 0--in which case
we would predict its value on Y to be 10.1 (see above)--but in fact its
actual or observed illegitimacy rate is 20. Then the error or residual for this
county is 212 - 20 = 9.9.
3. A geometrical interpretation of residuals is shown in Figure 2.
4. Interpretation:
Posc/Uapp 816 Class 8 Regression Page 8
i'1
'1
i' '1
i'
Posc/Uapp 816 Class 8 Regression Page 9
Go to Notes page
Go to Statistics page