0% found this document useful (0 votes)

7 views

Class 8

This document discusses key concepts in two variable regression analysis: 1. It introduces the linear regression model equation Y = a + bX and interprets the intercept (a) and slope (b) parameters. 2. It explains how to interpret the estimated regression equation coefficients based on a sample example. 3. It describes the statistical model as containing a systematic part with the intercept and slope, and an error part to account for random variation not explained by the model.

Uploaded by

naikvinayak1507

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Class 8

Uploaded by

naikvinayak1507

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DEPARTMENT OF POLITICAL SCIENCE

AND
INTERNATIONAL RELATIONS
Posc/Uapp 816

TWO VARIABLE REGRESSION

I. AGENDA:
A. Elements of the linear model
B. Interpretation of regression parameters
C. Causal inference from non-experimental research
D. Least squares principle
E. Reading: Agresti and Finlay Statistical Methods in the Social Sciences, 3rd
edition, Chapter 9.

II. GEOMETRY OF LINES:

A. See the notes from the last class (Class 7)
B. To understand the linear model let's review some simple math.
C. The equation of a linear (straight line) relationship between two variables, Y and
X, is
Y ' a % bX

D. Interpretation:
1. a is the intercept, that is the value of Y when X equals zero. If the line is
graphed on an Y-X coordinate system (see below), then a is the point
where the line crosses the Y axis.
2. b, called the slope, is the amount of change in Y for a one-unit change in
X. It's measured in units of the dependent variable, Y, but its numerical
value depends on the measurement scale: if X is measured in dollars, then b
will equal some particular value, but if the scale is thousands of dollars, b
will have a different value.
3. The figure presented in Class 7 shows a picture of the graph of a linear
relationship. Notice that the graph is a straight line.
4. The linear relationship described by this graph is:
Y ' a % bX ' 2 % (2)X

E. In other words, the intercept of this particular model is 2 and the slope is 2.0.
F. The numbers a and b are called regression parameters; note that they are constants
whereas X and Y are variables. The parameters show you how X affects or at least
is connected to Y.
Posc/Uapp 816 Class 8 - Two Variable Regression Page 2

III. INTERPRETING THE REGRESSION MODEL:

A. The equation of a linear (straight line) relationship between two variables, Y and
X, is
Yi ' $0 % $1Xi % gi

B. Interpretation of parameters:
1. β0 is the regression constant or intercept, that is the value of Y when X
equals zero. If the line is graphed on an Y-X coordinate system, then β0 is
the point where the line crosses the Y axis.
2. β1, called the slope or regression parameter, is the amount of change in Y
for a one-unit change in X. As noted above, be thoughtful when looking at
β1
i. Its numerical value depends on the measurement scale: if X is
measured in dollars, then it will equal some particular value, but if
the scale is thousands of dollars, it will have a different value.
3. Another way of viewing the model: how an individual's (unit's) score on Y
is affected by the independent variable, X.
i. The parameter β1 is sometimes interpreted as a "causal" mechanism
linking X to Y.
ii. But see the next section.
iii. A linear model, in brief, is a summary of what we think we know
about the dependent variable.
C. Example:
1. Suppose the estimated or observed regression equation turns out to be:

Ŷi ' 10.1 % .03Xi

i. Here β0 = 10.1.
a) Sometimes the constant has no “real” or substantive
meaning, as when for example we are relating achievement
to age. (Age = 0 would be meaningless in most social
science studies.)
ii. The regression constant is β1 = .03, which means that as X changes
(increases) 1 unit (say, one year), Y increases .03 units of whatever
Y is measured on, say an achievement index.
a) This may or may not be a large change.
b) You have to ask two questions at least:
* What is the substantive meaning of a one-unit
increase or decrease in X.
* What is the substantive meaning of a β1 unit change
Posc/Uapp 816 Class 8 Regression Page 3

in Y.
D. Mean value interpretation of Y:
1. The linear model is sometimes written (see Agresti and Finlay, Statistical
Methods, 3rd edition, page 314) as;
E(Yi) ' $0 % $1Xi

i. This equation suggests that the average or expected value of Y

depends on a corresponding value of X. If $1 is positive, for
example, then the expected value of the dependent variable will
increase with increases in X.
ii. This interpretation leads to the next topic.

IV. THE STATISTICAL MODEL:

A. Social and political relationships are seldom "determinate" which means that we
have to add "error" to our conceptions of how one thing affects another. Also, we
frequently deal with samples, not the total population, so we need to think about
estimates versus parameters.
B. Sources of error:
1. Random fluctuations caused by hundreds of idiosyncratic factors,
presumably which "cancel" each other out.
2. Random measurement error
C. The systematic part:
1. Suppose we have a quantitative dependent variable, Y, and a quantitative
independent variable, X. In a previous example Y is "out-of-wedlock
births" and X is the "average monthly AFDC payments." A statistical
model describing the relationship is:
Yi ' $0 % $1Xi % gi

2. Interpretation:
i. The systematic part contains:
a) β0, the intercept or constant which is the value of Y when X
=0
b) β1, the slope or regression coefficient which shows how
much Y changes for a one-unit change in X.
c) Suppose β1 = 0? What does that mean?
D. Error part:
1. εi represents random error--that is, measurement error in Y (but hopefully
not X), random factors causing variation in Y, etc. εi symbolizes the part of
the variation in Y (e.g., illegitimacy) that is not explained by the model.
Posc/Uapp 816 Class 8 Regression Page 4

2. See Agresti and Finlay, Statistical Methods for Social Sciences, 3rd edition
pages 314 to 319.
3. An important goal of the social sciences is to reduce the magnitude of the
εi's and to ensure that they are really random. Doing so has the effect of
increasing the explanatory power of the model compared to the error
component.
E. What we need is some method for finding numerical values of β , and β1 , when
the data are scattered about as in the example.
1. Before looking at how parameters are estimated, however, let's interpret
regression parameters from another angle.

V. CAUSAL INFERENCE IN NON-EXPERIMENTAL RESEARCH:

A. It is often said that natural science differs from social inquiry because, among other
things, investigators working in the former can literally manipulate variables to
observe the effects on various phenomena. Hence, a chemist can administer
varying amounts of a compound to rats to see what effect it has on, say, the
number of lymphocytes.
B. Moreover, so the conventional wisdom continues, the laboratory scientist can hold
all relevant factors constant, so that if there is a change in cell counts, the
difference can unambiguously be attributed to the compound. The researcher, it is
believed, can make a reasonably valid causal inference. The inference about
causality derives its strength from the experimenter's ability to eliminate alternative
explanations for any observed changes.
C. Now compare this situation with that facing the social scientist who wants to know
if changes in AFDC payments affect "deviant" or undesirable behavior. It is
possible, as we have already demonstrated, to compare areas having differing
payment levels. Or, as we just did, we can examine the association between
variation in one variable (AFDC payments) and out-of-wedlock births.
D. The problem comes in interpreting the results. Since we are dealing with
"observational" data--we have not manipulated anything nor have we control for
possible alternative causal factors, it is difficult to interpret our results, especially
the regression coefficient, as a "causal" parameter.
1. Why? Suppose, for the moment, our data had confirmed Murray's
argument: states with the highest welfare benefits had the highest
proportion of out-of-wedlock births. (This is contrary to what we did find,
but let's suspend our knowledge for a moment.) But consider this
possibility: those states having low AFDC payments also happen to be
populated by groups with strong and extended families and consequently
illegitimacy violates well established social norms. Suppose, in addition,
those places with more generous benefits do not contain as many such
groups. There are, in other words, three relationships: one between the
dependent variable (births) and AFDC payments; another between births
and family structure; and a third between the two independent variables,
Posc/Uapp 816 Class 8 Regression Page 5

AFDC payments and family structure. The question then arises: are the
differences in illegitimacy due to a) AFDC payments; b) family structureand
social norms; or c) both.
2. Figure 2 suggests alternative models.
3. "Hard scientists" would try to answer the question by manipulating
variables. (They would move families at random to different states, thus
cancelling out the association between welfare payments and family
structure.) In a sense they would be comparing apples with apples: the
states being compared would be the same in all relevant respects except for
AFDC payment level. If their illegitimacy rates differed, they investigators
could attribute the differences to the main independent variable.

Figure 2: Alternative Causal Models

Posc/Uapp 816 Class 8 Regression Page 6

4. But, of course, in the real world such manipulations are not possible;
families cannot be moved around to test hypotheses. (Actually social
scientists and policy analysts have attempted to experiment on welfare
recipients.)
5. The only solution is to adjust whatever statistical measure of relation
between Y and X, β1 for example, for the effects of other factors.
6. These considerations lead to two conclusions:
i. We have to be careful about translating statistical relationships, as
measured by the betas, into causal assertions of the form "X causes
(variation) in Y."
ii. We need methods to adjust the statistical measures, the β’s, to take
into account at least some possible confounding influences.
E. This is a matter we will deal with in the remainder of the course.

VI. LEAST SQUARES PRINCIPLE:

A. Suppose we have two estimates of β0 and β1; for now it doesn't matter where they
came from. As example, suppose the estimates for an equation are 10.1 for β0 and
.03 for β1. With these numbers we can obtain an estimated model (note the hats):

Ŷi ' $
$̂0 % $
$̂1X

where Ŷi is the predicted value of Y, and $̂0 and $̂1 are the estimated values

of the parameters. For example,

Ŷi ' 10.1 % .03X

1. Here, if the X is 0, the estimated or predicted value of Y is

ŷi ' 10.1 % (0) ' 10.1

2. If X is, say, 250, then the predicted value is

Ŷi ' 10.1 % .03(250) ' 17.6

B. Residuals: A residual is the difference between a predicted value (predicted on the

basis of some model) and the corresponding observed value.
1. The formula is:
Posc/Uapp 816 Class 8 Regression Page 7

gĝi ' (Ŷi & Yi)

2. Suppose, to continue with the above case, a case had X = 0--in which case
we would predict its value on Y to be 10.1 (see above)--but in fact its
actual or observed illegitimacy rate is 20. Then the error or residual for this
county is 212 - 20 = 9.9.
3. A geometrical interpretation of residuals is shown in Figure 2.

Figure 2: Partition of Deviation

4. Interpretation:
Posc/Uapp 816 Class 8 Regression Page 8

i. (Yi - Y ) is the difference between the ith unit's score on Y and

the grand (overall) mean. This difference when combined with all
the other corresponding differences measures the total variation in
Y.
ii. When all of these differences are combined by first squaring and
then summing them the result is the total sum of squares (TSS), an
important measure of variation in Y. The formula is:

TSS ' j (Yi & Y)2

i'1

iii. ( Ŷi - Y ) is the difference between the predicted Y and the

grand mean. It, in a sense, represents how much we know about Y

given our knowledge of X. In other words, if we knew nothing we
would "predict" that a typical unit would have a score equal to the
grand mean. But with our model of X's impact on Y, we know
more than this; in fact we know that as X increases one unit (one
dollar in this example) the value of Y will increase .03 units. Thus, a
portion of the total variation in Y is "explained" by our knowledge
of X which is summarized mathematically in the equation:

Ŷi ' 10.1 % .03X.

iv. Finally, gĝi represents error in prediction. It is, stated in other

words, the difference between what we think Y should be and what
it actually is. This error together with all of the others represents
the portion of variation in Y that is not accounted for by X.
5. The Least Squares Principle:
i. We pick as estimators of $0 and $1 those particular values that
minimize the sum of squared residuals for a batch of N observations
under study. That is, thinking of $0 and $1 as population
parameters, we choose estimates of them in such a way that the
quantity is a minimum.

S 2 ' j gĝi ' j (Ŷi & Yi)

N N
2 2

'1
i' '1
i'
Posc/Uapp 816 Class 8 Regression Page 9

6. Keep S2 in mind because it comes up again and again.

7. The principle of least squares leads to computing formulas used to obtain
estimates of the parameters from a set of data. These formulas are describe
by Agresti and Finlay and will be discussed later. For now we will rely on
MINITAB to compute the numerical estimates.

VII. NEXT TIME:

A. Examples of MINITAB regression
B. Measures of fit
C. Tests of significance

Go to Notes page

Go to Statistics page

PDF The Stacked Deck: An Introduction to Social Inequality Second Edition Jennifer Ball download
100% (1)
PDF The Stacked Deck: An Introduction to Social Inequality Second Edition Jennifer Ball download
35 pages
A History of The Expansion of Christianity VOL 1 PDF
100% (1)
A History of The Expansion of Christianity VOL 1 PDF
442 pages
DOCTOR DREADFUL Zombie Lab Brain Notice
No ratings yet
DOCTOR DREADFUL Zombie Lab Brain Notice
2 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Argumentation in "An Indian Father's Plea": Learning Targets
No ratings yet
Argumentation in "An Indian Father's Plea": Learning Targets
7 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
simple-regression
No ratings yet
simple-regression
14 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
Regression Analysis With Cross-Sectional Data
No ratings yet
Regression Analysis With Cross-Sectional Data
0 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
49 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
No ratings yet
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
107 pages
Module 3 - Data Analysis_S RM
No ratings yet
Module 3 - Data Analysis_S RM
63 pages
IE Chapter2
No ratings yet
IE Chapter2
46 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Chapter 3 - Linear Regression
No ratings yet
Chapter 3 - Linear Regression
43 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
100% (1)
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
45 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
Ch01 (Compatibility Mode)
No ratings yet
Ch01 (Compatibility Mode)
29 pages
Sheraz Qtms Assignment
No ratings yet
Sheraz Qtms Assignment
21 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
1170_10045_411513
No ratings yet
1170_10045_411513
55 pages
Week 9 lecture slides - T
No ratings yet
Week 9 lecture slides - T
22 pages
Wooldridge (2018) - Introductury Econometrics_ A Modern Approach-Chapter 2
No ratings yet
Wooldridge (2018) - Introductury Econometrics_ A Modern Approach-Chapter 2
47 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Simple Linear Regression Scott M Lynch
No ratings yet
Simple Linear Regression Scott M Lynch
111 pages
Regression Analysis and Equation Answer
No ratings yet
Regression Analysis and Equation Answer
33 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Chap010
No ratings yet
Chap010
45 pages
Topic 2
No ratings yet
Topic 2
23 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Ch2 Two Variable Analysis
No ratings yet
Ch2 Two Variable Analysis
13 pages
econ4
No ratings yet
econ4
92 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
Umair Assignment
No ratings yet
Umair Assignment
19 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
No ratings yet
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
53 pages
Unit 02 - Relationships in Data - Handouts - 1 Per Page
No ratings yet
Unit 02 - Relationships in Data - Handouts - 1 Per Page
53 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
CH-3
No ratings yet
CH-3
123 pages
chapter 9
No ratings yet
chapter 9
44 pages
Chapter1 - An Overview of Regression Analysis
No ratings yet
Chapter1 - An Overview of Regression Analysis
35 pages
STAT 445-Lecture 1_2021
No ratings yet
STAT 445-Lecture 1_2021
42 pages
Statistical Modelling of Epidemiological Data
No ratings yet
Statistical Modelling of Epidemiological Data
87 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Unit 02 - Relationships in Data - Handouts - 4 Per Page
No ratings yet
Unit 02 - Relationships in Data - Handouts - 4 Per Page
14 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lectures PowerPoints PDF
No ratings yet
Lectures PowerPoints PDF
459 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
Active Inference: The Free Energy Principle in Mind, Brain, and Behavior
From Everand
Active Inference: The Free Energy Principle in Mind, Brain, and Behavior
Thomas Parr
4/5 (3)
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)
150 QS001G en P PDF
No ratings yet
150 QS001G en P PDF
16 pages
ARDUINO Gear Indicator - Ino
No ratings yet
ARDUINO Gear Indicator - Ino
3 pages
Introduction To Airline Economics: Paul Stephen Dempsey Mcgill University Institute of Air & Space Law
No ratings yet
Introduction To Airline Economics: Paul Stephen Dempsey Mcgill University Institute of Air & Space Law
49 pages
Global Free Trade 2.0 - Dela Cruz
No ratings yet
Global Free Trade 2.0 - Dela Cruz
33 pages
Achalasia Cardia
No ratings yet
Achalasia Cardia
32 pages
Katrein Antennas 790-2200 MHZ PDF
No ratings yet
Katrein Antennas 790-2200 MHZ PDF
188 pages
Lab Aws 14-10
100% (1)
Lab Aws 14-10
25 pages
1 s2.0 S2187076416300355 Main
No ratings yet
1 s2.0 S2187076416300355 Main
7 pages
Rashi Mantra - Mantras For Your Zodiac Signs
No ratings yet
Rashi Mantra - Mantras For Your Zodiac Signs
1 page
Holiday Home Work For Class 12 Hindi
No ratings yet
Holiday Home Work For Class 12 Hindi
2 pages
Plasma membrane MCQ1
No ratings yet
Plasma membrane MCQ1
4 pages
What Is Anthropology
No ratings yet
What Is Anthropology
19 pages
MAT 2640 MCGB - Data Sheet For Suppliers Old MAT No.: 308
No ratings yet
MAT 2640 MCGB - Data Sheet For Suppliers Old MAT No.: 308
3 pages
XenoSure - Brochure - M0236 Rev. T-1
No ratings yet
XenoSure - Brochure - M0236 Rev. T-1
6 pages
QFT Notes Partiii
No ratings yet
QFT Notes Partiii
70 pages
Advantages and Disadvantages of Smoking
No ratings yet
Advantages and Disadvantages of Smoking
4 pages
Discovering Nutrition 3rd Edition
No ratings yet
Discovering Nutrition 3rd Edition
5 pages
Numerical Methods in Nuclear Engineering
No ratings yet
Numerical Methods in Nuclear Engineering
11 pages
7000 GVWR Standard Wood Floor Skid Steer Trailer - 14 Ft. - 06-01-2020
No ratings yet
7000 GVWR Standard Wood Floor Skid Steer Trailer - 14 Ft. - 06-01-2020
3 pages
Quiz 2
No ratings yet
Quiz 2
22 pages
Ace-6000 CT R4 LD
No ratings yet
Ace-6000 CT R4 LD
4 pages
Narrative Question
No ratings yet
Narrative Question
7 pages
Office Mail Management
No ratings yet
Office Mail Management
52 pages
What's in The Little Red Box On Some Jump-Start Leads (Guess First.)
No ratings yet
What's in The Little Red Box On Some Jump-Start Leads (Guess First.)
50 pages
Chapter 10 Image Segmentation
No ratings yet
Chapter 10 Image Segmentation
95 pages
Inside Out Pre Intermediate
No ratings yet
Inside Out Pre Intermediate
3 pages

Class 8

Uploaded by

Class 8

Uploaded by

DEPARTMENT OF POLITICAL SCIENCE

TWO VARIABLE REGRESSION

II. GEOMETRY OF LINES:

III. INTERPRETING THE REGRESSION MODEL:

Ŷi ' 10.1 % .03Xi

i. This equation suggests that the average or expected value of Y

IV. THE STATISTICAL MODEL:

V. CAUSAL INFERENCE IN NON-EXPERIMENTAL RESEARCH:

Figure 2: Alternative Causal Models

VI. LEAST SQUARES PRINCIPLE:

of the parameters. For example,

Ŷi ' 10.1 % .03X

1. Here, if the X is 0, the estimated or predicted value of Y is

2. If X is, say, 250, then the predicted value is

Ŷi ' 10.1 % .03(250) ' 17.6

B. Residuals: A residual is the difference between a predicted value (predicted on the

gĝi ' (Ŷi & Yi)

Figure 2: Partition of Deviation

i. (Yi - Y ) is the difference between the ith unit's score on Y and

TSS ' j (Yi & Y)2

iii. ( Ŷi - Y ) is the difference between the predicted Y and the

grand mean. It, in a sense, represents how much we know about Y

Ŷi ' 10.1 % .03X.

iv. Finally, gĝi represents error in prediction. It is, stated in other

S 2 ' j gĝi ' j (Ŷi & Yi)

6. Keep S2 in mind because it comes up again and again.

VII. NEXT TIME:

You might also like