0% found this document useful (0 votes)

58 views6 pages

Handout 41100 PolynomialRegression

The document discusses polynomial regression and why it is important to include lower order terms when adding higher order polynomial terms to a regression model. It provides examples using housing price and wage data to illustrate that forcing the intercept term to be equal to zero results in unrealistic slope estimates and fails to capture the true trend in the relationship between the variables. The document concludes that making strong assumptions about coefficient values, such as setting the intercept to zero, is inappropriate without good reason as it can distort the interpretation of other coefficient estimates.

Uploaded by

Prajol Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views6 pages

Handout 41100 PolynomialRegression

Uploaded by

Prajol Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Polynomial Regression:

Interpretation and Lower Order Terms

Max H. Farrell
BUS 41100
August 28, 2015

In class we talked about polynomial regression and the point was made that we always keep “lower
order” terms whenever we put additional polynomials into the model. This handout explains
the intuition and interpretation reasons behind this, with examples. The bottom line is that by
assuming a certain coefficient is exactly equal to zero, you are making a strong assumption on how
Y responds to X, one that you have no business making.

Contents
1 Building Intuition with the Intercept 1
1.1 Example: House Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Example: Wage Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Now Adding Squared Terms 4

2.1 Example: Call Center Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Higher Order Polynomials 6

4 Connection to Multiple Linear Regression 6

1 Building Intuition with the Intercept

Let’s return to simple linear regression and consider leaving out the intercept. This will give us
good intuition for what will happen we run polynomial regression but exclude lower order terms.
The general model is: E[Y |X] = β0 + β1 X + ε. Remember that β0 , β1 , and σ 2 are unknown. In
particular, we don’t know if β0 = 0 or not. But suppose we force least squares to set b0 = 0. What
have we done?
If b0 = 0, the intercept of the line is at zero: that is, when X = 0, we predict Ŷ = 0. So that
means that we force our prediction to be exactly zero at the value X = 0, no matter what. This is
a geometric assumption: the graph of the line must pass through (0,0).
Remember how we usually don’t put much stock in the intercept? Now we’re giving it a strong
interpretation, and requiring a lot of knowledge about it! Do you have a very good reason to believe
that is true? Usually not. Moreover, in general there is nothing special about the point Y = 0 or
X = 0. These will change if we measure the variables differently.
Remember that our goal is to extract the general trend in how Y changes with X. We used to
interpret b1 as the change in Y as X increases. If the intercept is zero, we don’t have this anymore!
We can only say that b1 measures the change in Y as X increase assuming the intercept fixed at
zero. That is because setting b0 = 0 only gives the “right” answer for b1 if the true β0 = 0 too.

1
That is, you must assume that E[Y |X] = β1 X + ε. This is a very strong assumption! Let’s consider
some examples.

1.1 Example: House Prices

Return to the house price data from Lecture 2. We have data on house prices (in thousands of
dollars) and size (in thousands of square feet). What is our goal for this analysis? We want to find
out how price increases with size. So, I want to be able to answer questions like: On average, how
much more expensive is a 3,000 sq. ft. house than a 2,000 sq. ft. house? This is exactly how we
interpret b1 from the linear regression
pricei = b0 + b1 sizei + ei .
What if we force b0 = 0? Now we are assuming β0 = 0, i.e. that a zero square foot house costs
nothing. Is that reasonable? Let’s look at the data:

Including intercept: Forcing b0 = 0:

Call:
Call:
lm(formula = price ~ size)
lm(formula = price ~ size - 1)
Residuals:
Residuals:
Min 1Q Median 3Q Max
Min 1Q Median 3Q Max
-30.425 -8.618 0.575 10.766 18.498
-21.465 -11.003 5.521 24.313 35.313
Coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.885 9.094 4.276 0.000903 ***
size 52.986 2.697 19.65 1.37e-11 ***
size 35.386 4.494 7.874 2.66e-06 ***
---
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 21.13 on 14 degrees of freedom
Residual standard error: 14.14 on 13 degrees of freedom
Multiple R-squared: 0.965, Adjusted R-squared: 0.9625
Multiple R-squared: 0.8267, Adjusted R-squared: 0.8133
F-statistic: 386.1 on 1 and 14 DF, p-value: 1.368e-11
F-statistic: 62 on 1 and 13 DF, p-value: 2.66e-06

●
150

●
Price ($ in 1000s)

●
●
100

●
●
●
●
●
●

●
●

●
50

General line
Force intercept=0
0

0 1 2 3 4

Size (1000s sq. ft)

2
The most important thing is that the slope estimate went up! This makes it look like square footage
is much more expensize: the “no intercept” model says that every extra 1,000 square feet costs
$53k, compared to only $35k from the other regression. Which model is better? (Notice the R2 is
higher on the right, but who cares?!? Remember R2 doesn’t mean anything.)
What happened? By forcing the intercept to be zero, we had to crank the line way up, artificially.
Note that I had to manually expand the range of the graph, so we could see both intercepts.
Here’s the main question: which one of these would you say better captures the general trend in
the response of price to size?
Now suppose I told you that all house sales are subject to a flat tax of $5,000. Then, only (price
- 5000) is under control of the buyer and seller, so we shift all the prices down by 5000. This
shouldn’t affect the slope of the line at all. But if you are still forcing the intercept to be zero, the
slope will have to change!1

1.2 Example: Wage Data

Let’s return to the wage data first used at the end of Lecture 1. Our goal is to find how the
number of hours worked (hours) responds to hourly pay rate (pay.rate). What’s different about
this example? Here, it makes sense to assume that β0 = 0, because that means that if you get paid
nothing, you work zero hours. Let’s see what the data turns up:

Including intercept: Forcing b0 = 0:

Call:
Call:
lm(formula = hours ~ pay.rate)
lm(formula = hours ~ pay.rate - 1)
Coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1913.01 52.89 36.167 < 2e-16 ***
pay.rate 753.24 17.98 41.9 <2e-16 ***
pay.rate 80.94 18.83 4.298 0.00012 ***
Residual standard error: 315.3 on 38 degrees of freedom
Residual standard error: 52.99 on 37 degrees of freedom
Multiple R-squared: 0.9788, Adjusted R-squared: 0.9783
Multiple R-squared: 0.333, Adjusted R-squared: 0.3149
F-statistic: 1756 on 1 and 38 DF, p-value: < 2.2e-16
F-statistic: 18.47 on 1 and 37 DF, p-value: 0.0001203

The interpretation of the intercept on the left makes no sense: you work almost 2000 hours if you
are paid nothing! But the interpretation of the slope is fine: for each extra dollar per hour, you
work an extra 80 hours per year. How about on the right? Assuming you work 0 hours if you aren’t
paid you will work an extra 750 hours for each additional dollar per hour. So someone working for
$1/hour (in 1966 remember) will work 750 hours, someone making $2/hour will work 1500 hours,
and so on. Which is more reasonable to you? Look at the picture:
1
Since b0 = Ȳ − b1 X̄, we can immediately
solve for b1 = Ȳ /X̄. So our forecast for Ŷ at the average X, i.e. at
X̄, is Ŷ (X = X̄) = b0 + b1 X̄ = 0 + Ȳ /X̄ X̄ = Ȳ . So we fit a line that goes through the point (0, 0) and the point
(X̄, Ȳ )! If we shift all the prices down by 5,000, the line will rotate toward being flatter, because the new b1 is the
old b1 minus 5, 000/X̄.

3
● ●
● ●● ●● ●
●● ●● ●●
● ●●
● ● ●
● ●
●
● ●
● ●

2000
●
● ● ●
●● ● ●●
●

1500
Hours worked

1000
500

General line
Force intercept=0
0

0 1 2 3

Pay rate

What’s going on here? Again, the restriction of b0 = 0 is forcing the line to slope up too fast.
Why? Look how far out of the sample you are “predicting” by assuming that β0 = 0 (no pay = no
work). The data we have are for working people (everyone has positive hours and a positive wage),
so this data doesn’t tell us anything about people that don’t work or aren’t paid. And yet we are
making a very strong assumption. What about social security or disability (no work, but positive
pay)? What about working odd jobs (no formal hours or pay)?
And again, there’s nothing special about Y = 0 or X = 0. It might make sense to measure pay
rate as hourly wage above mimimum wage, or measure hours per year relative to a standard work
week. Both of these would change the “zero” point.

2 Now Adding Squared Terms

Intuitively, the same problem will crop up for polynomial regression, that is, a geometric problem.
For now, let’s stick to squared terms. We are considering fitting
yi = b0 + b1 xi + b2 x2i + ei
and setting b1 = 0, that is, leaving out the linear term. Just like forcing the intercept to be zero
was a restriction on the graph of a line, this will also be a geometric/graphical restriction. Recall
the equation of a parabola:
y = a(x + vx )2 + vy .
The point (x = −vx , y = vy ) is the vertex (the bottom or the top of the parabola). If a > 0 the
parabola opens upward (like the letterr “U”); if a < 0 the parabola opens downward. Multiplying
this out we get
y = vy + vx2 + 2avx x + |{z}
a x2 .
| {z } |{z}
β0 β1 β2

4
So if we force least squares to fit b1 = 0 then we are assuming the vertex of the parabola is at x = 0
(the bottom if a > 0, the top if a < 0). Suppose a > 0 (that is, β2 > 0) so the parabola opens
upward. Then the minimum response of Y to X occurs at X = 0, by construction. This is again a
very strong assumption! This is something you are forcing about the shape of how Y responds to
X: when X = 0, Y responds very little.
Again, there’s nothing special about the point X = 0. Suppose you measure the variable X, but
for some number C, suppose I measure X by X + C instead. For example, if X is wage, I measure
it as dollars/hour above minimum wage. This should not affect how Y responds to X at all, that
is, the predicted values shouldn’t change. Assuming there is no linear term, the response should be

Y = β0 + β2 (X + C)2 = β0 + 2β2 C X + β2 X 2 ,
| {z }
β1 ???

and we have a linear term anyway!

So it boils down to the same intuition: by assuming a certain coefficient is exactly equal to zero, you
are making a strong assumption on how Y responds to X, one that you have no business making.
The interpretation of b2 is that as X increases, Y changes by b1 + b2 X, so that the rate of change in
Y depends on the particular X value. This makes sense in a lot of applications (like we mentioned
in class): diminishing returns to scale, increasing returns to education, etc. When we force b1 = 0,
we restrict the response in Y to only have an X-dependent part, and we lose the interpretation of
a rate of change plus a factor that changes with X.

2.1 Example: Call Center Data

Let’s return the call center data from week 3. The goal is to predict productivity (measured by
calls per day) using work experience (months of employment). In class we had a quadratic fit:

yi = b0 + b1 months + b2 months2 + ei .

There’s really no great reason to leave out the linear term here. Conceptually, what would that
mean in this example? It means that people’s productivity gain is the slowest when they are brand
new. Is that reasonable from an intuitive level? Probably not. And we have no data at months=0,
so it does not make sense to impose that the minimum productivity is there.
We can also leave out the intercept and the linear term, setting b0 = b1 = 0. This implies that
the parabola goes through (0,0), so that employees with zero experience make zero calls. Is that
reasonable? Probably not, even on your first day you could presumably make one call.
I’m omitting the R output here, but here is the graph.2 As you can see, it has all the same issues
as before. The blue and green curves do not fit the data at all: they are doing a terrible job of
extracting the general trend of how Y changes with X.
2
It looks like the general curve goes through (0,0) too, but it does not: the intercept is b0 = −0.1404712.

5
35
●
● ● ● ●
● ●
● ● ● ●

30
● ●
●

25
●

●
●

20
calls ●
●

15
10

General quadratic
5

Force b_1=0
Force b_1=0, b_0=0
0

0 5 10 15 20 25 30

months

3 Higher Order Polynomials

The story is the same for higher order polynomials, but more intricate. The graphical/geometric
interpretations of the above two cases are pretty clear. But what does it mean to leave the linear
term out of a cubic fit? To really understand it, you have to go back to the equation for a cubic
curve and figure out exactly what restriction you are imposing. I will not delve into the details.
The message should be clear: you are making a strong restriction on how Y responds to X.

4 Connection to Multiple Linear Regression

In multiple linear regression we interpret each coefficient conditional on what else is in the model.
In the first section, when we interpret b1 from a linear model, the interpretation depends on what
we assume about the intercept. If we force b0 , then the slope is interpreted conditional on this
choice. In the quadratic model, forcing b1 = 0 implies a very specific mechanism for changes in Y .
In multiple linear regression, say with two variables X1 and X2 , we estimate yi = b0 + b1 x1,i +
b2 x2,i + ei . The interpretation of b2 is conditional on X1 being in the model. So b2 measures the
change in Y as X2 increase controlling for X1 , holding it fixed at any given value (this is where the
term “controlling” for comes from in the popular press).
For example, suppose X1 = education and X2 = experience and our goal is predict Y = wages.
If we run the full model, then b1 measures the return to education holding experience constant.
That is, it gives the wage difference between two people who have exactly the same number of years
on the job, but one graduated college and the other only finished high school. If we set b2 = 0
(that is, regress wages on only education), then b1 now measures just the returns to education.
So now it gives the wage difference between two people where one graduated college and the other
only finished high school, no matter how many years on the job they have.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
ETC2410 Introductory Econometrics Unit Overview
No ratings yet
ETC2410 Introductory Econometrics Unit Overview
42 pages
GN021 Traffic Impact Assessment & Day-Time Ban Requirements For Road Works On Traffic Sensitive Routes
No ratings yet
GN021 Traffic Impact Assessment & Day-Time Ban Requirements For Road Works On Traffic Sensitive Routes
7 pages
Coronel DatabaseSystems 13e ch01
No ratings yet
Coronel DatabaseSystems 13e ch01
32 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
topic-2
No ratings yet
topic-2
23 pages
Topic 2
No ratings yet
Topic 2
23 pages
Chapter - 2
No ratings yet
Chapter - 2
59 pages
Lecture 8
No ratings yet
Lecture 8
32 pages
Lec Topic2
No ratings yet
Lec Topic2
68 pages
1. Linear regression Model - Applied_Part 1&2
No ratings yet
1. Linear regression Model - Applied_Part 1&2
69 pages
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
No ratings yet
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
55 pages
Notes Chapter 2
No ratings yet
Notes Chapter 2
19 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
ec226_24-25_week7_Thursday
No ratings yet
ec226_24-25_week7_Thursday
13 pages
CHAPTER 2
No ratings yet
CHAPTER 2
17 pages
Basic Eco No Metrics - Gujarati
50% (2)
Basic Eco No Metrics - Gujarati
48 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Lecture 8 - Removed
No ratings yet
Lecture 8 - Removed
13 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
No ratings yet
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
29 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Lectures
No ratings yet
Lectures
766 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Violations of Gauss Markov Assumptions: Omitted Variable Bias
No ratings yet
Violations of Gauss Markov Assumptions: Omitted Variable Bias
10 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Econometrics - Exercise set 1 (solution)
No ratings yet
Econometrics - Exercise set 1 (solution)
7 pages
Choosing A Functional Form
No ratings yet
Choosing A Functional Form
8 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
econometrics-cheat-sheet
No ratings yet
econometrics-cheat-sheet
4 pages
Econometrics 2
100% (1)
Econometrics 2
69 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Ordinary least Squares
No ratings yet
Ordinary least Squares
54 pages
3_lecture03
No ratings yet
3_lecture03
30 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
27 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Lecture Note 2: Simple Linear Regression Model: Changsu Ko
No ratings yet
Lecture Note 2: Simple Linear Regression Model: Changsu Ko
86 pages
Dougherty5e Studyguide ch15
No ratings yet
Dougherty5e Studyguide ch15
28 pages
Multiple regression
No ratings yet
Multiple regression
14 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
No ratings yet
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
107 pages
Regression Analysis: Ordinary Least Squares
No ratings yet
Regression Analysis: Ordinary Least Squares
12 pages
Problem Set 03 - Solutions
No ratings yet
Problem Set 03 - Solutions
16 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
Basic Econometrics Revision - Econometric Modelling
No ratings yet
Basic Econometrics Revision - Econometric Modelling
65 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
INTERNid IP Backup
No ratings yet
INTERNid IP Backup
8 pages
Inguillo V First Phil Scales
No ratings yet
Inguillo V First Phil Scales
3 pages
Manipulation of Financial Statements V0.1
No ratings yet
Manipulation of Financial Statements V0.1
17 pages
Technical Seminar On Battery Charger For Standby Application
No ratings yet
Technical Seminar On Battery Charger For Standby Application
11 pages
Abstract:: The Bluetooth Technology
No ratings yet
Abstract:: The Bluetooth Technology
8 pages
Third Lecture Applicaions and EMR
No ratings yet
Third Lecture Applicaions and EMR
13 pages
Conversation Laundry Service
100% (1)
Conversation Laundry Service
3 pages
ML Project Report
No ratings yet
ML Project Report
16 pages
Mutual Funds & Its Types
No ratings yet
Mutual Funds & Its Types
19 pages
PDF Corporate Board of Directors: Structure and Efficiency 1st ed. Edition Ismail Lahlou download
100% (2)
PDF Corporate Board of Directors: Structure and Efficiency 1st ed. Edition Ismail Lahlou download
51 pages
Ravi - Flight Ticket - HYD - PNQ - 19 Feb
No ratings yet
Ravi - Flight Ticket - HYD - PNQ - 19 Feb
2 pages
ALL Cash Buyer Infographics
No ratings yet
ALL Cash Buyer Infographics
3 pages
Accident Prevention and Crisis Management
100% (1)
Accident Prevention and Crisis Management
13 pages
Safari - 1 Nov 2023 at 15:20
No ratings yet
Safari - 1 Nov 2023 at 15:20
1 page
Power BI Developer Roadmap
No ratings yet
Power BI Developer Roadmap
3 pages
Boy Scouts of The Phils Vs - Coa, GR 177131, June 7, 2011 DIGEST
No ratings yet
Boy Scouts of The Phils Vs - Coa, GR 177131, June 7, 2011 DIGEST
3 pages
GXT745
No ratings yet
GXT745
24 pages
Principles of Financial Management
No ratings yet
Principles of Financial Management
182 pages
Contract
No ratings yet
Contract
7 pages
FLEX-6000 Family Datasheet
No ratings yet
FLEX-6000 Family Datasheet
3 pages
Tratamiento de Sulfinert
No ratings yet
Tratamiento de Sulfinert
2 pages
Section K - Individual Education Plans (IEPs)
No ratings yet
Section K - Individual Education Plans (IEPs)
18 pages
Fargo DTC1250e Brochure PDF
No ratings yet
Fargo DTC1250e Brochure PDF
2 pages
tech_specification_2023-05-17-09-31-01_9ac1a9bdebb7939241944a151fc23b23
No ratings yet
tech_specification_2023-05-17-09-31-01_9ac1a9bdebb7939241944a151fc23b23
9 pages
Cat - Dcs.sis - Controller 3412 X
67% (3)
Cat - Dcs.sis - Controller 3412 X
2 pages
TELEMETRIX_HEN
No ratings yet
TELEMETRIX_HEN
4 pages
WR Operating Procedure 2014
No ratings yet
WR Operating Procedure 2014
170 pages
PCI Bank V. CA
No ratings yet
PCI Bank V. CA
2 pages

Handout 41100 PolynomialRegression

Uploaded by

Handout 41100 PolynomialRegression

Uploaded by

Polynomial Regression:

Interpretation and Lower Order Terms

2 Now Adding Squared Terms 4

3 Higher Order Polynomials 6

4 Connection to Multiple Linear Regression 6

1 Building Intuition with the Intercept

1.1 Example: House Prices

Including intercept: Forcing b0 = 0:

Size (1000s sq. ft)

1.2 Example: Wage Data

Including intercept: Forcing b0 = 0:

2 Now Adding Squared Terms

and we have a linear term anyway!

2.1 Example: Call Center Data

3 Higher Order Polynomials

4 Connection to Multiple Linear Regression

You might also like