0% found this document useful (0 votes)

17 views

03 Regressionanalysis

This document discusses regression analysis and modeling sales based on advertising factors like TV, radio and newspaper ads. It introduces concepts like independent and dependent variables, the regression function to predict sales based on ads, estimating the regression function from data, limitations of nearest neighbor methods, the benefits of linear regression for interpretation though it may not fully capture relationships, and trade-offs around fitting models like balancing accuracy, simplicity and avoiding overfitting.

Uploaded by

Awais Ali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

03 Regressionanalysis

Uploaded by

Awais Ali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Travel Behavior

(LV 0000001540)

Session 3
21 November 2022
Regression Analysis

Rolf Moeckel | Professor of Travel Behavior | Department of Mobility Systems Engineering | Technical University of Munich
Statistical learning
Shown are Sales vs TV, Radio and Newspaper adds, with a blue linear-regression line fit separately to each.

Can we predict Sales using these three? Perhaps we can do better using a model
Sales = f (TV, Radio, Newspaper)
Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning
2
Notation (1)
Here, Sales is a dependent variable that we wish to predict. We generically refer to the response as 𝑌.

TV is an independent variable that we call 𝑋! . We name Radio as 𝑋" , and Newspaper as 𝑋# .

We can refer to the input vector collectively as

𝑋! sales is a depended
tv radio is indeendent,
𝑋 = 𝑋"
𝑋#
Now we write our model as
𝑌 =𝑓 𝑥 +𝜖
Where 𝜖 captures measurement errors and other discrepancies.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

3
Notation (2)
Is there an ideal 𝑓 𝑥 ? In particular, what is a good value for 𝑓 𝑥 at any selected value of 𝑋, say 𝑋 = 4? There
can be many Y values at 𝑋 = 4. A good value is
𝑓 4 =𝐸 𝑌𝑋=4
𝐸 𝑌 𝑋 = 4 means expected value (or average) of Y given X = 4.

This ideal 𝑓 𝑥 = 𝐸 𝑌 𝑋 = 𝑥 is called the regression function.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

4
Regression function 𝑓(𝑋)
Is similarly defined for vector 𝑋; e.g.
𝑓 𝑥 = 𝑓 𝑥! , 𝑥" , 𝑥# = 𝐸 𝑌 𝑋! = 𝑥! , 𝑋" = 𝑥" , 𝑋# = 𝑥#

The ideal or optimal predictor of 𝑌 with regard to mean-squared prediction error: 𝑓 𝑥 = 𝐸 𝑌 𝑋 = 𝑥 is the
"
function that minimizes 𝐸 𝑌 − 𝑓A 𝑋 𝑋 = 𝑥 over all functions 𝑓 at all points 𝑋 = 𝑥.

Y stands for the observed value (the “true” answer).

to minimize the differnece ber=twwen actual obeserve value

to get + value,
emphasize the large number

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

5
Regression function 𝑓(𝑋)
𝜖 = 𝑌 − 𝑓 𝑥 is the irreducible error, i.e. even if we knew 𝑓 𝑥 , we would still make errors in prediction, since at
each 𝑋 = 𝑥 there is typically a distribution of possible 𝑌 values.

For any estimate 𝑓A 𝑥 of 𝑓 𝑥 , we have

" "
𝐸 𝑌 − 𝑓A 𝑋 𝑋 = 𝑥 = 𝑓 𝑥 − 𝑓A 𝑥 + 𝑉𝑎𝑟 𝜖

Reducible Irreducible

differnce bwtwwen obersve value and function

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

6
How to estimate 𝑓(𝑋)
Typically, we have few if any data points with X = 4 exactly.
Therefore, we cannot compute 𝐸 𝑌 𝑋 = 𝑥
Relax the definition and let 𝑓A 𝑥 = 𝐴𝑣𝑒 𝑌 𝑋 ∈ 𝑁 𝑥
where 𝑁 𝑥 is some neighborhood of 𝑥.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

7
Limitations of nearest neighbor averaging
Nearest neighbor averaging can be pretty good for small 𝑝 (= number of independent variables), i.e. 𝑝 ≤ 4, and
large-ish 𝑁, i.e. sample size.

Nearest neighbor methods can be lousy when 𝑝 is large. Reason: the curse of dimensionality. Nearest
neighbors tend to be far away in high dimensions.

• We need to get a reasonable fraction of the 𝑁 values of 𝑦$ to average in order to bring the variance down,
e.g. 10%.
• A 10% neighborhood in high dimensions (i.e., large number of independent variables, or 𝑝 is large) need no
longer be local, so we lose the spirit of estimating 𝐸 𝑌 𝑋 = 𝑥 by local averaging.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

8
Curse of dimensionality

10% Neighborhood in one dimension (x1) 10% Neighborhood in two dimensions (x1 and x2)
Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning
9
Linear regression model

𝑓% 𝑥 = 𝛽& + 𝛽! 𝑋! + 𝛽" 𝑋" + ⋯ + 𝛽' 𝑋'

• A linear model is specified in terms of 𝑝 + 1 parameters 𝛽& , 𝛽! , 𝛽" , … 𝛽'

• We estimate the parameters by fitting the model to training data.
• Although it is almost never correct, a linear model often serves as a good and interpretable approximation
to the unknown true function 𝑓 𝑋 .

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

10
Fitting a model to training data
A linear model
𝑓A% 𝑋 = 𝛽A& + 𝛽A! 𝑋
gives a reasonable fit.

A quadratic model
𝑓A( 𝑋 = 𝛽A& + 𝛽A! 𝑋 + 𝛽A" 𝑋 "
may fit slightly better.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

11
Trade-offs of fitting data
Assume red dots represent
observed data, and the
blue surface was fitted to
these data.

𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑓( 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑠𝑒𝑛𝑖𝑜𝑟𝑖𝑡𝑦 + 𝜖

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

12
Linear fitting of data
A linear regression fits the
data fairly well, but missing
non-linear relationships
between years of education
and income.

𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑓(! 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑠𝑒𝑛𝑖𝑜𝑟𝑖𝑡𝑦 + 𝜖

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

13
Overfitting the data
It is also possible to overfit
the data by replicating
every little dent in the
surface (here achieved with
a spline regression). This
model adjusts to any
irregularity in the data,
which usually is not
desired.

This model would replicate

measurement errors and
model misspecifications.
𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑓(" 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑠𝑒𝑛𝑖𝑜𝑟𝑖𝑡𝑦 + 𝜖
Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning
14
Trade-offs to make when fitting data
• Prediction accuracy versus interpretability: Linear models are easy to interpret; thin-plate splines are not.
• Parsimony versus black-box: We often prefer a simpler model involving fewer variables over a black-box
predictor involving them all.
• Good fit versus over-fit or under-fit: How do we know when the fit is just right?

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

15
Assessing model accuracy
𝑁
Suppose we fit a model 𝑓A 𝑥 to some training data 𝑇𝑟 = 𝑥$ , 𝑦$ , and we wish to see how well it performs.
1
We can compute the average squared prediction error over training data 𝑇𝑟:
"
𝑀𝑆𝐸)* = 𝐴𝑣𝑒$∈), 𝑦$ − 𝑓A 𝑥$

This maybe biased towards models that are overfit. Instead, we should – when possible – compute it using
fresh test data 𝑇𝑒:
"
𝑀𝑆𝐸)- = 𝐴𝑣𝑒$∈)- 𝑦$ − 𝑓A 𝑥$

To create test data: Data records are randomly sampled, and – for example – 80% of all records are used for
model estimation and the remaining records that were not used for model estimation are used for model
testing.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

16
Danger of overfitting a model (Example 1)
Separate test data set

yellow linear model

𝑀𝑆𝐸!"

𝑀𝑆𝐸!#

Full data set

simpler complexer
model model

Black curve is “truth.” Orange, blue and green curves/squares correspond to fits of different flexibility.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

17
Danger of overfitting a model (Example 2)

Separate test data set

𝑀𝑆𝐸!"

𝑀𝑆𝐸!#

Full data set

simpler complexer
model model

Here, the truth is smoother (or simpler). The linear model does really well. Simple models are generally preferred over complex models.
Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning
18
Danger of overfitting a model (Example 3)

𝑀𝑆𝐸!" Separate test data set

𝑀𝑆𝐸!# Full data set

simpler complexer
model model

Here, the truth is wiggly and the noise is low, so the more flexible fits do the best job.
Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning
19
Bias-variance trade-off
Suppose we have fit a model 𝑓! 𝑥 to some training data 𝑇𝑟, and
let 𝑥$, 𝑦$ be a test observation drawn from the population. If the
true model is 𝑌 = 𝑓 𝑋 + 𝜖 with 𝑓 𝑥 = 𝐸 𝑌 𝑋 = 𝑥 , then
% %
𝐸 𝑦$ − 𝑓! 𝑥$ = 𝑉𝑎𝑟 𝑓! 𝑥$ + 𝐵𝑖𝑎𝑠 𝑓! 𝑥$ + 𝑉𝑎𝑟 𝜖
Estimated curve
Variance refers to the amount by which 𝑓! would change if we Bias
True relationship
estimated it using a different training data set. Bias refers to the

Varia
error that is introduced by approximating a real-life problem by a
much simpler model.

nce
𝛆
Observations
There is a bias-variance trade-off. Typically, as 𝑓! becomes more
complex, its variance increases and its bias decreases.
20
Bias-variance trade-offs for the three examples
Example 1 Example 2 Example 3

MSE = Var + Bias2 + Var(𝛆)

Bias = True Curve – Estimated Curve
Var = Observations – Estimated Curve
Irreducible
error 𝛆

Irreducible
error 𝛆

simpler complexer
model model

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Probability Distributions Hamiltonwentworthdsb
No ratings yet
Probability Distributions Hamiltonwentworthdsb
26 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
02 Statistical Learning
No ratings yet
02 Statistical Learning
37 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Module 5
No ratings yet
Module 5
48 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Ch06 MultipleLinearRegression
0% (2)
Ch06 MultipleLinearRegression
19 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
2
No ratings yet
2
62 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Linear Regression
No ratings yet
Linear Regression
97 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
ML-2
No ratings yet
ML-2
155 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
ISLR
No ratings yet
ISLR
9 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
Supervised_Learning (2)
No ratings yet
Supervised_Learning (2)
41 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Analytic Inequalities
From Everand
Analytic Inequalities
Nicholas D. Kazarinoff
5/5 (1)
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Instruction
No ratings yet
Instruction
1 page
Level of Service
No ratings yet
Level of Service
2 pages
oopSS23 L04 SimpleObject
No ratings yet
oopSS23 L04 SimpleObject
40 pages
Osama CV IT Support
No ratings yet
Osama CV IT Support
3 pages
AVT Lecture 2 Aspects of Passive Safety
No ratings yet
AVT Lecture 2 Aspects of Passive Safety
105 pages
2023 TESA VL 1 Introduction Costs Demand Supply
No ratings yet
2023 TESA VL 1 Introduction Costs Demand Supply
74 pages
Handout - Road Alignment
No ratings yet
Handout - Road Alignment
27 pages
Lecture 5 AI Exercise
No ratings yet
Lecture 5 AI Exercise
24 pages
Experiment:: I. Deterministic Experiment Ii. Random Experiment
No ratings yet
Experiment:: I. Deterministic Experiment Ii. Random Experiment
15 pages
4.3 - Geometric Distributions
No ratings yet
4.3 - Geometric Distributions
2 pages
Business Statistics 6th Edition Levine Test Bank - Download Instantly To Explore The Full Content
100% (4)
Business Statistics 6th Edition Levine Test Bank - Download Instantly To Explore The Full Content
47 pages
PDF Laporan Praktikum Data Mining - Compress
No ratings yet
PDF Laporan Praktikum Data Mining - Compress
142 pages
Fitting Statistical Models With PROC MCMC: Conference Paper
No ratings yet
Fitting Statistical Models With PROC MCMC: Conference Paper
27 pages
The Statistics Tutor's Quick Guide To Commonly Used Statistical Tests
No ratings yet
The Statistics Tutor's Quick Guide To Commonly Used Statistical Tests
53 pages
Statistics
No ratings yet
Statistics
36 pages
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
No ratings yet
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
9 pages
Skittles Project
No ratings yet
Skittles Project
5 pages
CHAPTER 9 - Random Variable Lesson 1 of 8 - Pelajar
No ratings yet
CHAPTER 9 - Random Variable Lesson 1 of 8 - Pelajar
5 pages
MABA5 Probability
No ratings yet
MABA5 Probability
15 pages
Workshop 2 (Sampling)
No ratings yet
Workshop 2 (Sampling)
5 pages
Trend Following With Moving Averages Strategiyasi Bozor Trendlarini Aniqlash Va Ularga Ergashish Uchun Foydalaniladi
No ratings yet
Trend Following With Moving Averages Strategiyasi Bozor Trendlarini Aniqlash Va Ularga Ergashish Uchun Foydalaniladi
3 pages
Mid Exam Statistic
No ratings yet
Mid Exam Statistic
15 pages
Bba QT
No ratings yet
Bba QT
1 page
Determining Sample Size
No ratings yet
Determining Sample Size
15 pages
Statistics EXP-5
No ratings yet
Statistics EXP-5
10 pages
Lecture Notes 2 Probability
No ratings yet
Lecture Notes 2 Probability
25 pages
Probability
No ratings yet
Probability
18 pages
T Test
No ratings yet
T Test
5 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
25 pages
Activity-4-1 Salosagcol, Leobert Yancy G
No ratings yet
Activity-4-1 Salosagcol, Leobert Yancy G
2 pages
Jurnal - Siti Nurjanah - 022119046
No ratings yet
Jurnal - Siti Nurjanah - 022119046
15 pages
Hypothesis I
No ratings yet
Hypothesis I
70 pages
Normal Distribution
No ratings yet
Normal Distribution
35 pages
Mai 4.10 Binomial Distribution
No ratings yet
Mai 4.10 Binomial Distribution
10 pages
SME10e SM App15
No ratings yet
SME10e SM App15
17 pages
500 PPM LCR Meter Part 2
No ratings yet
500 PPM LCR Meter Part 2
9 pages
Tutorial 4 (Hypothesis Testing)
100% (1)
Tutorial 4 (Hypothesis Testing)
22 pages

03 Regressionanalysis

Uploaded by

03 Regressionanalysis

Uploaded by

Travel Behavior

TV is an independent variable that we call 𝑋! . We name Radio as 𝑋" , and Newspaper as 𝑋# .

We can refer to the input vector collectively as

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

This ideal 𝑓 𝑥 = 𝐸 𝑌 𝑋 = 𝑥 is called the regression function.

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Y stands for the observed value (the “true” answer).

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

For any estimate 𝑓A 𝑥 of 𝑓 𝑥 , we have

differnce bwtwwen obersve value and function

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

𝑓% 𝑥 = 𝛽& + 𝛽! 𝑋! + 𝛽" 𝑋" + ⋯ + 𝛽' 𝑋'

• A linear model is specified in terms of 𝑝 + 1 parameters 𝛽& , 𝛽! , 𝛽" , … 𝛽'

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑓( 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑠𝑒𝑛𝑖𝑜𝑟𝑖𝑡𝑦 + 𝜖

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑓(! 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑠𝑒𝑛𝑖𝑜𝑟𝑖𝑡𝑦 + 𝜖

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

This model would replicate

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

yellow linear model

Full data set

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

Separate test data set

Full data set

𝑀𝑆𝐸!" Separate test data set

𝑀𝑆𝐸!# Full data set

MSE = Var + Bias2 + Var(𝛆)

Source: Trevor Hastie and Robert Tibshirani (2014) Statistical Learning

You might also like