Lecture 1_Multiple Regression Models
Lecture 1_Multiple Regression Models
Hanoi, 2023
5/31/2023 2
1
5/31/2023
Outlines
• Course introduction
• From economic to econometric model
• Estimating the parameters
• Hamburger chain data (andy.dta)
• Sampling properties
• Model specification
• Family income equation (edu_inc.dta)
• Poor data, collinearity, and insignificance
• Cars data (cars.dat)
• Required reading: Chap. 5&6 (Hill et al., 2011)
5/31/2023 3
Course introduction
• Course objectives
• Changes in this semester
• Students’ assessment
• Course syllabus
• Students’ expectations?
• What do you expect to learn from this course?
• What could the instructor do to enhance students’ greater
learning outcomes?
5/31/2023 4
2
5/31/2023
Economic model
• The interplay between sales and advertising
expenditure:
𝑆𝑎𝑙𝑒𝑠 = 𝛽1 + 𝛽2 𝑃𝑟𝑖𝑐𝑒 + 𝛽3 𝐴𝑑𝑣𝑒𝑟𝑡 (1)
Where, 𝛽1 , 𝛽2 , 𝛽3 are the unknown parameters.
• A quantitative inference:
Marginal analysis: change in Sales when Advert
increase by one unit:
∆𝑆𝑎𝑙𝑒𝑠 𝜕𝑆𝑎𝑙𝑒𝑠
𝛽3 = = (2)
∆𝐴𝑑𝑣𝑒𝑟𝑡 (𝑃𝑟𝑖𝑐𝑒 ℎ𝑒𝑙𝑑 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡) 𝜕𝐴𝑑𝑣𝑒𝑟𝑡
5/31/2023 5
5/31/2023 6
3
5/31/2023
5/31/2023 7
5/31/2023 8
4
5/31/2023
5/31/2023 9
5/31/2023 10
10
5
5/31/2023
5/31/2023 11
11
5/31/2023 12
12
6
5/31/2023
5/31/2023 13
13
5/31/2023 14
14
7
5/31/2023
Sampling properties:
Assumptions of the MRM
• MR1: 𝑦𝑖 = 𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 + 𝑒𝑖 , 𝑖 = 1, … , 𝑁
• MR2: 𝐸(𝑦𝑖 ) = 𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 ↔ 𝐸 𝑒𝑖 = 0
• MR3: 𝑣𝑎𝑟(𝑦𝑖 ) = 𝑣𝑎𝑟 𝑒𝑖 = 𝜎 2
• MR4: 𝑐𝑜𝑣(𝑦𝑖 ; 𝑦𝑗 ) = 𝑐𝑜𝑣(𝑒𝑖 , 𝑒𝑗 ) = 0, 𝑖 ≠ 𝑗
• MR5: The values of each 𝑥𝑖𝑘 are not random and are not
exact linear functions of the other explanatory variables.
• MR6: 𝑦𝑖 ~ 𝑁[(𝛽1 + 𝛽2 𝑥𝑖2 + … + 𝛽𝐾 𝑥𝑖𝐾 ), 𝜎 2 ] ↔ 𝑒𝑖 ~
N(0, 𝜎 2 ).
5/31/2023 15
15
Model specification:
Omitted variables
• Essential features of model choice:
• Choice of functional forms
• Choice of explanatory variables to be included in the model
• Whether the assumptions of MR1 – MR6 hold
• Omitted variables:
• The econometric model of family income regressed on
husband’s and wife’s years of education:
𝑦 = 𝛽1 + 𝛽2 𝑥2 +𝛽3 𝑥3 + 𝑒
• The omitted-variable bias of omitting wife’s year of
education:
𝑐𝑜𝑣(𝑥 2 , 𝑥3 )
𝑏𝑖𝑎𝑠 𝑏2∗ = 𝐸 𝑏2∗ − 𝛽2 = 𝛽3 (9)
𝑣𝑎𝑟(𝑥2 )
5/31/2023 16
16
8
5/31/2023
Correlation matrix:
Family income data
5/31/2023 17
17
Stata practice:
Family income data
• Data source: edu_inc.dat
• To do:
• Regressing family income (Faminc) on both husband’s
and wife’s years of education (Hedu and Wedu).
• Omitting wife’s years of education in the above
specification.
• Determining upward or downward estimates.
• Adding the number of young children (Kl6) as another
regressor.
5/31/2023 18
18
9
5/31/2023
5/31/2023 19
19
Model specification:
Irrelevant variables
• Adding two artificially generated variables X5 and
X6:
5/31/2023 20
20
10
5/31/2023
5/31/2023 21
21
22
11
5/31/2023
Collinearity
• Consequences:
• The standard errors are large, leading to insignificant
estimates of the coefficients/parameters.
• Sensitive estimators, due to addition or deletion of a few
observations, or variables.
• An example:
• Data source: cars.dat
• To do: (i) regressing energy consumption (miles per
gallon, MPG) on number of cylinders; (ii) adding on
engine displacement (ENG) and vehicle weight (WGT).
5/31/2023 23
23
Collinearity:
Estimated models of car data
• On the number of cylinders:
5/31/2023 24
24
12
5/31/2023
5/31/2023 25
25
Next week
• Lecture 2: Using indicator variables
• Indicator and qualitative factors.
• Application.
• Log-linear and log-log model.
• Treatment effects.
• Required reading: Chap. 4&7 (Hill et al., 2011).
5/31/2023 26
26
13