0% found this document useful (0 votes)

8 views

1. Linear regression Model - Applied_Part 1&2

Uploaded by

sahrish.khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

1. Linear regression Model - Applied_Part 1&2

Uploaded by

sahrish.khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Outline

1. The Linear Regression Model A. Econometric Model

2. Ordinary Least Squares

a. Predicted values and residuals B. Estimation
b. Interpreting coefficients/estimates
c. Goodness of fit

3. Inference:
a. Hypothesis Testing
b. Size, power and p-values C. Analysis

4. More analysis:
a. Data Problems: Multicollinearity, Outliers
b. Testing Functional Form
c. Selecting Regressors
1
1. LINEAR REGRESSION MODEL
APPLIED - PART 1 & 2

References:
Wooldridge: Chapter 3 (1 and 2), Chapter 7
Verbeek: Chapter 2 (4), Chapter 3 (1)

2
1. The linear regression model
Assume a relationship between y (dependent or explained variable) and a set of
variables:
x≡1 (a constant), x2 , x3 , …., xK,
valid for each individual in the population, such that the relationship is linear in
parameters (not necessarily in variables).

We write
𝑦 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + ⋯ + 𝛽𝐾 𝑥𝐾 + 𝜀

It is a population regression model:

y: regressand
x: regressor
𝜀: error, disturbance 3
From population N individuals are drawn → statistical interpretation: 𝑦𝑖 ,
𝑥𝑘𝑖 and 𝜀𝑖 are random variables

𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖

Question: What assumptions are made on these random variables????

4
Reminder: Estimators vs. Estimates
Suppose n random variables Zi (i=1, 2, …, n) distributed according to some pdf.
You have a sample and are interested in estimating some parameter 𝜃 of the distribution
Example: data on income for a sample of Irish workers
Estimator: a rule, a function of the random variables Zi, that gives you a sample value for the
parameter of interest.
W=h(Z1, Z2, ...,Zn)

𝑍 +𝑍 +⋯+𝑍𝑛
• Ex: sample mean estimator: 𝑍ҧ = 1 2
𝑛

• Estimator is a random variable (new sample = new sample value for W) with his own
probability distribution
• Sampling distribution: likelihood of various outcomes of W across different samples
Estimate: sample value of θ obtained for a specific sample drawn {z1, z2, ...,zn}

𝑧1 +𝑧2 +⋯+𝑧𝑛
• Ex: sample mean estimate 𝑧ҧ =
𝑛

• Each estimate has a certain probability of occurring, given the pdf of the
estimator

Now we focus on one particular estimator, the OLS.

What formula (function h) are we applying and why?

2. Ordinary Least Squares
Estimator: which criteria to use?
OLS criteria = minimize the sum of the squared approximation errors.

Errors: 𝜀𝑖 = 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥2𝑖 − 𝛽3 𝑥3𝑖 − ⋯ − 𝛽𝐾 𝑥𝐾𝑖

2
Squared errors: 𝜀𝑖2 = 𝑦𝑖 − 𝛽1 −𝛽2 𝑥2𝑖 − 𝛽3 𝑥3𝑖 − ⋯ − 𝛽𝐾 𝑥𝐾𝑖
Sum of the squared approximation errors:

𝑁 𝑁
2
෍ 𝜀𝑖2 = ෍ 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥2𝑖 − 𝛽3 𝑥3𝑖 − ⋯ − 𝛽𝐾 𝑥𝐾𝑖
𝑖=1 𝑖=1

Why?
7
𝑁
2
min ෍ 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥2𝑖 − 𝛽3 𝑥3𝑖 − ⋯ − 𝛽𝐾 𝑥𝐾𝑖
𝑖=1

First-order conditions:
𝑁

−2 ෍ 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ − 𝛽መ𝐾 𝑥𝐾𝑖 = 0

𝑖=1

𝑁 Note the
−2 ෍ 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ − 𝛽መ𝐾 𝑥𝐾𝑖 𝑥2𝑖 = 0 hats!!!
𝑖=1
…..
𝑁

−2 ෍ 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ − 𝛽መ𝐾 𝑥𝐾𝑖 𝑥𝐾𝑖 = 0

𝑖=1

➔ System of K equations in K unknowns (𝛽መ1 , 𝛽መ2 , 𝛽መ3 , … , 𝛽መ𝐾 ) – our parameters’ estimates

8
Unanswered questions:

- how close are our 𝛽መ𝑠 to the real 𝛽𝑠 ?

- can that system be always solved?
- is the solution to the system unique?

And one answer to the question in your mind: Can we have an example?
I cannot give you a numerical example or practise exercise where you are asked
to find the estimates with paper, pencil and a calculator….

9
a. Predicted values and residuals (and a nice picture)

Predicted values:
𝑦ො𝑖 = 𝛽መ1 + 𝛽መ2 𝑥2𝑖 + 𝛽መ3 𝑥3𝑖 + ⋯ + 𝛽መ𝐾 𝑥𝐾𝑖

Residuals:
𝑦𝑖 − 𝑦ො𝑖 = 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ 𝛽መ𝐾 𝑥𝐾𝑖 = 𝜀𝑖Ƹ

or sometimes 𝑒𝑖

Picture on the board!

10
Example: Wage equation (Verbeek Ch 3, section 6)
𝑦𝑖 = 𝑤𝑎𝑔𝑒
𝑥𝑖 = 𝑎𝑔𝑒, 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒, 𝑔𝑒𝑛𝑑𝑒𝑟, 𝑒𝑡𝑐.
Data on 1472 individuals, randomly sampled from the working population in Belgium in
1994.

11
12
13
• lm package to run linear models in R

reg1 <- lm(wage ~ educ + exper, data = bwages1)

summary(reg1)

stargazer(reg1, type="text")

14
15
16
Calculate predicted values and residuals:

Obs 1: 𝑦ො𝑖 = 1.073736 + 1.930375 1 + 0.200687 23 = 7.619913

𝑦𝑖 − 𝑦ො𝑖 = 7.780208 − 7.619913 = 0.160295

Using your estimates for prediction:

What would be the wage rate of a hypothetical individual with educ=2

and exper=30

𝑦ො𝑖 = 1.073736 + 1.930375 2 + 0.200687 30 = 10.95

17
18
b. Interpreting coefficients and estimates

Model: 𝑦 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + ⋯ + 𝛽𝐾 𝑥𝐾 + 𝜀

Estimated regression line: 𝑦ො = 𝛽መ1 + 𝛽መ2 𝑥2 + 𝛽መ3 𝑥3 + ⋯ + 𝛽መ𝐾 𝑥𝐾

𝛽1 : value of 𝑦𝑖 predicted by the model when all x are equal to zero → intercept of
the population line

𝛽መ1 : value of 𝑦𝑖 when all x are equal to zero as predicted by the regression line →
intercept of the regression line

19
Consider the model and take the partial derivative with respect to one of the
regressors (say 𝑥2 ) :
𝜕𝑦
= 𝛽2
𝜕𝑥2
𝛽2 : marginal effect of 𝑥2 on y when assuming everything else remains constant (all
the other x and 𝜀) as predicted by the model → slope of the linear model

Alternatively: ∆𝑦 = 𝛽2 ∆𝑥2 + 𝛽3 ∆𝑥3 + ⋯ + 𝛽𝐾 ∆𝑥𝐾 + ∆𝜀

∆𝑦
If ∆𝑥3 = ⋯ = ∆𝑥𝐾 = ∆𝜀 = 0 then ∆𝑦 = 𝛽2 ∆𝑥2 → 𝛽2 =
∆𝑥2

i.e. 𝛽2 is the change in 𝑦 for a unit change in 𝑥2 as predicted by the model when
nothing else changes (holding everything else constant)

20
Similarly
𝜕𝑦ො ∆𝑦ො
𝛽መ2 = or 𝛽መ2 =
𝜕𝑥2 ∆𝑥2

𝛽መ2 is the change in y predicted by our estimated regression line for unit
change in the regressor 𝑥2 everything else being held constant → slope of
the regression line.

Note: If 𝑥2 is the only thing changing, then the change in y is caused by

the change in 𝑥2 and therefore 𝛽2 is causal effect of 𝑥2 on y and 𝛽መ2 is the
estimate of this causal effect.

21
Intriguing questions:

- what assumptions/conditions are behind the idea of changing one

regressors while nothing else changes?

- if instead ‘other things’ change, then what do our parameters and

estimates tell us?

22
Dependent variable:
𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 + 𝛽3 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
wage

educ 1.930*** 𝛽መ1 = 1.073736 Belgian franc is the wage rate of

(0.082)
a worker with 0 experience and no primary
exper 0.201*** school
(0.010)

Constant 1.074*** 𝛽መ2 = 1.930375 so increasing the highest

(0.373) educational attainment by 1 increases wages by
1.930375 francs, what does it mean????
Observations 1,472

𝛽መ3 = 0.200687 so having one more year of

R2 0.344
Adjusted R2 0.344
Residual Std. Error 3.606 (df = 1469) experience increases wages by 0.20 Belgian
F Statistic 386.019*** (df = 2; 1469)
franc
Note: *p<0.1; **p<0.05; ***p<0.01

23
1. Squared variables
Suppose model includes
2
𝛽2 𝑥2𝑖 + 𝛽3 𝑥2𝑖

then the marginal effect of a change in 𝑥2𝑖 (everything else constant) is

𝜕𝑦𝑖
= 𝛽2 + 2𝛽3 𝑥2𝑖
𝜕𝑥2𝑖

Consequently, the marginal effect of 𝑥2𝑖 depends upon 𝑥2𝑖 .

𝛽3 > 0 → as 𝑥2 increases the marginal effect gets bigger
𝛽3 < 0 → as 𝑥2 increases the marginal effect gets smaller
24
𝑦 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥22
𝛽3 < 0

𝛽3 > 0

25
𝑦 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽4 𝑒𝑥𝑝𝑒𝑟 2 + 𝜀

# add squared experience to the dataframe

bwages1$expersq <- (bwages1$exper^2)

# new model with experience square

reg2 <- lm(wage ~ educ + exper + expersq, data = bwages1)

# alternatively
reg3 <- lm(wage ~ educ + exper + I(exper^2), data = bwages1)

26
𝑦 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽4 𝑒𝑥𝑝𝑒𝑟 2 + 𝜀

Dependent variable:

Taking derivatives, the marginal effect is:

wage

𝜕𝑦𝑖
educ 1.933*** = 0.3688409 + 2 −0.0044353 𝑒𝑥𝑝𝑒𝑟𝑖
𝜕𝑒𝑥𝑝𝑒𝑟𝑖
(0.081)

exper 0.369*** If exper=0, marginal effect is 0.36BF

(0.032) If exper=30, marginal effect is only 0.10BF
expersq -0.004***
(0.001)
➔ Returns on experience decrease over the working life.

Constant -0.057
(0.423)

Observations 1,472
27
2. Interaction terms
What is an interaction?

𝑦 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛽4 𝑥2 𝑥3 + ⋯ + 𝜀

Ex: 𝑒𝑑𝑢𝑐 × 𝑒𝑥𝑝𝑒𝑟 you obtain the data for this new variable by multiplying the two
variables

Interactions allow the effect of one explanatory variable to depend upon another.
𝜕𝑦
= 𝛽2 + 𝛽4 𝑥3
𝜕𝑥2
𝜕𝑦
and similarly = 𝛽3 + 𝛽4 𝑥2
𝜕𝑥3

28
Example: model includes the interaction between education and experience

𝑦 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽4 (𝑒𝑑𝑢𝑐 × 𝑒𝑥𝑝𝑒𝑟) + 𝜀

then the marginal effect of experience is

𝜕𝑦𝑖
= 𝛽3 + 𝛽4 𝑒𝑑𝑢𝑐𝑖
𝜕𝑒𝑥𝑝𝑒𝑟𝑖

Hence, the marginal effect of an extra year of experience depends upon the
education level attained. In particular, if 𝛽4 > 0 returns increase with education.

29
30
𝑦 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽4 (𝑒𝑑𝑢𝑐 × 𝑒𝑥𝑝𝑒𝑟) + 𝜀
Dependent variable:

wage 𝜕𝑦𝑖
= 𝛽3 + 𝛽4 𝑒𝑑𝑢𝑐𝑖
𝜕𝑒𝑥𝑝𝑒𝑟𝑖
educ 0.917***

(0.163)

Marginal effect is:

exper 0.023
𝜕𝑦𝑖
(0.027)
= 0.0227316 + 0.0534981𝑒𝑑𝑢𝑐𝑖
𝜕𝑒𝑥𝑝𝑒𝑟𝑖
I(exper * educ) 0.053***

(0.007)

- for individuals with primary education, marginal

Constant 4.638*** returns to experience equals 0.076BF
(0.619)

- for individuals with lower vocational training,

Observations 1,472
marginal returns to experience equals 0.129BF
31
3. Dummies
Define:
1 𝑖𝑓 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑝𝑟𝑒𝑠𝑒𝑛𝑡
𝐷𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The model is 𝑦𝑖 = 𝛽1 + 𝛾𝐷𝑖 + 𝛽2 𝑥2𝑖 + ⋯ + 𝜀𝑖
- If 𝐷𝑖 = 1 → intercept equals 𝛽1 + 𝛾
- If 𝐷𝑖 = 0 → intercept equals 𝛽1

→ 𝛾 is the intercept shift between two groups (parallel shift of the linear
model - regression line)

32
Group 𝐷𝑖 = 1 : 𝑦𝑖 = 𝛽1 + 𝛾 + 𝛽2 𝑥𝑖
y

Slope = 𝜷𝟐

𝜷𝟏 +𝜸
𝜸

𝜷𝟏 𝐺𝑟𝑜𝑢𝑝 𝐷𝑖 = 0 : 𝑦𝑖 = 𝛽1 + 𝛽2 𝑥𝑖

33
Suppose you wish to control for gender.
Define:
1 𝑖𝑓 𝑚𝑎𝑙𝑒
𝑚𝑎𝑙𝑒𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

The model is 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

- If 𝑚𝑎𝑙𝑒𝑖 = 1 (males) → intercept equals 𝛽1 + 𝛽2
- If 𝑚𝑎𝑙𝑒𝑖 = 0 (females) → intercept equals 𝛽1

→ 𝛽2 is the intercept shift between males and females, i.e. the gender wage gap.

34
Males: 𝑦𝑖 = 𝛽1 + 𝛽2 + 𝛽3 𝑒𝑑𝑢𝑐𝑖
wage

Slope = 𝜷𝟑

𝜷𝟏 + 𝜷𝟐
𝜷𝟐

𝜷𝟏 Females: 𝑦𝑖 = 𝛽1 + 𝛽3 𝑒𝑑𝑢𝑐𝑖

educ

35
Dependent The model is 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝜀𝑖
variable:
1.301 is the unadjusted wage gap, the raw
wage
difference in average wage of men and
women in the sample
male 1.301***
bwages1 %>%
(0.235) group_by(male) %>%
summarize("wages by sex" = mean(wage))
Constant 10.262***
(0.183) male `wages by sex`
1 0 10.3
2 1 11.6
Observations 1,472
There can be reasons why men are paid
more than women (education, experience,
etc) → you need to control for them

36
Now the model is 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

Dependent variable:

wage

(1) (2)

The adjusted wage gap has somewhat changed

male 1.301*** 1.346***
1.346144
(0.235) (0.193)
Interpretation: men earn 1.346144BF more than
educ 1.986***
women with same level of experience and
(0.081) education.

exper 0.192***

(0.010)

Constant 10.262*** 0.214

(0.183) (0.387)

Observations 1,472 1,472

37
Define:
1 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒
𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
You can have in the model:
- constant 𝛽1 and 𝑚𝑎𝑙𝑒𝑖 : 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + ⋯ + 𝜀𝑖 OR
- constant 𝛽1 and 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 : 𝑦𝑖 = 𝛽1 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + ⋯ + 𝜀𝑖 OR
- no constant 𝛽1 , and both 𝑚𝑎𝑙𝑒𝑖 and 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 (in this case the difference between
the coefficients of 𝑚𝑎𝑙𝑒𝑖 and 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 is the wage gap, 𝛽1 − 𝛽2 )
𝑦𝑖 = 𝛽1 𝑚𝑎𝑙𝑒𝑖 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + ⋯ + 𝜀𝑖

You cannot have constant 𝛽1 , 𝑚𝑎𝑙𝑒𝑖 and 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 in the specification!

(Look at first order conditions to derive OLS estimators)
38
𝑦𝑖 = 𝛽1 𝑚𝑎𝑙𝑒𝑖 + 𝛽2 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

Dependent variable:

wage
(1) (2)

female 0.214
(0.387) Wage gap = 1.56 - .214 = 1.346
male 1.346*** 1.560***
(0.193) (0.373)

educ 1.986* 1.986*

(0.081) (0.081)

exper 0.192* 0.192*

(0.010) (0.010)

Constant 0.214
(0.387)

Observations 1,472 1,472

39
You may have several dummies. For example:

1 𝑖𝑓 𝑚𝑎𝑟𝑟𝑖𝑒𝑑
𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The model is 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 + 𝛽4 𝑒𝑑𝑢𝑐𝑖 + 𝛽5 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
- If 𝑚𝑎𝑙𝑒𝑖 = 0 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 (unmarried females) → intercept equals 𝛽1
(base category)
- If 𝑚𝑎𝑙𝑒𝑖 = 1 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 (unmarried males) → intercept equals 𝛽1 + 𝛽2
- If 𝑚𝑎𝑙𝑒𝑖 = 0 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 (married females) → intercept equals 𝛽1 + 𝛽3
- If 𝑚𝑎𝑙𝑒𝑖 = 1 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 (married males) → intercept equals 𝛽1 + 𝛽2 +
𝛽3

40
Unmarried Married
Female 𝛽1 𝛽1 + 𝛽3
Male 𝛽1 + 𝛽2 𝛽1 + 𝛽2 + 𝛽3

𝛽2 = gender wage gap (modelled as independent of marital status)

𝛽3 = wage differences between married and unmarried workers (modelled as

independent of gender)

41
4. Dummies interactions
You can interact dummies with continuous variables and other dummies.
The model is:
𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛿𝐷𝑖 𝑥2𝑖 + ⋯ + 𝜀𝑖

then the marginal effect of 𝑥2𝑖 is

𝜕𝑦𝑖
= 𝛽2 + 𝛿𝐷𝑖
𝜕𝑥2𝑖
- 𝛽2 if 𝐷𝑖 = 0 (if characteristic not present)
- 𝛽2 + 𝛿 if 𝐷𝑖 = 1 (if characteristic present)

The interaction 𝐷𝑖 𝑥2𝑖 allows us to estimate a different slope for the two groups

42
𝐷𝑖 = 1 :𝑦𝑖 = 𝛽1 + (𝛽2 + 𝛿)𝑥2𝑖
y

𝜷𝟏 𝐷𝑖 = 0 :𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖

𝑥2

43
Example: suppose you suspect the effect of experience on wage depends
on gender.
You can include in the model an interaction between experience and one
of the gender dummies.
The model is:
𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 + 𝛽3 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝛽4 (𝑒𝑥𝑝𝑒𝑟𝑖 × 𝑚𝑎𝑙𝑒𝑖 ) + 𝜀𝑖

then the marginal effect of experience is

𝜕𝑦𝑖
= 𝛽3 + 𝛽4 𝑚𝑎𝑙𝑒𝑖
𝜕𝑒𝑥𝑝𝑒𝑟𝑖
- 𝛽3 for females
- 𝛽3 + 𝛽4 for males
The interaction (𝑒𝑥𝑝𝑒𝑟𝑖 × 𝑚𝑎𝑙𝑒𝑖 ) allows us to estimate a different slope for male
and females
44
Males: 𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 + (𝛽3 +𝛽4 )𝑒𝑥𝑝𝑒𝑟𝑖
wage

𝜷𝟏 Females: 𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 + 𝛽3 𝑒𝑥𝑝𝑒𝑟𝑖

exper

45
Dependent variable:
An extra year of experience increases
wage by 0.149 BF for females and by
wage
(.149+ .070) = 0.219 for men
(1) (2)

educ 1.930* 1.978* → Returns to experience are higher for

(0.082) (0.081) men.
exper 0.201*** 0.149***

(0.010) (0.012)

I(exper * male) 0.070***

(0.010)

Constant 1.074* 1.012*

(0.373) (0.367)

Observations 1,472 1,472

46
General model: combining dummies in levels and interactions

𝑦𝑖 = 𝛽1 + 𝛾𝐷𝑖 + 𝛽2 𝑥2𝑖 + 𝛿 𝐷𝑖 × 𝑥2𝑖 + ⋯ + 𝜀𝑖

If 𝐷𝑖 = 0: intercept is 𝛽1 and slope is 𝛽2

If 𝐷𝑖 = 1: intercept is 𝛽1 + 𝛾 and slope is 𝛽2 + 𝛿

47
𝐷𝑖 = 1 : 𝑦𝑖 = (𝛽1 +𝛾) + (𝛽4 +𝛿)𝑥2𝑖

𝜷𝟏 +𝜸

𝜷𝟏 𝐷𝑖 = 0 : 𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖

𝑥2𝑖

48
Example:

𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝛽5 (𝑒𝑥𝑝𝑒𝑟𝑖 × 𝑚𝑎𝑙𝑒𝑖 ) + 𝜀𝑖

𝛽2 : wage gap (wage difference between male and female workers that is
unexplained by the variables controlled for)

𝛽5 : difference in the returns to experience between male and female workers

49
Males: 𝑦𝑖 = (𝛽1 +𝛽2 ) + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + (𝛽4 +𝛽5 )𝑒𝑥𝑝𝑒𝑟𝑖

wage

𝜷𝟏 + 𝜷𝟐

𝜷𝟏 Females: 𝑦𝑖 = 𝛽1 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖

exper

50
Dependent variable:

wage

(1) (2) (3)

male 1.301*** 0.704*

(0.235) (0.368)

educ 1.978* 1.986*

(0.081) (0.081)

exper 0.149* 0.168*

(0.012) (0.015)

I(exper * male) 0.070* 0.039

(0.010) (0.019)

Constant 10.262* 1.012* 0.589

(0.183) (0.367) (0.428)

Observations 1,472 1,472 1,472

Note how the estimated gender gap gets reduced as we enrich the model!
51
Interacting two dummies
Suppose the model is
𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 + 𝛽4 𝑚𝑎𝑙𝑒𝑖 × 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 + ⋯ + 𝜀𝑖

- If 𝑚𝑎𝑙𝑒𝑖 = 0 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 (unmarried females) → intercept equals 𝛽1

(base category)
- If 𝑚𝑎𝑙𝑒𝑖 = 1 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 (unmarried males) → intercept equals 𝛽1 + 𝛽2
- If 𝑚𝑎𝑙𝑒𝑖 = 0 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 (married females) → intercept equals 𝛽1 + 𝛽3
- If 𝑚𝑎𝑙𝑒𝑖 = 1 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 1 (married males) → intercept equals 𝛽1 + 𝛽2 +
𝛽3 + 𝛽4

52
Unmarried Married
Female 𝛽1 𝛽1 + 𝛽3
Male 𝛽1 + 𝛽2 𝛽1 + 𝛽2 + 𝛽3 + 𝛽4

𝛽 (𝑖𝑓 𝑢𝑛𝑚𝑎𝑟𝑟𝑖𝑒𝑑)
gender wage gap = ቊ 2
𝛽2 + 𝛽4 (𝑖𝑓 𝑚𝑎𝑟𝑟𝑖𝑒𝑑)
Interaction allows the gender wage gap to differ between married and unmarried
individuals

Similarly for wage difference between married and unmarried workers (equal to
𝛽3 for female and to 𝛽3 + 𝛽4 for males)

53
5. Ordinal variables
For example: 𝑒𝑑𝑢𝑐𝑖 = 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙 (1, 2, … , 5)
• If you include 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 , then 𝛽2 is the marginal effect of gaining an extra
education level (from 1 to 2, 2 to 3, etc.) and it is model as constant

Attaining a higher level of education leads to an increase in wage equal to 1.986 BF

no matter what your starting level is, meaning that going from 1 (primary school) to
2 (lower vocational training) has the same impact on wages than going from 2 to 3
(lower vocational → intermediate vocational), and so on….

54
• Define instead a full set of dummy variables
1 𝑖𝑓 𝑒𝑑𝑢𝑐𝑖 = 1 1 𝑖𝑓 𝑒𝑑𝑢𝑐𝑖 = 5
𝑒𝑑1𝑖 = ቊ … 𝑒𝑑5𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

and estimate the model

𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑2𝑖 + 𝛽3 𝑒𝑑3𝑖 + 𝛽4 𝑒𝑑4𝑖 + 𝛽5 𝑒𝑑5𝑖 +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

Now the marginal effect of gaining an extra education level depends on what
education level is gained.

To obtain marginal effects, consider all the categories:

55
Primary school: 𝑦𝑖 = 𝛽1 +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
Lower vocational: 𝑦𝑖 = 𝛽1 + 𝛽2 (𝑒𝑑2𝑖 = 1) +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
Intermediate vocational: 𝑦𝑖 = 𝛽1 + 𝛽3 (𝑒𝑑3𝑖 = 1) +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
Higher vocational: 𝑦𝑖 = 𝛽1 + 𝛽4 (𝑒𝑑4𝑖 = 1) +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖
University level: 𝑦𝑖 = 𝛽1 + 𝛽5 (𝑒𝑑5𝑖 = 1) +𝛽6 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

Wage increase if going from primary to lower vocational = 𝛽2

Wage increase if going from lower to interm vocational = 𝛽3 − 𝛽2
Wage increase if going from interm to higher vocational = 𝛽4 − 𝛽3
….

56
Dependent variable:

Wage increase if going from primary to

lower vocational = 𝛽2 = 1.812
wage

(1) (2)

educ 1.930***

(0.082)

exper 0.201* 0.198* Wage increase if going from lower to

(0.010) (0.010)
interm vocational = 𝛽3 − 𝛽2 = 3.527 −
factor(educ)2 1.812***

(0.427)
1.812 = 1.714
factor(educ)3 3.527***

(0.411)

factor(educ)4 5.170*** Wage increase if going from interm to

(0.423)
higher vocational = 𝛽4 − 𝛽3
factor(educ)5 7.774***

(0.427) ….
Constant 1.074*** 3.302***

(0.373) (0.441)

Observations 1,472 1,472

57
6. Logarithms and Elasticities
Sometimes economists use log transformation of variables.
i.e. instead of using y or 𝑥2𝑖 , use 𝑙𝑛𝑦 or 𝑙𝑛𝑥2𝑖

Why?
- to allow for non-linearities
- if dependent variable has asymmetric distribution
- to reduce problem of heteroskedasticity in the data (later on this)
- to easily obtain elasticities.

58
59
60
61
Elasticity
Suppose the model is linear in variables
𝑦𝑖 = 𝛽1 +𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖
Elasticity: percentage change in y for 1% change in x (say 𝑥2𝑖 )

∆𝑦𝑖
𝑦𝑖 ∆𝑦𝑖 𝑥2𝑖 𝑥2𝑖
= = 𝛽2
∆𝑥2𝑖 ∆𝑥2𝑖 𝑦𝑖 𝑦𝑖
𝑥2𝑖

→ Elasticity varies for each individual in the sample

→ You could calculate it at the mean of the sample

62
But remember that for function 𝑙𝑛𝑦 = 𝑎𝑙𝑛𝑥
Take total differential
𝑑𝑦
1 1 𝑦
𝑑𝑦 = 𝑎 𝑑𝑥 ➔ 𝑑𝑥 =𝑎
𝑦 𝑥
𝑥

Hence a is the elasticity of y with respect to x.

So if the model is 𝑙𝑛𝑦𝑖 = 𝛽1 +𝛽2 𝑙𝑛𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖

then 𝛽2 is the elasticity of y with respect to 𝑥2𝑖 and it is constant.

63
Semi-elasticity
❑If dependent variable is in log and regressor is in levels (sometimes called log-lin
model), then
𝑙𝑛𝑦𝑖 = 𝛽1 +𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖
𝛽𝑘 : a 1-unit increase in 𝑥𝑘 is associated with a 𝛽𝑘 × 100% change in y.

❑ If dependent variable is in log and regressor is a dummy, then

𝑙𝑛𝑦𝑖 = 𝛽1 +𝛽2 𝐷𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖
𝛽2 × 100% is the % difference in y between the two groups

❑If independent variable is in level and regressor is in log (sometimes called lin-log
model), then
𝑦𝑖 = 𝛽1 +𝛽2 𝑙𝑛𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖
𝛽2 : a 1% increase in 𝑥2 is associated with a 𝛽𝑘 / 100 unit change in y.
64
Example: wage equation

• A 1% increase in experience, leads to a 0.231% increase in hourly wages

• Males earn (0.12×100)%=12% higher hourly wages than female workers.

65
c. Goodness-of-fit
How well does the regression line fit the data? How close to the line
are the observations?
Is the variation in x a good predictor of the variation in y???
The quality of the linear approximation offered by the model can be
measured by the R2.
• The R2 indicates the proportion of the variance in y that can be
explained by the linear combination of x variables

variance explained
• In formula:

total variance
66
• If the model contains an intercept (as it usual does), then we can re-write it as

(𝜀𝑖Ƹ ) (𝜀𝑖Ƹ 2 )

෠ 𝑖 ) = 𝑉(
in fact 𝑦𝑖 = 𝑦ො𝑖 + 𝜀𝑖Ƹ and 𝑉(𝑦 ෠ 𝑦ො𝑖 ) + 𝑉(
෠ 𝜀𝑖Ƹ ) since 𝑦ො𝑖 and 𝜀𝑖Ƹ are uncorrelated

0  R2  1.

R2 = 1 if 𝑉෠ 𝜀𝑖Ƹ = 0 i.e. all 𝜀𝑖Ƹ = 0 and model fits data perfectly

R2 = 0 if 𝑉෠ 𝑦𝑖 = 𝑉෠ (𝜀𝑖Ƹ ) i.e. model does not explain any of the variation of y, model
has only the intercept!
67
• If the model does not contain an intercept (very uncommon!), these two
expressions are not equivalent.
R2 can be negative
You can use the uncentered R2

σ 𝑁 2 σ 𝑁 2
𝑦
ො
𝑖=1 𝑖 𝜀
𝑖=1 𝑖Ƹ
𝑢𝑛𝑐𝑒𝑛𝑡𝑒𝑟𝑒𝑑 𝑅2 = 𝑁 2 = 1 − 𝑁 2
σ𝑖=1 𝑦𝑖 σ𝑖=1 𝑦𝑖

which is usually higher than R2

• It is also possible to define the R2 as the squared correlation coefficient between
observed and fitted values of y:

68
Caveats!
1. R2 is sensitive to transformation of y → R2s cannot be compared if y is different (y, ln(y), ∆y, ∆ln(y),
etc)
2. There is no general rule to say that an R2 is high or low, this depends upon the particular context.
microeconometrics context 0.2 can be high
time series analysis 0.8 can be low
3. It does not measure the quality of the model, but only the quality of the linear approximation, hence
not much relevance when analysing results.
4. Obtained minimizing variance of error ε , hence it will be the highest value you can ever obtain in a
linear model. If using different estimators (maybe better in certain circumstances), their
correspondent R2 will always be lower!
5. R2 will never decrease if a variable is added. Therefore, we define adjusted R2 as
1
σ𝑁𝑖=1 𝜀𝑖Ƹ2
𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 − 𝑁−𝐾
1
σ𝑁𝑖=1 𝑦𝑖 − 𝑦ത 2
𝑁−1

it has a penalty for larger K, and it may decline if you add a regressor, it can be negative 69

Download Full Statistical and Machine-Learning Data Mining, Third Edition: Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition Bruce Ratner PDF All Chapters
100% (1)
Download Full Statistical and Machine-Learning Data Mining, Third Edition: Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition Bruce Ratner PDF All Chapters
55 pages
Quizzes Module 1
33% (3)
Quizzes Module 1
11 pages
Department of Mining Engineering: Indian Institute of Technology (Indian School of Mines) Dhanbad
No ratings yet
Department of Mining Engineering: Indian Institute of Technology (Indian School of Mines) Dhanbad
25 pages
Neutrosophic Treatment of Duality Linear Models and The Binary Simplex Algorithm
No ratings yet
Neutrosophic Treatment of Duality Linear Models and The Binary Simplex Algorithm
14 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Error Based Learning
No ratings yet
Error Based Learning
48 pages
UEH AdvancedFM Chapter12
No ratings yet
UEH AdvancedFM Chapter12
53 pages
Chapter-6
No ratings yet
Chapter-6
12 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Today:: + Cos + Sin ,:, Arg Arg + Arg +
No ratings yet
Today:: + Cos + Sin ,:, Arg Arg + Arg +
20 pages
Lecture2
No ratings yet
Lecture2
67 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
2 - 1 - 05 Continuous Random Variables Annotated
No ratings yet
2 - 1 - 05 Continuous Random Variables Annotated
31 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Lecture-04__Least Squares and Geometry
No ratings yet
Lecture-04__Least Squares and Geometry
35 pages
Chapter three
No ratings yet
Chapter three
35 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Econometrics - Exercise set 1 (solution)
No ratings yet
Econometrics - Exercise set 1 (solution)
7 pages
IY432 Week 3 Differentiation (Parts 1-5)
No ratings yet
IY432 Week 3 Differentiation (Parts 1-5)
66 pages
Lesson 2.1 Functions of Several Variables - Edited
No ratings yet
Lesson 2.1 Functions of Several Variables - Edited
10 pages
Chapter 3 Heteroscedasticity
No ratings yet
Chapter 3 Heteroscedasticity
10 pages
Week 12 - Integral Leading To Exponential and Logarithmic Functions
No ratings yet
Week 12 - Integral Leading To Exponential and Logarithmic Functions
7 pages
Week 5 Lecture Q A
No ratings yet
Week 5 Lecture Q A
14 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
WQU - Econometrics - Module2 - Compiled Content
100% (1)
WQU - Econometrics - Module2 - Compiled Content
73 pages
Lecture 4
No ratings yet
Lecture 4
42 pages
Template Group Assignment 01
No ratings yet
Template Group Assignment 01
12 pages
Calculus
No ratings yet
Calculus
8 pages
Maths Skills Workbook for a Level WITH Statistic Without School Logo
No ratings yet
Maths Skills Workbook for a Level WITH Statistic Without School Logo
35 pages
MachineLearning PDF
No ratings yet
MachineLearning PDF
94 pages
TP06 Econometrics p2
No ratings yet
TP06 Econometrics p2
20 pages
2. Supevised Learning_1
No ratings yet
2. Supevised Learning_1
26 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
MTH408 Machine - Learning - Logistic - Regression
No ratings yet
MTH408 Machine - Learning - Logistic - Regression
43 pages
Definite Integral
No ratings yet
Definite Integral
52 pages
Chapter 3 - Multiple Linear Regression Models
No ratings yet
Chapter 3 - Multiple Linear Regression Models
29 pages
Regression-and-generalization (1)
No ratings yet
Regression-and-generalization (1)
67 pages
243000110-Afia Aunzum Meem
No ratings yet
243000110-Afia Aunzum Meem
24 pages
Chapter One: Sulaimani Polytechnic University/ Technical College of Engineering Mechanical Engineering Department
No ratings yet
Chapter One: Sulaimani Polytechnic University/ Technical College of Engineering Mechanical Engineering Department
28 pages
13.1. Exponential Functions
No ratings yet
13.1. Exponential Functions
8 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
PTU PAPER APRIL 2019
No ratings yet
PTU PAPER APRIL 2019
12 pages
FALLSEM2022-23 BMAT201L TH VL2022230102300 Reference Material I 20-07-2022 Module1 1
No ratings yet
FALLSEM2022-23 BMAT201L TH VL2022230102300 Reference Material I 20-07-2022 Module1 1
38 pages
Summary of Formulas
No ratings yet
Summary of Formulas
5 pages
Metrics Aug 2023
No ratings yet
Metrics Aug 2023
10 pages
Differential Equations 1
No ratings yet
Differential Equations 1
24 pages
MFIN 305_Lecture1
No ratings yet
MFIN 305_Lecture1
77 pages
6 Slides
No ratings yet
6 Slides
19 pages
Lecture-03__Vectors and Matrices
No ratings yet
Lecture-03__Vectors and Matrices
27 pages
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
17 pages
MF007 Chapter 2 Differentiation Part II
No ratings yet
MF007 Chapter 2 Differentiation Part II
38 pages
Linear Regression
100% (1)
Linear Regression
27 pages
Coproducts For Dummies
No ratings yet
Coproducts For Dummies
3 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
242021212 Usaidur Rahman
No ratings yet
242021212 Usaidur Rahman
24 pages
66f11335d986bPDEs Lec 1
No ratings yet
66f11335d986bPDEs Lec 1
15 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Schiestl - 2010 - The Evolution of Floral Scent and Insect Chemical Communication PDF
No ratings yet
Schiestl - 2010 - The Evolution of Floral Scent and Insect Chemical Communication PDF
14 pages
Effect of Personality Traits On Job Perf
No ratings yet
Effect of Personality Traits On Job Perf
36 pages
Assignment 2: Experimental Design: Description
No ratings yet
Assignment 2: Experimental Design: Description
8 pages
Jbs Vol.5 Issue.1 Article03
No ratings yet
Jbs Vol.5 Issue.1 Article03
9 pages
Change Orders Impact On Project Cost
100% (1)
Change Orders Impact On Project Cost
14 pages
Quantile Regression: 40 Years On: Annual Review of Economics
No ratings yet
Quantile Regression: 40 Years On: Annual Review of Economics
24 pages
Silve ́N Et Al., 2004, JEP
No ratings yet
Silve ́N Et Al., 2004, JEP
13 pages
W5 - Homework assignment
No ratings yet
W5 - Homework assignment
3 pages
The Basics of SAS Enterprise Miner 5.2: 1.1 Introduction To Data Mining
No ratings yet
The Basics of SAS Enterprise Miner 5.2: 1.1 Introduction To Data Mining
46 pages
Untitled
No ratings yet
Untitled
2 pages
important questions
No ratings yet
important questions
3 pages
Kinds of Research Across Fields
No ratings yet
Kinds of Research Across Fields
2 pages
Crim. Thesis 1st Draft
No ratings yet
Crim. Thesis 1st Draft
28 pages
1618820447FULL SEM NOTES BRM 5BBA PDF
No ratings yet
1618820447FULL SEM NOTES BRM 5BBA PDF
52 pages
Full download Representations and Characters of Groups Second Edition Gordon James pdf docx
100% (1)
Full download Representations and Characters of Groups Second Edition Gordon James pdf docx
67 pages
03 Machine Learning Lab Guide-Student Version
No ratings yet
03 Machine Learning Lab Guide-Student Version
74 pages
1.4 Scatterplots (Filled In)
No ratings yet
1.4 Scatterplots (Filled In)
4 pages
Chapter 15
No ratings yet
Chapter 15
7 pages
The High/Scope Perry Preschool Study Through Age 40: Summary, Conclusions, and Frequently Asked Questions
No ratings yet
The High/Scope Perry Preschool Study Through Age 40: Summary, Conclusions, and Frequently Asked Questions
21 pages
Rome Laboratory Old
No ratings yet
Rome Laboratory Old
358 pages
Class XI Summer Assignment 2022-23 JULY FINAL
No ratings yet
Class XI Summer Assignment 2022-23 JULY FINAL
9 pages
Pizza Corner
100% (2)
Pizza Corner
12 pages
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
No ratings yet
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
35 pages
Unit 2 Research Methodology
No ratings yet
Unit 2 Research Methodology
37 pages
Ecology 0012 9658 086 10 2673
No ratings yet
Ecology 0012 9658 086 10 2673
11 pages
Effect of Competence, Independence, and Professional Skepticism Against Ability To Detect Fraud Action in Audit Assignment (Survey On Public Accounting Firm Registered in IICPA Territory of Jakarta)
No ratings yet
Effect of Competence, Independence, and Professional Skepticism Against Ability To Detect Fraud Action in Audit Assignment (Survey On Public Accounting Firm Registered in IICPA Territory of Jakarta)
16 pages
Evaluation of Tool Wear in EPB Tunneling of Tehran Metro, Line 7 Expansion
No ratings yet
Evaluation of Tool Wear in EPB Tunneling of Tehran Metro, Line 7 Expansion
27 pages

1. Linear regression Model - Applied_Part 1&2

Uploaded by

1. Linear regression Model - Applied_Part 1&2

Uploaded by

Outline

1. The Linear Regression Model A. Econometric Model

2. Ordinary Least Squares

It is a population regression model:

𝑦𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝐾 𝑥𝐾𝑖 + 𝜀𝑖

Question: What assumptions are made on these random variables????

Now we focus on one particular estimator, the OLS.

What formula (function h) are we applying and why?

Errors: 𝜀𝑖 = 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥2𝑖 − 𝛽3 𝑥3𝑖 − ⋯ − 𝛽𝐾 𝑥𝐾𝑖

−2 ෍ 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ − 𝛽መ𝐾 𝑥𝐾𝑖 = 0

−2 ෍ 𝑦𝑖 − 𝛽መ1 − 𝛽መ2 𝑥2𝑖 − 𝛽መ3 𝑥3𝑖 − ⋯ − 𝛽መ𝐾 𝑥𝐾𝑖 𝑥𝐾𝑖 = 0

- how close are our 𝛽መ𝑠 to the real 𝛽𝑠 ?

Picture on the board!

reg1 <- lm(wage ~ educ + exper, data = bwages1)

Obs 1: 𝑦ො𝑖 = 1.073736 + 1.930375 1 + 0.200687 23 = 7.619913

Using your estimates for prediction:

What would be the wage rate of a hypothetical individual with educ=2

𝑦ො𝑖 = 1.073736 + 1.930375 2 + 0.200687 30 = 10.95

Estimated regression line: 𝑦ො = 𝛽መ1 + 𝛽መ2 𝑥2 + 𝛽መ3 𝑥3 + ⋯ + 𝛽መ𝐾 𝑥𝐾

Alternatively: ∆𝑦 = 𝛽2 ∆𝑥2 + 𝛽3 ∆𝑥3 + ⋯ + 𝛽𝐾 ∆𝑥𝐾 + ∆𝜀

Note: If 𝑥2 is the only thing changing, then the change in y is caused by

- what assumptions/conditions are behind the idea of changing one

- if instead ‘other things’ change, then what do our parameters and

educ 1.930*** 𝛽መ1 = 1.073736 Belgian franc is the wage rate of

Constant 1.074*** 𝛽መ2 = 1.930375 so increasing the highest

𝛽መ3 = 0.200687 so having one more year of

then the marginal effect of a change in 𝑥2𝑖 (everything else constant) is

Consequently, the marginal effect of 𝑥2𝑖 depends upon 𝑥2𝑖 .

# add squared experience to the dataframe

# new model with experience square

Taking derivatives, the marginal effect is:

exper 0.369*** If exper=0, marginal effect is 0.36BF

𝑦 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽4 (𝑒𝑑𝑢𝑐 × 𝑒𝑥𝑝𝑒𝑟) + 𝜀

then the marginal effect of experience is

Marginal effect is:

- for individuals with primary education, marginal

- for individuals with lower vocational training,

The model is 𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝜀𝑖

The adjusted wage gap has somewhat changed

Constant 10.262*** 0.214

Observations 1,472 1,472

You cannot have constant 𝛽1 , 𝑚𝑎𝑙𝑒𝑖 and 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 in the specification!

educ 1.986*** 1.986***

exper 0.192*** 0.192***

Observations 1,472 1,472

𝛽2 = gender wage gap (modelled as independent of marital status)

𝛽3 = wage differences between married and unmarried workers (modelled as

then the marginal effect of 𝑥2𝑖 is

then the marginal effect of experience is

𝜷𝟏 Females: 𝑦𝑖 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐𝑖 + 𝛽3 𝑒𝑥𝑝𝑒𝑟𝑖

educ 1.930*** 1.978*** → Returns to experience are higher for

I(exper * male) 0.070***

Constant 1.074*** 1.012***

Observations 1,472 1,472

𝑦𝑖 = 𝛽1 + 𝛾𝐷𝑖 + 𝛽2 𝑥2𝑖 + 𝛿 𝐷𝑖 × 𝑥2𝑖 + ⋯ + 𝜀𝑖

If 𝐷𝑖 = 0: intercept is 𝛽1 and slope is 𝛽2

𝑦𝑖 = 𝛽1 + 𝛽2 𝑚𝑎𝑙𝑒𝑖 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖 + 𝛽5 (𝑒𝑥𝑝𝑒𝑟𝑖 × 𝑚𝑎𝑙𝑒𝑖 ) + 𝜀𝑖

𝛽5 : difference in the returns to experience between male and female workers

𝜷𝟏 Females: 𝑦𝑖 = 𝛽1 + 𝛽3 𝑒𝑑𝑢𝑐𝑖 + 𝛽4 𝑒𝑥𝑝𝑒𝑟𝑖

(1) (2) (3)

male 1.301*** 0.704*

educ 1.978*** 1.986***

exper 0.149*** 0.168***

I(exper * male) 0.070*** 0.039**

Constant 10.262*** 1.012*** 0.589

Observations 1,472 1,472 1,472

- If 𝑚𝑎𝑙𝑒𝑖 = 0 and 𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝑖 = 0 (unmarried females) → intercept equals 𝛽1

Attaining a higher level of education leads to an increase in wage equal to 1.986 BF

and estimate the model

To obtain marginal effects, consider all the categories:

Wage increase if going from primary to lower vocational = 𝛽2

Wage increase if going from primary to

exper 0.201*** 0.198*** Wage increase if going from lower to

factor(educ)4 5.170*** Wage increase if going from interm to

Observations 1,472 1,472

→ Elasticity varies for each individual in the sample

educ 1.986* 1.986*

exper 0.192* 0.192*

educ 1.930* 1.978* → Returns to experience are higher for

Constant 1.074* 1.012*

educ 1.978* 1.986*

exper 0.149* 0.168*

I(exper * male) 0.070* 0.039

Constant 10.262* 1.012* 0.589

exper 0.201* 0.198* Wage increase if going from lower to