Simple Regression Model
Simple Regression Model
regression model with one independent variable the outline of this presentation
would be first to review some terminology that we use provide examples and
interpretations of the coefficient then go a little more theoretical talk about
population regression function and we can derive the ols estimates then things
would get more practical with examples of simple regression including how to
interpret the results then we'll talk about variation and r
squared then we will talk about different log transformations for the
dependent and independent variable and we will end with a very important gauss
markov assumptions and how that leads to unbiasedness of the estimators and the
variance formula so let's start with the terminology this is how a simple
regression model looks like y equals beta0 plus beta 1 times x plus u here y is the
dependent variable the one that we're trying to explain x is the independent
variable uh and in this case we have only one for simple
regression u is the error term and beta 0 and beta 1 are the parameters
notice that this regression model is for the population the population is everyone
that we're trying to uh find this relationship for after we obtain sample data we
can estimate the following equation y hat equals beta0 hat plus beta 1 hat times x
here y is called the predicted value beta 0 hat and beta 1 hat are the coefficients
these are the numbers the coefficients estimated from using the sample data and as
a result we have the residual u
hat which is the difference between the actual value of the dependent
variable and the predicted value of the dependent variable which is this one so
here i provided an important distinguishment between population and sample
population is say all the u.s workers that we're interested in sample would be say
only the thousand people that we survey in our data so the parameters which we do
not know beta are for the population the coefficient that we actually get from
using the sample data is called
beta hat and error refers to the population again we don't know that the
residuals can be estimated with the sample data after the regression is is run so
let me provide a practical example of this terminology so here again we have
dependent variable y independent variable x so in this case we for y we would have
hourly wage in dollars and as independent variable we would have years of
experience so let's suppose these are the first four observations and for the first
person they have hourly wage of twenty dollars and one
year of experience for the second person this is the data and so on after
we estimate the model simple linear regression model this would be the equation of
the line of the estimated line so here we would have y hat equals beta 0 hat would
be 20 plus beta 1 hat would be 0.5 times x so suppose this is the equation that we
estimated so because we have this equation we can calculate the predicted value the
predicted value is if we substitute the formula here with x whatever the actual
value of x is
for the first person so the predicted value would be 20 plus 0.5 times 1
and for the second person would be 2 1 and 3. so i'm basically just replacing the x
with the values that we have there so these are now the predicted value the
residuals you had would be the difference between the actual and the predicted
value so this is the actual value uh whatever the person has or hourly wage and
this is what we predict using the model and that's how we calculate the residuals
here so we can now plot this estimated uh regression line
uh on this graph here and this would be actually the equation for it y
hat equals 20 plus 0.5 times x so if you look at this line it would hit here at
exactly a 20 because when x is 0 y would be 20. so that would be considered the
intercept the slope would be 0.5 so this means that for every 1 unit increase here
of x for every 1 year additional experience we would actually see um 20. we would
see 0.5 increase in the hourly wage so basically 50 cents so this is the slope of
this line is rising at 0.5 so
these are the predicted wages so the y hats always follow on this line
and the y the actual values are here so for example let's track this first person
20 is the actual wage so this is the actual wage and the predicted wage is right
here 20.5 so right here on the line for the second person or for the third person
21 is the actual wage with one years of experience so this is this dot is for the
third person here and the predicted value for them is again on this regression line
so again y hats are here and these are
the actual points here if we take the difference between the actual and
the predicted value this would be the residual here so for this person we have a
positive residual for this person we have minus 0.5 a negative residual and the the
way that we came up with this um estimated uh line estimated equation is we
actually try to plot a line here that is as close as possible to all the actual
points in the data and that's how we came up with this line here so again our
simple regression is hourly wage depends on
years of experience and here i have shown you both in a table and in
graph how this looks like so let's go a little bit more generic uh here and talk in
general about actual values predicted values and residuals so if we have the data
points being these uh points right here these are the actual points for y for our
dependent variable let's suppose here is a y i this is an actual point and this is
the predicted value that we have or here is also called the fitted value y hat so
the difference between the actual
value and the predicted value y minus y hat that is the you had the
residual so in this case we have a positive residual so for this point here we have
a negative residual and so on so for this point right here we have this being the
actual value the predicted values right here on the line so in this case the
residual would be this difference right here it would be a negative value so one
important thing to notice is that we will care but a lot about these u hats the
residuals and we would think about their
one unit this is the generic interpretation beta 1 hat is also called the
slope in the simple linear regression and i showed you why on the graph that we saw
and the reason why we call it the slope is because it's a derivative of a function
is another function that shows the slope so in y equals uh beta 0 plus beta 1 x 1
plus u if we take the derivative basically we would find uh that this would be the
slope would be a this beta 1 that we we talked about so this formula above here
would be correct
equation that we have seen before um given x and then because of the
properties of the expected value beta 0 and beta 1 x they they don't vary there no
therefore the expectation of beta beta0 is beta zero um so they would just
basically come out of these expectation uh term and then we have plus the expected
value of u given x so if we assume that this value is zero then we would have this
being equal to beta0 plus beta1 x so this assumption is a very important assumption
and we would call it zero conditional
mean later on and i would explain it a lot more so what this population
regression function is showing you that the expected value of y given x for the
population is a linear function of x so it's beta 0 plus beta 1x so um again that's
that's what this represents so let me show you graphically what this means so here
is a population regression function the expected value of y given x equals beta0
plus beta 1x so this one looks like the estimated regression line that we have y
hat equals beta 0 hat plus beta 1 hat
beta 1 hat times u times x but it's a different uh this one is for the
population the other one comes estimated using the sample that we currently have so
again this could be a an actual value but again these are the errors that we truly
do not know because uh they're coming from the population and we do not know and we
would never know the values of these parameters we would just only know
coefficients that are estimated with sample data so what this population regression
function also shows is that for a given value of x that
the expected value of y given x would be right here on this line but the
the value actually could be anywhere along here so it could be here here here here
and many of you can recognize this being as the normal distribution but most likely
will be around here and the expected value or the average of it would be right on
this population regression line one small point here to note is that here x1 x2 and
x3 refers not to different uh variables but just x i a given a given number or a
value of x rather than the variable uh
variable one variable two and variable three so let's uh think about
derivation of the orless estimates for a regression model we have y equals beta0
plus beta 1 x plus u so in order to estimate this regression equation which is y
hat equals beta 0 hat plus beta 1 hat times x we need to find these coefficients we
do not have these coefficients so how are we going to find them is by first
calculating residuals so we have u hat equals y minus y hat so basically the
residuals are the actual minus the predicted value
well this predicted value we can replace with the function that we have
here so we can replace with this expression so now we have a an expression for u
hat for these residuals and how we can find these two parameters is we can find a
random sample of data that has n observations um so this would be x i and y i where
we would have each observation i would be anywhere between 1 to the total number of
observations n and the goal would be to obtain as good fit as possible of this
estimated regression equation so what
does it mean to have as good fit as possible well we will minimize the
sum of squared residuals so this is what this function is doing we're minimizing
the sum of the squared residuals so again minimize the sum of the squared residuals
so we have the residuals we're going to square them and we want those differences
those residuals to be the square of them to be as small as possible so we have
already an expression for this u hat so we're going to substitute it here and the
way we minimize a function we will basically be
taking the derivative of it putting it to zero and so these expressions
are going to come from this minimization process so we would be obtaining the ls
coefficients as beta 1 hat equals and it would be summation of x i minus x bar
times y i minus y bar and in the denominator we would have x i minus x bar squared
so if you look very carefully at this expression this is exactly the formula that
we have for the covariance of x and y so basically we are thinking about how does
each x differ from its mean and how does each
y differ from its mean and in the bottom x minus x bar or x minus the
average x this is a part of the formula for the variance of x so this is what the
slope would be how does x vary with y divided by how does what's the variance of x
or in other words the covariance of x with itself and beta 0 hat this is the
intercept that would be equal to the average value of the dependent variable minus
the estimated coefficient that we have here times the average value of x so onls is
called ordinary least squares
and it's based on minimizing the squared residuals it's least because
it's minimum and squares is because we're squaring them and it's ordinary because
it's not weighted or any other kind uh so that's where the oil s term comes from so
again oil less is the method the ordinary least squares is the method we use to
obtain the coefficients and that's why they're called or ls coefficients and we do
that by minimizing the sum of square residuals so we basically want to be as close
as possible to all of the actual points
for y so based on uh uh the formulas that we have these are some of the
properties of these uh of the ols estimators first of all the we have y bar equals
beta0 had plus beta 1 hat times x this is coming directly from the formula that we
have of beta0 had just rearranged and so what we're saying here is that the sample
average for the dependent variable and for the independent variable are on the
regression line because this is the line we actually estimated estimating and so
the averages would be
on that line the second property that we have here is the summation of u
hat equals zero so basically the summation of the residuals equals zero so these
residuals uh summing up to zero means that if we have like values of the actual uh
y above the regression line we would have as many below at such that these
distances sum up to zero so note that we minimize the sum of the squared residuals
but if we just sum up the residuals they will sum up to zero and the final property
here is that the summation of x
times u hat equals zero so basically what we're saying is that the
independent variable times the residuals the summation of that equals zero in other
words the covariance between the independent variable and this residuals is 0. and
that's a very important property because what that means is that the independent
variable and the residuals would not be correlated in any way so we wouldn't see
that when x is increasing u would also be increasing the residuals would also be
increasing so this was the theoretical uh
introduction of the ols estimators for the simple regression so now let's
turn into proceeding with some examples so let's look into ceo salary and we would
be estimating simple regression model to explain how the return on equity or roe
affects the ceo salary so how we had the regression model before y equals beta0
plus beta 1 x plus u now i replaced y with the salary which is for my particular
example and x is roe so this is the regression model that applies to the population
so these are the parameters we don't know what these
are this is the error we don't know what it is but if we have sample data
which we do um we can estimate this equation salary hat that's the same as the y
hat equals beta 0 hat plus beta 1 hat times r o e and once we have these uh
coefficients uh we would be also looking at the coefficient with the at the
residuals u had which would be the actual value of salary minus the predicted value
of salary so we're going to be estimating this regression to find these
coefficients here and we would be interpreting beta 1 had
as the change in the ceo salary associated with one unit increasing
return on equity holding other factors fixed so here's the estimated equation after
we had the data and we actually ran our model so here beta 0 hat would be 963 and
beta 1 hat would be 18.5 so again these are the estimated coefficients and they
came you know based on our sample data so salary here is measured in thousand
dollars and return on equity is measured in percent it just so it happened to be
measured in percent so again beta 1 measures the change in
the ceo salary associated with one unit increasing roe holding other
effectors fixed so how would we interpret this uh beta 1 coefficient here well we
would say that the ceo salary increases by 18.501 units well what are the units for
salary well this is measured in thousand dollars so that's how we got eighteen
thousand five hundred and one because it's 18.501 thousand dollars is the units and
so for each one unit increase in roe what are the units of the roe they're in
percent so that's why we say
for each one percent increase in roe and that's how beta zero is inter
beta one head is interpreted the interpretation of beta zero head uh that's the
intercept is that if roe is equal to zero like if this is equal to zero then the
predicted value of the salary would be uh 963 units so the 963 units are in
thousand dollars so that's how we would interpret that now i would show you just
one time how we estimated this model and we did that with stata so this is how a
output for a simple regression looks
like this one came from stata but r or sas or any other software would
give you similar regression output and we will be doing the regret regression of
salary on roe and every kind of computer program gives you uh these coefficients
here so this is the dependent variable salary this is the independent variable and
this is the constant or or the intercept and so we would be picking the
coefficients from here so notice that in the previous slide i said that salary
equals 963 plus 18 times roe well this is 963
is the uh the constant here that we have and 18 is the coefficient on roe
and that's what this value is so whether you see a regression output like that that
came from a statistical program or you see an estimated equation like this they
mean exactly the same thing so to make things even more interesting how economists
like to present the results is usually in a table and this is the most common way
we would see this uh later on and here um this is the dependent variable here is
the independent variable roe and
the same output uh basically estimating the coefficients so you could put
it like that in an equation you could put it in a table or you can look straight at
the output from a statistical program so in this case um we estimated um the
regression line and that's how it looks like so this is the equation that we
estimated and we know these coefficients after we ran the model notice that the
population regression function for the population is still unknown uh so we will
never know the true values
of these parameters beta0 and beta1 but using our sample exactly of the
ceos that uh and their salaries their that are in our data we calculated this with
sample data so this is coming from the sample well we know and this is the
population well if we get a different sample we would be able to estimate a
different regression line maybe it would come something like that but those would
be different coefficients based on a different sample but this is the true model
that we don't know and we're basically trying to estimate these
parameters and we're getting these coefficients here so here's how the
estimated regression looks like if we estimate the model so we can actually plot
here on the horizontal line it's the x variable return on equity and on the
vertical line here that is our y variable and this is salary in thousands of
dollars and so here's how the data looks like they're like different uh points this
is the actual data and the way we estimated the regression line right here is that
it actually passes as close as
the actual values are the dots around it so now how can we calculate the
residuals if these are the predicted values right here right on the line and these
are the actual values well a residual then is the difference between the uh the
actual or the true value and the predicted value right here so a residual would be
a a value such as uh this value that would actually be the residual for for this so
here are the residuals right here in in green and what they're doing is we already
said this these residuals
are actually if you look at that they're centered around zero so have we
seen this before yes we have because the summation of all these residuals is 0. so
it means that we have as many on below to above not not just as far as how many but
the distance here so the sum of these residuals below and above the line they they
should basically sum up to zero and so what we've effectively done is instead of
having the slope here we we did away with this slope and we just got left with the
with the residual uh values
so um everything that we said about this line here is also true here it
would hit the uh the zero here at 900 and something and if you see out of 5000 this
will be hitting right here at 963 or something like that so this is the intercept
that we saw and the slope of this is 18 so again the slope is like for each unit
increase in return on equity we would see 18 thousand dollars increase uh in uh in
in um salary of the ceo okay so to look into this yet another way this is how the
data looks like so we
have the salary uh here's the salary of the ceo this is basically in a
thousand dollars and roe the return equity for their firm is in percent measured in
percent so if we estimate um the predicted value that comes by just looking at the
estimated uh coefficients so this was the intercept plus the slope times the roe so
we can actually uh come up with the predicted value by just uh looking at uh at
this function and so for the first value here we have this number plus this times
the roe is this number
well that's how we got this number uh salary had for the second one we
have this number plus this number times the rose right here 10.9 and that's how we
got this number and so forth this is how we calculate the predicted values for the
dependent variable and the residuals are basically the actual value of salary which
is right here minus the predicted value right this one so this number minus this
number that's the residual this number minus this number that's the residual and
notice that we have negative
or positive residuals and if you sum them all up the mean for these
residuals would be zero if you actually find the mean for this predicted values
that would be exactly the mean as for the salary and for the actual values of of
the salary and these things are not coincidental they come from the properties of
the ols okay so now let's continue with another example this one is for wage and so
we can consider a simple regression model explaining how education affects the
wages for workers so in this case our regression model
would be wage equals beta0 plus beta1 education plus u so we're trying to
explain what is the hourly wage for workers given their education the estimated
equation would be wage hat equals beta0 had plus beta 1 hat times education so
these would be the estimated coefficients here and after we know the coefficients
we can calculate the residuals which are the actual minus the predicted value for a
wage so here beta1 head would be measuring the change in wage associated with one
more year of education
holding other factors uh fixed so you can say also how does uh wage
increase when um education increases by one year notice that as we talked about
before there is no correlation causation here everything is correlation uh when we
interpret these results and we also need to say holding other factors fixed because
in this case we're holding everything that is unobservable in the error term fixed
as well so if we estimate the equation we obtain that the intercept is minus 0.9
and the slope is point 54.
because there's no one in the sample that has um zero education so here's
how regression output looks like uh again this is coming from state it could be
coming from anything else these coefficients uh we see them right here so this is
the intercept minus point nine and this is the slope point point four uh and that's
what we're concentrating on right now so that's that's where to find them in a
regression output okay so let's uh talk then about variations uh that we have so
the first variation that we're going
to be talking about is sum of squares total and that would be the sum of
squares measuring the total variation in the dependent variable so here we would
have the actual value minus the average value squared this will be the total
variation in the dependent variable sst sse would be the explained sum of squares
and that would be the difference between the predicted value and the actual value
squared and the sum of it and then the sum of squared residuals would be the
difference between the actual value and the
regression and some called the residual sum of squares the r uh call it
sum of squares for the error and so in this case e and the r are very confusingly
reversed so when you see sse and ssr always double check what what the what they
mean okay so let's look at them on a graph so what did we mean by these things so
suppose we have an actual value of the the dependent variable y and it's here so
this is y here this number is y hat the predicted value this number right here is y
bar this is the average value for the
or that's the residual and the total variation is the actual value minus
the average uh the average value so this is the total variation so that's how we
are breaking down the total variation of being what we can explain with the
regression and here is the residual what we cannot explain with the regression and
what we want is as much as possible to be able to explain and as little as possible
that we're not able to explain for the regression okay so this leads us to a
goodness of fit measure we will call this r squared
and so r square is the explained sum of squares divided by the total sum
of squares so this is the explained sum of squares divided by the total sum of
squares and because sse is equal to sst minus ssr this is the same expression as
this so what the r square measures is the proportion of the total variation that is
explained by the regression so this is what we can explain with the regression
divided by the total variation in the dependent variable so r squared of 0.7 would
be interpreted as 70 of the variation is explained by the
regression and the rest is due to error and typically this like a rule of
thumb but not always used but typically an r squared of greater than 0.25 is
considered a good fit so if we can predict with our regression if we can explain at
least 25 percent of the variation we have a very good regression so here is how r
square is calculated if we look at the regression output it would give you ss the
sum of squares this would be the total sum of squares this would be the residual
sum of squares and we call this uh explain but they're
calling here a model sum of squares so this is the same as explained and
that's that value r squared is actually automatically calculated and it's 0.16 but
if we want to calculate it by hand all we have to do is divide the ss for the model
or the explained by the ss total so i will be dividing this number by that number
and i will be getting an r squared of point sixteen so the way to interpret this is
that sixteen percent of the variation in the wage would be explained by the
regression and the rest is due to error
well this is not a very good fit because we can only explain sixteen
percent of the variation with this regression the rest is actually due to error so
it's not it's not a very good fit okay now completely changing gears we would be
talking about log transformation or logging variables so sometimes variables y or x
are expressed in logs such as log of y or log of x and so with logs the
interpretation is not in units but it's in percentage or elasticity so so why would
we use logs well variables that are age or education that
are measured in units such as years should not be logged why because
oftentimes the interpretation is in percentage so what would it mean to be one
percent older right we wouldn't say this we would say one year older this is why
we're not logging variables that are measured in years variables measured in
percentage points such as interest rates also should not be logged because if we
have a one percent as as interest rate we shouldn't be further logging this because
what's a percent increase in in
over the change in log of x well by the properties of the logs the change
in log of y is actually delta y divided by y and the change in log of x is delta x
divided by x which went into the numerator but if you look at this formula what is
the change of y over the value of y well that is the percent change in y and what's
the change in x divided by x well that is the percent change in x so here we would
be interpreting this beta one coefficient as the dependent variable changes by
beta1 percent
when the independent variable changes by one percent so then we have the
long linear form also called the semi-log form here is where we can log the
dependent variable but the x variable would not be logged so in this case beta 1
hat would be the change in log of y divided by the change in x and so by the
properties of the log here this will have change in y divided by y and here we will
just have the change in x so here we would just have the percent change in y but
here we would have change in x how would we interpret this
coefficient well we would say that the dependent variable changes by beta
1 times 100 percent when the independent variable changes by one unit then we have
the linear log form and so here in the linear log form the y variable is the same
but the x variable is the one that is log the independent variable so this beta 1
would be the change in y over the change in log of x well this change in log of x
is the same as changing x divided by x so here we would have the change in y divide
the percent change in x so the dependent variable y
would be changing by beta 1 hat divided by 100 units when the independent
variable changes by one percent that would be the interpretation here okay so let
me give you examples i will start with the easy example first here we have data on
wages hourly wages and this is the log wage we're taking logs of this wage variable
and this is education years so notice what this log does here we have a very a very
large value and notice how taking the logs made these values much more similar to
the rest of the data
that would be important later on when i show you a graph so here is how
we have the linear form wage on education so here we have the education on the
horizontal line and wage on the vertical line these are all the data points and you
see like how we have some large values here for for the wages and this is the
estimated regression line now if instead of using y the wage as the raw variable we
take logs of y we would have these points you see how like they're a lot more
closer together than say these points so these values
right here that had very high values are now these values right here that
are much much closer uh next to the rest of the points so again that taking logs
kind of helps put the data in in like similar ranges and this is how the log linear
uh form looks like with the estimated regression line so if we estimate the
regression these are the results so here we have wage this the linear model where
we have wage regressed on education and here we have the log wage regressed on
education so this is the
got closer together and instead of uh salary here we have the log of
salary and these points here got closer together again you don't see these points
here as much anymore so this is how the linear uh form looks like this is how the
log log form looks like and the estimated regression line so the other way are if
we just take log on this side so we brought these points down closer to each other
but we still have these points here in sales there in the original variable or the
other way is you could leave the
y variable as the original variable but take the log of the sale so now
these points are closer to each other but these points are still far away so we can
estimate four different forms depending on which variable we decide to log and
which one to not log let's interpret the coefficients so this is how the table
looks like in terms of different variables so the first the linear form is the
traditional model where we have salary regressed on sales so how we're going to
interpret this coefficient is that
when for each one unit increase in sales well sales is measured in
million dollars so for each 1 million dollars in sales increase we would have that
salary increases by the coefficient and because salary is measured in thousand
dollars by so basically it's increasing by point one five five thousand dollars
well the way to read this better is by saying uh by a hundred and fifty-five
dollars for the log log form uh we would have to interpret and this is an easy
interpretation where if sales increases by one percent then
salary uh would increase by 0.25 now notice that we're not saying the log
salary is increasing by that but the salary itself is increasing by 0.25 percent in
the log linear the interpretation of this coefficient which is very very small is
that salary would increase by point zero zero fifteen percent and that is basically
this coefficient that we see here times hundred for each additional one unit
increase in sales and sales are measured in million of dollars millions of dollars
so with the final
form we have that when uh sales increases by one percent we would see
that salary would increase by 2.64 thousand dollars basically we pick up this
coefficient we divide by 100 and we attach the units behind it so again depending
on which which of these models you want to estimate which of these transformation
you could have very different interpretation of these coefficients and that depends
basically on what what you would be interested in so with that said the review
questions for this part would be to define the
regression model estimated equation and residuals think about what method
is used to obtain the coefficients what are the all s properties how is the r
squared defined and then what does taking of logs of the variables do so now let us
continue with the gauss markov assumptions for the linear regression model these
are the standard assumptions that we need uh for the models and they are one
linearity of parameters two random sampling three no perfect collinearity or we
have the sample variance in the independent variable
this is for the case of simple regression then we have exogeneity or zero
conditional mean and what that means is that the independent variables or
regressors are not correlated with the error term and then the last assumption we
have homoscedasticity which means that the variance of the error term is constant
so let's review each of them in detail assumption number one means linearity in
parameters so here we have the linear form y equals beta 0 plus beta 1 plus x plus
times x plus u here the relationship
between y and x is linear in the population so notice that the beta 0 and
beta 1 parameters are entering linearly in this function so note that the
regression model can have different types of variables that are for example log
variables like the way we saw it with log sales early on or it could be squared
variables such as education squared or in the multiple regression could be
interactions of variables such as education time experience but the beta parameters
are linear so basically if even if you give an x to a computer
going through all of the n observations in the sample so we want the data
to be a random sample drawn from the population where each observation would follow
the population equation so let's say we have data on workers so we have wage and
educational workers so suppose all the population is all of the workers in the u.s
about 150 million and the sample is the workers that are selected for the study so
about say a thousand people so we need to draw randomly from the population which
means that each worker would have equal
probability of being selected and that's a very important assumption
sometimes that's not how sampling is done and for example young workers could be
oversampled so you could have more young workers in the sample but that would not
be a random or representative sample and therefore if you try to make inferences
about the population they're not going to be correct if you don't have random
sample to begin with assumption number three is no perfect collinearity basically
what that means is that no two variables
move together exactly in the same way so in the simple regression with
only one independent variable that assumption means that there needs to be sample
variation in the independent variable or the variance of x needs to be positive so
the way to read this is the sum of squares total for x which is each value of x
minus the deviation from the mean squared needs to be positive so not all of the x
needs to be the same number such as if you have education but all everyone in the
sample has 12 years of education
you cannot estimate a model like that and the reason for that is because
this uh variable of education of 12 years for everyone would be perfectly
correlated with the constant in the model so therefore you cannot estimate the
model another way to see why we cannot estimate such model is that this sst or the
deviation of the sample value for observation i minus the mean is in the
denominator so if there's no variation and each one is equal to the sample mean of
x well this value would be a zero we
errors must sum up to zero but here we're saying that they must stop sum
up to zero for each value of x so basically uh for each value of x you want those
errors to cancel out let me show you an example so let's say we have a regression
model where we have wages regressed on education so we want to explain how wage
varies based on education and the education of a person so now suppose that the
ability of a person is unobserved if such variable is unobserved it will be part of
the error term right
here and it will be included in u so one of the issues that happens here
is that when we have that the ability here being higher that may likely drive that
also the education for that person is higher so here we would have a violation of
the zero conditional mean assumption because when you would you would be higher
when x would be higher we we we do not want that we want that no matter what uh the
value of x is we want you to be independent of that so let me show you an example
of this in graphs so here we have education
and suppose these are the residuals here that we have so what this means
is that for each value of education so pick for example education 12 we have that
the expected value of the residual of the error term equals zero so basically we
could have here these residuals being above zero or below zero but their expected
value is zero so on average we would have uh those uh those residuals being equal
to zero and so independent of where we're at for x each of these would be um the
expected value of of these would be equal to 0.
now i just made up this example where this is not the case so if we have
a case something like that and these are now the residuals notice that here if we
have high value of education we would have higher expected value of the residuals
here if we have lower value of x of education we would have lower lower expected
value of the residual so here is the case where again suppose that ability is
wrapped up in the error term we have people that have higher education also have a
higher ability and that's a
violation of the zero conditional mean assumption and it's the case where
we have endogeneity here we have exogeneity so this is a very important assumption
that we need to have in order for this is very important assumption that we need to
have so why are these assumptions important well the gauss mark of assumptions one
through four which are linearity random sampling no perfect linearity and zero
conditional mean that we talked about they lead to the unbiasedness of the rls
estimators and what unbiasedness means
is that the expected value of beta0 hat equals beta 0 and the expected
value of beta 1 hat equals beta 1. so what this means is that the expected value of
the sample coefficient is the same as the population parameter so again we don't
know what the population parameters are we would never know but when we get a lot
of these sample coefficients with different samples the expected value or the
average value of these is always going to be the population parameter um so again
for any given sample the
uh in this case that we have here is that for low values of education we
have much tighter distribution here of the error term and for higher values of
education we have much more spread distribution so with the example that i gave you
of ability if that is in the error term we could have if education is low people
have very may have very similar ability to each other but in the case of say of
high education people may have very different ability from each other and so that
would be again a violation of this homoskeleton
people are very much all over the place so from very low to very high
ability so again we want homoscedasticity as our fifth assumption not a case of
heteroskedasticity so if we detect this in our data uh there are some corrections
that we need to do uh with with the data so now um let's talk about estimating the
variance of the error term uh so how we can calculate this variance of the error
term sigma hat squared so uh we we would be using the following formula we would
just get the residuals
and this is that the gauss mark of assumptions one through five that we
talked about so far which is linearity random sampling no perfect linearity zero
means homoscedasticity that lead to the unbiasedness of the error variance so we
would have that the expected value of the sample variance is the same as the
population variance so not only that we have unbiasedness of the coefficients but
we also have this unbiasedness in the error variance as well and that's also a
desirable um desirable outcome here
so let's go over them the variance of the slope coefficient beta 1 hat is
sigma squared this is the variance of the error term divided by the summation of
each data point x minus its mean squared so the bottom we already saw this
expression before that this sst for x this is total sum of squares or this total
variation in in x and so again this is the uh variance of the error term divided by
the total variance of x and for the variance of beta0 hat we have almost similar
formula here but it's also multiplied by this expression
so let's uh the slope is what we really care about here and so let's
think when we're going to get the high variance for the coefficients well the
variance would be high when the variance of the error term is high and this
variance would be high when the total variation in in x is slow is low so do we
want high variance or low variance of our coefficients we want low variance we want
coefficients that are very what we call precise which means they don't vary very
much from sample to sample that's a desirable
property well the way we're going to get such desirable properties that
if we have if we want low variance in the coefficients we need to have low variance
in the error terms so not much variance in in error but we want high variance in
the independent variable so again we want that independent variable to be as
variable as possible like we don't want everyone in the sample to have 12 years of
education we want people with six years or uh people with 20 years of education so
that's a good thing
from the sample and so these standard errors um are the ones that are
actually also shown in regression outputs that i showed you early so i showed you
their coefficients and their standard errors next to them and so these are
basically how much the the uh coefficients vary from one sample to the next or how
precisely these coefficients are calculated so with that said the review questions
for the gauss markov assumptions are you need to list and explain the five of them
you need to know which assumptions are