0% found this document useful (0 votes)
2 views

Sleep

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Sleep

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

STATISTICAL PROJECT

Mouhamed Fadhel hamza and Beñat Alzuri


INDEX
1. PART A – A WARM-UP...........................................................................................2
2. PART B – FINDING THE RIGHT SPECIFICATION.................................................3
3. PART C – EVALUATING QUALITY OF YOUR MODEL (POST-REGRESSION
ANALYSIS)......................................................................................................................7
1. PART A – A WARM-UP
 Is the impact of time at work on sleep statistically significant?

First, we will create the OLS model and with the information obtained (see the picture below) we will
write the lineal regression model:

Sleep = 3586,38 – 0,150746 x totwrk

With this information we can say that yes, it is statistically significant, as the p-value is lower than
0,05. Therefore, we reject the hypothesis that it is not significant. See the picture below.

 Which sign of the estimated coefficient do you intuitively expect? Explain briefly why. Is
the estimated coefficient in line with your intuition? If not, what does it suggest about your
model?

The sign of totwrk can be intuitively expected to be negative, as it makes sense that the more you
sleep, the less you work in total. The coefficient estimated by Gretl is in line with our intuition, as it is
negative: -0,150746.

 How much of variation in sleep does the model explain? What does the intercept in this
equation mean? If totwrk increases by 2 hours, by how much is sleep estimated to
change? Do you find this to be a large effect?

The intercept means that the amount of sleep per week for someone who does not work is 3586,38
minutes, as when totwrk is 0 sleep will be 3586,38. This comes to about 8,5 hours per night. If totwrk
increases by 2 hours, then totwrk will be 120, and so sleep = -0,151 x 120 = -18,12 minutes. This is
only a few minutes, so it is not a very large effect.
 How much of variation in sleep does your model explain? Is it enough?

R2 = 0.103. This means that 10,3% of the variation in the ideal minutes of sleep is explained by the
regression model. Variation is always between 0 and 100%. 0% represents a model that does not
explain any of the variation in the response variable around its mean, and 100% represents a model
that explains all the variation around its mean. Therefore, as 10,3% is close from 0% we can say that
it is too low ant it does not explain enough variation in the response of sleep.

 Do you believe that this regression model is correctly specified? Explain briefly.

No, the regression model is acceptable but it could be much better and more specified if more
variables were included. Sleep time does not depend just on how much you work. There are many
other aspects that have much to do such as leisure, study, etc. So, the regression model should
contain all the relevant predictors, including any necessary transformations and interaction terms.

2. PART B – FINDING THE RIGHT SPECIFICATION

Sleep in the function of totwrk, age and educ.

Sleep = 3638,25 – 0,148 x totwrk + 2,199 x age – 11,133 x educ


 a) Which signs do you intuitively expect on individual variables? Are the signs of
the estimated coefficients in line with your intuition? If not, which problems could
this indicate in your model?

The sign of totwrk and educ can be expected to be negative, as workers and students usually wake
up early to go to their job or school, whereas people who neither work nor study can wake up later.
On the other side, age can not be expected, as in our opinion sleeping is a matter of habits and
lifestyle, not age. There are young people who sleep a lot or very little as well as there are old people
who sleep a lot or little.

The signs of the coefficients are in line: totwrk: -0,148 and educ: -11,133.

 b) If someone worked 5 hours a week more, how would their total sleep time
change based on your estimates? Is it a lot or no, in your opinion?

Totwrk = 5 x 60 = 300

Sleep = -0,148 x 300 = -44,4

So, working 5 more hours per week leads to sleeping 44,4 minutes less per week. It is not a lot in our
opinion, as sleeping one hour less per week is nothing important.

 c) Are more educated people sleeping more than less educated or less, based on
your results? How much is the value added (or depleted) of one year of education
on the total length of sleep per week? What could be the intuition behind your
result? If the purpose of life is to sleep enough, is it worth studying?

Although educ has a negative sign, its p-value is higher than 0,05. Therefore, it does not matter its
sign as it is not a significant variable. Education does not affect in how much a person sleeps.
However, taking into account the coefficient of educ, each year of education would decrease total
length of sleep per week by 11,13 minutes. This could mean that the less is your education, the more
you will sleep. If the purpose of life was to sleep enough, the coefficient of educ tells that you should
not study. This does not make sense, so if look at p-value we will see that education is not a
significant variable.

 d) How much of the variation in sleep does your model explain? Is it a lot or a
little?

R2 = 0.113. This means that 11,3% of the variation in the ideal minutes of sleep is explained by the
regression model. Variation is always between 0 and 100%. 0% represents a model that does not
explain any of the variation in the response variable around its mean, and 100% represents a model
that explains all the variation around its mean. Therefore, as 11,3% is close from 0% we can say that
it is too low ant it does not explain enough variation in the response of sleep.

 e) Do you think the model is correctly specified? In other words, do you think it
would be appropriate to include other variables in the model? First, look at the list
of variables in the file. Which of them could explain the variability in total sleep
duration and why? Apart from these, can you think of variable(s) which may be
missing in the data file, but you think they should definitely not be missing in the
sleep model? If so, what does this mean for your ability to accurately model sleep
(if data is not available)? Regardless of your answer to the previous question,
assume that you do not lack anything important to model sleep time from now on
(i.e. assume that sleep is only affected by the variables that you have available in
the file). Which of the variables would you include in your final model and why?
Which signs do you intuitively expect for these variables and why?

Not yet as education and age are not significant variables. Moreover, it would be appropriate to
include more variables, as sleeping time depends on more factors apart from the chosen ones.
Among all the variables from the list, there are some that could explain the variability in total sleep
duration. Those variables would be: slpnaps (as if you sleep naps, at night you do not need so much
sleep), workscnd (as if you have two jobs maybe you need to get up earlier or go to bed later) and
yngkid (as children below age of 3 can make you sleep less because they cry at night).

Apart from these variables, there could be more variables that are missing in the data file. In our
opinion, psychological health should not be missing in the sleep model, as people with insomnia or
stress tend to sleep less. If there is data of significant variables such as missing, we can say that the
model is not correctly specified and we do not have the ability to accurately model sleep.

As explained before, we would include slpnaps, workscnd and yngkid. We expect that they will have
negative sign:

-slpnaps: if someone sleeps naps, he will not be tired at night and he will sleep less.

-workscnd: if someone has 2 jobs, he will probably need to get up earlier if he has to go to 2 different
jobs, or go to sleep later as he will finish work so late.

-yngkid: kids usually cry at night and wake up other people, this way interrupting their sleep.

 f) Estimate the full model that you found reasonable in the previous sub-question.

model 𝑠𝑙𝑒𝑒𝑝𝑖=𝛽0+𝛽1𝑡𝑜𝑡𝑤𝑟𝑘𝑖+𝛽2𝑒𝑑𝑢𝑐𝑖+𝛽3𝑎𝑔𝑒𝑖+𝜀𝑖 that you estimated earlier


Briefly comment on what has changed in the new model compared to the basic

and evaluate the results. Are the newly added variables statistically significant and
do they have correct signs? Has the explanatory power of the model increased?
When adding each variable and deciding whether it should be in the final version
of the model, 4 specification criteria (from Studenmund textbook, also in lecture
notes) can help you. Use them to select your temporary final version of the model.
Now think about the model and evaluate it:
The only significant variable is slpnaps, but for our logic it has the wrong sign, as it should affect
negatively to the night sleep and not positively as the model shows. Yngkid and workscnd are not
significant as their p-value is higher than 0,05.

However, the explanatory power of the model increased, as the R- squared value is now 79,8% while
in the previous models it was of 11,3% and 10,3%. So, our model explains the variability of the data
almost entirely.

Sleep =604,339-0,008totwrk+30,6yngkid+0.7908slpnaps

The model does change, the explanatory power of the model increased, as the R- squared value is
now 79,8% while in the previous models it was of 11,3% .
The only significant variable is slpnaps, Yngkid and totwrk are not significant as their p-value is higher
than 0,05. but for our logic it has the wrong sign, as it should affect negatively to the night sleep and
not positively as the model shows. So, our model explains the variability of the data almost entirely.

i) The variables are in the model which could have non-linear effect on sleep
is yngkid, it is not meaning if you have kids below of 3 not mean you cannot
sleep because the kids below 3 is not necessary not let you sleep crying all
time or maybe have babysitter care about them. so you can sleep despite
you have kids below 3 of age.

&

The following model results from re-evaluating the model and eliminating the variable yngkid to
prevent nonlinearity:
Also the variable totwrk present non-linearity
model by (at least) 𝑎𝑔𝑒2, i.e. estimate the model
o ii. Regardless of your conclusions based on the scatter graphs, extend your

𝑠𝑙𝑒𝑒𝑝𝑖=𝛽0+𝛽1𝑡𝑜𝑡𝑤𝑟𝑘𝑖+𝛽2𝑒𝑑𝑢𝑐𝑖+𝛽3𝑎𝑔𝑒𝑖+𝛽4𝑎𝑔𝑒𝑖2+(𝑜𝑡ℎ𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠
𝑜𝑓 𝑦𝑜𝑢𝑟 𝑐ℎ𝑜𝑖𝑐𝑒)+𝜀𝑖. Use the 4 specification criteria from the lecture to
evaluate whether 𝑎𝑔𝑒2 belongs to the model or not, evaluate and comment.

The sign of coefficients not make sense .

F-test:

H0 : β1= β2= β3= β4=0

F(5,701) = 36615.32> Fc= 2.70307 and p value less than alpha level => reject H0 so it is statistically
significant .
R2= 0.996186 so it is too high .

So we can be age 2 belong the model .

iii. Finally, if you haven't already done so, expand your model to see if sleep patterns differ
between men and women, or possibly for families with a small child (under 3 years). Comment on
the results (signs, statistical significance). According to your results, who sleeps more and by how
much? If there are other dummy variables in the file that could make sense, include them in the
model and briefly explain why.
The sleep partterns differ between men and women because :

Sleep =328.59 - 32.1807 male

If men male=1

Sleep=296.093

If women male =0

Sleep = 328.59

So the men sleep less than women.

Also, it is not statistically significant.

iv. Is it possible that the results in your model will be burdened by high multicollinearity or
heteroskedasticity? Which variables could correlate with totwrk? Think intuitively first, then
evaluate formally using appropriate tools. If there is multicollinearity or heteroskedasticity in the
model, what are the implications for the quality of the estimate and how would you deal with it in
this case?

I think the gdhlth “good or excellent health” , if you are in good health you can work more .
Ho: This is the null hypothesis of the test, which states that there is constant variance among the
residuals.

Prob > chi2: This is the p-value that corresponds to the Chi-Square test statistic. In this case, it is
0.159680. Since this value is higher than 0.05, we can reject the null hypothesis and conclude that
heteroscedasticity is not present in the data.

 g) Which of the estimated versions of the model do you think is the best and why?

sleep = 614.960-0.00724573totwrk+0.0385032rlxall+0.751541slpnaps -10.7936inlf


we think that this model is the best model that all sign make a sense for the coefficient of variable
and present R-squared=79%.

Also these variables have high correlation with sleep


3. .PART C – EVALUATING QUALITY OF YOUR MODEL
(POST-REGRESSION ANALYSIS)
R-test:

H0: ϒ=0 if rejected, this indicates signs of misspecification particularly nonlinearity in the data.

p value=0.0088 less than any alpha level => reject H0 : so the indicate signs of misspecification
particularly nonlinearity in the data.
It is not too much normally distribution

So we should change our model , so it can be other variable include in the model and have significant
effect on sleep .

Conclusion :

Our model presents the variable high correlated with sleep and have their coefficient make sense
also we have some variable is statistically significant. but we can find another model better include
other variable .

You might also like