Worksheet 7 Olympic Swimmers
Worksheet 7 Olympic Swimmers
Olympic Swimmers
Recall the Olympics case in which we were exploring the determinants of weight (in kilos) for
Olympic swimmers – in particular the role of height (in centimeters) and age.
For every additional year, Olympic swimmers, on average, gain 0.64 kg.
Holding height constant, for every additional year, Olympic swimmers gain 0.33 kg on
average.
c. Can you intuitively explain why the coefficient on age is much smaller in the third
regression?
By excluding height, the age coefficient is capturing some of the positive weight
effects that are associated with height (see regression 2) and people usually get taller
as they age (assuming they are young enough of course).
d. Which of the three models will you use to predict the annual increase in weight for a
swimmer who is 30 years old? What would your prediction be?
You would use regression 3 because you can control for age and this swimmer is
predicted to gain 0.33 kg.
e. What is your prediction for difference in weight between two fraternal twins, one who is
2 cm taller than the other? Provide a 95% confidence interval.
0.806∗2=1.612 kg
f. Somebody argues that once you control for height, age does not predict weight for this
population, as metabolism only starts to slow down in your forties. Can you prove him
wrong?
H0: B1age = 0
95 % CI : 0.326± 2∗0.070=[0.186,0.466 ]
¿
0.326−0
t−stat= =4.66
0.070
Since the confidence interval does not contain 0 and the t-statistic is greater than 2,
we can reject the null and prove him wrong.
MARCH MADNESS!
Some people claim that the No. 5 seed teams in the NCAA tournament are jinxed. Look at the
graph below to see why! This graphs the average win precentage over this period for this seed.
(Don’t know what NCAA or seed means? It doesn’t matter. Just think of it as a variable)
a. Let’s start with a simple regression. Define win_probability as the outcome variable and
seed as the explanatory variable (seed=1,2,3…8). What is your best estimate of the slope
based on the graph above?
Around -7. Change in y is close to 50, change in x is 7.
Now you construct dummy variables for the eight seeds: seed1=1 if the team is seed 1, and
seed1=0 if the team is not seed 1. Similarly define seed2, seed3, seed4, etc., through seed8.
b. b1 = 100
c. b8 = -47
d. b2 = -6
e. Based on your data, why do you think people say that “No. 5 seeds are jinxed?”
Because even though they are higher ranked than the six seed (and thus face a weaker
opponent), their probability of winning is smaller.
f. c1 = 94
g. c8 = -41
h. c2 = 6