Chapter 2, Exercise 2
Chapter 2, Exercise 2
(a) regression problem; inference; n=top 500 firms in the US; p= profit, number of employees,
industry
(b) classification problem; prediction; n=20 similar products; p=price charged for the product,
marketing budget, competition price and ten other variables.
(c) regression problem; prediction; n=52 weeks for all of 2012; p=the % change in the US market,
the % change in the British market, and the % change in the German market
Chapter 3, Exercise 1
In the case of Table 3-4, for the null hypotheses of “TV”, it means “TV ads” have no influence on
“sales” in the presence of radio and newspaper. By the same token, the null hypotheses for
“radio” are also obtained when “radio ads” have no influence on “sales” in the presence of TV
and newspaper. In addition, the same is true for “newspapers”.
When p-value of TV is low, it means that the null hypotheses are false for TV. And p-values of TV
in table 3-4 is less than 0.0001. So, we can know from that the relationship between TV and
sales is exist. Besides, the same situation is with the radio. We can also draw the conclusion that
there is a relationship between the radio ad and sales. Conversely, when the values are high,
just like the p-value of the newspaper (0.8599), the null hypotheses are true.
Chapter 3, Exercise 4
(a) The polynomial regression was expected to have a lower training RSS than the linear
regression. Because it fits the data more closely.
(b) Conversely, the polynomial regression now has a higher test RSS. The true relationship
between the X and Y is linear. Because of polynomial regression’s overfit from training. The
polynomial regression would have more error than the linear regression.
(c) The polynomial regression still has lower training RSS than the linear regression. Since during
the training progress the polynomial regression is the more flexible model. It can fit the data
set better than simply linear regression. So, the train RSS will be reduced.
(d) Since we don’t know the true relationship pattern between X and Y. So, we can’t tell which
regression’s test RSS would be lower.
Chapter 3, Exercise 9
(b)
(c)
i: since R-squared=0.821, we can draw a conclusion from that: there is the relationship between
the predictors and the response.
ii: constant, displacement, weight, year, origin have the significant relationship to the response.
iii:since the coefficient of year is 0.7058, year and mpg are positively correlated.
(d)
Yes, the residual plots does suggest that.
Also, we can know from the plots that there are many sample points with high leverage values,
especially the 13th one.
(e)
displacement : weight appear to be statistically significant. Since the p-value is less than 0.05.
Chapter 4, Exercise 9
Chapter 4, Exercise 13
(a)
(b)
Lag2 appear to be statistically significant. Since the P-value of the Lag2 is the only one in the
data set less than 0.05.
(c)
We can tell from the confusion matrix that there have 430 observations points belonging to down
were wrongly assigned to up. Also, 48 observations points belonging to up were wrongly assigned
to down. So, the overall fraction of correction prediction is 0.56
(d)
(e)
(g)
(h)
(i)
The overall fraction of correction prediction is the key point that wo can decide which method
provide the best result. Since the fraction of the logistic id the highest one, the logistic
regression has the best performance.
(j)
Based on the previous result, when K=4 the model can provide the best result.
Chapter 5, Exercise 5
(a)
(b)
(c)
(d)
From the results we can see that not including the dummy variable for student leads to a
reduction in the test error rate.