2021 Quiz2 Sample
2021 Quiz2 Sample
Name:
Instructions
1
True/false questions
Problem 1.
For a fixed set of covariates, a fitted L2 regularized (ridge) regression model will typically have more non-zero
coefficients than an L1 regularized (lasso) regression model.
Problem 2.
Consider a linear regression model. The width of a prediction interval for a specific response will typically
be wider than the width of the confidence interval for the mean response.
Problem 3.
You are given a dataset of two variables: hon and female, where hon is a binary variable indicating whether
or not a student is in an honors class, and female is a binary variable indicating whether or not the student
is female.
You train the following logistic regression model:
model <- glm(hon ~ 1 + female, family = binomial, data = honor)
Problem 4.
For an (unregularized) logistic regression fit on some training set, the AUC on a test set will typically be
lower than the AUC on the training set.
2
Multiple choice questions
Problem 5.
You are given the model, house_price ~ 1 + year_built + sq_feet + school_district_quality,
where year_built and sq_feet (the square footage of the house) are continuous covariates, and
school_district_quality is a categorical variable expressing the quality of the school district, with four
possible values: A, B, C, and D. How many coefficients does the model have (including the intercept)?
(a) 7
(b) 6
(c) 4
(d) 3
3
Problem 6.
Included below is a plot of two variables x and y.
−5
y
−15
0 2 4 6 8 10
x
Let ρ denote the correlation coefficient between x and y. Which of the following best describes the value of ρ?
(a) ρ = −1
(b) −1 < ρ < 0
(c) ρ = 0
(d) 0 < ρ < 1
Problem 7.
Consider the following plot showing the ROC curves for two different logistic regression models A and B.
Which of the following is a valid conclusion to draw about the performance (as measured by AUC) of these
two models?
(a) Model A has better performance than model B
(b) Model B has better performance than model A
(c) Models A and B have fairly comparable performance
(d) There isn’t enough information in the plot to draw meaningful conclusions about the performance of A
and B
4
Problem 8.
Which of the following statements about model selection is NOT true?
(a) Regularization can help reduce overfitting
(b) The train-validate-test process can help detect model overfitting
(c) Cross validation is particularly helpful when you have a lot of training data
(d) Maximizing performance on a training set can lead to overfitting
Problem 9.
Researchers are planning to conduct a study of the effect of working in a factory on health outcomes. The
researchers plan to study workers from one factory and compare them with retirees who have never worked
in a factory. A reviewer of the research proposal is worried about selection bias. Which of the following best
represents the reviewer’s concern?
(a) Retirees should not be compared to factory workers because factory workers are under more stress than
retirees
(b) Retirees should not be compared to factory workers because factory workers’ incomes differ from those
of retirees
(c) Retirees should not be compared to factory workers because factory workers are likely to need to
maintain a certain level of health in order to work in a factory while retirees would not necessarily be
as healthy
(d) Retirees should not be compared to factory workers because factory workers likely live in a different
city than the retirees
Problem 10.
Suppose the ATE is greater than 0. Typically, is the magnitude of ATE among compliers (ATEc ) bigger,
smaller, or the same as the ATE?
(a) Bigger
(b) Smaller
(c) Same
(d) Not enough information to determine
5
Answer sheet
Name:
{True/false questions}
Fill-in the circle of the correct answer. (T = true, F = false)
1 T F
2 T F
3 T F
4 T F
5 T F
6 a b c d 16 a b c d
7 a b c d 17 a b c d
8 a b c d 18 a b c d
9 a b c d 19 a b c d
10 a b c d 20 a b c d
11 a b c d
12 a b c d
13 a b c d
14 a b c d
15 a b c d
Solutions
Problem 1: True
Problem 2: True
Problem 3: False
Problem 4: True
Problem 5: (b)
Problem 6: (b)
Problem 7: (a)
Problem 8: (c)
Problem 9: (c)
Problem 10: (a)