Metrics Practice Test 1 Group 7
Metrics Practice Test 1 Group 7
1 hidden cell
This dataset pertains to the weights of infants and includes the following variables: bwght, age and education
level of both mother and father, the number of cigarettes the mother smoked per day during pregnancy (cigs),
the number of alcoholic drinks the mother consumed per day during pregnancy (drink), the number of
prenatal care visits during pregnancy (npvis), and the ethnicity and race of both mother and father (mwhte,
mblck, moth, fwhte, fblck, foth), among others.
1. Provide summary statistics (mean, median, standard deviation, maximum value, minimum value) for the
variables bwght, cigs, drink, npvis, mwhte, and fwhte.
2. Run a simple regression with bwght as the dependent variable and cigs as the independent variable.
3. State the null and alternative hypotheses to test whether smoking during pregnancy affects birth
weight.
4. Using the provided output, conduct the test at the 5% significance level. You can use the rule of thumb
for t-tests.
5. Interpret the coefficient associated with cigs.
6. Do you think that npvis has an impact on birth weight? If so, in the context of the regression in Question
2, state the Gauss-Markov assumption SLR.3 in your own words. Without performing data analysis, do
you believe this assumption is reasonable in this context?
7. The variable lbwght is the log-transformed birth weight. Regress this variable on cigs and interpret the
results.
8. Create a new variable called lcigs such that lcigs = ln(cigs + 1). Regress lbwght on lcigs and interpret the
results.
1 hidden cell
1. Provide summary statistics (mean, median, standard deviation, maximum value, minimum value) for the
variables bwght, cigs, drink, npvis, mwhte, and fwhte.
Hidden code
2. Run a simple regression with bwght as the dependent variable and cigs as the independent variable .
Hidden code
Call:
lm(formula = bwght ~ cigs, data = df)
Residuals:
Min 1Q Median 3Q Max
-3061.71 -323.71 8.29 365.54 1782.29
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3421.711 14.145 241.897 < 2e-16 ***
cigs -11.478 3.245 -3.538 0.000414 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3.State the null and alternative hypotheses to test whether smoking during pregnancy affects birth weight.
Null Hypothesis: Smoking during pregnancy has no effect on birth weight. In other words, the coefficient
that measures the impact of cigarette consumption on birth weight is zero:
H0 : βcigs = 0
Alternative Hypothesis: Smoking during pregnancy does affect birth weight, meaning the coefficient is
not zero:
HA : βcigs ≠ 0
4. Using the provided output, conduct the test at the 5% significance level. You can use the rule of thumb for
t-tests.
Hidden code
Call:
lm(formula = bwght ~ cigs, data = data)
Residuals:
Min 1Q Median 3Q Max
-3061.71 -323.71 8.29 365.54 1782.29
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3421.711 14.145 241.897 < 2e-16 ***
cigs -11.478 3.245 -3.538 0.000414 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = bwght ~ cigs, data = df)
Residuals:
Min 1Q Median 3Q Max
-3061.71 -323.71 8.29 365.54 1782.29
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3421.711 14.145 241.897 < 2e-16 ***
cigs -11.478 3.245 -3.538 0.000414 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The coefficient associated with cigs, holding other factors constant, is -11.47833
The coefficient for cigarettes (cigs) is -11.478, which means that for each additional cigarette smoked per day
during pregnancy, birth weight decreases by approximately 11.48 grams, holding all else constant. This effect is
statistically significant at the 1% level (p = 0.000414).
6. Do you think that npvis has an impact on birth weight? If so, in the context of the regression in Question 2,
state the Gauss-Markov assumption SLR.3 in your own words. Without performing data analysis, do you
believe this assumption is reasonable in this context?
Npvis (number of prenatal visits) likely has an impact on birth weight since frequent prenatal care can greatly
affect the trajectory of fetal development.
The Gauss-Markov assumption SLR.3 (zero conditional mean) in this context: The expected value of the error
term, given any number of cigarettes smoked, should be zero. In mathematical notation:
E(u ∣ cigs) = 0
This assumption is likely violated in this context because there are other important factors affecting birth
weight (e.g npvis) that are not included in the simple regression. Women who smoke during pregnancy might
also have other behaviors or characteristics that affect birth weight (such as diet, alcohol consumption, or
socioeconomic status). These omitted variables could be correlated with smoking behavior, leading to biased
estimates.
7. The variable lbwght is the log-transformed birth weight. Regress this variable on cigs and interpret the
results.
Hidden code
Call:
lm(formula = lbwght ~ cigs, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.23489 -0.08507 0.01932 0.11840 0.43619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.120997 0.004940 1643.877 < 2e-16 ***
cigs -0.003436 0.001133 -3.032 0.00246 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The estimated coefficient on cigs is −0.003436. This means that a one-unit increase in the consumption
cigarette is associated with about a 0.34% decrease in birth weight. The regression result is statistically
significant (p = 0.002465), indicating a robust relationship between smoking and lower log-transformed birth
weight.
8. Create a new variable called lcigs such that lcigs = ln(cigs + 1). Regress lbwght on lcigs and interpret the
results.
Hidden code
Call:
lm(formula = lbwght ~ lcigs, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.23611 -0.08305 0.01883 0.11890 0.43497
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.122209 0.004983 1629.974 < 2e-16 ***
lcigs -0.023707 0.006750 -3.512 0.000456 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The regression results show that a 1% increase in cigarettes (after the log transformation) is associated with
approximately a 0.024% decrease in birth weight. This relationship is statistically significant at the 1% level (p
= 0.000456).