0% found this document useful (0 votes)
29 views

Further Statistics

This document summarizes the statistical analysis of factors that predict baby's birth weight. The analysis included exploring relationships between birth weight and predictor variables like gestational age, mother's height and weight, and smoking status through histograms, scatterplots, and boxplots. Pearson correlation coefficients showed gestational age had the strongest positive correlation with birth weight. Linear regression found smoking status was a statistically significant predictor of lower birth weight, with smokers having babies weighing on average 0.375 kg less than non-smokers, after controlling for other variables.

Uploaded by

Alka Alka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Further Statistics

This document summarizes the statistical analysis of factors that predict baby's birth weight. The analysis included exploring relationships between birth weight and predictor variables like gestational age, mother's height and weight, and smoking status through histograms, scatterplots, and boxplots. Pearson correlation coefficients showed gestational age had the strongest positive correlation with birth weight. Linear regression found smoking status was a statistically significant predictor of lower birth weight, with smokers having babies weighing on average 0.375 kg less than non-smokers, after controlling for other variables.

Uploaded by

Alka Alka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

MASTER OF PUBLIC HEALTH

COURSEWORK ASSESSMENT SUBMISSION

Course Name Further Epidemiology and Statistics

Student id no. 2711522

1
QUESTION 1.a
This study aimed to investigate whether the birth weight can be predicted by gestational age, (in weeks) and other variables relating to the mother (mothers'
height and weight and whether or not the mother smokes). Length of the baby and head circumference are excluded because these variables are not related to the
mother's variables. Firstly, before doing the statistical analysis, we should inspect the variables' using histograms for assessing the skewness and the presence of outliers.
Then, we need to plot the data using a scatterplot to explore a possible relationship between the baby's birth weight and the predictor variables (continuous variables)
before performing the statistical test. The results for the graphs and plots are presented below.

Fig 1: The histogram for the continous variable

2
Based on the Fig 1:
- The birthweight distribution and maternal gestation show the bell shape curve, which means the birth
weight variable looks normally distributed..
The histogram for the mother’s pregnancy weight shows a skewed distribution to the left, which
means this variable does not look normally distributed.
- The histogram for maternal age shows a skewed distribution to the right which means this variable
looks not normally distributed.
- The histogram for mother’s height shows a skewed distribution to the left, which means this variable
does not look normally distributed.

The data can be plotted using the scatter plot to see the relationship between the birth weight and
the mother’s related variable.

Fig 2: Scatterplots show the relationship between a baby's birth weight and each predictor variable.

3
Fig 3: Matrix Scatterplots show the relationship between a baby's birth weight and each predictor variable

Based on the Fig 2 and 3:

From the matrix scatter plot in Fig 3, we can subjectively assume the relationship between outcome and
predictor variable. We can also see the relationship from a scatter plot foe each predictor variable in Fig 2.
- The top left-hand corner shows the first plot for the relationship between birth weight against
gestational age. The first plot shows a positive relationship and minimal variation between birth
weight and maternal gestation. The scatterplot's circles are fairly closely clustered around an
underlying straight line (instead of a curve or a random scattering). From this plot, as maternal age
increases, the average birth weight increases.
- The second plot seems that there is a widespread birth weight for any given maternal age, so the
relationship is not strong. The relationship between maternal age and birth weight has a weaker
relationship, and also there is a lot of variation between these two variables.
- Birth weight and maternal height have a minimal association, and there is a lot of variation between
the two variables.
- In the last plot, birth weight and the mother's pre-pregnancy weight have a minimal association, and
there is a lot of variation between these two variables.
- It is interesting to note that there is a relationship between the other continuous predictor variables.
Circles in the scatterplot are clustered closely around a straight line. There appear to be strong positive
relationships between maternal height and the mother's pre-pregnancy.

4
Fig 4: Box-Whisker Plots show the distribution of birth weight based on smoking status

Based on the the Fig 4:


- A box plot for non-smokers appears asymmetrical towards the weight of the baby. Also, the boxplot
for smokers towards the birth weight shows a slightly asymmetrical shape. From these boxplots, we
can see a difference in birth weight between smokers and non-smokers. The median birth weight of
mothers who are non-smokers is heavier than mothers who are a smoker.

The above plots (Fig1, Fig2, Fig3, Fig 4) show the relationship between birth weight and other predictor
variables; however, the interpretation from the plots and graph is subjective. To learn more about any
association, we need to derive a statistical test assessing the true relationship between these variables. As a
first step, we can produce some five-number summaries.
Table 1: Summary Statistics
min Q1 median Q3 max mean sd n Missing
data
Variable
Birth weight 1.92 2.94 3.295 3.6475 4.57 3.312857 0.603895 42 0
(Kg)
Gestation 33 38 39.5 41 45 39.19048 2.643336 42 0
(week)
Maternal age 18 20.25 24 29 41 25.54762 5.666342 42 0
(years)
Mother’s height 149 161 164.5 169.5 181 164.4524 6.504041 42 0
Mother’s pre- 45 52.25 57 62 78 57.5 7.198408 42 0
pregnancy
weight (kg)

Table 2: Five number summaries for birth weight stratified by smoking status
Smoking status min Q1 median Q3 max mean sd n Missing
data
Non smoker 2.65 3.1400 3.385 3.9325 4.55 3.509500 0.5184945 20 0
Smoker 1.92 2.7425 3.185 3.5450 4.57 3.134091 0.6312471 22 0

5
Based on the table 1:
- The median and mean for birth weight, gestation, maternal age, mother’s height, and mother’s pre-
pregnancy weight are nearly similar (the difference between mean and median from each variable is
not far), indicating that the data are normally distributed. Also, the sample size is more than 30 (42
sample observations).

Based on the table 2:


- In this interpretation, we use the median for average. The median for non-smokers is heavier than the
median for smokers mothers. This result also confirms the interpretation from box plot in Fig 4.

The matrix scatterplot subjectively revealed some weak and strong associations between the
continuous variables. As can be seen from the plots above (Fig 2), data are normally distributed and the
assumption of linearity is appropriate to estimate correlation correctly. Thus, in order to understand the
strength of these associations, we should use parametric Perason rank correlation to calculate correlation
coefficients.

Fig 5: R output shows the Pearson correlation coefficient


## Birthweight Gestation Mage Mheight Mppwt
## Birthweight 1.00 0.71 0.00 0.36 0.40
## Gestation 0.71 1.00 0.01 0.21 0.26
## Mage 0.00 0.01 1.00 0.06 0.27
## Mheight 0.36 0.21 0.06 1.00 0.68
## Mppwt 0.40 0.26 0.27 0.68 1.00
##
## n= 42
##
##
## P
## Birthweight Gestation Mage Mheight Mppwt
## Birthweight 0.0000 0.9991 0.0181 0.0085
## Gestation 0.0000 0.9460 0.1809 0.1030
## Mage 0.9991 0.9460 0.7060 0.0789
## Mheight 0.0181 0.1809 0.7060 0.0000
## Mppwt 0.0085 0.1030 0.0789 0.0000

Based on the result from the Pearson correlation coefficient in Fig 5:


- The first block from R output shows that all the correlations are positive. The strongest correlation is
between birth weight and gestation (0.71). The weakest correlation is between birth weight and
maternal age (0.0).
- The third block presents the statistical significance and the corresponding p-values.We can see that all
correlations are statistically significant (p-value is less than 0.05) except for the birth weight and
maternal age (p-value=0.9991).

Fig 6: R output shows the linear regression which predict birth weight from observed smoking status
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$smoker)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21409 -0.39159 0.02591 0.41935 1.43591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5095 0.1298 27.040 <2e-16 ***
## bwg1$smokersmoker -0.3754 0.1793 -2.093 0.0427 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5804 on 40 degrees of freedom

6
## Multiple R-squared: 0.09874, Adjusted R-squared: 0.07621
## F-statistic: 4.382 on 1 and 40 DF, p-value: 0.0427

Based on the results linear regression between birth weight and smoking status:
- The R squared for regression of birth weight on smoking status produces an R squared value of
0.09874. Therefore, 9.8% of the variation in birth weight is explained by its relationship with the
smoking status. Also, the p-value (p value=0.0427) shows statistically significant.

Based on the result from the Pearson correlation coefficient in Fig 5, some variables are highly correlated,
which can cause problems in the regression model. The explanatory variable mother's height and mother's
pre-pregnancy weight show a strong correlation (0.68). We must choose which of the two variables to include
in the final model. First, we must choose either mother's height or the mother's pre-pregnancy weight in the
final model.
The procedure is to run a univariate regression on the mother's height and pre-pregnancy weight to
choose between these continous variable to predict the birth weight in final model. Then, the R squared values
for both regressions will be compared. The final model will include the variable with the highest R-squared
based on the result from the linear regression. The corresponding variable within that regression is retained
because it accounts for a larger portion of the variability in the dependent variable. By looking at the R squared
value, we can determine what variance in birth weight would be accounted for if the model had been derived
from the population from which the sample was taken.
Fig 7a: R output shows the linear regression which predict birth weight from observed mother's height
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$mheight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.31503 -0.32047 0.02239 0.35715 1.31981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23073 2.25127 -0.991 0.3277
## bwg1$mheight 0.03371 0.01368 2.464 0.0181 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5697 on 40 degrees of freedom
## Multiple R-squared: 0.1318, Adjusted R-squared: 0.1101
## F-statistic: 6.073 on 1 and 40 DF, p-value: 0.01812

Based on the figure 7a:


- The R squared for regression of birth weight on mother's height produces an R squared value of 0.13.
Therefore, 13% of the variation in birth weight is explained by its relationship with the mother’s
height. Also, the p-value shows that there is a significant evidence that birth weight is associated with
the mother's height (p value=0.01).

Fig 7b: R output shows the linear regression which predict birth weight from observed mother's pre-
pregnancy weight
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.24604 -0.41515 -0.03875 0.38833 1.25396
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)

7
## (Intercept) 1.37905 0.70407 1.959 0.05715 .
## bwg1$mppwt 0.03363 0.01215 2.768 0.00851 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5601 on 40 degrees of freedom
## Multiple R-squared: 0.1607, Adjusted R-squared: 0.1397
## F-statistic: 7.659 on 1 and 40 DF, p-value: 0.008513

Based on the figure 7b:


- The R squared for regression of birth weight on mother's pre-pregnancy weight produces an R squared
value of 0.16. Therefore, 16% of the variation in birth weight is explained by its relationship with
mother's pre-pregnancy weight. Also, the p-value shows that there is significant evidence that birth
weight is associated with mother's pre-pregnancy weight (p value=0.008).

From the univariate linear regression in Fig 7a and 7b, we can conclude that the R squared value from the
regression of birth weight on the mother's pre-pregnancy weight is greater than the value from the birth
weight on the mother's height. Also, from the matrix scatter plot and correlation coefficients, birth weight and
mother's height show much less variation and a non-significant association than birth weight and mother's
pre-pregnancy weight. These results indicate that we should exclude the mother's height from the final model
for multiple linear regression.

8
QUESTION 1.b
Gestation, maternal age, smoking status, and mother's pre-pregnancy weight are all factors that we
can consider to predict birth weight. Then we can set up the null hypothesis for the model regression. We can
set up the null hypotheses (H0) and alternative hypotheses (Ha) for the linear regression. Then, we run the first
model to choose the variable in final model.

H0: The initial assumption is that there is no relation.


Ha: At least one of the independent variables is helpful in explaining/predicting the birth weight

Fig 8: R output shows the multiple linear regression


##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$Gestation + +bwg1$smoker +
## bwg1$mage + bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58200 -0.29720 -0.03732 0.30943 0.89729
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.233673 0.975667 -3.314 0.00206 **
## bwg1$Gestation 0.141978 0.024131 5.884 9.02e-07 ***
## bwg1$smokersmoker -0.299665 0.124850 -2.400 0.02153 *
## bwg1$mage -0.002268 0.011549 -0.196 0.84542
## bwg1$mppwt 0.020822 0.009186 2.267 0.02934 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3926 on 37 degrees of freedom
## Multiple R-squared: 0.6185, Adjusted R-squared: 0.5773
## F-statistic: 15 on 4 and 37 DF, p-value: 2.253e-07

## 2.5 % 97.5 %
## (Intercept) -5.210562099 -1.25678465
## bwg1$Gestation 0.093084104 0.19087102
## bwg1$smokersmoker -0.552635913 -0.04669383
## bwg1$mage -0.025668375 0.02113336
## bwg1$mppwt 0.002209861 0.03943393

Based on Fig 8:
- The regression output indicates that the gestation, smoking status, and mother's pre-pregnancy
weight are significant variables in this model. These variables are statistically significant as they have
p values of 0.001, 0.02, and 0.03, respectively (all p values is less than 0.05), implying that gestation,
smoking status, and mother's pre-pregnancy weight are strong predictors of birth weight. On the
other hand, maternal age is is excluded since the p-value is greater than 0.05 (p values = 0.84).
- Thus, the final model uses gestation, smoking status, and the mother's pre-pregnancy weight to
predict the birth weight.

9
Fig 9: The R output shows the multiple linear regression for the final model
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$Gestation + bwg1$smoker +
## bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60383 -0.28810 -0.05007 0.31103 0.89116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.267614 0.948005 -3.447 0.0014 **
## bwg1$Gestation 0.142182 0.023801 5.974 6.18e-07 ***
## bwg1$smokersmoker -0.304964 0.120346 -2.534 0.0155 *
## bwg1$mppwt 0.020313 0.008701 2.335 0.0249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3876 on 38 degrees of freedom
## Multiple R-squared: 0.6181, Adjusted R-squared: 0.5879
## F-statistic: 20.5 on 3 and 38 DF, p-value: 4.565e-08

## 2.5 % 97.5 %
## (Intercept) -5.186748818 -1.34847869
## bwg1$Gestation 0.093999411 0.19036543
## bwg1$smokersmoker -0.548591906 -0.06133642
## bwg1$mppwt 0.002699597 0.03792709

Based on the Fig 9:


- The adjusted R squared value of 0.59 shows that the regression model explains 59% of the variation in
birth weight using gestation, smoking status, and the mother's pre-pregnancy weight.
- Multiple linear regression is conducted to investigate the relationship between gestational age at birth
(weeks), the mother's pre-pregnancy weight, and whether the mother smokes and birth weight. There
is a significant relationship between gestation and birth weight (p-value< 0.001), smoking and birth
weight (p-value = 0.016), and pre-pregnancy weight and birth weight (p value = 0.02).
- From the final model, we can predict the future birth weight using the final model where gestation,
smoking status, and mother's pre-pregnancy weight are predictor variable to predict the birth weight.
- This model:
birth weight = (-3.267614) + (0.142182 × gestational age) – (0.304964 x smoking status) + (0.020313
x mother's pre-pregnancy weight)

10
QUESTION 1.c

Based on the final model,


- This equation can be used to predict the weight of the baby.
-
The equation for the model is:
birth weight = (-3.267614) + (0.142182 × gestational age) – (0.304964 x smoking status) + (0.020313
x mother's pre-pregnancy weight)
where the indicator variable smoking status takes a value 1 in smokers and 0 in the non-smoker

- The intercept of the population regression line is estimated as -3.27Kg (average value of birth weight
when gestation, smoking status, and the mother's pre-pregnancy weight are all equal to 0).
- The slope of the population regression line for gestation is estimated as 0.14 (birth weight increases
by 0.14 Kg, on average, for every one-week increase in gestational age) after adjusting for smoking
status and the mother's pre-pregnancy weight.
- The slope of the population regression line for mother's pre-pregnancy weight is estimated birth
weight increases by 0.02 Kg, on average, for every 1 Kg increase in mother's pre-pregnancy weight
after adjusting for gestation and smoking status.
- The slope of the population regression line for a smoker is estimated birth weight decreases by 0.30
Kg, on average, than non-smokers after adjusting for gestation and the mother's pre-pregnancy
weight.
- The p-values are all < 0.001, so we can reject H0 and conclude that there is significant evidence that
birth weight is associated with gestation, smoking status, and the mother's pre-pregnancy weight.
- We are 95% confident that the increase in birth weight in the population could be as little as 0.09 Kg
or as much as 0.19 Kg to every one-week rise in gestational age after adjustment for smoking status
and the mother's pre-pregnancy weight. Our best estimate for the increase is 0.14 Kg.
- We are 95% confident that the increase in birth weight in the population could be as little as 0.003 Kg
or as much as 0.04 Kg for every one-kilogram rise in a mother's pre-pregnancy weight after
adjustment for smoking status and gestation. Our best estimate for the increase is 0.02 Kg.
- We are 95% confident that the decrease in birth weight in the population could be as little as 0.061 Kg
or as much as 0.54 Kg for mothers who are a smoker after adjustment for gestation and mothers' pre-
pregnancy weight. Our best estimate for the decrease is 0.30 Kg.

11
QUESTION 1.d

After identifying the best model, the final model must be checked to ensure that it meets the
requirements of the linear regression assumption. To draw conculusion about a population based on a
regression analysis, several assumption must be true.

- All predictor variables need to be quantitative or categorical (with two categories), and all outcome
variables need to be quantitative, continuous, and unbounded. In the final model, all predictor
variables are continuous variables (gestation and mother's pre-pregnancy weight) and categorical
variables (smoking status). The outcome variable is continuous (birth weight).
- The predictors should have some variation in value (there was non-zero variance).
- There should be no perfect linear relationship between two or more predictors. The predictor
variable should not correlate too high. From the Pearson correlation coefficient results, the predictor
variable between mother's height and mother's pre-pregnancy weight are highly correlated (0.68).
After running the univariate linear regression for the mother's height and mother's pre-pregnancy
weight, R squared value shows that the variable mother's pre-pregnancy weight is greater than the
mother's height. Therefore, the mother's height is excluded from the final model.
- The mean values of the outcome variable for the predictor variables in final model lie along the
straigth line. Scattered plots can be used visually to see whether predictor and outcome variables are
linearly related. Thus, the assumption of linearity has been met based on Fig 10 below.

Fig 10: Scatterplots show the relationship between the baby's birth weight and the predictor variables

12
Checking the assumption about residuals

Fig 11: Checking assumption about the residuals

Based on the Fig 11:


- The plots in Fig 11 indicate that the model satisfies the linear regression assumption. The residuals
versus fitted values plot does not have a funnel or a curve shape. This plot clearly shows a situation
where the linearity and homoscedasticity assumptions have been satisfied. The Q-Q plot shows a
deviation from the diagonal line slightly. Nothing out of the ordinary occurs in Q-Q plot. According to
the scale location plot, the red line is horizontal, and the spread of standardized residuals around the
red line is constant. This plot satisfies the linear regression assumption. The plot from Residuals vs
Lavarage shows all cases are well inside Cook’s distance lines (a red dashed line).

Fig 11: Histogram residuals

Based on the Fig 11:


- The histogram of residuals in Fig 11 shows a bell-shaped curve, indicating that the assumption of
residual normality has been satisfied. We can conclude that the final model satisfied the assumption
of linear regression.

13
The data meet the homogeneity and linearity assumptions, and the residuals approximately fit the normal
distribution. We can predict the future birth weight using the final model using the mother's related variable:
gestation, smoking status, and mother's pre-pregnancy weight. However, the modeling needs to be improved.
There are several suggestions to improve the linear regression modeling.
- Increasing the sample size may increase the precision of parameter estimates.
- Including the pre-pregnancy maternal BMI or BMI first semester as a predictor variable. BMI can
reflect the maternal nutritional status, which is critical to assuring a constant supply of nutrients to
developing fetuses.
- we can improve the regression modeling depends on the study design that can be used to predict
birth weight. Perhaps there is a third (mediating) variable at work. A randomized control trial can
establish only cause and effect. Therefore.

14
QUESTION 2.a
This study aimed to predict low birth weight based on the mother's characteristics. The outcome
variable is whether the newborn baby will be of low birth weight (<2500 g) or not (>=2500 g). The predictor
variables are age, mother's weight, smoking status, previous premature, hypertension, and a total of GP visits
during the first trimester of pregnancy. Firstly, we need to plot the data and graph before applying the
statistical model.

Fig 12: The box plot of age of mother in years by low birth weight

Based on the Fig 12:


- Subjectively, the boxplots for low birth weight (<2500 g) and birth weight (>=2500 g) toward the age
of the respondent show an asymmetrical shape. However, the median for low birth weight (<2500 g)
and birth weight (>=2500 g) are similar. Thus, the five summary statistics below can confirm the
results from the box plot.

Table 3: Five summary statistics: age of mother in years by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 14 19 23 28 45 23.82911 5.799399 158 0
<2500 g 14 20 23 25 34 22.83117 4.569356 77 0

Based on table 3:
- The summary statistics show that the median and minimum values of the age of the mother are
similar in terms of low birth weight (<2500 g) and birth weight (>=2500 g). Moreover, the minimum
value, Q1 and Q3, maximum value, and mean value are higher in birth weight (>=2500 g) than in low
birth weight (<2500 g). These results also confirm the interpretation from the box plot.

Fig 13: the cross tabulation of the categorical variables smoking status and newborn baby's birth weight
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===================================================
## lbw$smoke
## lbw$bwcat nonsmoker smoker Total
## ---------------------------------------------------
## birthweight >= 2500g 110 48 158
## 73.8% 55.8%
## ---------------------------------------------------
## birthweight<2500g 39 38 77
## 26.2% 44.2%
## ---------------------------------------------------
## Total 149 86 235
## 63.4% 36.6%
## ===================================================

15
Based on the Fig 13:
- The column percentage for non-smoker mother show that 73.8% has newborn baby birth weight >
2500 g while 26.2% has birth weight <2500 g.
- The column percentage for smoker mother show that 55.8% has newborn baby birth weight > 2500 g
while 44.2% has birth weight <2500 g.

Fig 14: The box plot of mother with history of previous premature babies by low birth weight

Based on the Fig 14:


- Subjectively, the boxplot for the number of having previous premature babies in terms of low birth
weight (<2500 g) and birth weight (>=2500 g) shows an asymmetrical shape. However, the median for
low birth weight (<2500 g) and birth weight (>=2500 g) are similar. Thus, the five summary statistics
below can confirm the results from the box plot.

Table 4: Five summary statistics: history of mother with number of previous premature babies by low birth
weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 0 0 0 0 3 0.1202532 0.4276673 158 0
<2500 g 0 0 0 1 2 0.3376623 0.5527536 77 0

Based on table 4:
- The summary statistics show that the median and minimum values of the number of having previous
premature babies are similar in terms of low birth weight (<2500 g) and birth weight (>=2500 g).
Moreover, the maximum value of a mother having previous premature babies is higher in birth weight
(>=2500 g). However, the mean value is higher in low birth weight (<2500 g). These results also
confirm the interpretation from the box plot.

Fig 15: the box plot of the number of GP visits by low birth weight

16
Based on the Fig 15:
- The box plots representing the number of visits to GP during the first trimester concerning low birth
weight (2500 g) and birth weight (>2500 g) show asymmetrical distributions to an extent.
Furthermore, the median for birth weight (>=2500 g) is higher than low birth weight (<2500 g) .
However, the maximum for low birth weight (<2500 g) and birth weight (>=2500 g) are similar. Thus,
the five summary statistics below can confirm the results from the box plot.

Table 5: Five summary statistics: the number of visits to GP during first trimester by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 0 0 1 1 6 0.8607595 1.0495274 158 0
<2500 g 0 0 0 1 4 0.6623377 0.9815454 77 0

Based on table 5:
- The summary statistics show that the median value for birth weight (>=2500 g) is higher than low
birth weight (<2500 g). Moreover, the maximum and mean values of the number of visits to GP are
higher in birth weight (>=2500 g). These results also confirm the interpretation from the box plot.

Fig 16: The box plot of mother’s weight at last menstrual period by low birth weight

Based on the Fig 16:


- The box plot has an asymmetrical shape for birth weight (>=2500 g) to the mother's weight at the last
menstrual period. However, the low birth weight (<2500 g) shows a slightly asymmetrical form. From
the box plot, the median for birth weight (>=2500 g) is higher than low birth weight (<2500 g). Thus,
the five summary statistics below can confirm the results from the box plot.

Table 6: Five summary statistics: mother’s weight at last menstrual period by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 38.55540 52.16319 56.01872 67.69874 113.39822 60.74413 14.29398 158 0
<2500 g 36.28743 47.62725 54.43115 59.87426 90.71858 56.22196 12.76118 77 0

Based on table 6:
- The summary statistics show that the median value for birth weight (>=2500 g) is higher than low
birth weight (<2500 g). Moreover, the maximum, minimum, Q1, Q3 and mean values of mother’s
weight at last menstrual period are higher in birth weight (>=2500 g). These results also confirm the
interpretation from the box plot.

17
Fig 17: the cross-tabulation of the categorical variables hypertension history and newborn baby's birth
weight
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===============================================================================
## lbw$hypercat
## lbw$bwcat hypertension history no hypertension history Total
## -------------------------------------------------------------------------------
## birthweight >= 2500g 6 152 158
## 35.3% 69.7%
## -------------------------------------------------------------------------------
## birthweight<2500g 11 66 77
## 64.7% 30.3%
## -------------------------------------------------------------------------------
## Total 17 218 235
## 7.2% 92.8%
## ===============================================================================

Based on the Fig 17:


- The column percentage for mothers with hypertension history shows that 35.3% have newborn babies
with birth weight > 2500 g while 64.7% have birth weight <2500 g.
- The column percentage for mothers with no hypertension history shows that 69.7% have a newborn
baby birth weight > 2500 g while 30.3% have a birth weight <2500 g.

18
QUESTION 2.b

1. Building the logistic regression model from each of the predictor variables

Fig 18: R output shows the results of logistic regression to low birth weight on the age of mothers
##
## Call:
## glm(formula = lbw$low ~ lbw$age, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0161 -0.9213 -0.8320 1.4416 1.6628
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.09915 0.63136 0.157 0.875
## lbw$age -0.03508 0.02662 -1.318 0.188
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 295.49 on 233 degrees of freedom
## AIC: 299.49
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.13829618 1.33658818
## lbw$age -0.08725216 0.01709816

## OR 2.5 % 97.5 %
## (Intercept) 1.1042275 0.3203644 3.806036
## lbw$age 0.9655311 0.9164460 1.017245

Based on the Fig 18:

- The odds of low birth weight are multiplied by 0.97. It means the odds of low birth weight are reduced
by 3% for every additional year of age.
- However, a 95% Confidence Interval from 0.92 to 1.01 means an overlap of 1.0. Also, the p-value is 0.2
(p-value greater than 0.05). These results indicate that age is not included in the final logistic
regression model.

Fig 18a: The R output shows the AUCs in low birth weight based on the mother's age
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred1, ci = TRUE)
##
## Data: lbw1$lowpred1 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5298
## 95% CI: 0.4536-0.6059 (DeLong)

Fig 18b: ROC Curve in low birth weight based on the mother’s age

19
Based on Fig 18a and 18b:
- The AUC value is 0.5298 (95% CI: 0.4536-0.6059), so we can conclude that the mother’s age in the
model is a very poor predictive ability for low birth weight.

Fig 19: R output shows the logistic regression results of the newborn baby's birth weight on the mother's
weight at the last menstrual period.
##
## Call:
## glm(formula = lbw$low ~ lbw$lweight, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0944 -0.9291 -0.8124 1.3546 1.8768
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.81738 0.67293 1.215 0.2245
## lbw$lweight -0.02634 0.01148 -2.296 0.0217 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 291.40 on 233 degrees of freedom
## AIC: 295.4
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.50155125 2.136302228
## lbw$lweight -0.04883783 -0.003851365

## OR 2.5 % 97.5 %
## (Intercept) 2.2645487 0.6055905 8.468067
## lbw$lweight 0.9739994 0.9523356 0.996156

Based on the Fig 19:


- The odds of low birth weight are multiplied by 0.97. The odds of low birth weight are reduced by 3%
for every additional mother’s weight in kilogram at the last menstrual period.
- The 95% Confidence Interval from 0.95 to 1 means an overlap of 1.0. However, the p-value is 0.02 (p-
value is less than 0.05). From the p-value, the mother’s weight at the last menstrual period is
statistically significant. Thus, the odds ratio is not quite extreme, but it is still significant.
- The logistic regression assumes that the relationship between the mother's weight at the last period of
menstrual on the probability of having a low birth weight is linear.

Figure 19a: The R output shows the AUCs in low birth weight based on the mother's weight at the last
menstrual period.

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred2, ci = TRUE)
##
## Data: lbw1$lowpred2 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.6011
## 95% CI: 0.5211-0.6811 (DeLong)

20
Figure 19b: ROC Curve in low birth weight based on the mother's weight at the last period of menstrual.

Based on Fig 19a and 19b:


- The AUC value is 0.6011 (95% CI: 0.5211-0.6811), so we can conclude that the mother's weight at the
last period of menstrual in this model is a poor predictive ability for low birth weight.

According to Fig. 20, the relationship between the mother's weight during her last menstruation period
and the likelihood of having a low birth weight are linear. This model needs to check its linearity relationship
by calculating the regression squared term for the continuous variable mother's weight, predicting the
probabilities, and plotting the probabilities against the mother's weight. The R outputs are below.

Fig 19c: R output shows the squared term in regression for the continuous variable mother's weight
##
## Call:
## glm(formula = lbw$low ~ lbw$lweight + lbw$lweight2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2126 -0.9103 -0.7695 1.3182 1.7976
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.9940510 2.2135834 1.353 0.176
## lbw$lweight -0.0955843 0.0676572 -1.413 0.158
## lbw$lweight2 0.0005204 0.0004950 1.051 0.293
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 290.37 on 232 degrees of freedom
## AIC: 296.37
##
## Number of Fisher Scoring iterations: 4

## OR 2.5 % 97.5 %
## (Intercept) 19.9664026 0.2606719 1529.344725
## lbw$lweight 0.9088418 0.7959731 1.037715
## lbw$lweight2 1.0005206 0.9995503 1.001492

Based on the Fig 19c:


- The squared term of the mother's weight has a P-value of 0.293 (p-value > 0.05), indicating that it is
not statistically significant. As a result, the squared term of the mother's weight will not be included in
the final model. The final model will include the variable of the mother's weight.

21
To better understand the relationship between the low birth weight and the mother's weight, calculate
the probability of low birth weight and plot it against the mother's weight.

Fig 19d: the plot for the probabilities against the mother's weight.

Based on the Fig 19d:


- The highest point of having a low birth weight based on the mother's weight at the last menstrual
period is around 40 Kg. The probability decreases until the mother's weight is approximately 90 Kg.
However, starting at this point, the likelihood of having low birth weight tends to increase.

Fig 19e: the plot for predicting the probabilities of low birth weight against the mother's weight using the
cubed term.

Based on the Fig 19e:


- The probability of having a low birth weight peaks when the mother's weight during the last menstrual
period is 40 kg, then declines until the mother weighs approximately 90 kg—however, the chances of
having a low birth weight baby begin to rise at this point.

Figures d and e do not seem to differ much from each other. However, there is a slight difference after
adding the cubed term to predict the probabilities. The R output for regression on cubed term of mother’s
weight below.

Fig 19f: The R output shows the cubed term in regression for the continuous variable mother's weight
##
## Call:
## glm(formula = LBW ~ +I(MLWEIGHT^2) + I(MLWEIGHT^3), family = binomial,
## data = lbwcomp)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1220 -0.9245 -0.7924 1.3414 1.8401
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)

22
## (Intercept) 5.707e-01 7.967e-01 0.716 0.474
## I(MLWEIGHT^2) -6.367e-04 5.188e-04 -1.227 0.220
## I(MLWEIGHT^3) 4.258e-06 4.726e-06 0.901 0.368
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 291.49 on 232 degrees of freedom
## AIC: 297.49
##
## Number of Fisher Scoring iterations: 4

Based on the Fig 19f:


- The p-value is greater than 0.05 (p –value=0.368) for the cubed term for the mother's weight. The
cubed term for the mother's weight is not statistically significant. Also, this result confirms that the
plot for predicting the probabilities of low birth weight against the mother's weight using the cubed
term is not much different from the probabilities of low birth weight against the mother's weight.

23
Fig 20: R output shows the logistic regression results of low birth weight on the number of previous
premature babies.
##
## Call:
## glm(formula = lbw$low ~ lbw$prev.prem, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9699 -0.8237 -0.8237 1.1814 1.5785
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9066 0.1551 -5.844 5.11e-09 ***
## lbw$prev.prem 0.8972 0.2962 3.029 0.00245 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 287.28 on 233 degrees of freedom
## AIC: 291.28
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.2106966 -0.6025278
## lbw$prev.prem 0.3167052 1.4777617

## OR 2.5 % 97.5 %
## (Intercept) 0.4038902 0.2979896 0.5474261
## lbw$prev.prem 2.4528080 1.3725979 4.3831241

Based on the Fig 20:


- The odds of low birth weight are multiplied by 2.5 for every additional number of having a history of
previous premature babies.
- Moreover, we can be 95% confident that the rising in odds are between 1.4 to 4.3, with the best
estimate being 2.5. Also, the p-value is 0.002 (p-value < 0.05). Thus, we can conclude that the variable
for having a history of previous premature babies will include in the final model.
Fig 20a: The R output shows the AUCs in low birth weight based on the number of having a history of
previous premature babies.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred3, ci = TRUE)
##
## Data: lbw1$lowpred3 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.6029
## 95% CI: 0.5469-0.6588 (DeLong)

Fig 20b: ROC Curve in low birth weight based on the number of having a history of previous premature
babies.

24
Based on Fig 20a and 20b:
- The AUC value is 0.6029 (95% CI: 0.5469-0.6588), so we can conclude that the number of having a
history of previous premature babies is a poor predictive ability for low birth weight.

Fig 21: R output shows the logistic regression results of newborn baby's birth weight on mother smoked
during pregnancy.
##
## Call:
## glm(formula = lbw$low ~ lbw$smoke, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0799 -0.7791 -0.7791 1.2781 1.6373
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0369 0.1864 -5.564 2.64e-08 ***
## lbw$smokesmoker 0.8033 0.2861 2.807 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 289.37 on 233 degrees of freedom
## AIC: 293.37
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.402187 -0.6716502
## lbw$smokesmoker 0.242463 1.3641447

## OR 2.5 % 97.5 %
## (Intercept) 0.3545455 0.2460582 0.5108648
## lbw$smokesmoker 2.2329060 1.2743841 3.9123755

Based on the Fig 21:


- The odds ratio is estimated as 2.23, suggesting that smokers (coded as 1) are at greater risk of low
birth weight.
- Moreover, we can be 95% confident that the rising in odds are between 1.27 to 3.9, with the best
estimate being 2.23. Also, the p-value is 0.005 (p-value < 0.05). Thus, we can conclude that the
variable for smoking status in the final model.

Fig 21a: The R output shows the AUCs in low birth weight based on the mother's smoking status during
pregnancy.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred4, ci = TRUE)
##
## Data: lbw1$lowpred4 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5949
## 95% CI: 0.5281-0.6616 (DeLong)

Fig 21b: ROC Curve in low birth weight based on the mother's smoking status during pregnancy.

25
Based on Fig 21a and 21b:
- The AUC value is 0.5949 (95% CI: 0.5281-0.6616), so we can conclude that the number of having a
history of previous premature babies is a poor predictive ability for low birth weight.

Fig 22: R output shows the results of logistic regression of newborn baby's birth weight on mother has a
history of hypertension.
##
## Call:
## glm(formula = lbw$low ~ lbw$hyper.tension, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4432 -0.8492 -0.8492 1.5459 1.5459
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8342 0.1474 -5.659 1.52e-08 ***
## lbw$hyper.tension 1.4404 0.5285 2.725 0.00642 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 289.42 on 233 degrees of freedom
## AIC: 293.42
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.1231490 -0.5453026
## lbw$hyper.tension 0.4045319 2.4761913

## OR 2.5 % 97.5 %
## (Intercept) 0.4342105 0.325254 0.5796663
## lbw$hyper.tension 4.2222222 1.498601 11.8958696

Based on the Fig 22:


- The odds ratio is estimated as 4.2, suggesting that those with a history of hypertension (coded as 1)
are at greater risk of low birth weight.
- Moreover, we can be 95% confident that the rising odds are 1.5 to 11.9, with the best estimate being
2.23. The interval is wide.
- The p-value is 0.006 (p-value < 0.05). Thus, we can conclude that the variable for mother has a history
of hypertension will be include in the final model.

Fig 22a: The R output shows the AUCs in low birth weight based on the mother's history of hypertension.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred5, ci = TRUE)
##
## Data: lbw1$lowpred5 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5524
## 95% CI: 0.5104-0.5945 (DeLong)

Fig 22b: ROC Curve in low birth weight based on the mother's history of hypertension.

26
Based on Fig 22a and 22b:
- The AUC value is 0.5524 (95% CI: 0.5104-0.5945), so we can conclude that the history of hypertension
is a very poor predictive ability for low birth weight.

Fig 23: R output shows the logistic regression results of the newborn baby's birth weight on the mother's
number of GP visits during the 1st trimester.
##
## Call:
## glm(formula = lbw$low ~ lbw$gp.visits, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9481 -0.9481 -0.8732 1.4255 1.7875
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5666 0.1743 -3.251 0.00115 **
## lbw$gp.visits -0.2012 0.1459 -1.379 0.16779
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 295.26 on 233 degrees of freedom
## AIC: 299.26
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.9081690 -0.22498644
## lbw$gp.visits -0.4870886 0.08469505

## OR 2.5 % 97.5 %
## (Intercept) 0.5674641 0.4032619 0.7985271
## lbw$gp.visits 0.8177515 0.6144126 1.0883851

Based on the Fig 23:


- The odds of low birth weight are multiplied by 0.8 that means the odds of low birth weight are
reduced by 20% for every additional number of GP visits during 1st trimester.
- Moreover, we can be 95% confident that the reduction odds are 0.6 to 1.09, with the best estimate
being 0.8.
- The p-value is 0.16 (p-value > 0.05). Thus, we can conclude that the variable for number of GP visits
during the 1st trimester is not included in the final model.

Fig 23a: The R output shows the AUCs in low birth weight based on the number of GP visits during the 1st
trimester.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred6, ci = TRUE)
##
## Data: lbw1$lowpred6 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5629
## 95% CI: 0.4907-0.6351 (DeLong)

Fig 23b: ROC Curve in low birth weight based on the number of GP visits during the 1st trimester.

27
Based on Fig 23a and 23b:
- The AUC value is 0.5629 (95% CI: 0.4907-0.6351), so we can conclude that the number of GP visits
during the 1st trimester is very poor predictive ability for low birth weight.

As the results from logistic regression for each predictor variable to predict the low birth weight, the
predictor variables for the final model are the mother's weight at the last menstrual period, smoking status,
the number of having previous preterm babies, and having a history of hypertension. The number of GP visits
at 1st trimester is excluded since the p-value is greater than 0.05. Moreover, the result from AUC also confirm
that the number of GP visits is very poor predictive ability for low birth weight.

2. Logistic Regression Model for Low Birth Weight

Fig 24: R output shows the final logistic regression results of the independent variables to predict low birth
weight.
##
## Call:
## glm(formula = lbw$low ~ +lbw$lweight + lbw$prev.prem + lbw$smoke +
## lbw$hyper.tension, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0742 -0.7933 -0.6845 1.0492 2.2463
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.77094 0.75012 1.028 0.304062
## lbw$lweight -0.03539 0.01278 -2.769 0.005623 **
## lbw$prev.prem 0.71046 0.31414 2.262 0.023723 *
## lbw$smokesmoker 0.65009 0.30609 2.124 0.033679 *
## lbw$hyper.tension 2.07203 0.61500 3.369 0.000754 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 265.36 on 230 degrees of freedom
## AIC: 275.36
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.69926096 2.24114566
## lbw$lweight -0.06043663 -0.01033931
## lbw$prev.prem 0.09475336 1.32615793
## lbw$smokesmoker 0.05017821 1.25001109
## lbw$hyper.tension 0.86666366 3.27740361

## OR 2.5 % 97.5 %
## (Intercept) 2.1618025 0.4969524 9.404099
## lbw$lweight 0.9652309 0.9413534 0.989714
## lbw$prev.prem 2.0349182 1.0993877 3.766544
## lbw$smokesmoker 1.9157221 1.0514585 3.490382
## lbw$hyper.tension 7.9409557 2.3789606 26.506861

28
Based on the Fig 24:
- The logistic regression output indicates that the mother's weight during the last menstrual period,
smoking status, the number of having previous preterm babies, and having a history of hypertension
are strong predictors in this model.
- Mother's weight (p value= 0.005), smoking status (smoker) (p value= 0.03), the number of previous
premature babies (p value= 0.02), and having history of hypertension (p value <0.001) are significant.

The results from the final model imply that the mother's weight during the last menstrual period, smoking
status, the number of having previous preterm babies and having a history of hypertension are variable that
can be consider predictors of low birth weight. These variables are included after excluding the number of GP
visits variable. Also, from the AUC value, the number of having previous preterm predicts better, followed by
the mother's weight, smoking status, and history of hypertension. Therefore, we can also expect a low birth
weight from the final model using this equation below..
log-odds(low birth weight) = 2.16 + 0.97 x the mother's weight during the last menstrual period +
2.03 x the number of having previous preterm babies + 1.92 x smoking status + 7.94 x having a history
of hypertension

where the indicator variable smoking status takes a value 1 in smokers and 0 in the non-smoker

QUESTION 2.C:

- Odds ratio for mother's weight during the last menstrual period = 0.97, 95% CI: 0.94 to 0.99.

The odds of low birth weight are multiplied by 0.97, which means the odds of low birth weight are
reduced by 3% by an additional kilogram of the mother's weight. The 95% confidence interval shows
that the reduction in odds lies between 6% and 10% for each kilogram of a mother's weight. The 95%
CI from 0.94 to 0.99 is not overlapped 1.0, and the p-value is 0.005 indicating that the mother's
weight during the last menstrual period is a helpful predictor of low birth weight in addition to
smoking status, the number of previous preterm babies and having a history of hypertension.

- The odds ratio for having previous preterm babies = 2.03, 95% CI: 1.1 to 3.8.

The odds of low birth weight are multiplied by 2.03, which means that the number of having previous
preterm babies is the greater risk of low birth weight by about 2.03 times. The 95% confidence
interval shows that the odds ratio in the number of having previous preterm babies lies between 1.1
to 3.8 and hence does not include unity. The p-value is less than 0.05 (p value= 0.02), indicating that
the number of having previous preterm babies is a helpful predictor of low birth weight in addition to
the mother's weight during the last menstrual period, smoking status, and having a history of
hypertension.

- The odds ratio for smokers = 1.92, 95% CI: 1.05 to 3.4.

The odds of low birth weight are multiplied by 1.92, which means that a smoker mother is at a greater
risk of low birth weight by about 1.92 times. The 95% confidence interval for the odds ratio in
smokers is between 1.05 to 3.4 and hence does not include unity. The p-value is less than 0.05 (p
value= 0.03), indicating that smoking status is a helpful predictor of low birth weight in addition to the
mother's weight, the number of previous preterm babies, and having a history of hypertension.

- The odds ratio for having a history of hypertension = 7.94, 95% CI: 2.4 to 26.5.

The odds of low birth weight are multiplied by 7.94, which means that having a history of
hypertension is at a greater risk of low birth weight by about 7.94 times. The 95% confidence interval
for the odds ratio in the number of having previous preterm babies is 2.4 to 26.5 and hence does not
include unity. The p-value is less than 0.05 (p value= 0.008), indicating that having a history of

29
hypertension is a helpful predictor of low birth weight in addition to the mother's weight during the
last menstrual period, the number of previous preterm babies, and smoking status.

However, significant variables do not necessarily mean that a model is good. The goodness of fit test
can help to determine this.

Fig 25: R output shows the results of Hosmer-Lemeshow Statistic (H-L)


> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=8)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 9.8381, df = 6, p-value = 0.1316

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=10)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 10.456, df = 8, p-value = 0.2345

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=12)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 14.742, df = 10, p-value = 0.1418

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=14)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 17.993, df = 12, p-value = 0.1159

Based on the Fig 25:


- The results from The Hosmer-Lemeshow Statistic indicate that the model is a good fit since all p-value
are greater than 0.05. We can conclude that the final model reasonably predicts the future low birth
weight.

Fig 26: R output shows Area under the Receiver Operator Characteristic (ROC)Curve
##
## Call:
## roc.default(response = lbwcomp2$LBW, predictor = lbwcomp2$lbwpred2, ci = TRUE)
##
## Data: lbwcomp2$lbwpred2 in 158 controls (lbwcomp2$LBW 0) < 77 cases (lbwcomp2$LBW 1).
## Area under the curve: 0.7212
## 95% CI: 0.6485-0.7938 (DeLong)

30
Based on the Fig 26:
- The AUC value for the final model is 0.7212 (95%CI:0.6485-0.7938). It can be concluded that the final
model is a good predictive ability for low birth weight. This finding also confirms the results from
Hosmer-Lemeshow Statistic.

QUESTION 2.d
Likelihood ratio test is to explore the interaction effects within the regression model
1. Test for interaction between smoking status and age

Fig 27a: R output shows the likelihood ratio test for smoking status and age
## Likelihood ratio test
##
## Model 1: lbw$low ~ lbw$age + lbw$smoke + lbw$age:lbw$smoke
## Model 2: lbw$low ~ lbw$age + lbw$smoke
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -142.87
## 2 3 -143.96 -1 2.1799 0.1398

Based on the Fig 27a:

- The interaction terms in the model, as a group, are not statistically significant since the p-value equals
0.1398, which is greater than 0.05. Therefore, the interaction between smoking and age does not need
to do any stratified subgroup analyses.

2. Test for interaction between hypertension and age

Fig 27b: R output shows the likelihood ratio test for hypertension and age
## Likelihood ratio test
##
## Model 1: lbw$low ~ lbw$age + lbw$hyper.tension + lbw$age:lbw$hyper.tension
## Model 2: lbw$low ~ lbw$age + lbw$hyper.tension
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -142.37
## 2 3 -143.72 -1 2.7005 0.1003

Based on the Fig 27b:

- The interaction terms in the model, as a group, are not statistically significant since the p-value equals
0.1003, which is greater than 0.05. Thus, the interaction between hypertension and age does not need
to do any stratified subgroup analyses.

31

You might also like