0% found this document useful (0 votes)

29 views

Further Statistics

This document summarizes the statistical analysis of factors that predict baby's birth weight. The analysis included exploring relationships between birth weight and predictor variables like gestational age, mother's height and weight, and smoking status through histograms, scatterplots, and boxplots. Pearson correlation coefficients showed gestational age had the strongest positive correlation with birth weight. Linear regression found smoking status was a statistically significant predictor of lower birth weight, with smokers having babies weighing on average 0.375 kg less than non-smokers, after controlling for other variables.

Uploaded by

Alka Alka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Further Statistics

Uploaded by

Alka Alka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

MASTER OF PUBLIC HEALTH

COURSEWORK ASSESSMENT SUBMISSION

Course Name Further Epidemiology and Statistics

Student id no. 2711522

1
QUESTION 1.a
This study aimed to investigate whether the birth weight can be predicted by gestational age, (in weeks) and other variables relating to the mother (mothers'
height and weight and whether or not the mother smokes). Length of the baby and head circumference are excluded because these variables are not related to the
mother's variables. Firstly, before doing the statistical analysis, we should inspect the variables' using histograms for assessing the skewness and the presence of outliers.
Then, we need to plot the data using a scatterplot to explore a possible relationship between the baby's birth weight and the predictor variables (continuous variables)
before performing the statistical test. The results for the graphs and plots are presented below.

Fig 1: The histogram for the continous variable

2
Based on the Fig 1:
- The birthweight distribution and maternal gestation show the bell shape curve, which means the birth
weight variable looks normally distributed..
The histogram for the mother’s pregnancy weight shows a skewed distribution to the left, which
means this variable does not look normally distributed.
- The histogram for maternal age shows a skewed distribution to the right which means this variable
looks not normally distributed.
- The histogram for mother’s height shows a skewed distribution to the left, which means this variable
does not look normally distributed.

The data can be plotted using the scatter plot to see the relationship between the birth weight and
the mother’s related variable.

Fig 2: Scatterplots show the relationship between a baby's birth weight and each predictor variable.

3
Fig 3: Matrix Scatterplots show the relationship between a baby's birth weight and each predictor variable

Based on the Fig 2 and 3:

From the matrix scatter plot in Fig 3, we can subjectively assume the relationship between outcome and
predictor variable. We can also see the relationship from a scatter plot foe each predictor variable in Fig 2.
- The top left-hand corner shows the first plot for the relationship between birth weight against
gestational age. The first plot shows a positive relationship and minimal variation between birth
weight and maternal gestation. The scatterplot's circles are fairly closely clustered around an
underlying straight line (instead of a curve or a random scattering). From this plot, as maternal age
increases, the average birth weight increases.
- The second plot seems that there is a widespread birth weight for any given maternal age, so the
relationship is not strong. The relationship between maternal age and birth weight has a weaker
relationship, and also there is a lot of variation between these two variables.
- Birth weight and maternal height have a minimal association, and there is a lot of variation between
the two variables.
- In the last plot, birth weight and the mother's pre-pregnancy weight have a minimal association, and
there is a lot of variation between these two variables.
- It is interesting to note that there is a relationship between the other continuous predictor variables.
Circles in the scatterplot are clustered closely around a straight line. There appear to be strong positive
relationships between maternal height and the mother's pre-pregnancy.

4
Fig 4: Box-Whisker Plots show the distribution of birth weight based on smoking status

Based on the the Fig 4:

- A box plot for non-smokers appears asymmetrical towards the weight of the baby. Also, the boxplot
for smokers towards the birth weight shows a slightly asymmetrical shape. From these boxplots, we
can see a difference in birth weight between smokers and non-smokers. The median birth weight of
mothers who are non-smokers is heavier than mothers who are a smoker.

The above plots (Fig1, Fig2, Fig3, Fig 4) show the relationship between birth weight and other predictor
variables; however, the interpretation from the plots and graph is subjective. To learn more about any
association, we need to derive a statistical test assessing the true relationship between these variables. As a
first step, we can produce some five-number summaries.
Table 1: Summary Statistics
min Q1 median Q3 max mean sd n Missing
data
Variable
Birth weight 1.92 2.94 3.295 3.6475 4.57 3.312857 0.603895 42 0
(Kg)
Gestation 33 38 39.5 41 45 39.19048 2.643336 42 0
(week)
Maternal age 18 20.25 24 29 41 25.54762 5.666342 42 0
(years)
Mother’s height 149 161 164.5 169.5 181 164.4524 6.504041 42 0
Mother’s pre- 45 52.25 57 62 78 57.5 7.198408 42 0
pregnancy
weight (kg)

Table 2: Five number summaries for birth weight stratified by smoking status
Smoking status min Q1 median Q3 max mean sd n Missing
data
Non smoker 2.65 3.1400 3.385 3.9325 4.55 3.509500 0.5184945 20 0
Smoker 1.92 2.7425 3.185 3.5450 4.57 3.134091 0.6312471 22 0

5
Based on the table 1:
- The median and mean for birth weight, gestation, maternal age, mother’s height, and mother’s pre-
pregnancy weight are nearly similar (the difference between mean and median from each variable is
not far), indicating that the data are normally distributed. Also, the sample size is more than 30 (42
sample observations).

Based on the table 2:

- In this interpretation, we use the median for average. The median for non-smokers is heavier than the
median for smokers mothers. This result also confirms the interpretation from box plot in Fig 4.

The matrix scatterplot subjectively revealed some weak and strong associations between the
continuous variables. As can be seen from the plots above (Fig 2), data are normally distributed and the
assumption of linearity is appropriate to estimate correlation correctly. Thus, in order to understand the
strength of these associations, we should use parametric Perason rank correlation to calculate correlation
coefficients.

Fig 5: R output shows the Pearson correlation coefficient

## Birthweight Gestation Mage Mheight Mppwt
## Birthweight 1.00 0.71 0.00 0.36 0.40
## Gestation 0.71 1.00 0.01 0.21 0.26
## Mage 0.00 0.01 1.00 0.06 0.27
## Mheight 0.36 0.21 0.06 1.00 0.68
## Mppwt 0.40 0.26 0.27 0.68 1.00
##
## n= 42
##
##
## P
## Birthweight Gestation Mage Mheight Mppwt
## Birthweight 0.0000 0.9991 0.0181 0.0085
## Gestation 0.0000 0.9460 0.1809 0.1030
## Mage 0.9991 0.9460 0.7060 0.0789
## Mheight 0.0181 0.1809 0.7060 0.0000
## Mppwt 0.0085 0.1030 0.0789 0.0000

Based on the result from the Pearson correlation coefficient in Fig 5:

- The first block from R output shows that all the correlations are positive. The strongest correlation is
between birth weight and gestation (0.71). The weakest correlation is between birth weight and
maternal age (0.0).
- The third block presents the statistical significance and the corresponding p-values.We can see that all
correlations are statistically significant (p-value is less than 0.05) except for the birth weight and
maternal age (p-value=0.9991).

Fig 6: R output shows the linear regression which predict birth weight from observed smoking status
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$smoker)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21409 -0.39159 0.02591 0.41935 1.43591
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5095 0.1298 27.040 <2e-16 ***
## bwg1$smokersmoker -0.3754 0.1793 -2.093 0.0427 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5804 on 40 degrees of freedom

6
## Multiple R-squared: 0.09874, Adjusted R-squared: 0.07621
## F-statistic: 4.382 on 1 and 40 DF, p-value: 0.0427

Based on the results linear regression between birth weight and smoking status:
- The R squared for regression of birth weight on smoking status produces an R squared value of
0.09874. Therefore, 9.8% of the variation in birth weight is explained by its relationship with the
smoking status. Also, the p-value (p value=0.0427) shows statistically significant.

Based on the result from the Pearson correlation coefficient in Fig 5, some variables are highly correlated,
which can cause problems in the regression model. The explanatory variable mother's height and mother's
pre-pregnancy weight show a strong correlation (0.68). We must choose which of the two variables to include
in the final model. First, we must choose either mother's height or the mother's pre-pregnancy weight in the
final model.
The procedure is to run a univariate regression on the mother's height and pre-pregnancy weight to
choose between these continous variable to predict the birth weight in final model. Then, the R squared values
for both regressions will be compared. The final model will include the variable with the highest R-squared
based on the result from the linear regression. The corresponding variable within that regression is retained
because it accounts for a larger portion of the variability in the dependent variable. By looking at the R squared
value, we can determine what variance in birth weight would be accounted for if the model had been derived
from the population from which the sample was taken.
Fig 7a: R output shows the linear regression which predict birth weight from observed mother's height
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$mheight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.31503 -0.32047 0.02239 0.35715 1.31981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23073 2.25127 -0.991 0.3277
## bwg1$mheight 0.03371 0.01368 2.464 0.0181 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5697 on 40 degrees of freedom
## Multiple R-squared: 0.1318, Adjusted R-squared: 0.1101
## F-statistic: 6.073 on 1 and 40 DF, p-value: 0.01812

Based on the figure 7a:

- The R squared for regression of birth weight on mother's height produces an R squared value of 0.13.
Therefore, 13% of the variation in birth weight is explained by its relationship with the mother’s
height. Also, the p-value shows that there is a significant evidence that birth weight is associated with
the mother's height (p value=0.01).

Fig 7b: R output shows the linear regression which predict birth weight from observed mother's pre-
pregnancy weight
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.24604 -0.41515 -0.03875 0.38833 1.25396
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)

7
## (Intercept) 1.37905 0.70407 1.959 0.05715 .
## bwg1$mppwt 0.03363 0.01215 2.768 0.00851 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5601 on 40 degrees of freedom
## Multiple R-squared: 0.1607, Adjusted R-squared: 0.1397
## F-statistic: 7.659 on 1 and 40 DF, p-value: 0.008513

Based on the figure 7b:

- The R squared for regression of birth weight on mother's pre-pregnancy weight produces an R squared
value of 0.16. Therefore, 16% of the variation in birth weight is explained by its relationship with
mother's pre-pregnancy weight. Also, the p-value shows that there is significant evidence that birth
weight is associated with mother's pre-pregnancy weight (p value=0.008).

From the univariate linear regression in Fig 7a and 7b, we can conclude that the R squared value from the
regression of birth weight on the mother's pre-pregnancy weight is greater than the value from the birth
weight on the mother's height. Also, from the matrix scatter plot and correlation coefficients, birth weight and
mother's height show much less variation and a non-significant association than birth weight and mother's
pre-pregnancy weight. These results indicate that we should exclude the mother's height from the final model
for multiple linear regression.

8
QUESTION 1.b
Gestation, maternal age, smoking status, and mother's pre-pregnancy weight are all factors that we
can consider to predict birth weight. Then we can set up the null hypothesis for the model regression. We can
set up the null hypotheses (H0) and alternative hypotheses (Ha) for the linear regression. Then, we run the first
model to choose the variable in final model.

H0: The initial assumption is that there is no relation.

Ha: At least one of the independent variables is helpful in explaining/predicting the birth weight

Fig 8: R output shows the multiple linear regression

##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$Gestation + +bwg1$smoker +
## bwg1$mage + bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58200 -0.29720 -0.03732 0.30943 0.89729
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.233673 0.975667 -3.314 0.00206 **
## bwg1$Gestation 0.141978 0.024131 5.884 9.02e-07 ***
## bwg1$smokersmoker -0.299665 0.124850 -2.400 0.02153 *
## bwg1$mage -0.002268 0.011549 -0.196 0.84542
## bwg1$mppwt 0.020822 0.009186 2.267 0.02934 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3926 on 37 degrees of freedom
## Multiple R-squared: 0.6185, Adjusted R-squared: 0.5773
## F-statistic: 15 on 4 and 37 DF, p-value: 2.253e-07

## 2.5 % 97.5 %
## (Intercept) -5.210562099 -1.25678465
## bwg1$Gestation 0.093084104 0.19087102
## bwg1$smokersmoker -0.552635913 -0.04669383
## bwg1$mage -0.025668375 0.02113336
## bwg1$mppwt 0.002209861 0.03943393

Based on Fig 8:
- The regression output indicates that the gestation, smoking status, and mother's pre-pregnancy
weight are significant variables in this model. These variables are statistically significant as they have
p values of 0.001, 0.02, and 0.03, respectively (all p values is less than 0.05), implying that gestation,
smoking status, and mother's pre-pregnancy weight are strong predictors of birth weight. On the
other hand, maternal age is is excluded since the p-value is greater than 0.05 (p values = 0.84).
- Thus, the final model uses gestation, smoking status, and the mother's pre-pregnancy weight to
predict the birth weight.

9
Fig 9: The R output shows the multiple linear regression for the final model
##
## Call:
## lm(formula = bwg1$Birthweight ~ bwg1$Gestation + bwg1$smoker +
## bwg1$mppwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60383 -0.28810 -0.05007 0.31103 0.89116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.267614 0.948005 -3.447 0.0014 **
## bwg1$Gestation 0.142182 0.023801 5.974 6.18e-07 ***
## bwg1$smokersmoker -0.304964 0.120346 -2.534 0.0155 *
## bwg1$mppwt 0.020313 0.008701 2.335 0.0249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3876 on 38 degrees of freedom
## Multiple R-squared: 0.6181, Adjusted R-squared: 0.5879
## F-statistic: 20.5 on 3 and 38 DF, p-value: 4.565e-08

## 2.5 % 97.5 %
## (Intercept) -5.186748818 -1.34847869
## bwg1$Gestation 0.093999411 0.19036543
## bwg1$smokersmoker -0.548591906 -0.06133642
## bwg1$mppwt 0.002699597 0.03792709

Based on the Fig 9:

- The adjusted R squared value of 0.59 shows that the regression model explains 59% of the variation in
birth weight using gestation, smoking status, and the mother's pre-pregnancy weight.
- Multiple linear regression is conducted to investigate the relationship between gestational age at birth
(weeks), the mother's pre-pregnancy weight, and whether the mother smokes and birth weight. There
is a significant relationship between gestation and birth weight (p-value< 0.001), smoking and birth
weight (p-value = 0.016), and pre-pregnancy weight and birth weight (p value = 0.02).
- From the final model, we can predict the future birth weight using the final model where gestation,
smoking status, and mother's pre-pregnancy weight are predictor variable to predict the birth weight.
- This model:
birth weight = (-3.267614) + (0.142182 × gestational age) – (0.304964 x smoking status) + (0.020313
x mother's pre-pregnancy weight)

10
QUESTION 1.c

Based on the final model,

- This equation can be used to predict the weight of the baby.
-
The equation for the model is:
birth weight = (-3.267614) + (0.142182 × gestational age) – (0.304964 x smoking status) + (0.020313
x mother's pre-pregnancy weight)
where the indicator variable smoking status takes a value 1 in smokers and 0 in the non-smoker

- The intercept of the population regression line is estimated as -3.27Kg (average value of birth weight
when gestation, smoking status, and the mother's pre-pregnancy weight are all equal to 0).
- The slope of the population regression line for gestation is estimated as 0.14 (birth weight increases
by 0.14 Kg, on average, for every one-week increase in gestational age) after adjusting for smoking
status and the mother's pre-pregnancy weight.
- The slope of the population regression line for mother's pre-pregnancy weight is estimated birth
weight increases by 0.02 Kg, on average, for every 1 Kg increase in mother's pre-pregnancy weight
after adjusting for gestation and smoking status.
- The slope of the population regression line for a smoker is estimated birth weight decreases by 0.30
Kg, on average, than non-smokers after adjusting for gestation and the mother's pre-pregnancy
weight.
- The p-values are all < 0.001, so we can reject H0 and conclude that there is significant evidence that
birth weight is associated with gestation, smoking status, and the mother's pre-pregnancy weight.
- We are 95% confident that the increase in birth weight in the population could be as little as 0.09 Kg
or as much as 0.19 Kg to every one-week rise in gestational age after adjustment for smoking status
and the mother's pre-pregnancy weight. Our best estimate for the increase is 0.14 Kg.
- We are 95% confident that the increase in birth weight in the population could be as little as 0.003 Kg
or as much as 0.04 Kg for every one-kilogram rise in a mother's pre-pregnancy weight after
adjustment for smoking status and gestation. Our best estimate for the increase is 0.02 Kg.
- We are 95% confident that the decrease in birth weight in the population could be as little as 0.061 Kg
or as much as 0.54 Kg for mothers who are a smoker after adjustment for gestation and mothers' pre-
pregnancy weight. Our best estimate for the decrease is 0.30 Kg.

11
QUESTION 1.d

After identifying the best model, the final model must be checked to ensure that it meets the
requirements of the linear regression assumption. To draw conculusion about a population based on a
regression analysis, several assumption must be true.

- All predictor variables need to be quantitative or categorical (with two categories), and all outcome
variables need to be quantitative, continuous, and unbounded. In the final model, all predictor
variables are continuous variables (gestation and mother's pre-pregnancy weight) and categorical
variables (smoking status). The outcome variable is continuous (birth weight).
- The predictors should have some variation in value (there was non-zero variance).
- There should be no perfect linear relationship between two or more predictors. The predictor
variable should not correlate too high. From the Pearson correlation coefficient results, the predictor
variable between mother's height and mother's pre-pregnancy weight are highly correlated (0.68).
After running the univariate linear regression for the mother's height and mother's pre-pregnancy
weight, R squared value shows that the variable mother's pre-pregnancy weight is greater than the
mother's height. Therefore, the mother's height is excluded from the final model.
- The mean values of the outcome variable for the predictor variables in final model lie along the
straigth line. Scattered plots can be used visually to see whether predictor and outcome variables are
linearly related. Thus, the assumption of linearity has been met based on Fig 10 below.

Fig 10: Scatterplots show the relationship between the baby's birth weight and the predictor variables

12
Checking the assumption about residuals

Fig 11: Checking assumption about the residuals

Based on the Fig 11:

- The plots in Fig 11 indicate that the model satisfies the linear regression assumption. The residuals
versus fitted values plot does not have a funnel or a curve shape. This plot clearly shows a situation
where the linearity and homoscedasticity assumptions have been satisfied. The Q-Q plot shows a
deviation from the diagonal line slightly. Nothing out of the ordinary occurs in Q-Q plot. According to
the scale location plot, the red line is horizontal, and the spread of standardized residuals around the
red line is constant. This plot satisfies the linear regression assumption. The plot from Residuals vs
Lavarage shows all cases are well inside Cook’s distance lines (a red dashed line).

Fig 11: Histogram residuals

Based on the Fig 11:

- The histogram of residuals in Fig 11 shows a bell-shaped curve, indicating that the assumption of
residual normality has been satisfied. We can conclude that the final model satisfied the assumption
of linear regression.

13
The data meet the homogeneity and linearity assumptions, and the residuals approximately fit the normal
distribution. We can predict the future birth weight using the final model using the mother's related variable:
gestation, smoking status, and mother's pre-pregnancy weight. However, the modeling needs to be improved.
There are several suggestions to improve the linear regression modeling.
- Increasing the sample size may increase the precision of parameter estimates.
- Including the pre-pregnancy maternal BMI or BMI first semester as a predictor variable. BMI can
reflect the maternal nutritional status, which is critical to assuring a constant supply of nutrients to
developing fetuses.
- we can improve the regression modeling depends on the study design that can be used to predict
birth weight. Perhaps there is a third (mediating) variable at work. A randomized control trial can
establish only cause and effect. Therefore.

14
QUESTION 2.a
This study aimed to predict low birth weight based on the mother's characteristics. The outcome
variable is whether the newborn baby will be of low birth weight (<2500 g) or not (>=2500 g). The predictor
variables are age, mother's weight, smoking status, previous premature, hypertension, and a total of GP visits
during the first trimester of pregnancy. Firstly, we need to plot the data and graph before applying the
statistical model.

Fig 12: The box plot of age of mother in years by low birth weight

Based on the Fig 12:

- Subjectively, the boxplots for low birth weight (<2500 g) and birth weight (>=2500 g) toward the age
of the respondent show an asymmetrical shape. However, the median for low birth weight (<2500 g)
and birth weight (>=2500 g) are similar. Thus, the five summary statistics below can confirm the
results from the box plot.

Table 3: Five summary statistics: age of mother in years by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 14 19 23 28 45 23.82911 5.799399 158 0
<2500 g 14 20 23 25 34 22.83117 4.569356 77 0

Based on table 3:
- The summary statistics show that the median and minimum values of the age of the mother are
similar in terms of low birth weight (<2500 g) and birth weight (>=2500 g). Moreover, the minimum
value, Q1 and Q3, maximum value, and mean value are higher in birth weight (>=2500 g) than in low
birth weight (<2500 g). These results also confirm the interpretation from the box plot.

Fig 13: the cross tabulation of the categorical variables smoking status and newborn baby's birth weight
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===================================================
## lbw$smoke
## lbw$bwcat nonsmoker smoker Total
## ---------------------------------------------------
## birthweight >= 2500g 110 48 158
## 73.8% 55.8%
## ---------------------------------------------------
## birthweight<2500g 39 38 77
## 26.2% 44.2%
## ---------------------------------------------------
## Total 149 86 235
## 63.4% 36.6%
## ===================================================

15
Based on the Fig 13:
- The column percentage for non-smoker mother show that 73.8% has newborn baby birth weight >
2500 g while 26.2% has birth weight <2500 g.
- The column percentage for smoker mother show that 55.8% has newborn baby birth weight > 2500 g
while 44.2% has birth weight <2500 g.

Fig 14: The box plot of mother with history of previous premature babies by low birth weight

Based on the Fig 14:

- Subjectively, the boxplot for the number of having previous premature babies in terms of low birth
weight (<2500 g) and birth weight (>=2500 g) shows an asymmetrical shape. However, the median for
low birth weight (<2500 g) and birth weight (>=2500 g) are similar. Thus, the five summary statistics
below can confirm the results from the box plot.

Table 4: Five summary statistics: history of mother with number of previous premature babies by low birth
weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 0 0 0 0 3 0.1202532 0.4276673 158 0
<2500 g 0 0 0 1 2 0.3376623 0.5527536 77 0

Based on table 4:
- The summary statistics show that the median and minimum values of the number of having previous
premature babies are similar in terms of low birth weight (<2500 g) and birth weight (>=2500 g).
Moreover, the maximum value of a mother having previous premature babies is higher in birth weight
(>=2500 g). However, the mean value is higher in low birth weight (<2500 g). These results also
confirm the interpretation from the box plot.

Fig 15: the box plot of the number of GP visits by low birth weight

16
Based on the Fig 15:
- The box plots representing the number of visits to GP during the first trimester concerning low birth
weight (2500 g) and birth weight (>2500 g) show asymmetrical distributions to an extent.
Furthermore, the median for birth weight (>=2500 g) is higher than low birth weight (<2500 g) .
However, the maximum for low birth weight (<2500 g) and birth weight (>=2500 g) are similar. Thus,
the five summary statistics below can confirm the results from the box plot.

Table 5: Five summary statistics: the number of visits to GP during first trimester by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 0 0 1 1 6 0.8607595 1.0495274 158 0
<2500 g 0 0 0 1 4 0.6623377 0.9815454 77 0

Based on table 5:
- The summary statistics show that the median value for birth weight (>=2500 g) is higher than low
birth weight (<2500 g). Moreover, the maximum and mean values of the number of visits to GP are
higher in birth weight (>=2500 g). These results also confirm the interpretation from the box plot.

Fig 16: The box plot of mother’s weight at last menstrual period by low birth weight

Based on the Fig 16:

- The box plot has an asymmetrical shape for birth weight (>=2500 g) to the mother's weight at the last
menstrual period. However, the low birth weight (<2500 g) shows a slightly asymmetrical form. From
the box plot, the median for birth weight (>=2500 g) is higher than low birth weight (<2500 g). Thus,
the five summary statistics below can confirm the results from the box plot.

Table 6: Five summary statistics: mother’s weight at last menstrual period by low birth weight
min Q1 median Q3 max mean Sd n missing
>=2500 g 38.55540 52.16319 56.01872 67.69874 113.39822 60.74413 14.29398 158 0
<2500 g 36.28743 47.62725 54.43115 59.87426 90.71858 56.22196 12.76118 77 0

Based on table 6:
- The summary statistics show that the median value for birth weight (>=2500 g) is higher than low
birth weight (<2500 g). Moreover, the maximum, minimum, Q1, Q3 and mean values of mother’s
weight at last menstrual period are higher in birth weight (>=2500 g). These results also confirm the
interpretation from the box plot.

17
Fig 17: the cross-tabulation of the categorical variables hypertension history and newborn baby's birth
weight
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===============================================================================
## lbw$hypercat
## lbw$bwcat hypertension history no hypertension history Total
## -------------------------------------------------------------------------------
## birthweight >= 2500g 6 152 158
## 35.3% 69.7%
## -------------------------------------------------------------------------------
## birthweight<2500g 11 66 77
## 64.7% 30.3%
## -------------------------------------------------------------------------------
## Total 17 218 235
## 7.2% 92.8%
## ===============================================================================

Based on the Fig 17:

- The column percentage for mothers with hypertension history shows that 35.3% have newborn babies
with birth weight > 2500 g while 64.7% have birth weight <2500 g.
- The column percentage for mothers with no hypertension history shows that 69.7% have a newborn
baby birth weight > 2500 g while 30.3% have a birth weight <2500 g.

18
QUESTION 2.b

1. Building the logistic regression model from each of the predictor variables

Fig 18: R output shows the results of logistic regression to low birth weight on the age of mothers
##
## Call:
## glm(formula = lbw$low ~ lbw$age, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0161 -0.9213 -0.8320 1.4416 1.6628
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.09915 0.63136 0.157 0.875
## lbw$age -0.03508 0.02662 -1.318 0.188
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 295.49 on 233 degrees of freedom
## AIC: 299.49
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.13829618 1.33658818
## lbw$age -0.08725216 0.01709816

## OR 2.5 % 97.5 %
## (Intercept) 1.1042275 0.3203644 3.806036
## lbw$age 0.9655311 0.9164460 1.017245

Based on the Fig 18:

- The odds of low birth weight are multiplied by 0.97. It means the odds of low birth weight are reduced
by 3% for every additional year of age.
- However, a 95% Confidence Interval from 0.92 to 1.01 means an overlap of 1.0. Also, the p-value is 0.2
(p-value greater than 0.05). These results indicate that age is not included in the final logistic
regression model.

Fig 18a: The R output shows the AUCs in low birth weight based on the mother's age
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred1, ci = TRUE)
##
## Data: lbw1$lowpred1 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5298
## 95% CI: 0.4536-0.6059 (DeLong)

Fig 18b: ROC Curve in low birth weight based on the mother’s age

19
Based on Fig 18a and 18b:
- The AUC value is 0.5298 (95% CI: 0.4536-0.6059), so we can conclude that the mother’s age in the
model is a very poor predictive ability for low birth weight.

Fig 19: R output shows the logistic regression results of the newborn baby's birth weight on the mother's
weight at the last menstrual period.
##
## Call:
## glm(formula = lbw$low ~ lbw$lweight, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0944 -0.9291 -0.8124 1.3546 1.8768
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.81738 0.67293 1.215 0.2245
## lbw$lweight -0.02634 0.01148 -2.296 0.0217 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 291.40 on 233 degrees of freedom
## AIC: 295.4
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.50155125 2.136302228
## lbw$lweight -0.04883783 -0.003851365

## OR 2.5 % 97.5 %
## (Intercept) 2.2645487 0.6055905 8.468067
## lbw$lweight 0.9739994 0.9523356 0.996156

Based on the Fig 19:

- The odds of low birth weight are multiplied by 0.97. The odds of low birth weight are reduced by 3%
for every additional mother’s weight in kilogram at the last menstrual period.
- The 95% Confidence Interval from 0.95 to 1 means an overlap of 1.0. However, the p-value is 0.02 (p-
value is less than 0.05). From the p-value, the mother’s weight at the last menstrual period is
statistically significant. Thus, the odds ratio is not quite extreme, but it is still significant.
- The logistic regression assumes that the relationship between the mother's weight at the last period of
menstrual on the probability of having a low birth weight is linear.

Figure 19a: The R output shows the AUCs in low birth weight based on the mother's weight at the last
menstrual period.

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred2, ci = TRUE)
##
## Data: lbw1$lowpred2 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.6011
## 95% CI: 0.5211-0.6811 (DeLong)

20
Figure 19b: ROC Curve in low birth weight based on the mother's weight at the last period of menstrual.

Based on Fig 19a and 19b:

- The AUC value is 0.6011 (95% CI: 0.5211-0.6811), so we can conclude that the mother's weight at the
last period of menstrual in this model is a poor predictive ability for low birth weight.

According to Fig. 20, the relationship between the mother's weight during her last menstruation period
and the likelihood of having a low birth weight are linear. This model needs to check its linearity relationship
by calculating the regression squared term for the continuous variable mother's weight, predicting the
probabilities, and plotting the probabilities against the mother's weight. The R outputs are below.

Fig 19c: R output shows the squared term in regression for the continuous variable mother's weight
##
## Call:
## glm(formula = lbw$low ~ lbw$lweight + lbw$lweight2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2126 -0.9103 -0.7695 1.3182 1.7976
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.9940510 2.2135834 1.353 0.176
## lbw$lweight -0.0955843 0.0676572 -1.413 0.158
## lbw$lweight2 0.0005204 0.0004950 1.051 0.293
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 290.37 on 232 degrees of freedom
## AIC: 296.37
##
## Number of Fisher Scoring iterations: 4

## OR 2.5 % 97.5 %
## (Intercept) 19.9664026 0.2606719 1529.344725
## lbw$lweight 0.9088418 0.7959731 1.037715
## lbw$lweight2 1.0005206 0.9995503 1.001492

Based on the Fig 19c:

- The squared term of the mother's weight has a P-value of 0.293 (p-value > 0.05), indicating that it is
not statistically significant. As a result, the squared term of the mother's weight will not be included in
the final model. The final model will include the variable of the mother's weight.

21
To better understand the relationship between the low birth weight and the mother's weight, calculate
the probability of low birth weight and plot it against the mother's weight.

Fig 19d: the plot for the probabilities against the mother's weight.

Based on the Fig 19d:

- The highest point of having a low birth weight based on the mother's weight at the last menstrual
period is around 40 Kg. The probability decreases until the mother's weight is approximately 90 Kg.
However, starting at this point, the likelihood of having low birth weight tends to increase.

Fig 19e: the plot for predicting the probabilities of low birth weight against the mother's weight using the
cubed term.

Based on the Fig 19e:

- The probability of having a low birth weight peaks when the mother's weight during the last menstrual
period is 40 kg, then declines until the mother weighs approximately 90 kg—however, the chances of
having a low birth weight baby begin to rise at this point.

Figures d and e do not seem to differ much from each other. However, there is a slight difference after
adding the cubed term to predict the probabilities. The R output for regression on cubed term of mother’s
weight below.

Fig 19f: The R output shows the cubed term in regression for the continuous variable mother's weight
##
## Call:
## glm(formula = LBW ~ +I(MLWEIGHT^2) + I(MLWEIGHT^3), family = binomial,
## data = lbwcomp)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1220 -0.9245 -0.7924 1.3414 1.8401
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)

22
## (Intercept) 5.707e-01 7.967e-01 0.716 0.474
## I(MLWEIGHT^2) -6.367e-04 5.188e-04 -1.227 0.220
## I(MLWEIGHT^3) 4.258e-06 4.726e-06 0.901 0.368
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 291.49 on 232 degrees of freedom
## AIC: 297.49
##
## Number of Fisher Scoring iterations: 4

Based on the Fig 19f:

- The p-value is greater than 0.05 (p –value=0.368) for the cubed term for the mother's weight. The
cubed term for the mother's weight is not statistically significant. Also, this result confirms that the
plot for predicting the probabilities of low birth weight against the mother's weight using the cubed
term is not much different from the probabilities of low birth weight against the mother's weight.

23
Fig 20: R output shows the logistic regression results of low birth weight on the number of previous
premature babies.
##
## Call:
## glm(formula = lbw$low ~ lbw$prev.prem, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9699 -0.8237 -0.8237 1.1814 1.5785
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9066 0.1551 -5.844 5.11e-09 ***
## lbw$prev.prem 0.8972 0.2962 3.029 0.00245 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 287.28 on 233 degrees of freedom
## AIC: 291.28
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.2106966 -0.6025278
## lbw$prev.prem 0.3167052 1.4777617

## OR 2.5 % 97.5 %
## (Intercept) 0.4038902 0.2979896 0.5474261
## lbw$prev.prem 2.4528080 1.3725979 4.3831241

Based on the Fig 20:

- The odds of low birth weight are multiplied by 2.5 for every additional number of having a history of
previous premature babies.
- Moreover, we can be 95% confident that the rising in odds are between 1.4 to 4.3, with the best
estimate being 2.5. Also, the p-value is 0.002 (p-value < 0.05). Thus, we can conclude that the variable
for having a history of previous premature babies will include in the final model.
Fig 20a: The R output shows the AUCs in low birth weight based on the number of having a history of
previous premature babies.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred3, ci = TRUE)
##
## Data: lbw1$lowpred3 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.6029
## 95% CI: 0.5469-0.6588 (DeLong)

Fig 20b: ROC Curve in low birth weight based on the number of having a history of previous premature
babies.

24
Based on Fig 20a and 20b:
- The AUC value is 0.6029 (95% CI: 0.5469-0.6588), so we can conclude that the number of having a
history of previous premature babies is a poor predictive ability for low birth weight.

Fig 21: R output shows the logistic regression results of newborn baby's birth weight on mother smoked
during pregnancy.
##
## Call:
## glm(formula = lbw$low ~ lbw$smoke, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0799 -0.7791 -0.7791 1.2781 1.6373
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0369 0.1864 -5.564 2.64e-08 ***
## lbw$smokesmoker 0.8033 0.2861 2.807 0.005 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 289.37 on 233 degrees of freedom
## AIC: 293.37
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.402187 -0.6716502
## lbw$smokesmoker 0.242463 1.3641447

## OR 2.5 % 97.5 %
## (Intercept) 0.3545455 0.2460582 0.5108648
## lbw$smokesmoker 2.2329060 1.2743841 3.9123755

Based on the Fig 21:

- The odds ratio is estimated as 2.23, suggesting that smokers (coded as 1) are at greater risk of low
birth weight.
- Moreover, we can be 95% confident that the rising in odds are between 1.27 to 3.9, with the best
estimate being 2.23. Also, the p-value is 0.005 (p-value < 0.05). Thus, we can conclude that the
variable for smoking status in the final model.

Fig 21a: The R output shows the AUCs in low birth weight based on the mother's smoking status during
pregnancy.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred4, ci = TRUE)
##
## Data: lbw1$lowpred4 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5949
## 95% CI: 0.5281-0.6616 (DeLong)

Fig 21b: ROC Curve in low birth weight based on the mother's smoking status during pregnancy.

25
Based on Fig 21a and 21b:
- The AUC value is 0.5949 (95% CI: 0.5281-0.6616), so we can conclude that the number of having a
history of previous premature babies is a poor predictive ability for low birth weight.

Fig 22: R output shows the results of logistic regression of newborn baby's birth weight on mother has a
history of hypertension.
##
## Call:
## glm(formula = lbw$low ~ lbw$hyper.tension, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4432 -0.8492 -0.8492 1.5459 1.5459
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8342 0.1474 -5.659 1.52e-08 ***
## lbw$hyper.tension 1.4404 0.5285 2.725 0.00642 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 289.42 on 233 degrees of freedom
## AIC: 293.42
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -1.1231490 -0.5453026
## lbw$hyper.tension 0.4045319 2.4761913

## OR 2.5 % 97.5 %
## (Intercept) 0.4342105 0.325254 0.5796663
## lbw$hyper.tension 4.2222222 1.498601 11.8958696

Based on the Fig 22:

- The odds ratio is estimated as 4.2, suggesting that those with a history of hypertension (coded as 1)
are at greater risk of low birth weight.
- Moreover, we can be 95% confident that the rising odds are 1.5 to 11.9, with the best estimate being
2.23. The interval is wide.
- The p-value is 0.006 (p-value < 0.05). Thus, we can conclude that the variable for mother has a history
of hypertension will be include in the final model.

Fig 22a: The R output shows the AUCs in low birth weight based on the mother's history of hypertension.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred5, ci = TRUE)
##
## Data: lbw1$lowpred5 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5524
## 95% CI: 0.5104-0.5945 (DeLong)

Fig 22b: ROC Curve in low birth weight based on the mother's history of hypertension.

26
Based on Fig 22a and 22b:
- The AUC value is 0.5524 (95% CI: 0.5104-0.5945), so we can conclude that the history of hypertension
is a very poor predictive ability for low birth weight.

Fig 23: R output shows the logistic regression results of the newborn baby's birth weight on the mother's
number of GP visits during the 1st trimester.
##
## Call:
## glm(formula = lbw$low ~ lbw$gp.visits, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9481 -0.9481 -0.8732 1.4255 1.7875
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5666 0.1743 -3.251 0.00115 **
## lbw$gp.visits -0.2012 0.1459 -1.379 0.16779
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 295.26 on 233 degrees of freedom
## AIC: 299.26
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.9081690 -0.22498644
## lbw$gp.visits -0.4870886 0.08469505

## OR 2.5 % 97.5 %
## (Intercept) 0.5674641 0.4032619 0.7985271
## lbw$gp.visits 0.8177515 0.6144126 1.0883851

Based on the Fig 23:

- The odds of low birth weight are multiplied by 0.8 that means the odds of low birth weight are
reduced by 20% for every additional number of GP visits during 1st trimester.
- Moreover, we can be 95% confident that the reduction odds are 0.6 to 1.09, with the best estimate
being 0.8.
- The p-value is 0.16 (p-value > 0.05). Thus, we can conclude that the variable for number of GP visits
during the 1st trimester is not included in the final model.

Fig 23a: The R output shows the AUCs in low birth weight based on the number of GP visits during the 1st
trimester.
##
## Call:
## roc.formula(formula = lbw1$LBWO ~ lbw1$lowpred6, ci = TRUE)
##
## Data: lbw1$lowpred6 in 158 controls (lbw1$LBWO 0) < 77 cases (lbw1$LBWO 1).
## Area under the curve: 0.5629
## 95% CI: 0.4907-0.6351 (DeLong)

Fig 23b: ROC Curve in low birth weight based on the number of GP visits during the 1st trimester.

27
Based on Fig 23a and 23b:
- The AUC value is 0.5629 (95% CI: 0.4907-0.6351), so we can conclude that the number of GP visits
during the 1st trimester is very poor predictive ability for low birth weight.

As the results from logistic regression for each predictor variable to predict the low birth weight, the
predictor variables for the final model are the mother's weight at the last menstrual period, smoking status,
the number of having previous preterm babies, and having a history of hypertension. The number of GP visits
at 1st trimester is excluded since the p-value is greater than 0.05. Moreover, the result from AUC also confirm
that the number of GP visits is very poor predictive ability for low birth weight.

2. Logistic Regression Model for Low Birth Weight

Fig 24: R output shows the final logistic regression results of the independent variables to predict low birth
weight.
##
## Call:
## glm(formula = lbw$low ~ +lbw$lweight + lbw$prev.prem + lbw$smoke +
## lbw$hyper.tension, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0742 -0.7933 -0.6845 1.0492 2.2463
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.77094 0.75012 1.028 0.304062
## lbw$lweight -0.03539 0.01278 -2.769 0.005623 **
## lbw$prev.prem 0.71046 0.31414 2.262 0.023723 *
## lbw$smokesmoker 0.65009 0.30609 2.124 0.033679 *
## lbw$hyper.tension 2.07203 0.61500 3.369 0.000754 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 297.28 on 234 degrees of freedom
## Residual deviance: 265.36 on 230 degrees of freedom
## AIC: 275.36
##
## Number of Fisher Scoring iterations: 4

## 2.5 % 97.5 %
## (Intercept) -0.69926096 2.24114566
## lbw$lweight -0.06043663 -0.01033931
## lbw$prev.prem 0.09475336 1.32615793
## lbw$smokesmoker 0.05017821 1.25001109
## lbw$hyper.tension 0.86666366 3.27740361

## OR 2.5 % 97.5 %
## (Intercept) 2.1618025 0.4969524 9.404099
## lbw$lweight 0.9652309 0.9413534 0.989714
## lbw$prev.prem 2.0349182 1.0993877 3.766544
## lbw$smokesmoker 1.9157221 1.0514585 3.490382
## lbw$hyper.tension 7.9409557 2.3789606 26.506861

28
Based on the Fig 24:
- The logistic regression output indicates that the mother's weight during the last menstrual period,
smoking status, the number of having previous preterm babies, and having a history of hypertension
are strong predictors in this model.
- Mother's weight (p value= 0.005), smoking status (smoker) (p value= 0.03), the number of previous
premature babies (p value= 0.02), and having history of hypertension (p value <0.001) are significant.

The results from the final model imply that the mother's weight during the last menstrual period, smoking
status, the number of having previous preterm babies and having a history of hypertension are variable that
can be consider predictors of low birth weight. These variables are included after excluding the number of GP
visits variable. Also, from the AUC value, the number of having previous preterm predicts better, followed by
the mother's weight, smoking status, and history of hypertension. Therefore, we can also expect a low birth
weight from the final model using this equation below..
log-odds(low birth weight) = 2.16 + 0.97 x the mother's weight during the last menstrual period +
2.03 x the number of having previous preterm babies + 1.92 x smoking status + 7.94 x having a history
of hypertension

where the indicator variable smoking status takes a value 1 in smokers and 0 in the non-smoker

QUESTION 2.C:

- Odds ratio for mother's weight during the last menstrual period = 0.97, 95% CI: 0.94 to 0.99.

The odds of low birth weight are multiplied by 0.97, which means the odds of low birth weight are
reduced by 3% by an additional kilogram of the mother's weight. The 95% confidence interval shows
that the reduction in odds lies between 6% and 10% for each kilogram of a mother's weight. The 95%
CI from 0.94 to 0.99 is not overlapped 1.0, and the p-value is 0.005 indicating that the mother's
weight during the last menstrual period is a helpful predictor of low birth weight in addition to
smoking status, the number of previous preterm babies and having a history of hypertension.

- The odds ratio for having previous preterm babies = 2.03, 95% CI: 1.1 to 3.8.

The odds of low birth weight are multiplied by 2.03, which means that the number of having previous
preterm babies is the greater risk of low birth weight by about 2.03 times. The 95% confidence
interval shows that the odds ratio in the number of having previous preterm babies lies between 1.1
to 3.8 and hence does not include unity. The p-value is less than 0.05 (p value= 0.02), indicating that
the number of having previous preterm babies is a helpful predictor of low birth weight in addition to
the mother's weight during the last menstrual period, smoking status, and having a history of
hypertension.

- The odds ratio for smokers = 1.92, 95% CI: 1.05 to 3.4.

The odds of low birth weight are multiplied by 1.92, which means that a smoker mother is at a greater
risk of low birth weight by about 1.92 times. The 95% confidence interval for the odds ratio in
smokers is between 1.05 to 3.4 and hence does not include unity. The p-value is less than 0.05 (p
value= 0.03), indicating that smoking status is a helpful predictor of low birth weight in addition to the
mother's weight, the number of previous preterm babies, and having a history of hypertension.

- The odds ratio for having a history of hypertension = 7.94, 95% CI: 2.4 to 26.5.

The odds of low birth weight are multiplied by 7.94, which means that having a history of
hypertension is at a greater risk of low birth weight by about 7.94 times. The 95% confidence interval
for the odds ratio in the number of having previous preterm babies is 2.4 to 26.5 and hence does not
include unity. The p-value is less than 0.05 (p value= 0.008), indicating that having a history of

29
hypertension is a helpful predictor of low birth weight in addition to the mother's weight during the
last menstrual period, the number of previous preterm babies, and smoking status.

However, significant variables do not necessarily mean that a model is good. The goodness of fit test
can help to determine this.

Fig 25: R output shows the results of Hosmer-Lemeshow Statistic (H-L)

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=8)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 9.8381, df = 6, p-value = 0.1316

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=10)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 10.456, df = 8, p-value = 0.2345

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=12)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 14.742, df = 10, p-value = 0.1418

> logitgof(lbwcomp2$LBW,lbwcomp2$lbwpred2,g=14)
## Hosmer and Lemeshow test (binary model)
##
## data: lbwcomp2$LBW, lbwcomp2$lbwpred2
## X-squared = 17.993, df = 12, p-value = 0.1159

Based on the Fig 25:

- The results from The Hosmer-Lemeshow Statistic indicate that the model is a good fit since all p-value
are greater than 0.05. We can conclude that the final model reasonably predicts the future low birth
weight.

Fig 26: R output shows Area under the Receiver Operator Characteristic (ROC)Curve
##
## Call:
## roc.default(response = lbwcomp2$LBW, predictor = lbwcomp2$lbwpred2, ci = TRUE)
##
## Data: lbwcomp2$lbwpred2 in 158 controls (lbwcomp2$LBW 0) < 77 cases (lbwcomp2$LBW 1).
## Area under the curve: 0.7212
## 95% CI: 0.6485-0.7938 (DeLong)

30
Based on the Fig 26:
- The AUC value for the final model is 0.7212 (95%CI:0.6485-0.7938). It can be concluded that the final
model is a good predictive ability for low birth weight. This finding also confirms the results from
Hosmer-Lemeshow Statistic.

QUESTION 2.d
Likelihood ratio test is to explore the interaction effects within the regression model
1. Test for interaction between smoking status and age

Fig 27a: R output shows the likelihood ratio test for smoking status and age
## Likelihood ratio test
##
## Model 1: lbw$low ~ lbw$age + lbw$smoke + lbw$age:lbw$smoke
## Model 2: lbw$low ~ lbw$age + lbw$smoke
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -142.87
## 2 3 -143.96 -1 2.1799 0.1398

Based on the Fig 27a:

- The interaction terms in the model, as a group, are not statistically significant since the p-value equals
0.1398, which is greater than 0.05. Therefore, the interaction between smoking and age does not need
to do any stratified subgroup analyses.

2. Test for interaction between hypertension and age

Fig 27b: R output shows the likelihood ratio test for hypertension and age
## Likelihood ratio test
##
## Model 1: lbw$low ~ lbw$age + lbw$hyper.tension + lbw$age:lbw$hyper.tension
## Model 2: lbw$low ~ lbw$age + lbw$hyper.tension
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -142.37
## 2 3 -143.72 -1 2.7005 0.1003

Based on the Fig 27b:

- The interaction terms in the model, as a group, are not statistically significant since the p-value equals
0.1003, which is greater than 0.05. Thus, the interaction between hypertension and age does not need
to do any stratified subgroup analyses.

12 - Further Statistics 2 FS2
100% (2)
12 - Further Statistics 2 FS2
264 pages
Assignment 2 Group 1 Report
No ratings yet
Assignment 2 Group 1 Report
13 pages
Medical Statistics For MRCP: 1. Basic Stats
No ratings yet
Medical Statistics For MRCP: 1. Basic Stats
15 pages
77 MultipleRegression
No ratings yet
77 MultipleRegression
4 pages
Lab 4 Solutions
No ratings yet
Lab 4 Solutions
13 pages
PHC121-Paper Assignment - 2nd Sem 23-24
No ratings yet
PHC121-Paper Assignment - 2nd Sem 23-24
3 pages
Psy 234 Investigating Relationships Week 11
No ratings yet
Psy 234 Investigating Relationships Week 11
37 pages
Computer Practical 3
No ratings yet
Computer Practical 3
7 pages
Interactions Stata R20170622
No ratings yet
Interactions Stata R20170622
15 pages
Birthweight Reduced Data Set: Name Variable Data Type
No ratings yet
Birthweight Reduced Data Set: Name Variable Data Type
2 pages
57_Continuous
No ratings yet
57_Continuous
4 pages
MPH 803 BIO STATISTICS
No ratings yet
MPH 803 BIO STATISTICS
12 pages
STATS301 PS1 Corrected
No ratings yet
STATS301 PS1 Corrected
7 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank download
100% (4)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank download
52 pages
Class Xiii: Controlling For Other Variables
No ratings yet
Class Xiii: Controlling For Other Variables
33 pages
Notes 11
No ratings yet
Notes 11
23 pages
Statistical Models of Determinants of Birth Weight
No ratings yet
Statistical Models of Determinants of Birth Weight
134 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bankpdf download
100% (5)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bankpdf download
36 pages
Introductory Econometrics: Your Friendly Teaching Team
No ratings yet
Introductory Econometrics: Your Friendly Teaching Team
23 pages
330 Lecture18 2014
No ratings yet
330 Lecture18 2014
24 pages
Prediction of Birth Weight
No ratings yet
Prediction of Birth Weight
34 pages
Lab 6 Answers
No ratings yet
Lab 6 Answers
14 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Download Now For An Unlimited Reading Experience
100% (1)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Download Now For An Unlimited Reading Experience
63 pages
Community Project: Simple Linear Regression in SPSS
No ratings yet
Community Project: Simple Linear Regression in SPSS
4 pages
Introductory Notes
No ratings yet
Introductory Notes
30 pages
07 - Inference For Numerical Data
No ratings yet
07 - Inference For Numerical Data
3 pages
Continuous assessment test-stats
No ratings yet
Continuous assessment test-stats
2 pages
Smoking
No ratings yet
Smoking
12 pages
CCRM Christina
No ratings yet
CCRM Christina
39 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Quick Download In Full PDF Format With All Chapters
100% (4)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Quick Download In Full PDF Format With All Chapters
48 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Complete Set Of Chapters Available For Instant Download
100% (4)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank - Complete Set Of Chapters Available For Instant Download
54 pages
Full download Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf docx
100% (32)
Full download Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf docx
52 pages
Biostats Lecture 10 Inference for Means(3)
No ratings yet
Biostats Lecture 10 Inference for Means(3)
43 pages
Standardised Regression Coefficient-Metaanalysis
No ratings yet
Standardised Regression Coefficient-Metaanalysis
15 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
23 pages
Stat5000 HW 1-2
No ratings yet
Stat5000 HW 1-2
3 pages
Country Export in Millions of Euros in 2007
No ratings yet
Country Export in Millions of Euros in 2007
11 pages
Project #3 Hypothesis Testing Project
No ratings yet
Project #3 Hypothesis Testing Project
2 pages
Analysis of continuous outcome measures
No ratings yet
Analysis of continuous outcome measures
65 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
100% (2)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
42 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
100% (1)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
22 pages
1 Introduct
No ratings yet
1 Introduct
9 pages
Assignment 7 (Solution)
No ratings yet
Assignment 7 (Solution)
2 pages
Regression Models For Twin Studies
No ratings yet
Regression Models For Twin Studies
11 pages
Tutorial in Biostatistics Using The General Linear Mixed Model To Analyse Unbalanced Repeated Measures and Longitudinal Data
No ratings yet
Tutorial in Biostatistics Using The General Linear Mixed Model To Analyse Unbalanced Repeated Measures and Longitudinal Data
32 pages
L3_handout_solution
No ratings yet
L3_handout_solution
5 pages
Session 1
No ratings yet
Session 1
8 pages
Statistics For Dummies
100% (3)
Statistics For Dummies
41 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
PPOL205 Assignment 4
No ratings yet
PPOL205 Assignment 4
6 pages
Biostat Intro
No ratings yet
Biostat Intro
60 pages
Econometrics Final Project Model Results
No ratings yet
Econometrics Final Project Model Results
9 pages
Introduction_to_hypothesis_testing24
No ratings yet
Introduction_to_hypothesis_testing24
54 pages
Descriptive Statistics Tutorial 1 Solutions
No ratings yet
Descriptive Statistics Tutorial 1 Solutions
5 pages
Instant Download for Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank 2024 Full Chapters in PDF
100% (9)
Instant Download for Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank 2024 Full Chapters in PDF
58 pages
Complete Answer Guide for Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank
100% (13)
Complete Answer Guide for Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank
59 pages
intro_259_2007
No ratings yet
intro_259_2007
52 pages
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
100% (4)
Introduction to Statistics An Active Learning Approach 2nd Edition Carlson Test Bank pdf download
60 pages
Weight Loss for Men - US Edition
From Everand
Weight Loss for Men - US Edition
Vincent Antonetti PhD
No ratings yet
Weight Loss for Women - US Edition
From Everand
Weight Loss for Women - US Edition
Vincent Antonetti PhD
1/5 (1)
Theoretical Analysis of Cyber-Interpersonal Violence Victimization and Offending Using Cyber-Routine Activities Theory
No ratings yet
Theoretical Analysis of Cyber-Interpersonal Violence Victimization and Offending Using Cyber-Routine Activities Theory
38 pages
Advanced Management Accounting
100% (3)
Advanced Management Accounting
289 pages
Stat 2509 Exam Review Problems 1 Awoods
No ratings yet
Stat 2509 Exam Review Problems 1 Awoods
23 pages
Methods For Estimating Regression Discontinuity Design With Multiple Assignment Variables A Comparative Study of Three Estimation Methods
No ratings yet
Methods For Estimating Regression Discontinuity Design With Multiple Assignment Variables A Comparative Study of Three Estimation Methods
73 pages
Validity of Force-Velocity Profiling Assessed With A Pneumatic Leg Press Device
No ratings yet
Validity of Force-Velocity Profiling Assessed With A Pneumatic Leg Press Device
21 pages
Final Copy Current Assets Management
0% (1)
Final Copy Current Assets Management
77 pages
Journal of Applied Developmental Psychology: Laura Pesu, Jaana Viljaranta, Kaisa Aunola
No ratings yet
Journal of Applied Developmental Psychology: Laura Pesu, Jaana Viljaranta, Kaisa Aunola
9 pages
AMDA Cheat Sheet Spring FINAL3
No ratings yet
AMDA Cheat Sheet Spring FINAL3
2 pages
(Ebooks PDF) Download Big and Complex Data Analysis Methodologies and Applications Ahmed Full Chapters
100% (2)
(Ebooks PDF) Download Big and Complex Data Analysis Methodologies and Applications Ahmed Full Chapters
52 pages
Short Notes
No ratings yet
Short Notes
44 pages
CH 17 Correlation Vs Regression
No ratings yet
CH 17 Correlation Vs Regression
17 pages
Arpita - Sarkar - Business - Report - 17th December, 2023
No ratings yet
Arpita - Sarkar - Business - Report - 17th December, 2023
23 pages
Demand Forecasting: by Archana@rbmi - in
No ratings yet
Demand Forecasting: by Archana@rbmi - in
39 pages
Cec Project of Business Statistics (Autosaved)
No ratings yet
Cec Project of Business Statistics (Autosaved)
9 pages
Application of Artificial Neural Network Models For Predicting Water Quality Index
No ratings yet
Application of Artificial Neural Network Models For Predicting Water Quality Index
15 pages
AnalytixLabs - Data Science & Machine Learning With Python-1601625377114-1
No ratings yet
AnalytixLabs - Data Science & Machine Learning With Python-1601625377114-1
16 pages
Minyichel Baye
No ratings yet
Minyichel Baye
86 pages
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 5th Edition instant download
100% (6)
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 5th Edition instant download
52 pages
A Hybrid Machine Learning Model For Grade Prediction in Online Engineering Education
No ratings yet
A Hybrid Machine Learning Model For Grade Prediction in Online Engineering Education
22 pages
Rose Assignment
100% (1)
Rose Assignment
6 pages
L2 Linear Regression
No ratings yet
L2 Linear Regression
61 pages
Chapter 14 Multiple Regression
No ratings yet
Chapter 14 Multiple Regression
28 pages
AI Lab File - C
No ratings yet
AI Lab File - C
52 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
ML Supervised Regression
No ratings yet
ML Supervised Regression
70 pages
Population Growth Prediction
100% (2)
Population Growth Prediction
10 pages
Computers & Education: Daniel Darghan Felisoni, Alexandra Strommer Godoi Mark
No ratings yet
Computers & Education: Daniel Darghan Felisoni, Alexandra Strommer Godoi Mark
13 pages
Stat 362 Study Guide
No ratings yet
Stat 362 Study Guide
88 pages
The Practice of Statistics in the Life Sciences 3rd Edition Brigitte Baldi - The ebook is now available, just one click to start reading
100% (1)
The Practice of Statistics in the Life Sciences 3rd Edition Brigitte Baldi - The ebook is now available, just one click to start reading
47 pages

Further Statistics

Uploaded by

Further Statistics

Uploaded by

MASTER OF PUBLIC HEALTH

COURSEWORK ASSESSMENT SUBMISSION

Course Name Further Epidemiology and Statistics

Student id no. 2711522

Fig 1: The histogram for the continous variable

Based on the Fig 2 and 3:

Based on the the Fig 4:

Based on the table 2:

Fig 5: R output shows the Pearson correlation coefficient

Based on the result from the Pearson correlation coefficient in Fig 5:

Based on the figure 7a:

Based on the figure 7b:

H0: The initial assumption is that there is no relation.

Fig 8: R output shows the multiple linear regression

Based on the Fig 9:

Based on the final model,

Fig 11: Checking assumption about the residuals

Based on the Fig 11:

Fig 11: Histogram residuals

Based on the Fig 11:

Based on the Fig 12:

Based on the Fig 14:

Based on the Fig 16:

Based on the Fig 17:

Based on the Fig 18:

Based on the Fig 19:

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

Based on Fig 19a and 19b:

Based on the Fig 19c:

Based on the Fig 19d:

Based on the Fig 19e:

Based on the Fig 19f:

Based on the Fig 20:

Based on the Fig 21:

Based on the Fig 22:

Based on the Fig 23:

2. Logistic Regression Model for Low Birth Weight

Fig 25: R output shows the results of Hosmer-Lemeshow Statistic (H-L)

Based on the Fig 25:

Based on the Fig 27a:

2. Test for interaction between hypertension and age

Based on the Fig 27b:

You might also like