0% found this document useful (0 votes)
27 views

T3. Logistic Regressions

Uploaded by

Clara Carner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

T3. Logistic Regressions

Uploaded by

Clara Carner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Biostatistics Session 2022–2023,(Semester 2)

Tutorial 4
1. (ref Kleinbaum) Suppose you are interested in describing whether social economic
status, as measured by a binary variable SOC (0 low, 1 high), is assocatied with
cardiovascular disease, as defined by a binary variable called CVD (0 no, 1 yes).
Suppose further that you have carried out a 12-year follow-up study of 200 men who
are 60 years old or older. In assessing the relationship between SOC and CVD, you
decide that you want to control for smoking (SMK, 0 no,1 yes) and blood pressure
(SBP continuous). Logistic regression models without the interaction terms with
SOC and with these interaction terms are fitted. The results are as follows
Model 1 Model 2
variable coefficient coefficeint
constant -1.18 -1.19
SOC -0.52 -0.50
SBP 0.04 0.01
SMK -0.56 -0.42
SOC × SBP -0.033 -
SOC × SMK 0.175 -
(a) Give the study design.
(b) State for each of the two models the form of the estimated model in logit terms.
(c) Using model 1, compute the estimated risk for CVD for SOC=1, SMK=1 and
SBP=150.
(d) Using model 2, compute the estimated risk for CVD for person 1 with SOC=1,
SMK=1 and SBP=150 and for person 2 with SOC=0, SMK=1 and SBP=150
(e) Compare the estimated risk in question c with the estimated risk for person 1
in question d. Why are they different?
(e) Using the results from question d, compute the risk ratio that compares person
1 with person 2. Interpret your answer.
(f) Formulate the null hypothesis in terms of parameters to test model 1 versus
model 2.

2. Consider the following genotype counts for a polymorphism in the FGG gene in
thrombosis cases and controls (obtained from Uitte de Willige et al, 2005):

Genotype Total
outcome GG GA AA
Cases 213 201 57 471
Controls 245 198 28 471

(a) Check whether the genotype counts in the controls are in Hardy Weinberg equi-
librium, either by pen and paper or by R.
(b) Compute the odds ratios of disease of GA vs GG and of AA vs GG using R.

1
(c) Which genetic model seems to fit best to these data (see lecture for the models)?
Why? Give also the odds ratio(s) under this model.
(d) Write down the corresponding logistic regression model.
(e) Fit the model of d using R and write down your conclusion.

3. You may find the dataset l05dat.txt on virtuale. The data are based on a group
of men who had been diagnosed as having a heart attack restricted to those who
were cigarette smokers. After the heart attack, some men continued to smoke and
others quit smoking. The cohort was followed for five years and the risk of mortality
assessed.
(a) Load the data in R and fit a logistic regression model to this data. Interpret
the results.
(b) Compute from this model the probability to die within 5 years for men who
continued smoking.
(c) Compute the relative risk for smoking versus non smoking.
(d) Are the estimates in b and c unbiased?

4. You may find the dataset data1.txt on virtuale. The data are from a hypothetical
cohort study of 100 women between 40 and 60 years of age. The outcome of interest
is a disease. The exposure of interest is the binary variable obese yes/no. Also the
variable age is included which is binary 1 for above 50 years of age and 0 otherwise.
(a) Fit a logistic regression model with obese as exposure and outcome as dependent
variable. Formulate your conclusion.
(b) Fit a logistic regression model with obese as exposure and outcome as dependent
variable and adjust for age. Formulate your conclusion.
(c) Fit a logistic regression model with exposure as dependent and age as indepen-
dent variable. Formulate your conclusion.
(d) Now explain the results of quesions a, b, and c.

5. An investigator aims to study the effect of condom use on HIV. Data are available of
a group of 500 men in a high risk group. In addition to HIV (yes/no) and condom use
(yes/no), the researchers have information on number of partners, drug use, and on
aids (yes/no). Discuss whether the investigator should include each of these variable
in the model.

6. Evans data. This data was also presented during the lecture. The data is available
in the lbreg package.
(a) Fit the model including the interactions given in the slides.
(b) Remove the interactions one by one until you have only the two significant
interaction and the main effects in the model. Each time remove the least
significant. Give each time the Wald statistic and the likelihood ratio statistic to
test the null hypothesis of no significant interaction. Also give the corresponding
p-values.

2
7. Find the dust.txt dataset at vituale. The dust data was surveyed among the em-
ployees of a Munich factory. A data frame with 1246 observations on the following 4
variables.
• bronch chronical bronchial reaction, no = 0, yes = 1 (outcome)
• dust dust concentration (mg/cm3 ) at working place (exposure)
• smoke employee smoker?, no = 0, yes = 1
• years years of dust exposition
(a) Of interest is the effect of dust on bronch. Write down a model strategy and fit
the models. Explain your reasoning.
(b) A person has the following values for the variables: Dust 6 mg/cm3 , smoke yes,
25 years of exposure. For the final model give for this person the odds ratio
versus a person with values Dust 0 mg/cm3 , smoke no, 0 years of exposure.
(c) Give the 95% confidence interval for the odds ratio in b.

8. A dataset contains information on N=2073 subjects (403 cases and 1670 controls).
The genotypes of 10.000 SNPs at one chromosome are available for all the subjects.
The genotypes are coded as 0, 1 and 2, corresponding to the number of alleles carried
by the subject. For example, 0 corresponds to the genotype aa, 1 corresponds to the
genotype aA or Aa and 2 corresponds to the genotype AA.
(a) Explain why the researchers first test for HWE in the controls. What should be
a cut off point for declaring significant? Motivate your answer.
(b) Then the researchers fit 10.000 logistic regression models where the genotype
is modelled multiplicatively at odds ratio scale. Formulate the model in logit
terms.
(c) The lowest p-value was 4.4 ×10−6 for SNP300. What is your conclusion?

You might also like