PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
com
https://textbookfull.com/product/solutions-manual-
to-advanced-regression-models-with-sas-and-r-1st-
edition-olga-korosteleva/
https://textbookfull.com/product/student-solutions-manual-to-
accompany-loss-models-from-data-to-decisions-klugman/
textbookfull.com
https://textbookfull.com/product/clinical-trial-data-analysis-with-r-
and-sas-second-edition-chen/
textbookfull.com
https://textbookfull.com/product/linear-regression-models-1st-edition-
john-p-hoffmann/
textbookfull.com
https://textbookfull.com/product/an-impossible-dream-racial-
integration-in-the-united-states-1st-edition-sharon-a-stanley/
textbookfull.com
Cities as International Actors Urban and Regional
Governance Beyond the Nation State 1st Edition Tassilo
Herrschel
https://textbookfull.com/product/cities-as-international-actors-urban-
and-regional-governance-beyond-the-nation-state-1st-edition-tassilo-
herrschel/
textbookfull.com
https://textbookfull.com/product/the-psychology-of-religion-and-place-
emerging-perspectives-victor-counted/
textbookfull.com
https://textbookfull.com/product/wolf-shunned-the-alpha-queen-
legacy-1-1st-edition-laurel-night-night/
textbookfull.com
SOLUTIONS MANUAL
FOR
Korosteleva, O. (2018). Advanced Regression Models with SAS and R, CRC Press
By
OLGA KOROSTELEVA
Department of Mathematics and Statistics
California State University, Long Beach
1
TABLE OF CONTENTS
CHAPTER 1 ……………………………………………………………………………………. 3
CHAPTER 2 ……………………………………………………………………………………. 24
CHAPTER 3 ……………………………………………………………………………………. 58
CHAPTER 4 ……………………………………………………………………………………. 92
CHAPTER 5 ……………………………………………………………………………………. 131
CHAPTER 6 ……………………………………………………………………………………. 163
CHAPTER 7 ……………………………………………………………………………………. 187
CHAPTER 8 ……………………………………………………………………………………. 218
CHAPTER 9 ……………………………………………………………………………………. 284
CHAPTER 10 …………………………………………………………………………………. 315
2
CHAPTER 1
EXERCISE 1.1. Show that the normal distribution belongs to the exponential family of
distributions.
( )
𝑓(𝑦, 𝜇, 𝜎 ) = √ exp − = exp − ln(2𝜋𝜎 ) − (𝑦 − 2𝑦𝜇 + 𝜇 ) . Let 𝜃 = 𝜇
( )
= exp − ln(2𝜋𝜙) − = exp + ℎ(𝑦, 𝜙) where 𝑐(𝜃) = , and
1 𝑦
ℎ(𝑦, 𝜙) = − ln(2𝜋𝜙) − .
2 2𝜙
EXERCISE 1.2. (a) Verify normality of the response variable, then fit the linear regression model
to the data. State the fitted model. Give estimates for all parameters.
In SAS:
data weightloss;
input drug$ age gender$ EWL @@;
cards;
A 49 F 14.2 A 54 M 25.4 A 37 F 14.1 A 43 F 20.0 A 57 M 11.7 A 48 M 16.6
A 34 F 15.9 A 51 F 17.4 A 54 F 22.8 A 45 F 16.7 A 36 M 12.7 A 57 M 15.0
A 44 M 8.4 A 56 M 11.2 A 44 M 17.3 A 47 M 20.5 A 44 F 6.7 B 52 F 29.4
B 51 M 21.9 B 44 F 23.6 B 53 F 23.8 B 55 M 7.4 B 30 F 23.1 B 47 M 16.8
B 26 M 14.1 B 56 F 24.6 B 28 F 17.8 B 34 M 27.8 B 43 M 10.6 B 55 M 26.8
B 52 F 15.7 B 54 F 23.7
;
3
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.10216310 Pr > D >0.150
Cramer-von Mises W-Sq 0.05103595 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.28788730 Pr > A-Sq >0.250
Based on the large p-values of the normality tests and the histogram, we can conclude that the
response variable follows a normal distribution.
The fitted model is 𝐸 (𝐸𝑊𝐿) = 9.2146 + 4.8103 ∙ 𝑑𝑟𝑢𝑔𝐵 + 0.1102 ∙ 𝑎𝑔𝑒 + 2.7235 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 ,
and 𝜎 = 5.2451.
In R:
weightloss.data<- read.csv(file="C:/./Exercise1.2Data.csv", header = TRUE, sep =
",")
4
shapiro.test(weightloss.data$EWL)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.2146 5.6981 1.617 0.1171
drug.relB 4.8103 1.9988 2.407 0.0229
age 0.1102 0.1140 0.967 0.3420
gender.relF 2.7235 1.9952 1.365 0.1831
5.607257
(b) Which regression coefficients turn out to be significant at the 5%? Discuss goodness of fit of the
model.
Drug B is the only significant predictor in the model at the 5% significance level since the
corresponding p-value is the only one under 0.05.
In SAS:
data deviance_test;
deviance = -2*(-102.6326 - (-98.4395));
pvalue = 1 - probchi(deviance,3);
run;
deviance pvalue
8.3862 0.038669
The p-value for the deviance test is less than 0.05, indicating a good fit of the model. The R code and
output are:
#checking model fit
null.model<- glm(EWL ~ 1, data=weightloss.data, family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))
5
8.386158
0.03867005
(c) Is one of the drugs more efficient for weight loss than the other? Interpret all estimated significant
coefficients.
The estimated average EWL for subjects taking drug B is 4.8103 percent higher than that for
subjects taking drug A, keeping all the other predictors fixed. It means that drug B is more efficient
than drug A.
(d) According to the model, what is the predicted percent decrease in excess body weight for a 35-
year old male who is taking drug A?
The predicted percent decrease in excess body weight for a 35-year old male who is taking drug A is
computed by hand as: 𝐸𝑊𝐿 = 9.2146 + 0.1102 ∙ 35 = 13.0716.
In SAS:
data weightloss;
set weightloss predict;
run;
proc genmod;
class drug gender;
model EWL = drug age gender / dist=normal link=identity;
output out=outdata p=pEWL;
run;
pEWL
13.0718
In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(drug.rel="A", age=35, gender.rel="M")))
13.7178
6
EXERCISE 1.3. (a) Reduce the car price by the factor of 1000. Check that the distribution of the
price is normal. Fit a general linear regression model to predict the price of a car. Write down the
fitted model, specifying all estimated parameters.
In SAS:
data carsales;
input bodystyle$ 1-9 country$ hwy doors leather$ price @@;
priceK=price/1000;
cards;
coupe USA 26 4 no 17445 coupe USA 40 4 no 23500
coupe USA 35 2 no 19600 coupe Germany 37 4 no 23400
coupe Germany 25 4 no 24100 coupe Germany 24 2 no 12400
coupe Japan 26 2 no 13300 coupe Japan 27 4 no 15550
coupe Japan 20 4 yes 29345 hatchback USA 30 2 no 12540
hatchback USA 39 4 no 17595 hatchback USA 38 2 no 17300
hatchback Germany 38 4 no 17800 hatchback Germany 32 4 no 22500
hatchback Germany 34 4 no 20300 hatchback Japan 38 4 yes 27300
hatchback Japan 38 2 yes 23300 hatchback Japan 38 2 yes 29300
sedan USA 29 4 no 32000 sedan USA 25 2 yes 34200
sedan USA 33 4 yes 33395 sedan Germany 40 4 no 22850
sedan Germany 23 2 yes 36000 sedan Germany 25 4 no 19900
sedan Japan 40 4 yes 36700 sedan Japan 35 4 yes 31600
sedan Japan 37 4 no 24600
run;
P-values for the normality tests are all in excess of 0.05, indicating that normality holds. The
histogram also displays a distribution close to bell-shaped.
7
/*fitting general linear model*/
proc genmod;
class bodystyle(ref="hatchback") country(ref="Japan") leather(ref="no");
model priceK=bodystyle country hwy doors leather/dist=normal link=identity;
run;
The fitted model is 𝐸 (𝑝𝑟𝑖𝑐𝑒𝐾) = 5.1353 + 2.2698 ∙ 𝑐𝑜𝑢𝑝𝑒 + 6.4107 ∙ 𝑠𝑒𝑑𝑎𝑛 + 3.1959 ∙
𝐺𝑒𝑟𝑚𝑎𝑛𝑦 + 3.2128 ∙ 𝑈𝑆𝐴 + 0.1305 ∙ ℎ𝑤𝑦 + 1.5554 ∙ 𝑑𝑜𝑜𝑟𝑠 + 12.1757 ∙ 𝑙𝑒𝑎𝑡ℎ𝑒𝑟, and
𝜎 = 2.9219.
In R:
carsales.data<- read.csv(file="C:/./Exercise1.3Data.csv",header=TRUE, sep=",")
#rescaling price
priceK<- carsales.data$price/1000
shapiro.test(priceK)
Shapiro-Wilk normality test
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1353 5.5909 0.919 0.36986
bodystyle.relcoupe 2.2698 2.0070 1.131 0.27216
bodystyle.relsedan 6.4107 1.8450 3.475 0.00254
country.relGermany 3.1959 2.0098 1.590 0.12829
country.relUSA 3.2128 1.8812 1.708 0.10394
hwy 0.1305 0.1332 0.980 0.33937
doors 1.5554 0.7904 1.968 0.06384
leather.relyes 12.1757 1.9332 6.298 4.79e-06
3.483088
(b) How good is the model fit? Discuss significance of the regression coefficients.
The p-value in the deviance test is way below 0.05, indicating a good model fit. Significant variables
are sedan body style and leather interior.
In SAS:
data deviance_test;
deviance = -2*(-91.1942 - (-67.2613));
pvalue = 1 - probchi(deviance,7);
run;
deviance pvalue
47.8658 3.7823E-8
In R:
9
3.78218e-08
(c) Interpret the estimates of those regression coefficients that differ significantly from zero.
As estimated, sedan costs on average $6,410.70 more than a hatchback, under all other equal conditions.
The estimated average price of a car with leather interior is $12,175.70 larger compared to a car without
leather interior.
(d) What is the predicted price of a sedan made in USA that has 4 doors, leather seats, and runs 30
mpg on highway?
The predicted price of a sedan that is made in USA, has 4 doors, leather seats, and runs 30 mpg on
highway is calculated as: 𝑝𝑟𝑖𝑐𝑒 = $1,000(5.1353 + 6.4107 + 3.2128 + 0.1305 ∙ 30 + 1.5554 ∙
4 + 12.1757) = $37,071.10.
In SAS:
data carsales;
set carsales predict;
run;
proc genmod;
class bodystyle country leather;
model priceK = bodystyle country hwy doors leather / dist=normal link=identity;
output out=outdata p=ppriceK;
run;
data final_prediction;
set outdata;
pprice=ppriceK*1000;
run;
pprice
37071.14
In R:
37071.14
10
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
EXERCISE 1.4. (a) Show normality of the distribution of the number of hours of sleep per night.
Regress the number of hours of sleep on all the given factors. Write explicitly what the fitted model
is.
In SAS:
data sleep;
input age gender$ quiettime nchildren stresslevel jobstatus$ nactivities pastvac
sleephours @@;
cards;
62 F 60 1 5 unempl 1 15 7.7 28 F 15 1 6 unempl 5 11 5.3
50 M 15 0 5 unempl 1 19 6.4 36 M 60 1 6 full 1 21 7.7
56 F 50 0 3 part 4 5 7.6 48 M 180 0 5 full 0 6 6.4
55 M 40 0 8 full 8 23 7.0 26 F 80 0 7 student 9 8 8.3
44 M 180 1 3 part 6 20 9.6 49 F 5 0 7 unempl 5 15 5.5
29 M 60 2 5 student 5 7 7.7 56 M 10 1 4 unempl 4 17 5.7
46 F 40 1 7 part 3 3 7.4 41 F 5 2 6 full 9 10 6.2
22 M 15 0 8 full 4 3 6.3 36 F 45 2 5 part 8 14 7.5
54 F 120 1 8 part 7 10 8.5 42 F 60 3 1 full 9 11 6.3
58 F 5 1 7 full 1 17 5.3 33 M 100 2 1 full 9 5 8.3
50 F 2 2 6 full 3 12 5.1 59 M 30 2 5 full 2 6 6.9
32 M 30 1 8 full 5 9 6.9 50 M 60 2 8 part 8 13 8.0
56 F 10 0 3 unempl 7 7 6.1 42 F 240 0 1 part 8 21 8.8
58 F 10 2 7 full 9 4 6.2 57 F 15 1 6 full 2 16 6.3
30 F 30 0 2 full 8 9 8.3 54 M 20 2 8 full 6 7 6.5
57 M 45 2 4 full 7 18 7.5 45 F 120 0 9 part 2 13 6.6
33 F 40 1 6 unempl 9 24 7.0 56 F 120 0 5 part 2 20 8.7
59 F 60 2 9 part 4 19 8.1 41 M 60 2 3 student 2 3 7.5
62 M 40 0 1 unempl 0 2 8.6 29 M 15 1 7 unempl 3 20 6.3
34 F 30 0 7 unempl 9 0 6.6 32 F 20 3 7 unempl 2 8 7.8
46 F 20 2 3 unempl 9 18 7.9 45 M 60 0 2 unempl 0 22 9.0
23 M 45 0 6 part 4 12 7.6 38 M 60 4 5 full 3 5 7.8
45 M 30 0 5 unempl 9 7 6.8 63 F 40 0 6 unempl 5 5 7.3
27 F 120 0 4 student 1 16 7.3 30 F 45 0 7 part 8 10 7.7
34 F 5 3 6 full 0 4 6.0 62 M 10 0 10 part 8 11 6.0
;
11
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.08733974 Pr > D >0.150
Cramer-von Mises W-Sq 0.06145088 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.32815950 Pr > A-Sq >0.250
The normality tests (p-values > 0.05) as well as the bell-shaped histogram indicate normality of the
response variable.
In R:
12
shapiro.test(sleep.data$sleephours)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.826002 0.798388 8.550 1.78e-10
age -0.003656 0.010494 -0.348 0.72943
gender.relM 0.356815 0.241401 1.478 0.14741
quiettime 0.007421 0.003238 2.292 0.02738
nchildren 0.120419 0.123020 0.979 0.33368
stresslevel -0.139828 0.060734 -2.302 0.02674
jobstatus.relpart 1.048386 0.360976 2.904 0.00603
jobstatus.relstudent 0.628623 0.493437 1.274 0.21021
jobstatus.relunempl 0.381840 0.323501 1.180 0.24501
nactivities 0.020373 0.039031 0.522 0.60465
pastvac 0.005046 0.019222 0.263 0.79430
0.8168443
(b) How good is the model fit? What beta coefficients are significantly different from zero at the 5%
level of significance?
The p-value in the deviance test is smaller than 0.05, which indicates a good fit of the model.
Significant variables at the 5% level are quiet time, stress level, and part-time employment status.
In SAS:
/*checking model fit*/
proc genmod;
model sleephours = / dist=normal link=identity;
run;
13
Log Likelihood -73.0195
data deviance_test;
deviance = -2*(-73.0195 - (-54.6201));
pvalue = 1 - probchi(deviance,10);
run;
deviance pvalue
36.7988 .000061312
In R:
36.79887
6.131066e-05
Below we calculate the predicted number of hours of night’s sleep that a 30-year old full-time mom
of three children under the age of five has, if she gets 10 minutes a day for herself, walks to the park
with her kids every day of the week, estimates her stress level as 7, and who hasn’t gotten any
vacation for one year.
data sleep;
14
set sleep predict;
run;
proc genmod;
class gender jobstatus;
model sleephours = age gender quiettime nchildren stresslevel jobstatus
nactivities pastvac / dist=normal link=identity;
output out=outdata p=psleephours;
run;
psleephours
6.37616
In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=30, gender.rel="F", quiettime=10,
nchildren=3, stresslevel=7, jobstatus.rel="full", nactivities=7, pastvac=12)))
6.376164
EXERCISE 1.5. (a) Compute the total time spent on both transitions. Verify normality of the
distribution of this variable, and fit a general linear regression model. Specify the fitted model.
In SAS:
data time;
input age gender$ run t1 bike t2 swim @@;
transitiontime=t1+t2;
cards;
55 M 24.17 2.60 37.95 2.50 5.70 59 F 34.88 2.83 52.15 3.05 5.20
24 M 32.97 2.55 59.20 3.47 5.37 53 F 22.2 1.83 46.70 2.15 5.50
51 M 27.35 1.75 42.05 2.32 3.75 38 F 32.13 2.38 50.92 2.95 6.00
66 M 25.39 1.95 41.57 2.80 3.93 30 F 24.67 1.58 48.28 2.77 5.68
43 F 42.33 2.78 63.60 4.08 7.18 47 F 28.73 2.35 45.57 3.90 6.62
26 F 29.62 2.92 51.23 3.85 4.92 45 M 22.23 2.07 38.95 2.35 4.28
29 F 26.93 2.10 44.33 2.45 7.47 34 M 17.75 0.75 33.27 1.23 3.65
39 M 37.47 2.52 55.67 4.47 8.60 54 M 36.63 3.27 43.92 3.08 7.15
26 M 34.42 2.73 52.62 2.67 9.23 36 M 27.38 2.22 39.03 2.92 7.43
42 M 21.37 2.12 35.95 1.93 3.95 49 M 29.03 4.50 38.53 3.95 8.80
42 F 28.53 3.27 49.85 3.67 8.13 42 F 25.12 1.72 39.52 2.50 4.55
42 F 26.33 1.70 48.98 2.30 5.02 41 F 36.75 3.95 62.85 3.13 6.93
15 M 25.12 1.70 44.75 3.20 7.48 48 M 26.52 4.43 40.98 3.82 6.58
37 M 28.3 2.85 41.78 3.47 6.02 55 M 31.25 2.70 43.43 3.25 5.25
42 M 24.38 1.45 37.13 1.83 3.70 25 M 33.45 2.25 51.38 4.03 7.45
12 F 27.62 2.23 55.47 2.97 4.37 23 F 28.55 2.17 54.57 2.55 7.90
49 M 33.88 2.77 54.82 3.87 6.90 53 F 26.97 1.77 42.33 3.40 6.58
45 F 26.58 1.65 44.30 2.52 5.40 33 F 32.32 2.10 54.87 2.32 6.25
63 M 40.53 3.78 69.75 3.83 12.17 50 M 33.68 3.07 43.57 3.13 5.77
43 F 34.93 2.58 62.35 2.95 7.92 24 M 22.88 1.82 39.55 2.12 4.03
44 M 29.25 2.47 45.60 2.75 9.18 51 F 36.98 3.70 46.58 5.18 7.60
;
15
/*running normality check*/
proc univariate;
var transitiontime;
histogram/normal;
run;
The p-values in the normality tests are above 0.05, which means that the response variable has a
normal distribution. The histogram displays a bell-shaped curve, supporting the normality conclusion.
The fitted model is 𝐸 (𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑡𝑖𝑚𝑒) = 0.5293 + 0.0067 ∙ 𝑎𝑔𝑒 + 0.0961 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 + 0.1964 ∙
𝑟𝑢𝑛 − 0.0565 ∙ 𝑏𝑖𝑘𝑒 + 0.2475 ∙ 𝑠𝑤𝑖𝑚, and 𝜎 = 0.9271.
In R:
16
time.data<- read.csv(file="C:/./Exercise1.5Data.csv", header=TRUE, sep=",")
shapiro.test(transition.time)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.529266 1.107464 0.478 0.635605
age 0.006659 0.013837 0.481 0.633232
gender.relF 0.096094 0.351716 0.273 0.786250
run 0.196405 0.053953 3.640 0.000849
bike -0.056487 0.035412 -1.595 0.119427
swim 0.247544 0.110615 2.238 0.031507
1.001351
(b) Discuss the model fit. Are all the predictors in that model significant at the 5% significance level?
In SAS:
/*checking model fit*/
proc genmod;
model transitiontime = / dist=normal link=identity;
run;
17
Log Likelihood -74.6263
data deviance_test;
deviance = -2*(-74.6263 - (-56.4150));
pvalue = 1 - probchi(deviance,5);
run;
deviance pvalue
36.4226 .000000782
Since the p-value in the deviance test is tiny, the model has a good fit. The only significant predictors
at the 5% level are run time and swim time.
In R:
36.42269
7.817128e-07
(c) Interpret only the estimated significant regression coefficients of this model.
The estimated average transition time increases by 0.1964 for a one-minute increase in run time.
For a one-minute increase in swim time, the estimated average transition time increases by 0.2475.
(d) What is the predicted total time at transitions for the student, if his best result in 5-kilometer run is
27:32, 13-mile bike is 56:17, and 200-meter swim is 8:46?
Below we compute the predicted time at transitions for the 25-year-old student with a 27:32 run,
56:17 bike, and 8:46 swim. First we convert the times into minutes: 27+32/60=27.53,
56+17/60=56.28, and 8+46/60=8.77. The calculation is as follows: 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑡𝑖𝑚𝑒 = 0.5293 +
0.0067 ∙ 25 + 0.1964 ∙ 27.53 − 0.0565 ∙ 56.28 + 0.2475 ∙ 8.77 = 5.09.
In SAS:
proc genmod;
18
class gender;
model transitiontime = age gender run bike swim / dist=normal link=identity;
output out=outdata p=ptransitiontime;
run;
ptransitiontime
5.09465
In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=25, gender.rel="M", run=27.53,
bike=56.28, swim=8.77)))
5.094653
EXERCISE 1.6. (a) Check that the measurements for the heart rate are coming from a normal
distribution. Fit the regression model and specify all estimated parameters.
In SAS:
data heartrate;
length AQI $9.;
input age gender$ ethnicity$ BMI nmeds AQI$ HR @@;
cards;
48 F Black 29.9 0 good 76 56 F White 22.9 3 unhealthy 112
67 F White 23.4 1 good 94 82 M Black 29.7 0 good 92
64 F White 31.4 3 good 97 58 M White 18.9 2 moderate 79
72 F Black 25.2 0 moderate 114 70 F Black 25.9 1 moderate 115
54 M Hispanic 29.6 0 moderate 80 57 F Hispanic 20.2 2 good 81
50 F Black 23.9 1 unhealthy 97 59 F Hispanic 22.6 0 good 86
61 M Hispanic 32.8 1 good 84 69 M Hispanic 24.1 2 unhealthy 94
65 F Black 23.4 2 moderate 114 66 F Hispanic 27.8 3 good 82
74 M White 32.4 1 moderate 97 66 M Hispanic 22.9 2 good 86
53 M Hispanic 25.2 0 good 84 55 M Hispanic 24.6 0 moderate 94
73 F Hispanic 24.8 3 moderate 105 45 F Hispanic 19.0 2 unhealthy 83
71 F White 20.3 2 unhealthy 111 63 M Black 23.8 2 unhealthy 108
71 F White 21.5 2 moderate 100 62 M Hispanic 27.4 3 good 79
44 F Hispanic 17.2 0 unhealthy 86 49 M White 17.1 1 good 75
63 M Black 28.0 2 good 91 65 F Hispanic 22.2 1 moderate 106
;
19
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.15627802 Pr > D 0.061
Cramer-von Mises W-Sq 0.09496306 Pr > W-Sq 0.129
Anderson-Darling A-Sq 0.65250988 Pr > A-Sq 0.084
Based on the histograms and the large p-values, we can conclude that the heart rate follows a normal
distribution.
/*fitting generallinear model*/
proc genmod;
class gender ethnicity(ref="Hispanic") AQI(ref="good");
model HR = age gender ethnicity BMI nmeds AQI / dist=normal link=identity;
run;
The fitted model is 𝐸 (𝐻𝑅) = 38.0164 + 0.6503 ∙ 𝑎𝑔𝑒 + 7.1031 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 + 7.5351 ∙ 𝐵𝑙𝑎𝑐𝑘 + 2.
2633 ∙ 𝑊ℎ𝑖𝑡𝑒 + 0.0431 ∙ 𝐵𝑀𝐼 + 0.4384 ∙ 𝑛𝑚𝑒𝑑𝑠 + 10.8596 ∙ 𝐴𝑄𝐼𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒 + 14.1674 ∙
𝐴𝑄𝐼𝑢𝑛ℎ𝑒𝑎𝑙𝑡ℎ𝑦, and 𝜎 = 5.9914.
In R:
20
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
hr.data<- read.csv(file="C:/./Exercise1.6Data.csv", header=TRUE, sep=",")
shapiro.test(hr.data$HR)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.01638 12.24005 3.106 0.00535
age 0.65033 0.17599 3.695 0.00134
gender.relF 7.10311 2.82173 2.517 0.02002
ethnicity.relBlack 7.53509 3.46094 2.177 0.04102
ethnicity.relWhite 2.26328 3.33411 0.679 0.50466
BMI 0.04306 0.38543 0.112 0.91210
nmeds 0.43836 1.42454 0.308 0.76133
AQI.relmoderate 10.85963 3.22023 3.372 0.00288
AQI.relunhealthy 14.16737 3.81333 3.715 0.00128
7.161087
(b) Discuss the goodness-of-fit of the model. What variables are significant predictors of heart rate at
the 5% level of significance?
In SAS:
/*checking model fit*/
proc genmod;
model HR = / dist=normal link=identity;
21
run;
data deviance_test;
deviance = -2*(-117.8512 - (-96.2779));
pvalue = 1 - probchi(deviance,8);
run;
deviance pvalue
43.1466 .000000824
Since the p-value in the deviance test is tiny, the model has a good fit. The significant predictors at
the 5% level are age, gender, ethnicity level Black, and both levels of AQI.
In R:
43.14658
8.243212e-07
22
data heartrate;
set heartrate predict;
run;
proc genmod;
class gender ethnicity AQI;
model HR = age gender ethnicity BMI nmeds AQI / dist=normal link=identity;
output out=outdata p=pHR;
run;
pHR
82.2536
In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=50, gender.rel="M", ethnicity.rel="Hi
spanic", BMI=20, nmeds=0, AQI.rel="moderate")))
82.25361
23
CHAPTER 2
EXERCISE 2.1. (a) Is the decrease in BMI percentile (preBMI-postBMI) normally distributed?
Plot a histogram and test for normality of the distribution.
In SAS:
data obesity;
input gender$ age group$ preBMI postBMI @@;
BMIdiff=preBMI-postBMI;
female=(gender="F");
control=(group="Cx");
cards;
F 6 Cx 85.7 83.8 F 6 Cx 93.8 92.9 F 7 Cx 93.5 92.5 F 8 Cx 90.1 89.8
F 9 Tx 92.3 90.7 F 9 Tx 90.3 88.3 F 12 Cx 87.6 85.9 F 12 Cx 87.2 84.1
F 12 Tx 96.9 94.9 F 12 Tx 85.8 81.2 F 13 Cx 96.7 94.1 F 13 Cx 93.5 92.9
F 13 Tx 92.3 87.5 F 13 Tx 85.3 83.7 F 14 Tx 95.5 78.7 F 15 Cx 91.3 89.9
F 15 Tx 95.8 87.1 F 16 Tx 90.7 87.2 M 6 Cx 92.6 88.1 M 7 Cx 95.8 94.7
M 7 Cx 90.4 89.1 M 7 Cx 91.2 88.6 M 8 Tx 94.4 87.8 M 8 Tx 93.2 87.3
M 10 Cx 93.9 91.5 M 10 Tx 96.2 91.1 M 10 Tx 89.4 87.9 M 11 Tx 86.2 77.1
M 11 Tx 95.4 84.8 M 12 Cx 97.7 95.8 M 13 Tx 85.3 80.0 M 13 Tx 86.2 82.4
M 14 Cx 85.5 83.6 M 14 Cx 97.8 93.8 M 16 Cx 95.0 93.6 M 16 Tx 93.1 86.8
;
Neither the histogram nor the normality tests support normality of the response. In fact, the
distribution is right-skewed.
24
In R:
shapiro.test(BMIdiff)
(b) Find the optimal lambda for Box-Cox transformation. Transform the change in BMI percentile
(find the appropriate transformation in Table 2.1), and show that the transformed variable is
normally distributed. Plot the histogram and do a formal testing.
In SAS:
/*finding optimal lambda for Box-Cox transformation*/
proc transreg;
model BoxCox(BMIdiff) = identity(age female control);
run;
25
Random documents with unrelated
content Scribd suggests to you:
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.