100% found this document useful (3 votes)
75 views33 pages

PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download

SAS

Uploaded by

baheymanno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
75 views33 pages

PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download

SAS

Uploaded by

baheymanno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Download the full version of the textbook now at textbookfull.

com

Solutions Manual to Advanced Regression


Models with SAS and R 1st Edition Olga
Korosteleva

https://textbookfull.com/product/solutions-manual-
to-advanced-regression-models-with-sas-and-r-1st-
edition-olga-korosteleva/

Explore and download more textbook at https://textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Student Solutions Manual to Accompany Loss Models From


Data to Decisions Klugman

https://textbookfull.com/product/student-solutions-manual-to-
accompany-loss-models-from-data-to-decisions-klugman/

textbookfull.com

Clinical trial data analysis with R and SAS Second Edition


Chen

https://textbookfull.com/product/clinical-trial-data-analysis-with-r-
and-sas-second-edition-chen/

textbookfull.com

Linear Regression Models 1st Edition John P. Hoffmann

https://textbookfull.com/product/linear-regression-models-1st-edition-
john-p-hoffmann/

textbookfull.com

An impossible dream racial integration in the United


States 1st Edition Sharon A. Stanley

https://textbookfull.com/product/an-impossible-dream-racial-
integration-in-the-united-states-1st-edition-sharon-a-stanley/

textbookfull.com
Cities as International Actors Urban and Regional
Governance Beyond the Nation State 1st Edition Tassilo
Herrschel
https://textbookfull.com/product/cities-as-international-actors-urban-
and-regional-governance-beyond-the-nation-state-1st-edition-tassilo-
herrschel/
textbookfull.com

How to Prepare for Verbal Ability and Reading


Comprehension for the CAT Arun Sharma & Meenakshi Upadhyay
[Sharma
https://textbookfull.com/product/how-to-prepare-for-verbal-ability-
and-reading-comprehension-for-the-cat-arun-sharma-meenakshi-upadhyay-
sharma/
textbookfull.com

The Psychology of Religion and Place: Emerging


Perspectives Victor Counted

https://textbookfull.com/product/the-psychology-of-religion-and-place-
emerging-perspectives-victor-counted/

textbookfull.com

The Digital Project Management Evolution Essential Case


Studies from Organisations in the Middle East 1st Edition
Shafiz Affendi Mohd Yusof
https://textbookfull.com/product/the-digital-project-management-
evolution-essential-case-studies-from-organisations-in-the-middle-
east-1st-edition-shafiz-affendi-mohd-yusof/
textbookfull.com

Distributed Languaging Affective Dynamics and the Human


Ecology Volume II Co articulating Self and World Routledge
Advances in Communication and Linguistic Theory 1st
Edition Paul J. Thibault
https://textbookfull.com/product/distributed-languaging-affective-
dynamics-and-the-human-ecology-volume-ii-co-articulating-self-and-
world-routledge-advances-in-communication-and-linguistic-theory-1st-
edition-paul-j-thibault/
textbookfull.com
Wolf Shunned (The Alpha Queen Legacy #1) 1st Edition
Laurel Night [Night

https://textbookfull.com/product/wolf-shunned-the-alpha-queen-
legacy-1-1st-edition-laurel-night-night/

textbookfull.com
SOLUTIONS MANUAL
FOR
Korosteleva, O. (2018). Advanced Regression Models with SAS and R, CRC Press
By
OLGA KOROSTELEVA
Department of Mathematics and Statistics
California State University, Long Beach

1
TABLE OF CONTENTS
CHAPTER 1 ……………………………………………………………………………………. 3
CHAPTER 2 ……………………………………………………………………………………. 24
CHAPTER 3 ……………………………………………………………………………………. 58
CHAPTER 4 ……………………………………………………………………………………. 92
CHAPTER 5 ……………………………………………………………………………………. 131
CHAPTER 6 ……………………………………………………………………………………. 163
CHAPTER 7 ……………………………………………………………………………………. 187
CHAPTER 8 ……………………………………………………………………………………. 218
CHAPTER 9 ……………………………………………………………………………………. 284
CHAPTER 10 …………………………………………………………………………………. 315

2
CHAPTER 1

EXERCISE 1.1. Show that the normal distribution belongs to the exponential family of

distributions.

( )
𝑓(𝑦, 𝜇, 𝜎 ) = √ exp − = exp − ln(2𝜋𝜎 ) − (𝑦 − 2𝑦𝜇 + 𝜇 ) . Let 𝜃 = 𝜇

and 𝜙 = 𝜎 . Then, we can write 𝑓(𝑦, 𝜃, 𝜙) = exp − ln(2𝜋𝜙 ) − (𝑦 − 2𝑦𝜃 + 𝜃 )

( )
= exp − ln(2𝜋𝜙) − = exp + ℎ(𝑦, 𝜙) where 𝑐(𝜃) = , and

1 𝑦
ℎ(𝑦, 𝜙) = − ln(2𝜋𝜙) − .
2 2𝜙

EXERCISE 1.2. (a) Verify normality of the response variable, then fit the linear regression model
to the data. State the fitted model. Give estimates for all parameters.
In SAS:

data weightloss;
input drug$ age gender$ EWL @@;
cards;
A 49 F 14.2 A 54 M 25.4 A 37 F 14.1 A 43 F 20.0 A 57 M 11.7 A 48 M 16.6
A 34 F 15.9 A 51 F 17.4 A 54 F 22.8 A 45 F 16.7 A 36 M 12.7 A 57 M 15.0
A 44 M 8.4 A 56 M 11.2 A 44 M 17.3 A 47 M 20.5 A 44 F 6.7 B 52 F 29.4
B 51 M 21.9 B 44 F 23.6 B 53 F 23.8 B 55 M 7.4 B 30 F 23.1 B 47 M 16.8
B 26 M 14.1 B 56 F 24.6 B 28 F 17.8 B 34 M 27.8 B 43 M 10.6 B 55 M 26.8
B 52 F 15.7 B 54 F 23.7
;

/*running normality check*/


proc univariate;
var EWL;
histogram/normal;
run;

3
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.10216310 Pr > D >0.150
Cramer-von Mises W-Sq 0.05103595 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.28788730 Pr > A-Sq >0.250

Based on the large p-values of the normality tests and the histogram, we can conclude that the
response variable follows a normal distribution.

/*fitting general linear model*/


proc genmod;
class drug(ref="A") gender;
model EWL = drug age gender / dist=normal link=identity;
run;

Log Likelihood -98.4395

Analysis Of Maximum Likelihood Parameter Estimates


Parameter DF Estimate Standard Wald 95% Confidence Wald Chi- Pr > ChiSq
Error Limits Square
Intercept 1 9.2146 5.3301 -1.2322 19.6614 2.99 0.0838
drug B 1 4.8103 1.8697 1.1456 8.4749 6.62 0.0101
drug A 0 0.0000 0.0000 0.0000 0.0000 . .
age 1 0.1102 0.1067 -0.0988 0.3192 1.07 0.3015
gender F 1 2.7235 1.8664 -0.9346 6.3815 2.13 0.1445
gender M 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 1 5.2451 0.6556 4.1054 6.7012

The fitted model is 𝐸 (𝐸𝑊𝐿) = 9.2146 + 4.8103 ∙ 𝑑𝑟𝑢𝑔𝐵 + 0.1102 ∙ 𝑎𝑔𝑒 + 2.7235 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 ,
and 𝜎 = 5.2451.

In R:
weightloss.data<- read.csv(file="C:/./Exercise1.2Data.csv", header = TRUE, sep =
",")

#running normality check


library(rcompanion)
plotNormalHistogram(weightloss.data$EWL)

4
shapiro.test(weightloss.data$EWL)

Shapiro-Wilk normality test

W = 0.97424, p-value = 0.6234

#specifying reference levels


drug.rel<- relevel(weightloss.data$drug, ref="A")
gender.rel<- relevel(weightloss.data$gender, ref="M")

#fitting general linear model


summary(fitted.model<- glm(EWL ~ drug.rel + age + gender.rel, data =
weightloss.data, family=gaussian(link=identity)))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.2146 5.6981 1.617 0.1171
drug.relB 4.8103 1.9988 2.407 0.0229
age 0.1102 0.1140 0.967 0.3420
gender.relF 2.7235 1.9952 1.365 0.1831

#outputting estimated sigma


sigma(fitted.model)

5.607257

(b) Which regression coefficients turn out to be significant at the 5%? Discuss goodness of fit of the
model.

Drug B is the only significant predictor in the model at the 5% significance level since the
corresponding p-value is the only one under 0.05.

In SAS:

/*checking model fit*/


proc genmod;
model EWL = / dist=normal link=identity;
run;

Log Likelihood -102.6326

data deviance_test;
deviance = -2*(-102.6326 - (-98.4395));
pvalue = 1 - probchi(deviance,3);
run;

proc print noobs;


run;

deviance pvalue
8.3862 0.038669

The p-value for the deviance test is less than 0.05, indicating a good fit of the model. The R code and
output are:
#checking model fit
null.model<- glm(EWL ~ 1, data=weightloss.data, family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))
5
8.386158

print(p.value<- pchisq(deviance, df=3, lower.tail=FALSE))

0.03867005

(c) Is one of the drugs more efficient for weight loss than the other? Interpret all estimated significant
coefficients.
The estimated average EWL for subjects taking drug B is 4.8103 percent higher than that for
subjects taking drug A, keeping all the other predictors fixed. It means that drug B is more efficient
than drug A.

(d) According to the model, what is the predicted percent decrease in excess body weight for a 35-
year old male who is taking drug A?

The predicted percent decrease in excess body weight for a 35-year old male who is taking drug A is
computed by hand as: 𝐸𝑊𝐿 = 9.2146 + 0.1102 ∙ 35 = 13.0716.

In SAS:

/*using fitted model for prediction*/


data predict;
input drug$ age gender$;
cards;
A 35 M
;

data weightloss;
set weightloss predict;
run;

proc genmod;
class drug gender;
model EWL = drug age gender / dist=normal link=identity;
output out=outdata p=pEWL;
run;

proc print data=outdata (firstobs=33) noobs;


var pEWL;
run;

pEWL
13.0718

In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(drug.rel="A", age=35, gender.rel="M")))

13.7178

6
EXERCISE 1.3. (a) Reduce the car price by the factor of 1000. Check that the distribution of the
price is normal. Fit a general linear regression model to predict the price of a car. Write down the
fitted model, specifying all estimated parameters.

In SAS:
data carsales;
input bodystyle$ 1-9 country$ hwy doors leather$ price @@;
priceK=price/1000;
cards;
coupe USA 26 4 no 17445 coupe USA 40 4 no 23500
coupe USA 35 2 no 19600 coupe Germany 37 4 no 23400
coupe Germany 25 4 no 24100 coupe Germany 24 2 no 12400
coupe Japan 26 2 no 13300 coupe Japan 27 4 no 15550
coupe Japan 20 4 yes 29345 hatchback USA 30 2 no 12540
hatchback USA 39 4 no 17595 hatchback USA 38 2 no 17300
hatchback Germany 38 4 no 17800 hatchback Germany 32 4 no 22500
hatchback Germany 34 4 no 20300 hatchback Japan 38 4 yes 27300
hatchback Japan 38 2 yes 23300 hatchback Japan 38 2 yes 29300
sedan USA 29 4 no 32000 sedan USA 25 2 yes 34200
sedan USA 33 4 yes 33395 sedan Germany 40 4 no 22850
sedan Germany 23 2 yes 36000 sedan Germany 25 4 no 19900
sedan Japan 40 4 yes 36700 sedan Japan 35 4 yes 31600
sedan Japan 37 4 no 24600
run;

/*running normality check*/


proc univariate;
var priceK;
histogram/normal;
run;

Goodness-of-Fit Tests for Normal Distribution


Test Statistic p Value
Kolmogorov-Smirnov D 0.11287889 Pr > D >0.150
Cramer-von Mises W-Sq 0.05867848 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.37263698 Pr > A-Sq >0.250

P-values for the normality tests are all in excess of 0.05, indicating that normality holds. The
histogram also displays a distribution close to bell-shaped.
7
/*fitting general linear model*/
proc genmod;
class bodystyle(ref="hatchback") country(ref="Japan") leather(ref="no");
model priceK=bodystyle country hwy doors leather/dist=normal link=identity;
run;

Log Likelihood -67.2613

Analysis Of Maximum Likelihood Parameter Estimates


Parameter DF Estimate Standard Wald 95% Confidence Wald Chi- Pr > ChiSq
Error Limits Square
Intercept 1 5.1353 4.6900 -4.0570 14.3276 1.20 0.2735
bodystyle coupe 1 2.2698 1.6836 -1.0301 5.5696 1.82 0.1776
bodystyle sedan 1 6.4107 1.5477 3.3772 9.4441 17.16 <.0001
bodystyle hatchback 0 0.0000 0.0000 0.0000 0.0000 . .
country Germany 1 3.1959 1.6859 -0.1085 6.5002 3.59 0.0580
country USA 1 3.2128 1.5780 0.1199 6.3058 4.15 0.0418
country Japan 0 0.0000 0.0000 0.0000 0.0000 . .
hwy 1 0.1305 0.1117 -0.0884 0.3494 1.36 0.2427
doors 1 1.5554 0.6630 0.2560 2.8549 5.50 0.0190
leather yes 1 12.1757 1.6217 8.9972 15.3541 56.37 <.0001
leather no 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 1 2.9219 0.3976 2.2378 3.8150

The fitted model is 𝐸 (𝑝𝑟𝑖𝑐𝑒𝐾) = 5.1353 + 2.2698 ∙ 𝑐𝑜𝑢𝑝𝑒 + 6.4107 ∙ 𝑠𝑒𝑑𝑎𝑛 + 3.1959 ∙
𝐺𝑒𝑟𝑚𝑎𝑛𝑦 + 3.2128 ∙ 𝑈𝑆𝐴 + 0.1305 ∙ ℎ𝑤𝑦 + 1.5554 ∙ 𝑑𝑜𝑜𝑟𝑠 + 12.1757 ∙ 𝑙𝑒𝑎𝑡ℎ𝑒𝑟, and
𝜎 = 2.9219.

In R:
carsales.data<- read.csv(file="C:/./Exercise1.3Data.csv",header=TRUE, sep=",")

#rescaling price
priceK<- carsales.data$price/1000

#running normality check


library(rcompanion)
plotNormalHistogram(priceK)

shapiro.test(priceK)
Shapiro-Wilk normality test

W = 0.95482, p-value = 0.28


8
#specifying reference levels
bodystyle.rel<- relevel(carsales.data$bodystyle, ref="hatchback")
country.rel<- relevel(carsales.data$country, ref="Japan")
leather.rel<- relevel(carsales.data$leather, ref="no")

#fitting general linear model


summary(fitted.model<- glm(priceK ~ bodystyle.rel + country.rel + hwy + doors +
leather.rel, data=carsales.data, family=gaussian(link=identity)))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1353 5.5909 0.919 0.36986
bodystyle.relcoupe 2.2698 2.0070 1.131 0.27216
bodystyle.relsedan 6.4107 1.8450 3.475 0.00254
country.relGermany 3.1959 2.0098 1.590 0.12829
country.relUSA 3.2128 1.8812 1.708 0.10394
hwy 0.1305 0.1332 0.980 0.33937
doors 1.5554 0.7904 1.968 0.06384
leather.relyes 12.1757 1.9332 6.298 4.79e-06

#outputting estimated sigma


sigma(fitted.model)

3.483088

(b) How good is the model fit? Discuss significance of the regression coefficients.
The p-value in the deviance test is way below 0.05, indicating a good model fit. Significant variables
are sedan body style and leather interior.

In SAS:

/*checking model fit*/


proc genmod;
model priceK = / dist=normal link=identity;
run;

Log Likelihood -91.1942

data deviance_test;
deviance = -2*(-91.1942 - (-67.2613));
pvalue = 1 - probchi(deviance,7);
run;

proc print noobs;


run;

deviance pvalue
47.8658 3.7823E-8

In R:

#checking model fit


null.model<- glm(priceK ~ 1, data=carsales.data, family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))
47.86586

print(p.value<- pchisq(deviance, df=7, lower.tail = FALSE))

9
3.78218e-08

(c) Interpret the estimates of those regression coefficients that differ significantly from zero.
As estimated, sedan costs on average $6,410.70 more than a hatchback, under all other equal conditions.
The estimated average price of a car with leather interior is $12,175.70 larger compared to a car without
leather interior.

(d) What is the predicted price of a sedan made in USA that has 4 doors, leather seats, and runs 30
mpg on highway?
The predicted price of a sedan that is made in USA, has 4 doors, leather seats, and runs 30 mpg on
highway is calculated as: 𝑝𝑟𝑖𝑐𝑒 = $1,000(5.1353 + 6.4107 + 3.2128 + 0.1305 ∙ 30 + 1.5554 ∙
4 + 12.1757) = $37,071.10.

In SAS:

/*using fitted model for prediction*/


data predict;
input bodystyle$ country$ hwy doors leather$;
cards;
sedan USA 30 4 yes
;

data carsales;
set carsales predict;
run;

proc genmod;
class bodystyle country leather;
model priceK = bodystyle country hwy doors leather / dist=normal link=identity;
output out=outdata p=ppriceK;
run;

data final_prediction;
set outdata;
pprice=ppriceK*1000;
run;

proc print data=final_prediction (firstobs=28) noobs;


var pprice;
run;

pprice
37071.14

In R:

#using fitted model for prediction


prediction<- (predict(fitted.model, data.frame(bodystyle.rel="sedan", country.rel
="USA", hwy=30, doors=4, leather.rel="yes")))
print(prediction*1000)

37071.14

10
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
EXERCISE 1.4. (a) Show normality of the distribution of the number of hours of sleep per night.
Regress the number of hours of sleep on all the given factors. Write explicitly what the fitted model
is.
In SAS:
data sleep;
input age gender$ quiettime nchildren stresslevel jobstatus$ nactivities pastvac
sleephours @@;
cards;
62 F 60 1 5 unempl 1 15 7.7 28 F 15 1 6 unempl 5 11 5.3
50 M 15 0 5 unempl 1 19 6.4 36 M 60 1 6 full 1 21 7.7
56 F 50 0 3 part 4 5 7.6 48 M 180 0 5 full 0 6 6.4
55 M 40 0 8 full 8 23 7.0 26 F 80 0 7 student 9 8 8.3
44 M 180 1 3 part 6 20 9.6 49 F 5 0 7 unempl 5 15 5.5
29 M 60 2 5 student 5 7 7.7 56 M 10 1 4 unempl 4 17 5.7
46 F 40 1 7 part 3 3 7.4 41 F 5 2 6 full 9 10 6.2
22 M 15 0 8 full 4 3 6.3 36 F 45 2 5 part 8 14 7.5
54 F 120 1 8 part 7 10 8.5 42 F 60 3 1 full 9 11 6.3
58 F 5 1 7 full 1 17 5.3 33 M 100 2 1 full 9 5 8.3
50 F 2 2 6 full 3 12 5.1 59 M 30 2 5 full 2 6 6.9
32 M 30 1 8 full 5 9 6.9 50 M 60 2 8 part 8 13 8.0
56 F 10 0 3 unempl 7 7 6.1 42 F 240 0 1 part 8 21 8.8
58 F 10 2 7 full 9 4 6.2 57 F 15 1 6 full 2 16 6.3
30 F 30 0 2 full 8 9 8.3 54 M 20 2 8 full 6 7 6.5
57 M 45 2 4 full 7 18 7.5 45 F 120 0 9 part 2 13 6.6
33 F 40 1 6 unempl 9 24 7.0 56 F 120 0 5 part 2 20 8.7
59 F 60 2 9 part 4 19 8.1 41 M 60 2 3 student 2 3 7.5
62 M 40 0 1 unempl 0 2 8.6 29 M 15 1 7 unempl 3 20 6.3
34 F 30 0 7 unempl 9 0 6.6 32 F 20 3 7 unempl 2 8 7.8
46 F 20 2 3 unempl 9 18 7.9 45 M 60 0 2 unempl 0 22 9.0
23 M 45 0 6 part 4 12 7.6 38 M 60 4 5 full 3 5 7.8
45 M 30 0 5 unempl 9 7 6.8 63 F 40 0 6 unempl 5 5 7.3
27 F 120 0 4 student 1 16 7.3 30 F 45 0 7 part 8 10 7.7
34 F 5 3 6 full 0 4 6.0 62 M 10 0 10 part 8 11 6.0
;

/*running normality check*/


proc univariate;
var sleephours;
histogram/normal;
run;

11
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.08733974 Pr > D >0.150
Cramer-von Mises W-Sq 0.06145088 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.32815950 Pr > A-Sq >0.250

The normality tests (p-values > 0.05) as well as the bell-shaped histogram indicate normality of the
response variable.

/*fitting general linear model*/


proc genmod;
class gender(ref="F") jobstatus(ref="full");
model sleephours = age gender quiettime nchildren stresslevel jobstatus
nactivities pastvac / dist=normal link=identity;
run;

Log Likelihood -54.6201

Analysis Of Maximum Likelihood Parameter Estimates


Parameter DF Estimate Standard Wald 95% Confidence Wald Chi- Pr > ChiSq
Error Limits Square
Intercept 1 6.8260 0.7051 5.4440 8.2080 93.72 <.0001
age 1 -0.0037 0.0093 -0.0218 0.0145 0.16 0.6932
gender M 1 0.3568 0.2132 -0.0610 0.7747 2.80 0.0942
gender F 0 0.0000 0.0000 0.0000 0.0000 . .
quiettime 1 0.0074 0.0029 0.0018 0.0130 6.74 0.0095
nchildren 1 0.1204 0.1086 -0.0925 0.3334 1.23 0.2677
stresslevel 1 -0.1398 0.0536 -0.2450 -0.0347 6.80 0.0091
jobstatus part 1 1.0484 0.3188 0.4235 1.6732 10.81 0.0010
jobstatus student 1 0.6286 0.4358 -0.2255 1.4828 2.08 0.1492
jobstatus unempl 1 0.3818 0.2857 -0.1781 0.9418 1.79 0.1814
jobstatus full 0 0.0000 0.0000 0.0000 0.0000 . .
nactivities 1 0.0204 0.0345 -0.0472 0.0879 0.35 0.5545
pastvac 1 0.0050 0.0170 -0.0282 0.0383 0.09 0.7663
Scale 1 0.7214 0.0721 0.5930 0.8776

The fitted model is


𝐸 (𝑠𝑙𝑒𝑒𝑝ℎ𝑜𝑢𝑟𝑠) = 6.8260 − 0.0037 ∙ 𝑎𝑔𝑒 + 0.3568 ∙ 𝑚𝑎𝑙𝑒 + 0.0074 ∙ 𝑞𝑢𝑖𝑒𝑡𝑡𝑖𝑚𝑒 + 0.1204 ∙
𝑛𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛 − 0.1398 ∙ 𝑠𝑡𝑟𝑒𝑠𝑠𝑙𝑒𝑣𝑒𝑙 + 1.0484 ∙ 𝑝𝑎𝑟𝑡𝑡𝑖𝑚𝑒 + 0.6286 ∙ 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 + 0.3818 ∙ 𝑢𝑛𝑒𝑚𝑝𝑙 +
0.0204 ∙ 𝑛𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑖𝑒𝑠 + 0.0050 ∙ 𝑝𝑎𝑠𝑡𝑣𝑎𝑐, and 𝜎 = 0.7214.

In R:

sleep.data<- read.csv(file="C:/./Exercise1.4Data.csv", header=TRUE, sep=",")

#running normality check


library(rcompanion)
plotNormalHistogram(sleep.data$sleephours)

12
shapiro.test(sleep.data$sleephours)

Shapiro-Wilk normality test

W = 0.98284, p-value = 0.6762

#specifying reference levels


gender.rel<- relevel(sleep.data$gender, ref="F")
jobstatus.rel<- relevel(sleep.data$jobstatus, ref="full")

#fitting general linear model


summary(fitted.model<- glm(sleephours ~ age + gender.rel + quiettime + nchildren
+ stresslevel + jobstatus.rel + nactivities + pastvac, data=sleep.data,
family=gaussian(link=identity)))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.826002 0.798388 8.550 1.78e-10
age -0.003656 0.010494 -0.348 0.72943
gender.relM 0.356815 0.241401 1.478 0.14741
quiettime 0.007421 0.003238 2.292 0.02738
nchildren 0.120419 0.123020 0.979 0.33368
stresslevel -0.139828 0.060734 -2.302 0.02674
jobstatus.relpart 1.048386 0.360976 2.904 0.00603
jobstatus.relstudent 0.628623 0.493437 1.274 0.21021
jobstatus.relunempl 0.381840 0.323501 1.180 0.24501
nactivities 0.020373 0.039031 0.522 0.60465
pastvac 0.005046 0.019222 0.263 0.79430

#outputting estimated sigma


sigma(fitted.model)

0.8168443

(b) How good is the model fit? What beta coefficients are significantly different from zero at the 5%
level of significance?
The p-value in the deviance test is smaller than 0.05, which indicates a good fit of the model.
Significant variables at the 5% level are quiet time, stress level, and part-time employment status.

In SAS:
/*checking model fit*/
proc genmod;
model sleephours = / dist=normal link=identity;
run;
13
Log Likelihood -73.0195

data deviance_test;
deviance = -2*(-73.0195 - (-54.6201));
pvalue = 1 - probchi(deviance,10);
run;

proc print noobs;


run;

deviance pvalue
36.7988 .000061312

In R:

#checking model fit


null.model<- glm(sleephours ~ 1, data=sleep.data, family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))

36.79887

print(p.value<- pchisq(deviance, df=10, lower.tail = FALSE))

6.131066e-05

(c) Interpret the estimated significant regression coefficients.


It is estimated that for each extra minute of quiet time, a person would get on average 0.0074 hours
more sleep per night. For a unit increase in stress level, the estimated average number of hours of night
sleep decrease by 0.1398. It is estimated that, on average, someone working part-time would get 1.0484
more hours of sleep compared to someone who is working full-time.
(d) Find the estimated number of hours of night’s sleep that a 30-year old full-time mom of three
children under the age of five has, if she gets 10 minutes a day for herself, walks to the park with her
kids every day of the week, estimates her stress level as 7, and who hasn’t gotten any vacation for one
year.

Below we calculate the predicted number of hours of night’s sleep that a 30-year old full-time mom
of three children under the age of five has, if she gets 10 minutes a day for herself, walks to the park
with her kids every day of the week, estimates her stress level as 7, and who hasn’t gotten any
vacation for one year.

𝑠𝑙𝑒𝑒𝑝ℎ𝑜𝑢𝑟𝑠 = 6.8260 − 0.0037 ∙ 30 + 0.0074 ∙ 10 + 0.1204 ∙ 3 − 0.1398 ∙ 7 + 0.0204 ∙ 7


+ 0.0050 ∙ 12 = 6.3744.
In SAS:

/*using fitted model for prediction*/


data predict;
input age gender$ quiettime nchildren stresslevel jobstatus$ nactivities pastvac;
cards;
30 F 10 3 7 full 7 12
;

data sleep;

14
set sleep predict;
run;

proc genmod;
class gender jobstatus;
model sleephours = age gender quiettime nchildren stresslevel jobstatus
nactivities pastvac / dist=normal link=identity;
output out=outdata p=psleephours;
run;

proc print data=outdata (firstobs=51) noobs;


var psleephours;
run;

psleephours
6.37616

In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=30, gender.rel="F", quiettime=10,
nchildren=3, stresslevel=7, jobstatus.rel="full", nactivities=7, pastvac=12)))

6.376164

EXERCISE 1.5. (a) Compute the total time spent on both transitions. Verify normality of the
distribution of this variable, and fit a general linear regression model. Specify the fitted model.
In SAS:

data time;
input age gender$ run t1 bike t2 swim @@;
transitiontime=t1+t2;
cards;
55 M 24.17 2.60 37.95 2.50 5.70 59 F 34.88 2.83 52.15 3.05 5.20
24 M 32.97 2.55 59.20 3.47 5.37 53 F 22.2 1.83 46.70 2.15 5.50
51 M 27.35 1.75 42.05 2.32 3.75 38 F 32.13 2.38 50.92 2.95 6.00
66 M 25.39 1.95 41.57 2.80 3.93 30 F 24.67 1.58 48.28 2.77 5.68
43 F 42.33 2.78 63.60 4.08 7.18 47 F 28.73 2.35 45.57 3.90 6.62
26 F 29.62 2.92 51.23 3.85 4.92 45 M 22.23 2.07 38.95 2.35 4.28
29 F 26.93 2.10 44.33 2.45 7.47 34 M 17.75 0.75 33.27 1.23 3.65
39 M 37.47 2.52 55.67 4.47 8.60 54 M 36.63 3.27 43.92 3.08 7.15
26 M 34.42 2.73 52.62 2.67 9.23 36 M 27.38 2.22 39.03 2.92 7.43
42 M 21.37 2.12 35.95 1.93 3.95 49 M 29.03 4.50 38.53 3.95 8.80
42 F 28.53 3.27 49.85 3.67 8.13 42 F 25.12 1.72 39.52 2.50 4.55
42 F 26.33 1.70 48.98 2.30 5.02 41 F 36.75 3.95 62.85 3.13 6.93
15 M 25.12 1.70 44.75 3.20 7.48 48 M 26.52 4.43 40.98 3.82 6.58
37 M 28.3 2.85 41.78 3.47 6.02 55 M 31.25 2.70 43.43 3.25 5.25
42 M 24.38 1.45 37.13 1.83 3.70 25 M 33.45 2.25 51.38 4.03 7.45
12 F 27.62 2.23 55.47 2.97 4.37 23 F 28.55 2.17 54.57 2.55 7.90
49 M 33.88 2.77 54.82 3.87 6.90 53 F 26.97 1.77 42.33 3.40 6.58
45 F 26.58 1.65 44.30 2.52 5.40 33 F 32.32 2.10 54.87 2.32 6.25
63 M 40.53 3.78 69.75 3.83 12.17 50 M 33.68 3.07 43.57 3.13 5.77
43 F 34.93 2.58 62.35 2.95 7.92 24 M 22.88 1.82 39.55 2.12 4.03
44 M 29.25 2.47 45.60 2.75 9.18 51 F 36.98 3.70 46.58 5.18 7.60
;
15
/*running normality check*/
proc univariate;
var transitiontime;
histogram/normal;
run;

Goodness-of-Fit Tests for Normal Distribution


Test Statistic p Value
Kolmogorov-Smirnov D 0.07499320 Pr > D >0.150
Cramer-von Mises W-Sq 0.03895414 Pr > W-Sq >0.250
Anderson-Darling A-Sq 0.26390584 Pr > A-Sq >0.250

The p-values in the normality tests are above 0.05, which means that the response variable has a
normal distribution. The histogram displays a bell-shaped curve, supporting the normality conclusion.

/*fitting general linear model*/


proc genmod;
class gender;
model transitiontime = age gender run bike swim / dist=normal link=identity;
run;

Log Likelihood -56.4150

Analysis Of Maximum Likelihood Parameter Estimates


Parameter DF Estimate Standard Wald 95% Confidence Wald Chi- Pr > ChiSq
Error Limits Square
Intercept 1 0.5293 1.0253 -1.4803 2.5388 0.27 0.6057
age 1 0.0067 0.0128 -0.0184 0.0318 0.27 0.6032
gender F 1 0.0961 0.3256 -0.5421 0.7343 0.09 0.7679
gender M 0 0.0000 0.0000 0.0000 0.0000 . .
run 1 0.1964 0.0500 0.0985 0.2943 15.46 <.0001
bike 1 -0.0565 0.0328 -0.1207 0.0078 2.97 0.0849
swim 1 0.2475 0.1024 0.0468 0.4483 5.84 0.0156
Scale 1 0.9271 0.1012 0.7486 1.1481

The fitted model is 𝐸 (𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑡𝑖𝑚𝑒) = 0.5293 + 0.0067 ∙ 𝑎𝑔𝑒 + 0.0961 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 + 0.1964 ∙
𝑟𝑢𝑛 − 0.0565 ∙ 𝑏𝑖𝑘𝑒 + 0.2475 ∙ 𝑠𝑤𝑖𝑚, and 𝜎 = 0.9271.

In R:
16
time.data<- read.csv(file="C:/./Exercise1.5Data.csv", header=TRUE, sep=",")

#computing total transition time


transition.time<- time.data$t1 + time.data$t2

#running normality check


library(rcompanion)
plotNormalHistogram(transition.time)

shapiro.test(transition.time)

Shapiro-Wilk normality test

W = 0.97896, p-value = 0.6216

#specifying reference levels


gender.rel<- relevel(time.data$gender, ref="M")

#fitting general linear model


summary(fitted.model<- glm(transition.time ~ age + gender.rel + run + bike +
swim, data=time.data, family=gaussian(link=identity)))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.529266 1.107464 0.478 0.635605
age 0.006659 0.013837 0.481 0.633232
gender.relF 0.096094 0.351716 0.273 0.786250
run 0.196405 0.053953 3.640 0.000849
bike -0.056487 0.035412 -1.595 0.119427
swim 0.247544 0.110615 2.238 0.031507

#outputting estimated sigma


sigma(fitted.model)

1.001351

(b) Discuss the model fit. Are all the predictors in that model significant at the 5% significance level?

In SAS:
/*checking model fit*/
proc genmod;
model transitiontime = / dist=normal link=identity;
run;

17
Log Likelihood -74.6263

data deviance_test;
deviance = -2*(-74.6263 - (-56.4150));
pvalue = 1 - probchi(deviance,5);
run;

proc print noobs;


run;

deviance pvalue
36.4226 .000000782

Since the p-value in the deviance test is tiny, the model has a good fit. The only significant predictors
at the 5% level are run time and swim time.

In R:

#checking model fit


null.model<- glm(transition.time ~ 1, data=time.data,
family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))

36.42269

print(p.value<- pchisq(deviance, df=5, lower.tail = FALSE))

7.817128e-07

(c) Interpret only the estimated significant regression coefficients of this model.

The estimated average transition time increases by 0.1964 for a one-minute increase in run time.
For a one-minute increase in swim time, the estimated average transition time increases by 0.2475.

(d) What is the predicted total time at transitions for the student, if his best result in 5-kilometer run is
27:32, 13-mile bike is 56:17, and 200-meter swim is 8:46?

Below we compute the predicted time at transitions for the 25-year-old student with a 27:32 run,
56:17 bike, and 8:46 swim. First we convert the times into minutes: 27+32/60=27.53,
56+17/60=56.28, and 8+46/60=8.77. The calculation is as follows: 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛𝑡𝑖𝑚𝑒 = 0.5293 +
0.0067 ∙ 25 + 0.1964 ∙ 27.53 − 0.0565 ∙ 56.28 + 0.2475 ∙ 8.77 = 5.09.

In SAS:

/*using fitted model for prediction*/


data predict;
input age gender$ run bike swim;
cards;
25 M 27.53 56.28 8.77
;
data time;
set time predict;
run;

proc genmod;
18
class gender;
model transitiontime = age gender run bike swim / dist=normal link=identity;
output out=outdata p=ptransitiontime;
run;

proc print data=outdata (firstobs=43) noobs;


var ptransitiontime;
run;

ptransitiontime
5.09465

In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=25, gender.rel="M", run=27.53,
bike=56.28, swim=8.77)))

5.094653

EXERCISE 1.6. (a) Check that the measurements for the heart rate are coming from a normal
distribution. Fit the regression model and specify all estimated parameters.

In SAS:
data heartrate;
length AQI $9.;
input age gender$ ethnicity$ BMI nmeds AQI$ HR @@;
cards;
48 F Black 29.9 0 good 76 56 F White 22.9 3 unhealthy 112
67 F White 23.4 1 good 94 82 M Black 29.7 0 good 92
64 F White 31.4 3 good 97 58 M White 18.9 2 moderate 79
72 F Black 25.2 0 moderate 114 70 F Black 25.9 1 moderate 115
54 M Hispanic 29.6 0 moderate 80 57 F Hispanic 20.2 2 good 81
50 F Black 23.9 1 unhealthy 97 59 F Hispanic 22.6 0 good 86
61 M Hispanic 32.8 1 good 84 69 M Hispanic 24.1 2 unhealthy 94
65 F Black 23.4 2 moderate 114 66 F Hispanic 27.8 3 good 82
74 M White 32.4 1 moderate 97 66 M Hispanic 22.9 2 good 86
53 M Hispanic 25.2 0 good 84 55 M Hispanic 24.6 0 moderate 94
73 F Hispanic 24.8 3 moderate 105 45 F Hispanic 19.0 2 unhealthy 83
71 F White 20.3 2 unhealthy 111 63 M Black 23.8 2 unhealthy 108
71 F White 21.5 2 moderate 100 62 M Hispanic 27.4 3 good 79
44 F Hispanic 17.2 0 unhealthy 86 49 M White 17.1 1 good 75
63 M Black 28.0 2 good 91 65 F Hispanic 22.2 1 moderate 106
;

/*running normality check*/


proc univariate;
var HR;
histogram/normal;
run;

19
Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.15627802 Pr > D 0.061
Cramer-von Mises W-Sq 0.09496306 Pr > W-Sq 0.129
Anderson-Darling A-Sq 0.65250988 Pr > A-Sq 0.084

Based on the histograms and the large p-values, we can conclude that the heart rate follows a normal
distribution.
/*fitting generallinear model*/
proc genmod;
class gender ethnicity(ref="Hispanic") AQI(ref="good");
model HR = age gender ethnicity BMI nmeds AQI / dist=normal link=identity;
run;

Log Likelihood -96.2779

Analysis Of Maximum Likelihood Parameter Estimates


Parameter DF Estimate Standard Wald 95% Confidence Wald Chi- Pr > ChiSq
Error Limits Square
Intercept 1 38.0164 10.2408 17.9449 58.0879 13.78 0.0002
age 1 0.6503 0.1472 0.3617 0.9389 19.51 <.0001
gender F 1 7.1031 2.3608 2.4760 11.7303 9.05 0.0026
gender M 0 0.0000 0.0000 0.0000 0.0000 . .
ethnicity Black 1 7.5351 2.8956 1.8598 13.2104 6.77 0.0093
ethnicity White 1 2.2633 2.7895 -3.2041 7.7306 0.66 0.4172
ethnicity Hispanic 0 0.0000 0.0000 0.0000 0.0000 . .
BMI 1 0.0431 0.3225 -0.5890 0.6751 0.02 0.8938
nmeds 1 0.4384 1.1919 -1.8976 2.7743 0.14 0.7130
AQI moderate 1 10.8596 2.6942 5.5790 16.1402 16.25 <.0001
AQI unhealthy 1 14.1674 3.1905 7.9142 20.4206 19.72 <.0001
AQI good 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 1 5.9914 0.7735 4.6520 7.7165

The fitted model is 𝐸 (𝐻𝑅) = 38.0164 + 0.6503 ∙ 𝑎𝑔𝑒 + 7.1031 ∙ 𝑓𝑒𝑚𝑎𝑙𝑒 + 7.5351 ∙ 𝐵𝑙𝑎𝑐𝑘 + 2.
2633 ∙ 𝑊ℎ𝑖𝑡𝑒 + 0.0431 ∙ 𝐵𝑀𝐼 + 0.4384 ∙ 𝑛𝑚𝑒𝑑𝑠 + 10.8596 ∙ 𝐴𝑄𝐼𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒 + 14.1674 ∙
𝐴𝑄𝐼𝑢𝑛ℎ𝑒𝑎𝑙𝑡ℎ𝑦, and 𝜎 = 5.9914.

In R:

20
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
hr.data<- read.csv(file="C:/./Exercise1.6Data.csv", header=TRUE, sep=",")

#running normality check


library(rcompanion)
plotNormalHistogram(hr.data$HR)

shapiro.test(hr.data$HR)

Shapiro-Wilk normality test

W = 0.93047, p-value = 0.05054

#specifying reference levels


gender.rel<- relevel(hr.data$gender, ref="M")
ethnicity.rel<- relevel(hr.data$ethnicity, ref="Hispanic")
AQI.rel<- relevel(hr.data$AQI, ref="good")

#fitting general linear model


summary(fitted.model<- glm(HR ~ age + gender.rel + ethnicity.rel + BMI + nmeds +
AQI.rel, data=hr.data, family=gaussian(link=identity)))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.01638 12.24005 3.106 0.00535
age 0.65033 0.17599 3.695 0.00134
gender.relF 7.10311 2.82173 2.517 0.02002
ethnicity.relBlack 7.53509 3.46094 2.177 0.04102
ethnicity.relWhite 2.26328 3.33411 0.679 0.50466
BMI 0.04306 0.38543 0.112 0.91210
nmeds 0.43836 1.42454 0.308 0.76133
AQI.relmoderate 10.85963 3.22023 3.372 0.00288
AQI.relunhealthy 14.16737 3.81333 3.715 0.00128

#outputting estimated sigma


sigma(fitted.model)

7.161087

(b) Discuss the goodness-of-fit of the model. What variables are significant predictors of heart rate at
the 5% level of significance?

In SAS:
/*checking model fit*/
proc genmod;
model HR = / dist=normal link=identity;

21
run;

Log Likelihood -117.8512

data deviance_test;
deviance = -2*(-117.8512 - (-96.2779));
pvalue = 1 - probchi(deviance,8);
run;

proc print noobs;


run;

deviance pvalue
43.1466 .000000824

Since the p-value in the deviance test is tiny, the model has a good fit. The significant predictors at
the 5% level are age, gender, ethnicity level Black, and both levels of AQI.

In R:

#checking model fit


null.model<- glm(HR ~ 1, data=hr.data, family=gaussian(link=identity))
print(deviance<- -2*(logLik(null.model)-logLik(fitted.model)))

43.14658

print(p.value<- pchisq(deviance, df=8, lower.tail=FALSE))

8.243212e-07

(c) Give interpretation of the estimated statistically significant regression coefficients.


As age increases by one year, the estimated average heart rate increases by 0.6503 beats per minute.
The estimated average heart rate for females is 7.1031 beats per minute larger than that for males.
The estimated average heart rate for Blacks is 7.5351 beats per minute larger than that for Hispanics.
The estimated average heart rate for people living with moderate air quality is 10.8956 beats per
minute larger than that for people living with good air quality. The estimated average heart rate for
people living with moderate air quality is 14.1674beats per minute larger than that for people living
with good air quality.
(d) Compute the predicted heart rate of a 50-year-old Hispanic male who has a BMI of 20, is not
taking any heart medications, and resides in an area with a moderate air quality.
The predicted heart rate of a 50-year-old Hispanic male who has a BMI of 20, is not taking any heart
medications, and resides in an area with a moderate air quality is computed as follows:

𝐻𝑅 = 38.0164 + 0.6503 ∙ 50 + 0.0431 ∙ 20 + 10.8596 = 82.253.


In SAS:
/*using fitted model for prediction*/
data predict;
input age gender$ ethnicity$ BMI nmeds AQI$;
cards;
50 M Hispanic 20 0 moderate
;

22
data heartrate;
set heartrate predict;
run;

proc genmod;
class gender ethnicity AQI;
model HR = age gender ethnicity BMI nmeds AQI / dist=normal link=identity;
output out=outdata p=pHR;
run;

proc print data=outdata (firstobs=31) noobs;


var pHR;
run;

pHR
82.2536

In R:
#using fitted model for prediction
print(predict(fitted.model, data.frame(age=50, gender.rel="M", ethnicity.rel="Hi
spanic", BMI=20, nmeds=0, AQI.rel="moderate")))

82.25361

23
CHAPTER 2
EXERCISE 2.1. (a) Is the decrease in BMI percentile (preBMI-postBMI) normally distributed?
Plot a histogram and test for normality of the distribution.

In SAS:
data obesity;
input gender$ age group$ preBMI postBMI @@;
BMIdiff=preBMI-postBMI;
female=(gender="F");
control=(group="Cx");
cards;
F 6 Cx 85.7 83.8 F 6 Cx 93.8 92.9 F 7 Cx 93.5 92.5 F 8 Cx 90.1 89.8
F 9 Tx 92.3 90.7 F 9 Tx 90.3 88.3 F 12 Cx 87.6 85.9 F 12 Cx 87.2 84.1
F 12 Tx 96.9 94.9 F 12 Tx 85.8 81.2 F 13 Cx 96.7 94.1 F 13 Cx 93.5 92.9
F 13 Tx 92.3 87.5 F 13 Tx 85.3 83.7 F 14 Tx 95.5 78.7 F 15 Cx 91.3 89.9
F 15 Tx 95.8 87.1 F 16 Tx 90.7 87.2 M 6 Cx 92.6 88.1 M 7 Cx 95.8 94.7
M 7 Cx 90.4 89.1 M 7 Cx 91.2 88.6 M 8 Tx 94.4 87.8 M 8 Tx 93.2 87.3
M 10 Cx 93.9 91.5 M 10 Tx 96.2 91.1 M 10 Tx 89.4 87.9 M 11 Tx 86.2 77.1
M 11 Tx 95.4 84.8 M 12 Cx 97.7 95.8 M 13 Tx 85.3 80.0 M 13 Tx 86.2 82.4
M 14 Cx 85.5 83.6 M 14 Cx 97.8 93.8 M 16 Cx 95.0 93.6 M 16 Tx 93.1 86.8
;

/*running normality check of response*/


proc univariate;
var BMIdiff;
histogram/normal;
run;

Goodness-of-Fit Tests for Normal Distribution


Test Statistic p Value
Kolmogorov-Smirnov D 0.18720025 Pr > D <0.010
Cramer-von Mises W-Sq 0.36512474 Pr > W-Sq <0.005
Anderson-Darling A-Sq 2.15289200 Pr > A-Sq <0.005

Neither the histogram nor the normality tests support normality of the response. In fact, the
distribution is right-skewed.

24
In R:

bmi.data<- read.csv(file="C:/./Exercise2.1Data.csv",header=TRUE, sep=",")

#creating the difference in BMI


BMIdiff<- bmi.data$preBMI-bmi.data$postBMI

#running normality check of response


library(rcompanion)
plotNormalHistogram(BMIdiff)

shapiro.test(BMIdiff)

Shapiro-Wilk normality test


W = 0.79159, p-value = 1.114e-05

(b) Find the optimal lambda for Box-Cox transformation. Transform the change in BMI percentile
(find the appropriate transformation in Table 2.1), and show that the transformed variable is
normally distributed. Plot the histogram and do a formal testing.

In SAS:
/*finding optimal lambda for Box-Cox transformation*/
proc transreg;
model BoxCox(BMIdiff) = identity(age female control);
run;

/*applying Box-Cox transformation with lambda=0*/


data obesity;
set obesity;

25
Random documents with unrelated
content Scribd suggests to you:
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like