0% found this document useful (0 votes)

19 views

Statistical Models in R

Student

Uploaded by

mereninnas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Statistical Models in R

Student

Uploaded by

mereninnas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Chapter 6

Linear regression

6.1 Simple linear regression

This chapter covers topics likely to be encountered when dealing with the construction of
simple linear regression models from data that do not exhibit any undesireable properties.
For i = 1, 2, . . . , n, let yi represent the ith observed value of a continuous response variable
Y and xi the corresponding value for a continuous explanatory variable X. Assume X is
measured without error and suppose that for each xi the corresponding observed responses yi
are prone to random deviations from some unknown mean response. For each i = 1, 2, . . . , n,
denote these random deviations in the observed responses by ϵi , and let β0 and β1 represent
unknown regression parameters.
The general structure of a simple linear regression model in algebraic form is

yi = β0 + β1 xi + ϵi

Moreover, it is assumed that, for each i = 1, 2, ..., n, the error terms ϵi have constant vari-
ances σ 2 , are independent, and are identically and normally distributed with ϵi ∼ N (0, σ 2 )
In matrix form, the model has the appearance

y = Xβ + ϵ,

where it is assumed that the design matrix, X, has full column rank; the response vector y
is a solution of y = Xβ + ϵ, and the entries of the error vector ϵ satisfy the above-mentioned
assumptions.
A traditional application of simple linear regression typically involves a study in which
the continuous response variable is, in theory, assumed to be linearly related to a continuous
explanatory variable, and for which the data provide evidence in support of this structural

1
2 CHAPTER 6. FROM R COMPANION

Table 6.1: Data for Chapter 5 Illustrations

x 0.00 0.75 1.50 2.25 3.00 3.75 4.50 5.25 6.00
y 54.3 50.8 58.0 54.6 45.3 47.0 51.7 43.3 44.7

x 6.75 7.50 8.25 9.00 9.75 10.50 11.25 12.00 12.75

y 38.5 42.1 40.0 32.0 34.6 32.8 33.4 28.7 26.9

requirement as well as for all fundamental assumptions on the error terms. Data for illus-
trations to follow are given in Table 5.1. This is simulated data.

6.1.1 Exploratory Data Analysis

A histogram can be plotted using the function hist(see Figure 5.1).
#insert the histogram

x <- c(0.00 , 0.75 , 1.50 , 2.25 , 3.00 , 3.75 , 4.50 , 5.25 , 6.00, 6.75, 7.5, 8.25, 9,
y <- c(54.3 , 50.8 , 58.0 , 54.6 , 45.3 , 47.0 , 51.7 , 43.3 , 44.7, 38.5, 42.1, 40, 32,
SimpleRegData <- data.frame(x, y)
with(SimpleRegData, hist(y, sub = "fig 5.1: histogram of response variable y"))

Histogram of y
4
3
Frequency

2
1
0

25 30 35 40 45 50 55 60

y
fig 5.1: histogram of response variable y
Such figures might give some
feel for clear deviations from symmetry in the observed responses. A preliminary assess-
6.1. SIMPLE LINEAR REGRESSION 3

ment of the manner in which the response might be (at least approximately) related to the
explanatory variable can be made using a basic scatterplot1 (see Figure 5.2).
#insert the scatterplot

with(SimpleRegData,plot(y, x,xlab="x",ylab="y"))
10
8
y

6
4
2
0

30 35 40 45 50 55

x
At this point, boxplots can also
be used in a preliminary check for potential outlying data as well as
#insert the summary

summary(SimpleRegData)

## x y
## Min. : 0.000 Min. :26.90
## 1st Qu.: 3.188 1st Qu.:33.70
## Median : 6.375 Median :42.70
## Mean : 6.375 Mean :42.15
## 3rd Qu.: 9.562 3rd Qu.:49.85
## Max. :12.750 Max. :58.00

Combining the observations that the response does not suggest a deviation from symme-
try, and the fact that there appears to be an approximate linear relationship between X and
Y , it is reasonable to fit the data to a simple linear regression model.
4 CHAPTER 6. FROM R COMPANION

6.1.2 Model Construction and Fit

The basic function call for fitting the data to a simple linear regression model has the
appearance

Simple.mod<-lm(y~x,SimpleRegData)

Then names(Simple.mod) shows that the resulting object, Simple.mod, contains several
sub-objects each of which contains useful information. To determine what the fitted model
is, execute

Simple.mod$coefficients

## (Intercept) x
## 56.417544 -2.238046

Thus, the fitted model is ŷ = 56.418 − 2.238x. Among the objects contained in Sim-
ple.mod that will find use in the diagnostic phase are coefficients, which contains the pa-
rameter estimates β0 and β1 ; fitted.values, which contains the fitted values ŷi ; residuals,
which contains the residuals, ϵˆi , for the fitted model; and df.residual, the residual degrees of
freedom.
In the case of simple linear regression models, it is a simple matter to obtain the tradi-
tional ANOVA table:

anova(Simple.mod)

## Analysis of Variance Table

##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 1365.1 1365.07 144.17 2.04e-09 ***
## Residuals 16 151.5 9.47
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This table contains the residual sum of squares (SSE = 151.5) and regression sum of
squares (SSR = 1365.1). The sum of squares of total variation (SSTo) is the sum of SSE
and SSR. The mean square error (MSE = 9.47) is also contained in this table along with the
F- and p-value for the corresponding goodness-of-fit F-test for the model.
6.1. SIMPLE LINEAR REGRESSION 5

It is important to remember that this approach of obtaining the traditional ANOVA table
will not work for later multiple linear regression models.
The summary function can be used to obtain the model summary statistics, which include
measures of fit. This summary also provides the essentials of the information given in the
traditional ANOVA table. The summary statistics produced can be stored in an object for
later recall using the command (output not shown)

(Simple.sum <- summary(Simple.mod))

Observe that by enclosing the whole assignment statement within parentheses, R not
only performs the assignment, but also outputs the summary statistics. Here is a partial list
of the summary output for the linear model object Simple.mod.
The object Simple.sum contains summary statistics associated with the residuals and
the coefficients along with s, the residualstandarderror, stored as sigma; the coefficient
of determination, r2 , stored as Multiple R-squared; and the relevant F − statistic and
degreesof f reedom for the goodness of fit F-test, stored as fstatistic.
Testing the model’s significance via the hypotheses
H0 : β0 + ϵi {reduced model}, Vs
H1 : β0 + β1 xi + ϵi {full model}.
then boils down to interpreting the line

anova(Simple.mod)

## Analysis of Variance Table

##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 1365.1 1365.07 144.17 2.04e-09 ***
## Residuals 16 151.5 9.47
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

in the object Simple.sum generated above. Finally, the line Multiple R-squared: 0.9001,
Adjusted R-squared: 0.8939 in Simple.sum contains the coefficient of determination, r2
≈ 0.9001. Recall, by definition, this indicates that close to 90% of the variations in the
observed responses for the given data are well explained by the fitted model and variations
in the explanatory variable. Note that in simple linear regression cases, Multiple R-squared
should be read simply as r-Squared and Adjusted R-squared should be ignored.
6 CHAPTER 6. FROM R COMPANION

6.1.3 Diagnostics

For many, graphical diagnostic methods are considered adequate and for pretty much all,
these methods are essential in providing visual support when numerical assessment methods
are used. To facilitate simpler code, begin by extracting the needed information from the
objects Simple.mod and Simple sum.

e <- residuals(Simple.mod)
y.hat <-fitted.values(Simple.mod)
s<-Simple.sum$sigma
r <- e/s
d<-rstudent(Simple.mod)

The objects e, y.hat, and s are fairly self-explanatory, r contains standardized residuals,
and d contains what are referred to as studentized deleted residuals.

The constant variance assumption

To assess whether the variances of the error terms might not be constant, the two traditional
plots used are given in Figure 5.3, the code2 for which is

plot(e, y.hat,xlab="Fitted Values",ylab="Residuals")

abline(h = 0)
plot(e, SimpleRegData$x,xlab="x",ylab="Residuals")
abline(h = 0)

Here, ideal plots should not have any noticeable trends and should have the bulk of the
plotted points randomly scattered and (approximately) symmetrically bounded about the
horizontal axis. These plots can also serve as a preliminary means of flagging potential
outlying values of the response variable.
6.1. SIMPLE LINEAR REGRESSION 7

4
Residuals

Residuals
2

2
0

0
−2

−2
−4

−4
30 35 40 45 50 55 0 2 4 6 8 10 12

Fitted Values x

The normality assumption

The QQ normal probability plot is a popular graphical assessment of the normality assump-
tion; the residuals or standardized residuals may be used, each with an appropriate reference
line. Figure 5.4 was obtained using

qqnorm(r,main="QQ Plot of Standardized Residuals")

abline(a=0,b=1,lty=3)

The line y = x works as a reference line for standardized residuals, but not for un-
standardized residuals. An alternative reference line that should be used for QQ plots of
unstandardized data can be plotted using

qqline(r,lty=3)

which plots a line through the first and third quartiles.

An ideal QQ normal probability plot typically has plotted points that are randomly and
closely clustered about the reference line. QQ plots can provide a fair bit of information with
respect to the distribution of the error terms as well as about the observed data. Outliers
in the data appear in one or both of the tails of the plot and generally tend to foul up the
”niceness” of a desired linear trend in the plot.
Two numerical tests of normality are given here, for one of which R has a built-in function.
The two tests are close in power; however, according to the literature, the normal QQ
8 CHAPTER 6. FROM R COMPANION

correlation coefficient test(no built-in r function for it) appears to be more versatile with
respect to sample size than the Shapiro-Wilk test.

QQ Plot of Standardized Residuals

1.5
Sample Quantiles

0.5
−0.5
−1.5

−2 −1 0 1 2

fig. 5.4: QQ normal probability

Theoretical
plot Quantiles
with superimposed line y = x for
reference.

Shapiro-Wilk test

This test is a modification of the correlation coefficient test, and the associated R command
to perform this test is

shapiro.test(e)

##
## Shapiro-Wilk normality test
##
## data: e
## W = 0.95673, p-value = 0.5399

which also indicates that there is not enough evidence to reject the assumption of nor-
mality (for α < p − value)
6.1. SIMPLE LINEAR REGRESSION 9

The presence and influence of outliers

In the case of simple linear regression, an assessment of the influence of an outlying case, an
ordered pair (xk , yk ) for which either xk , yk , or both xk and yk influence the fit of the model,
is best performed using the earlier constructed xy-scatterplot in Figure 5.2. Typically, an
influential outlier is indicated by a plotted point on an xy-scatterplot that lies far from the
bulk of the data and that does not fit in, or appears to disrupt the general trend of the bulk
of the data.

It might also correctly be inferred that if an observed response is an outlying case, then
the corresponding residual will lie further from zero than the bulk of the residuals. Thus, if
a particular value for the observed responses is an outlier, then this will be evident in plots
shown in Figures 5.3 and 5.4.

One way to test for outliers might be to decide beforehand how many points to test.
Suppose the 5% of residuals that are most extreme fit the bill. For the current data, this
amounts to a single data value. Cut-off values can be obtained using a Bonferroni adjustment
and then be placed in a plot of the studentized deleted residuals such as in Figure 5.5.

For example, suppose it is determined that the most extreme 5% of the data involve m
observations, which may or may not be potential outliers. Then, cutoff values for possible
outliers are obtained using t(α/(2m), n − p − 2), where n is the sample size and, in the case
of simple regression, p = 1. Thus, for the current model, the code

n<-length(e)
m<-ceiling(.05*n)
p<-1
cv<-qt(0.05/(2*m),n-p-2,lower.tail=F)
plot(d~y.hat,ylim=c(-cv-.5,cv+.5), xlab = "", ylab = "")
title(xlab="Fitted Values", ylab="Studentized Deleted Residuals")
abline(h=c(-cv,0,cv),lty=c(3,1,3))
10 CHAPTER 6. FROM R COMPANION

Studentized Deleted Residuals

2
1
0
−1
−2

30 35 40 45 50 55

Fitted Values
produces Figure 5.5. While
none of the plotted points for the example in question lie outside the plotted cutoff lines,
there are occasions when outliers will be present.
6.2. MULTIPLE LINEAR REGRESSION 11

6.2 Multiple Linear Regression

Let yi , i = 1, 2, . . . n represent observed values for a continuous response variable Y . For
j = 1, 2, . . . p, let xij , denote the corresponding values for the continuous explanatory vari-
ables X1 , X2 , . . . Xp and denote the unknown regression parameters by βj . Finally, let ϵi
represent the random error terms corresponding to each observed response. As with simple
linear regression, the explanatory variables are assumed to be measured without error.
The general structure of a multiple linear regression model in algebraic form has the appear-
ance
yi = β0 + β1 xi1 + · · · + βp xip

As with the simple linear regression model, it is assumed that, for each i = 1, 2, . . . , n, the
error terms ϵi have constant variances σ 2 , are independent, and are identically and normally
distributed with ϵi ∼ N (0, σ 2 )
The following data will be used for all illustrations

y <- c(15.09, 34.37, 24.64, 19.16, 26.68, 38.04, 5.59, 29.31, 28.27, 7.04,
38.56, 3.95, 1.57, 10.38, 47.61, 12.71,28.19, 6.8, 26.25, 45.33, 29.38,
3.3, 16.17, 29.24, 48, 39.97,17.16, 13.8, 22.86, 30.05, 16.5, 40.04, 2.9,
42, 39.08, 24.74, 34.61, 45.54, 5.6, 23.7)

x1 <- c(9.83, 6.62, 5.27, 1.96, 6.47,9.02, 9.32, 9.8, 7.89, 2.45, 5.73,
8.91, 7.95, 6.09, 3.28, 4.28, 1.56, 5.24, 3.48, 6.91, 3.14, 8.09, 6.76,
7.59, 4.7, 8.18, 1, 9.24, 8.05, 3.87,7.83, 7.48, 6.06, 5.81, 9.69, 7.01,
1.53, 7.8, 6.9, 3.51)

x2 <- c(1.87, 3.94, 4.95, 4.78, 2.87, 1.11, 1.09,1.06, 4.59, 3.18, 2.29,
2.01, 2.16, 4.6, 2.8, 3.23, 2.42, 2.46,2.48, 4.73, 2.82, 1.72, 4.57, 2.43,
4.54, 4.15, 2.72, 3.1, 3.87,4.73, 1.36, 4.53, 2.73, 1.72,4.26, 1.81, 3.74,
2.14, 4.89, 4.67)

x3 <- c(8.14, 14.16, 9.17, 5.38, 11.2, 16.99, 4.01, 13.55,11.09, 1.31, 15.72,
3.73, 1.98, 3.34, 17.63, 4.87, 10.12, 1.79, 9.82, 17.68, 11.32, 2.04, 6.53,
12.55, 17.83, 16.7, 4.99, 6.59, 9.81, 11.07, 7.7, 16.08, 1.61, 16.86, 16.43,
10.18, 12.5, 19.27,1.52, 7.53)
MultipleReg <- data.frame(y, x1, x2, x3)
12 CHAPTER 6. FROM R COMPANION

6.2.1 Exploratory Data Analysis

An initial graphical analysis might focus on the distributional properties of Y ; the relation-
ships between Y and each of X1 , X2 , and X3 ; the relationships between pairs of X1 , X2 ,
and X3 ; and the presence of unusual observations. One might begin with a histogram of the
observed responses

with(MultipleReg, hist(y))
title(sub = "Fig. 5.5 Histogram for observed responses in MultipleReg dataset")

Histogram of y
7
6
5
Frequency

4
3
2
1
0

0 10 20 30 40 50

y
Fig. 5.5 Histogram for observed responses in MultipleReg dataset
(see Figure 5.5). Histograms might provide some information on the symmetry and
spread of the observed responses in relation to the normal distribution having the same
mean and standard deviation as those of the sample data.
Boxplots can also be used as they provide information on symmetry and the presence of
outliers (in the univariate sense); however, a caution is appropriate here. When combining
boxplots on a single figure, pay attention to scale (and units). It is preferable to use separate
plots if there is a large difference in ranges between variables, and it is definitely preferable
to use separate plots if variable units differ. A matrix scatterplot can be used to look for
patterns or surprising behavior in both the response and the explanatory variables (see
Figure 5.6). One of the ways to obtain this figure is
6.2. MULTIPLE LINEAR REGRESSION 13

with(MultipleReg,pairs(cbind(y,x1,x2,x3)))
title(sub = "fig 5.6: Matrix scatterplot of MultipleReg dataset illustrating pairwise
relationships for all variables in the dataset.")

2 4 6 8 10 5 10 15

40
30
y

20
10
0
10
8
6

x1
4
2

5
4
x2

3
2
1
15

x3
10
5

0 10 20 30 40 1 2 3 4 5
fig 5.6: Matrix scatterplot of MultipleReg dataset illustrating pairwise
relationships for all variables in the dataset.
Another way is to simply enter plot(MultipleReg). An examination of (Xj , Y ) pairs in
scatterplots such as Fig 5.6 provide some (often vague) information on issues associated with
approximate relational fits between the response and explanatory variables. The more useful
information obtainable from such plots concerns the presence of possible linear relationships
between pairs of explanatory variables, (Xj , Xk ), where j ̸= k.
The summary function can be used here if summary statistics for the data are desired,
14 CHAPTER 6. FROM R COMPANION

the execution being as for the simple regression case.

6.2.2 Model Construction and Fit

The function lm is used to fit the data to the proposed model,

Multiple.mod<-lm(y~x1+x2+x3, MultipleReg)

and the contents of Multiple.mod include information analogous to the simple regression
case.

round(Multiple.mod$coefficients,4)

## (Intercept) x1 x2 x3
## 2.9411 -0.9098 0.8543 2.4917

So, the fitted model is ŷ = 2.9411 - 0.9098x1 + 0.8543x2 + 0.8543x3 .

The contribution significance of each variable, and the overall model fit is best obtained
from the model summary object.

Multiple.sum<-summary(Multiple.mod)

Take a look at the contents of the object Multiple.sum with the help of the function
names. Then,

Multiple.sum$coefficients

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 2.9410671 0.66798813 4.402873 9.159731e-05
## x1 -0.9098496 0.06287012 -14.471892 1.405530e-16
## x2 0.8543410 0.12738684 6.706666 7.987807e-08
## x3 2.4917171 0.02744574 90.787012 4.094410e-44

provides information on the contribution significance of each variable to the proposed

(full) model, given that the other two variables are in the reduced model. For example, the
row corresponding to x1 looks at the hypotheses

H0 : yi = β0 + +β2 xi2 + β3 xi3 ϵi {Reduced model} vs.

H1 : yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + ϵi {Full model}

If simply Multiple.sum is entered, the output includes the lines

6.2. MULTIPLE LINEAR REGRESSION 15

Residual standard error: 0.9442 on 36 degrees of freedom

Multiple R-squared: 0.9958, Adjusted R-squared: 0.9954
F-statistic: 2843 on 3 and 36 DF, p-value: < 2.2e-16

which provide quite a bit of additional information. Begin with the third line of the output
shown above. This provides results of the goodness of fit F-test, or the model significance
test involving the hypotheses

H0 : yi = β0 + ϵi {Reduced model} vs.

H1 : yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + ϵi {Full model}

The second line of the Multiple.sum output shown previously contains the coefficient of
multiple determination (R2 ), Multiple R-squared, and the adjusted coefficient of multiple
determination (R2 Adj ), Adjusted R-squared. Finally, the first line gives the residual stan-
dard error (s) along with its degrees of freedom.

Diagnostics

There is quite a bit more involved in the diagnostics stage for multiple regression models
and this can require some preparatory work. The objects of interest that are contained
in the fitted model object Multiple.mod include coefficients(β̂), residuals(ϵ̂), and
fitted.values(ŷ). Of use in the model summary object Multiple.sum are sigma (s), the
degrees of freedom of s2 which is contained in df[2], and cov.unscaled, which is the matrix
(X ′ X)−1 . These objects will be used to obtain the various statistics needed for the diagnos-
tics to be performed.

ˆ
In addition to the usual (unstandardized) residuals, epsilon i , available in Multiple.mod,
two transformed versions of the residuals may find use in the preliminary stages of the fol-
lowing diagnostics: standardized (or semistudentized) residuals,

ϵ̂i
r̂i = ,
s

and studentized (or internally studentized) residuals,

ϵ̂i
r̂i = √ .
s 1 − hii

The hii terms in the above formula are called leverages and are the diagonal entries of the
16 CHAPTER 6. FROM R COMPANION

hat-matrix H = X(X ′ X)−1 X ′ . These find use in the analysis of explanatory variable
p-tuples, (xi1 , xi2 , . . . , xip ), for flagging potential outlying cases.
One further form of residuals, seen earlier, that plays a role in outlier analysis is the
studentized deleted (or externally studentized) residuals,
s
n−p−2
dˆ∗i = ϵ̂i .
s2 (n − p − 1)(1 − hii )ϵ̂2i

The choice of which form of residuals to use in the diagnostics process is sometimes guided
by the variability present in the data and also depends on a combination of the task to be
performed and individual preference.
To ease the way for the illustrations to follow, compute and store each of the above-listed
statistics in appropriately named variables.

y.hat<-fitted.values(Multiple.mod)
e<-residuals(Multiple.mod)
s<-Multiple.sum$sigma
r<-e/s#Compute standardized residuals
h<-hatvalues(Multiple.mod)#Extract leverages
stud<-e/(s*sqrt(1-h))#Compute studentized residuals
stud.del<-rstudent(Multiple.mod)#Extract studentized deleted residuals

The constant variance assumption

Although the plots shown in fig 5.7 serve mainly to assess the constant variances assumption,
recall that they also serve to flag other potential issues such as the presence of outliers, the
absence of important variables, or possible issues of fit. Ideal plots have all plotted points
randomly scattered and approximately symmetrically spread about the horizontal axis.
Only the studentized residuals are used here; however, it should be remembered that any one
of the three forms of (undeleted) residuals provide equivalent information, but in potentially
varying levels of clarity. The code used to produce fig 5.7 is shown below.

par(mfrow=c(2,2))
plot(y.hat,stud,xlab="Fitted values",ylab="Studentized residuals",sub="(a)")
abline(h=c(-2,0,2),lty=c(3,1,3))
plot(x1,stud,xlab="x1", ylab = "Studentized residuals",sub="(b)")
abline(h=c(-2,0,2),lty=c(3,2,3))
6.2. MULTIPLE LINEAR REGRESSION 17

plot(x2,stud,xlab="x2", ylab="Studentized residuals",sub="(c)")

abline(h=c(-2,0,2),lty=c(3,2,3))
plot(x3,stud,xlab="x3", ylab="Studentized residuals",sub="(d)")
abline(h=c(-2,0,2),lty=c(3,2,3))
Studentized residuals

Studentized residuals
2

2
1

1
0

0
−1

−1
−2

−2
10 20 30 40 2 4 6 8 10

Fitted values x1
(a) (b)
Studentized residuals

Studentized residuals
2

2
1

1
0

0
−1

−1
−2

−2

1 2 3 4 5 5 10 15

x2 x3
(c) (d)

The included horizontal lines are simply for reference and do not represent cutoff values.

6.2.3 The normality assumption

This may be accomplished by looking at a QQ normal probability plot of any one of the
unstandardized, standardized, or studentized residuals. Fig 5.8, was obtained using the code
18 CHAPTER 6. FROM R COMPANION

qqnorm(stud,main=NULL)
abline(a=0,b=1)

2
Sample Quantiles

1
0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles
Observe, by setting main=NULL,
the default figure title does not appear; not specifying an lty results in a default solid line.
For the fitted model in question, the QQ plot of the standardized residuals looks very much
like fig. 5.8. The tests of normality encountered earlier work here, too.

6.2.4 Shapiro-Wilk test

As before

shapiro.test(stud)

##
## Shapiro-Wilk normality test
##
## data: stud
## W = 0.98147, p-value = 0.744

also indicating that there is not enough evidence to reject the assumption of normality
(for α < p).

MAT-121 Assignment Sheet WA4
No ratings yet
MAT-121 Assignment Sheet WA4
5 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
R Code For Linear Regression Analysis 1 Way ANOVA
No ratings yet
R Code For Linear Regression Analysis 1 Way ANOVA
8 pages
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
No ratings yet
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
27 pages
Lab-10-Forest-Regression
No ratings yet
Lab-10-Forest-Regression
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Diagnostico de Modelos
No ratings yet
Diagnostico de Modelos
4 pages
Regression Diagnostics With R: Anne Boomsma
No ratings yet
Regression Diagnostics With R: Anne Boomsma
23 pages
Interactive Lecture Notes 12-Regression Analysis
No ratings yet
Interactive Lecture Notes 12-Regression Analysis
22 pages
Regn_lect_4
No ratings yet
Regn_lect_4
9 pages
How to Perform Simple Linear Regression in Python
No ratings yet
How to Perform Simple Linear Regression in Python
8 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Correlation Analysis. Regression
No ratings yet
Correlation Analysis. Regression
73 pages
Mini Tests
No ratings yet
Mini Tests
11 pages
00384899fuzzy Fault in Sensors
No ratings yet
00384899fuzzy Fault in Sensors
6 pages
Lab 5 LR
No ratings yet
Lab 5 LR
9 pages
In Sem 2 Study Material
No ratings yet
In Sem 2 Study Material
19 pages
American Statistical Association American Society For Quality
No ratings yet
American Statistical Association American Society For Quality
8 pages
2.1972 Generalized Linear Models Nelder Wedderburn
No ratings yet
2.1972 Generalized Linear Models Nelder Wedderburn
16 pages
Appendix Nonparametric Regression
No ratings yet
Appendix Nonparametric Regression
17 pages
Lab 05-1
No ratings yet
Lab 05-1
6 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
Nelder 1972
No ratings yet
Nelder 1972
16 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Lectures 8 9 10
No ratings yet
Lectures 8 9 10
185 pages
Appendix Nonlinear Regression
No ratings yet
Appendix Nonlinear Regression
5 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
0.1 Aov: Analysis of Variance For Continuous Depen-Dent Variables
No ratings yet
0.1 Aov: Analysis of Variance For Continuous Depen-Dent Variables
7 pages
06 05 Adequacy of Regression Models
No ratings yet
06 05 Adequacy of Regression Models
11 pages
Solutions Chapter6
100% (1)
Solutions Chapter6
19 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
indexamc_merged
No ratings yet
indexamc_merged
16 pages
Mathematica Laboratories For Mathematical Statistics
No ratings yet
Mathematica Laboratories For Mathematical Statistics
26 pages
Regression Analysis Tutorial Excel Matlab
100% (1)
Regression Analysis Tutorial Excel Matlab
15 pages
Stochastic Simulation and Power Analysis: ©2006 Ben Bolker August 3, 2007
No ratings yet
Stochastic Simulation and Power Analysis: ©2006 Ben Bolker August 3, 2007
25 pages
Answers Review Questions Econometrics
84% (25)
Answers Review Questions Econometrics
59 pages
Manual
No ratings yet
Manual
46 pages
PACS Numbers: 01.30.Pp, 01.55.+b, 02.70.uu
No ratings yet
PACS Numbers: 01.30.Pp, 01.55.+b, 02.70.uu
18 pages
CH 05
100% (1)
CH 05
19 pages
Name: Chinmay Tripurwar Roll No: 22b3902: Simple Regression Model Analysis
No ratings yet
Name: Chinmay Tripurwar Roll No: 22b3902: Simple Regression Model Analysis
9 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
06 Simple Modeling
No ratings yet
06 Simple Modeling
16 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
LR_GD
No ratings yet
LR_GD
5 pages
04 Violation of Assumptions All
No ratings yet
04 Violation of Assumptions All
24 pages
Solutions Chapter6
No ratings yet
Solutions Chapter6
19 pages
Multiple Regression
No ratings yet
Multiple Regression
100 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
19 pages
Cooks
No ratings yet
Cooks
5 pages
Topic 7
No ratings yet
Topic 7
43 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Practical Attachement Final Report
No ratings yet
Practical Attachement Final Report
19 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Categorical Slide2024
No ratings yet
Categorical Slide2024
32 pages
Multinomial & Ordinal LR Possion1
No ratings yet
Multinomial & Ordinal LR Possion1
63 pages
CBTP Phase Two Document JHG
No ratings yet
CBTP Phase Two Document JHG
15 pages
PA Last
No ratings yet
PA Last
23 pages
QuadraticsPolynomial Crossword P Answer Key
No ratings yet
QuadraticsPolynomial Crossword P Answer Key
1 page
Linear Programming (Sensitivity Analysis - Shadow Price)
No ratings yet
Linear Programming (Sensitivity Analysis - Shadow Price)
18 pages
Lecture 13
No ratings yet
Lecture 13
7 pages
Numerical Evaluation of Integrals With Weight Function ? ? Using Gauss Legendre Quadrature Rules
No ratings yet
Numerical Evaluation of Integrals With Weight Function ? ? Using Gauss Legendre Quadrature Rules
6 pages
Modern Characterization of Electromagnetic Systems and its Associated Metrology (Wiley - IEEE) 1st Edition Tapan K. Sarkar 2024 Scribd Download
100% (2)
Modern Characterization of Electromagnetic Systems and its Associated Metrology (Wiley - IEEE) 1st Edition Tapan K. Sarkar 2024 Scribd Download
51 pages
Instant Download Infinite Divisibility of Probability Distributions on the Real Line 1st Edition Fred W. Steutel PDF All Chapters
100% (17)
Instant Download Infinite Divisibility of Probability Distributions on the Real Line 1st Edition Fred W. Steutel PDF All Chapters
60 pages
Vtet Language PDF
No ratings yet
Vtet Language PDF
31 pages
Mathematics: Quarter 1 - Module 8
100% (2)
Mathematics: Quarter 1 - Module 8
14 pages
Lecture Notes1 22
No ratings yet
Lecture Notes1 22
22 pages
Determinant JEEMains 2021 Questions
No ratings yet
Determinant JEEMains 2021 Questions
3 pages
10th polynomials worksheet
No ratings yet
10th polynomials worksheet
1 page
NSM - Runge Kutta - 2nd Order - Lab 4
No ratings yet
NSM - Runge Kutta - 2nd Order - Lab 4
3 pages
Quadratic
No ratings yet
Quadratic
10 pages
Gpelab A Matlab Toolbox For Computing Stationary Solutions and Dynamics of Gross-Pitaevskii Equations (Gpe)
No ratings yet
Gpelab A Matlab Toolbox For Computing Stationary Solutions and Dynamics of Gross-Pitaevskii Equations (Gpe)
122 pages
Numerical Methods: MATH-351
No ratings yet
Numerical Methods: MATH-351
14 pages
Module 1: Introduction To Numerical Analysis Questions
No ratings yet
Module 1: Introduction To Numerical Analysis Questions
2 pages
. Minimal Polynomials of 2 cos π n: A000010 A055034 A181875 A181876
No ratings yet
. Minimal Polynomials of 2 cos π n: A000010 A055034 A181875 A181876
3 pages
Chapra - Applied Numerical Methods MATLAB Engineers Scientists 3rd-Halaman-182-191
No ratings yet
Chapra - Applied Numerical Methods MATLAB Engineers Scientists 3rd-Halaman-182-191
10 pages
Algebraic Expressions, Polynomials and Operations On Polynomials
No ratings yet
Algebraic Expressions, Polynomials and Operations On Polynomials
12 pages
Eeschsyll
No ratings yet
Eeschsyll
147 pages
Flow Reactor Models For Fluid-Fluid Systems, Based On The Two-Film Theory
No ratings yet
Flow Reactor Models For Fluid-Fluid Systems, Based On The Two-Film Theory
6 pages
Lockbox Problem
No ratings yet
Lockbox Problem
14 pages
Lesson 2 Polynomial Functions
No ratings yet
Lesson 2 Polynomial Functions
6 pages
Tutorial 2
No ratings yet
Tutorial 2
3 pages
HW 07 - Ise 303
No ratings yet
HW 07 - Ise 303
2 pages
Numerical Methods To Find A Root of An Algebraic or Transcendental Equation
No ratings yet
Numerical Methods To Find A Root of An Algebraic or Transcendental Equation
21 pages
Mth603 Collection of Old Papers
No ratings yet
Mth603 Collection of Old Papers
11 pages
MVC Econ Summer22-PracticeMidterm
No ratings yet
MVC Econ Summer22-PracticeMidterm
4 pages
Systems of Nonlinear Equations Pre Cal Notes
No ratings yet
Systems of Nonlinear Equations Pre Cal Notes
2 pages

Statistical Models in R

Uploaded by

Statistical Models in R

Uploaded by

Chapter 6

6.1 Simple linear regression

Table 6.1: Data for Chapter 5 Illustrations

x 6.75 7.50 8.25 9.00 9.75 10.50 11.25 12.00 12.75

6.1.1 Exploratory Data Analysis

6.1.2 Model Construction and Fit

## Analysis of Variance Table

(Simple.sum <- summary(Simple.mod))

## Analysis of Variance Table

The constant variance assumption

plot(e, y.hat,xlab="Fitted Values",ylab="Residuals")

The normality assumption

qqnorm(r,main="QQ Plot of Standardized Residuals")

which plots a line through the first and third quartiles.

QQ Plot of Standardized Residuals

fig. 5.4: QQ normal probability

The presence and influence of outliers

Studentized Deleted Residuals

6.2 Multiple Linear Regression

6.2.1 Exploratory Data Analysis

the execution being as for the simple regression case.

6.2.2 Model Construction and Fit

So, the fitted model is ŷ = 2.9411 - 0.9098x1 + 0.8543x2 + 0.8543x3 .

## Estimate Std. Error t value Pr(>|t|)

provides information on the contribution significance of each variable to the proposed

H0 : yi = β0 + +β2 xi2 + β3 xi3 ϵi {Reduced model} vs.

If simply Multiple.sum is entered, the output includes the lines

Residual standard error: 0.9442 on 36 degrees of freedom

H0 : yi = β0 + ϵi {Reduced model} vs.

and studentized (or internally studentized) residuals,

The constant variance assumption

plot(x2,stud,xlab="x2", ylab="Studentized residuals",sub="(c)")

6.2.3 The normality assumption

6.2.4 Shapiro-Wilk test

You might also like