0% found this document useful (0 votes)
10 views53 pages

Stat 362 UNIT 1

The document provides an overview of regression analysis, including linear, multiple, and various polynomial regression models. It includes R code examples for fitting linear, quadratic, cubic, and quartic models, as well as multiple regression with interaction terms. Additionally, it presents sample data and outputs from regression analyses, highlighting the coefficients and statistical significance of the models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

Stat 362 UNIT 1

The document provides an overview of regression analysis, including linear, multiple, and various polynomial regression models. It includes R code examples for fitting linear, quadratic, cubic, and quartic models, as well as multiple regression with interaction terms. Additionally, it presents sample data and outputs from regression analyses, highlighting the coefficients and statistical significance of the models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

STATS 362

Statistical Computing and Data Analysis 2


UNIT 1
• REGRESSION

1. LINEAR REGRESSION MODEL

2. MULTIPLE REGRESSION MODEL

3. EXPONENTIAL REGRESSION

1
REGRESSION
• FORMS OF REGRESSION

▪ LINEAR
▪ QUADRATIC
▪ CUBIC
▪ QUARTIC

• MULTIPLE REGRESSION
• EXPONENTIAL REGRESSION
2
LINEAR REGRESSION
A major objective of many statistical investigations is to establish
relationships that make it possible to predict one or more variables in
terms of others. This regression is of the form y=ax+b, where
Y is the dependent variable.
X is the independent variable.
a is the y-intercept.
And b is the coefficient.
Note: The command ‘lm()’ is used to perform least squares regression.

3
R-code for linear regression
>x=c(numeric values)
>y=c(numeric values)
>a=lm(y~x)
>summary(a)

• Here y is the dependent variable


• X is the independent variable

4
EXAMPLE
The following table presents data on the fretting wear of mild steel and oil
viscosity. Where, x=oil viscosity and y =wear volume.

Viscosity(X) 3.9 2.1 6.4 5.7 4.7 2.8 3.4 7.5 3 4.5
Volume(Y) 6.5 7.8 5.2 8.2 9.2 8.9 7.3 9.8 5.6 4.6

Using the R software, fit an appropriate linear, quadratic, cubic and quartic
model to these data.

5
SOLUTION
LINEAR MODEL

>x=c(3.9,2.1,6.4,5.7,4.7,2.8,3.4,7.5,3.0,4.5)
>y=c(6.5,7.8,5.2,8.2,9.2,8.9,7.3,9.8,5.6,4.6)
>a=lm(y~x)
>summary(a)

6
7
OUTPUT
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.7255 -1.3034 0.4168 1.5894 2.0108
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.6299 1.7085 3.881 0.00467 **
x 0.1546 0.3642 0.424 0.68244
---
8
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.
’ 0.1 ‘ ’ 1
Residual standard error: 1.873 on 8 degrees of free
dom
Multiple R-squared: 0.02202, Adjusted R-squared:
-0.1002
F-statistic: 0.1801 on 1 and 8 DF, p-value: 0.6824

MODEL: Y= 6.6299+0.1546x

9
R-code for Quadratic
>x=c(numeric values)
>y=c(numeric values)
>b=lm(y~x+I(x^2))
>summary(a)

• Here y is the dependent variable


• X is the independent variable

10
QUADRATIC

>x=c(3.9,2.1,6.4,5.7,4.7,2.8,3.4,7.5,3.0,4.5)
>y=c(6.5,7.8,5.2,8.2,9.2,8.9,7.3,9.8,5.6,4.6)
>b=lm(y~x+I(x^2))
>summary(b)

11
12
Call:
lm(formula = y ~ x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-2.38024 -1.26603 0.08659 1.09552 2.57011
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.8590 4.8338 2.453 0.0439
*
x -2.3402 2.1926 -1.067 0.3213
I(x^2) 0.2612 0.2265 1.153 0.2867
13
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.836 on 7 degrees of


freedom
Multiple R-squared: 0.1781, Adjusted R-squared:
-0.05667
F-statistic: 0.7587 on 2 and 7 DF, p-value: 0.5032

MODEL:𝐲 = 𝟏𝟏. 𝟖𝟓𝟗 − 𝟐. 𝟑𝟒𝟎𝟐𝒙 + 𝟎. 𝟐𝟔𝟏𝟐𝒙𝟐


14
R-Code for Cubic
>x=c(numeric values)
>y=c(numeric values)
>d=lm(y~x+I(x^2)+I(x^3))
>summary(d)

• Here y is the dependent variable


• X is the independent variable

15
CUBIC
>x=c(3.9,2.1,6.4,5.7,4.7,2.8,3.4,7.5,3.0,4.5)
>y=c(6.5,7.8,5.2,8.2,9.2,8.9,7.3,9.8,5.6,4.6)
>c=lm(y~x+I(x^2)+I(x^3))
>summary(c)

16
17
Call:
lm(formula = y ~ x + I(x^2) + I(x^3))
Residuals:
Min 1Q Median 3Q Max
-2.0692 -1.4149 0.1101 1.2082 2.5999
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.78399 14.68099 0.462 0.660
x 1.44018 10.50169 0.137 0.895
I(x^2) -0.59553 2.33259 -0.255 0.807
I(x^3) 0.05974 0.16178 0.369 0.725
18
Residual standard error: 1.961 on 6 degrees of
freedom
Multiple R-squared: 0.1964, Adjusted R-squared: -
0.2054
F-statistic: 0.4888 on 3 and 6 DF, p-value: 0.7026

MODEL:𝐲 = 6.78399+1.44018x−0. 𝟓𝟗𝟓𝟓𝟑𝒙𝟐 +0.𝟎𝟓𝟗𝟕𝟒𝒙𝟑

19
R-Code for QUARTIC
>x=c(numeric values)
>y=c(numeric values)
>e=lm(y~x+I(x^2)+I(x^3)+I(x^4))
>summary(e)

• Here y is the dependent variable


• X is the independent variable

20
QUARTIC
>x=c(3.9,2.1,6.4,5.7,4.7,2.8,3.4,7.5,3.0,4.5)
>y=c(6.5,7.8,5.2,8.2,9.2,8.9,7.3,9.8,5.6,4.6)
>d=lm(y~x+I(x^2)+I(x^3)+I(x^4))
>summary(d)

21
22
Call:
lm(formula = y ~ x + I(x^2) + I(x^3) + I(x^4))
Residuals:
1 2 3 4 5 6
-0.5671 -0.4279 -1.3427 1.5557 2.0758 1.9301

7 8 9 10
0.3901 0.2268 -1.2881 -2.5526

23
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.57466 46.70585 0.783 0.469
x -28.93120 46.28444 -0.625 0.559
I(x^2) 10.22251 16.19846 0.631 0.556
I(x^3) -1.54560 2.38226 -0.649 0.545
I(x^4) 0.08439 0.12491 0.676 0.529

Residual standard error: 2.056 on 5 degrees of freedom


Multiple R-squared: 0.2636,Adjusted R-squared: -
0.3255
F-statistic: 0.4475 on 4 and 5 DF, p-value: 0.772

MODEL:y= 36.5747−28.93120x+10.22𝟐𝟓𝒙𝟐 −1.5𝟒𝟓𝟔𝒙𝟑 +0.𝟎𝟖𝟒𝟑𝟗𝒙𝟒


24
MULTIPLE REGRESSION
A regression with two or more explanatory variable is called a multiple
regression. Rather than modelling the mean as a straight line as in linear
regression, it is now modelled as a function of several explanatory variables.

25
R-code for Multiple Regression
>explanatory1=c(numeric values)
>explanatory2=c( numeric values)
>response=c(numeric values)
>s=lm(response~explanatory1+explanatory2)
>summay(s)
Here the terms response and explanatory1&2 should be replaced with names
of the response and explanatory variables respectively.
NOTE: The explanatory variables can be more than two.

26
EXAMPLE
The electric power consumed each month by a chemical plant is thought
to be related to the average ambient temperature(𝑥1 ) and tons of
product produced(𝑥2 ).The past 8 month historical data is presented
below:

𝑥1 91 90 88 87 91 94 87 86

𝑥2 25 21 24 25 25 26 25 25

Y 240 236 270 274 301 316 300 296

Fit a multiple regression to the set of data above.


27
SOLUTION
> x1=c(91,90,88,87,91,94,87,86)
> x2=c(25,21,24,25,25,26,25,25)
> y=c(240,236,270,274,301,316,300,296)
> a=lm(y~x1+x2)
> summary(a)

28
29
SOLUTION
Call:
lm(formula = y ~ x1 + x2)

Residuals:
1 2 3 4 5 6
-45.718 4.871 -2.461 -12.285 15.282 17.024
7 8
13.715 9.573

30
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -43.4558 331.1189 -0.131 0.9007
x1 -0.1417 3.4741 -0.041 0.9690
x2 13.6828 6.2329 2.195 0.0796 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

Residual standard error: 24.8 on 5 degrees of freedom


Multiple R-squared: 0.4927, Adjusted R-squared: 0.2897
F-statistic: 2.428 on 2 and 5 DF, p-value: 0.1833

MODEL:y=−43.4558−0.𝟏𝟒𝟏𝟕𝒙𝟏 +13.6828𝒙𝟐
31
When there is interaction among the
independent variables.
The R-Code command is:
> x1=c(numeric values)
> x2=c(numeric values)
>y=c(numeric values)
>b=lm(y~x1+x2+x1*x2)
> summary(b)

32
EXAMPLE
The electric power consumed each month by a chemical plant is thought
to be related to the average ambient temperature(𝑥1 ) and tons of
product produced(𝑥2 ).There is an interaction between ambient
temperature and product produced. The past 8 month historical data is
presented below:

𝑥1 91 90 88 87 91 94 87 86

𝑥2 25 21 24 25 25 26 25 25
Y 240 236 270 274 301 316 300 296

Fit a multiple regression.


33
When there is interaction among the independent variables.

> x1=c(91,90,88,87,91,94,87,86)
> x2=c(25,21,24,25,25,26,25,25)
>y=c(240,236,270,274,301,316,300,296)
>b=lm(y~x1+x2+x1*x2)
> summary(b)

34
35
OUTPUT
Call:
lm(formula = y ~ x1 + x2 + x1 * x2)

Residuals:
1 2 3 4 5 6
-36.003 5.462 -14.565 -10.977 24.997 7.283
7 8
15.023 8.780

36
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15661.115 19353.552 0.809 0.464
x1 -174.234 214.539 -0.812 0.462
x2 -607.238 765.100 -0.794 0.472
x1:x2 6.880 8.477 0.812 0.463

Residual standard error: 25.69 on 4 degrees of freedom


Multiple R-squared: 0.5644, Adjusted R-squared:
0.2377
F-statistic: 1.727 on 3 and 4 DF, p-value: 0.299

MODEL:y=15661.1−1𝟕𝟒. 𝟐𝟑𝟒𝒙𝟏 −607.2𝒙𝟐 + 6.88𝒙𝟏 𝒙𝟐


37
EXPONENTIAL REGRESSION
Example
The data shows the cooling temperatures of a freshly brewed cup of coffee
after it is poured from the brewing pot into a serving cup. The brewing pot
temperature is approximately 180º F.

Time 0 1 8 11 15 18 22 25 30 34 38 42 45 50

Temp 179.5 168.7 158.1 149.2 141.7 134.6 125.4 123.5 116.3 113.2 109.1 105.7 102.2 100.5

Find the law in the form T = 𝑎𝑏 𝑡

38
R-Code
>time=c(0,5,8,11,15,18,22,25,30,34,38,42,45,50)
>temp=c(179.5,168.7,158.1,149.2,141.7,134.6,125.4,123.5,116.3,113.2,
109.1,105.7,102.2,100.5)
>a=lm(log(temp)~time)
>summary(a)

39
40
SOLUTION
Call:
lm(formula = log(temp) ~ time)

Residuals:
Min 1Q Median 3Q Max
-0.052753 -0.025261 -0.005929 0.014306 0.056930

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1443601 0.0172782 297.74 < 2e-16 ***
time -0.0118227 0.0005988 -19.74 1.62e-10 ***
41
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03415 on 12 degrees of


freedom
Multiple R-squared: 0.9701, Adjusted R-squared:
0.9676
F-statistic: 389.8 on 1 and 12 DF, p-value:
1.621e-10

y= 5.144-0.01182log(time)
42
Example 2
The periods and mean distances of some of the planets are given below:

Period P days 87.97 224.7 365.3 687.0 4333 10760


Mean distance s in 58 108 150 228 778 1426
millions of km

Find a law in the form 𝑃 = 𝑘𝑠 𝑛 .

43
R-Code
>p=c(87.97,224.7,365.3,687.0,4333,10760)
>s=c(58,108,150,228,778,1426)
>a=lm(log(p)~log(s))
>summary(a)

44
45
Call:
lm(formula = log(p) ~ log(s))

Residuals:
1 2 3 4
-9.492e-04 3.871e-03 -3.153e-03 1.154e-04
5 6
-9.961e-05 2.155e-04
46
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.6154480 0.0052541 -307.5 6.71e-10 ***
log(s) 1.5006720 0.0009335 1607.5 8.99e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

Residual standard error: 0.002544 on 4 degrees of freedom


Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.584e+06 on 1 and 4 DF, p-value: 8.986e-13

𝒑 = −𝟏. 𝟔𝟏𝟓𝟒𝟒𝟖 + 𝟏. 𝟓𝟎𝟎𝟔𝟕𝟐𝒏


47
Example 3
The width of successive whorls of a shell of Turbo duplicatus have been
measured:

Positions of whols n 1 2 3 4 5 6 7 8
Width of whorl w cm 3.33 2.84 2.39 2.03 1.70 1.45 1.22 1.04

Find the law in the form 𝑤 = 𝑎𝑏 𝑛 .

48
R-Code
>n=c(1,2,3,4,5,6,7,8)
>w=c(3.33,2.84,2.39,2.03,1.70,1.45,1.22,1.04)
>r=lm(log(w)~n)
>summary(r)

49
50
Call:
lm(formula = log(w) ~ n)

Residuals:
Min 1Q Median 3Q
Max
-0.0065511 -0.0033214 0.0006323 0.0036527
0.0049239

51
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3733474 0.0035109 391.2 1.88e-14 ***
n -0.1672336 0.0006953 -240.5 3.48e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

Residual standard error: 0.004506 on 6 degrees of freedom


Multiple R-squared: 0.9999, Adjusted R-squared:
0.9999
F-statistic: 5.786e+04 on 1 and 6 DF, p-value: 3.484e-13
𝒘 = 𝟏. 𝟑𝟕𝟑𝟑𝟒𝟕𝟒 − 𝟎. 𝟏𝟔𝟕𝟐𝟑𝟑𝟔 𝐥𝐨𝐠 𝒃
52

You might also like