Simple Linear Regression
Simple Linear Regression
Excell
Girth Yield
50 480
45 375
62 500
78 650
55 440
40 400
52 468
57 513
45 408
66 540
R Studio
Reg_data=read.table("clipboard",header=1)
plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
plot(Reg_data)
lm1.fit=lm(Girth~Yield,data=Reg_data)
lm1.fit
summary(aov(lm1.fit))
par(mfrow=c(2,2))
plot(lm1.fit)
pred=predict(lm1.fit,interval = "predict")
pred
abline(lm1.fit,col="Blue")
R Studio Results
> Reg_data=read.table("clipboard",header=1)
> plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
> plot(Reg_data)
>
> lm1.fit=lm(Girth~Yield,data=Reg_data)
> lm1.fit
Call:
lm(formula = Girth ~ Yield, data = Reg_data)
Coefficients:
(Intercept) Yield
-8.7523 0.1335
>
> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> par(mfrow=c(2,2))
> plot(lm1.fit)
>
> pred=predict(lm1.fit,interval = "predict")
Warning message:
In predict.lm(lm1.fit, interval = "predict") :
predictions on current data refer to _future_ responses
> pred
fit lwr upr
1 55.34721 45.87153 64.82288
2 41.32544 31.10463 51.54625
3 58.01802 48.50517 67.53087
4 78.04911 66.58164 89.51659
5 50.00558 40.42758 59.58358
6 44.66396 34.75591 54.57200
7 53.74472 44.26301 63.22642
8 59.75405 50.18566 69.32243
9 45.73228 35.90759 55.55697
10 63.35964 53.59914 73.12015
>
> abline(lm1.fit,col="Blue")
>
> ggplot(Reg_data, aes(x = Yield, y = Girth)) +
+ geom_point() +
+ geom_smooth(method = "lm", se = FALSE, color = "red") +
+ labs(x = "Yield", y = "Girth", title = "Scatter Plot with Regression Line
")
`geom_smooth()` using formula = 'y ~ x
Interpretation
Standardized residuals
Residuals vs Fitted Q-Q Residuals
1.5
6
5 5
Residuals
0.0
-6 -2
-1.5
6 1 6
1
Standardized residuals
1.5
6 5 1
5
2
0.5
-1.5 0.0
0.6
1
0.0
Cook's
6 distance
According to the residual analysis, there is no reason for reject the model, so this is not broken the
underline the assumption.
> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
According to this results value of intercept and p value of Yield are less than 0.05 therefore Yield variable
significant.