Linear Regression in R - R Tutorial
Linear Regression in R - R Tutorial
MENU SEARCH... GO
This article explains how to run linear regression in R. This SAS Interview Questions and
Answers
tutorial covers assumptions of linear regression and how to The following is a list of frequently asked SAS
interview questions which covers basic,
treat if assumptions violate. It also covers fitting the model intermediate and advanced concepts of SAS. It
cov...
and calculating model performance metrics to check the
performance of linear regression model. Linear Regression Analytics Companies Using SAS in
India
is one of the most popular statistical technique. It has been SAS (Statistical analysis system) is one of the
most popular tool for data analysis and
in use for more than 3 decades. It is widely accepted in statistical modeling. It is one of the world's ...
almost every domain as it's easy to understand output of Importing Excel Data into SAS
PROC IMPORT is the SAS procedure used to
linear regression. read data from excel into SAS. This tutorial
covers how to import excel data to SAS with
PROC IMP...
Linear Regression
Regression Equation
Interpretation:
Algorithm
4. No Outlier Problem
Standardized Coefficients
1. R-squared
R-Squared Formula
Rule :
2. Adjusted R-squared
Adjusted R-Squared
RMSE Calculation
Important Point
RMSE vs MAE
1. Data Preparation
2. Testing of Multicollinearity
3. Treatment of Multicollinearity
4. Checking for Autocorrelation
5. Checking for Outliers
6. Checking for Heteroscedasticity
7. Testing of Normality of Residuals
8. Forward, Backward and Stepwise Selection
9. Calculating RMSE
10. Box Cox Transformation of Dependent Variable
11. Calculating R-Squared and Adj, R-squared manually
12. Calculating Residual and Predicted values
13. Calculating Standardized Coefficient
library(ggplot2)
library(car)
library(caret)
library(corrplot)
Read Data
We will use mtcars dataset from cars package. This data
was extracted from the Motor Trend US magazine, and
comprises fuel consumption and 10 aspects of automobile
design and performance for 32 automobiles.
#Loading data
data(mtcars)
# Looking at variables
str(mtcars)
Variable Description
hp Gross horsepower
vs V/S
Summarize Data
> head(mtcars)
mpg cyl disp hp drat wt
qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620
16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320
18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440
17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460
20.22 1 0 3 1
summary(mtcars)
> summary(mtcars)
mpg cyl disp
hp
Min. :10.40 Min. :4.000 Min. : 71.1
Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3
Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7
Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0
Max. :335.0
drat wt qsec
vs
Min. :2.760 Min. :1.513 Min. :14.50
Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89
1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71
Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85
Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90
3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90
Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
Data Preparation
mtcars$am = as.factor(mtcars$am)
mtcars$cyl = as.factor(mtcars$cyl)
mtcars$vs = as.factor(mtcars$vs)
mtcars$gear = as.factor(mtcars$gear)
#Calculating Correlation
descrCor <- cor(numericData)
[1] 32 8
#Extracting Coefficients
summary(fit)$coeff
anova(fit)
par(mfrow=c(2,2))
plot(fit)
> summary(fit)
Call:
lm(formula = mpg ~ ., data = dat3)
Residuals:
Min 1Q Median 3Q Max
-5.4850 -1.3058 0.1856 1.5278 5.2439
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.7823 19.6148 0.856 0.401
cyl6 -1.8025 2.6085 -0.691 0.497
cyl8 -3.5873 4.0324 -0.890 0.383
drat 1.4283 2.1997 0.649 0.523
qsec 0.1003 0.7729 0.130 0.898
vs1 0.7068 2.3291 0.303 0.764
am1 3.2396 2.4702 1.311 0.203
gear4 1.3869 3.0466 0.455 0.653
gear5 2.3776 3.4334 0.692 0.496
carb -1.4836 0.6305 -2.353 0.028 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
> anova(fit)
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 2 824.78 412.39 45.7033 1.464e-08 ***
drat 1 14.45 14.45 1.6017 0.21890
qsec 1 2.83 2.83 0.3137 0.58108
vs 1 1.02 1.02 0.1132 0.73969
am 1 26.35 26.35 2.9198 0.10157
gear 2 8.15 4.07 0.4513 0.64254
carb 1 49.96 49.96 5.5363 0.02798 *
Residuals 22 198.51 9.02
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
1. Residuals vs Fitted
2. Normal Q-Q
3. Scale Location
4. Residuals vs Leverage
Residuals and Normal Q-Q Plot
[1] 0.8237094
[1] 0.7515905
AIC(fit)
[1] 171.2156
BIC(fit)
[1] 187.3387
> summary(stepBIC)
Call:
lm(formula = mpg ~ vs + am + carb, data = dat3)
Residuals:
Min 1Q Median 3Q Max
-6.2803 -1.2308 0.4078 2.0519 4.8197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.5174 1.6091 12.130 1.16e-12
***
vs1 4.1957 1.3246 3.168 0.00370
**
am1 6.7980 1.1015 6.172 1.15e-06
***
carb -1.4308 0.4081 -3.506 0.00155
**
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
#Standardised coefficients
library(QuantPsyc)
lm.beta(stepBIC)
std.Coeff = data.frame(Standardized.Coeff =
stdz.coff(stepBIC))
std.Coeff = cbind(Variable = row.names(std.Coeff),
std.Coeff)
row.names(std.Coeff) = NULL
vif(stepBIC)
Testing Other Assumptions
#Autocorrelation Test
durbinWatsonTest(stepBIC)
#See Residuals
resid = residuals(stepBIC)
#Relative Importance
install.packages("relaimpo")
library(relaimpo)
calc.relimp(stepBIC)
mpg pred
Mazda RX4 21.0 20.59222
Mazda RX4 Wag 21.0 20.59222
Datsun 710 22.8 29.08031
Hornet 4 Drive 21.4 22.28235
Hornet Sportabout 18.7 16.65583
Valiant 18.1 22.28235
#Calculating RMSE
rmse = sqrt(mean((dat3$mpg - pred)^2))
print(rmse)
K-fold cross-validation
library(DAAG)
kfold = cv.lm(data=dat3, stepBIC, m=5)
Nike Mens FS Lite
Run 3 Running S
It will feel like you're s
6% off
Nike FS Lite 2 Mens
Running Shoes
Finish strong in our Ni
RELATED POSTS:
Facebook Data Mining using R
R : If Else and Nested If Else
Support Vector Machine Simplified using R
Companies using R
When and why to standardize a variable
Partial and Semipartial Correlation
Predictive Modeling Interview Questions and
Answers
SAS : Calculate AUC of Validation Data
Detecting Interaction in Regression Model
R : Add Linear Regression Equation and
RSquare to Graph
Regression : Transform Negative Values
Predicting Transformed Dependent Variable
Linear Regression in R - R Tutorial
Linear Regression Model with PROC
GLMSELECT
How to Integrate R with PHP
How to Integrate R with PHP
Facebook Data Mining using R
R : If Else and Nested If Else
Support Vector Machine Simplified using R
Companies using R
When and why to standardize a variable
Partial and Semipartial Correlation
Predictive Modeling Interview Questions and
Answers
SAS : Calculate AUC of Validation Data
Detecting Interaction in Regression Model
R : Add Linear Regression Equation and
RSquare to Graph
Regression : Transform Negative Values
Predicting Transformed Dependent Variable
Linear Regression in R - R Tutorial
Linear Regression Model with PROC
GLMSELECT
26 RESPONSES TO "LINEAR REGRESSION IN R - R TUTORIAL"
Reply
Reply
Replies
Reply
Great explanation.
Thanks for sharing this.
Reply
Replies
Reply
thanks sir !
Reply
thanks sir !
Reply
Reply
Replies
Reply
Reply
Replies
Reply
Replies
Cheers!
Reply
Reply
Replies
Reply
Reply
Replies
Reply
Reply
Replies
Reply
Reply
Replies
> library(caret)
Error in loadNamespace(j <- i[[1L]], c(lib.loc,
.libPaths()), versionCheck = vI[[j]]) :
there is no package called pbkrtest
In addition: Warning message:
package caret was built under R version
3.2.5
Error: package or namespace load failed for
caret
Reply
Reply
Reply
Reply
Enter your comment...
Publish Preview
PREV NEXT