0% found this document useful (0 votes)

10 views

R Lab 4

Uploaded by

sdcphdwork

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

R Lab 4

Uploaded by

sdcphdwork

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Simple Linear Regression and Curvilinear Regression

Lab 4 R Notes: EXST 7014/15

Contents
0.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Lab Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.3 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.4 Fitting the Simple Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.4.1 Probing the lm function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.4.2 Assessing Homogeneity of Variance and Normality Assumptions . . . . . . . . . . . . 5
0.5 Fitting the Exponential Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
0.6 Lab Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.6.1 Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.6.2 Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.6.3 Question 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.6.4 Question 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1 Objectives
1. Use the lm function to fit a simple linear model

2. Check the assumption of homogeneous variances and normality test

3. Use the lm to fit curvilinear models.

Simple linear regression (SLR) is a common analysis procedure used to describe the significant relationship
between two variables in such a manner that one variable can be predicted or explained by using infor-
mation from the other. In the previous labs, we learnt how to evaluate the SLR model comprehensively
by interpreting the ANOVA table, R2 , parameter estimates, residual plot, normality test and diagnostic
statistics.
However, many systems encountered in research and practice exhibit curvilinear relationships instead of a
simple linear relationship. Luckily, many curvilinear relationships can be expressed in linear relationships.
During the last lab, you might be aware that the homogeneity - that is assumes constant variability about
the regression line - is a common violation of one of the assumptions of linear regression. If the variability
increases as values of the predicted value increases then certain transformations are applied. Among the
choices are the log, square root, and reciprocal transformations. Usually the need for one of these transforma-
tions is determined by examining the residual plot. If the residual plot is fan shaped then the heterogeneity
of variance is assumed. Log transformation is the most commonly used to alleviate a problem with hetero-
geneity of variance. Using a log transformation implies that the underlying relationship is exponential. If
the transformation works and the underlying relationship is exponential then the regression model should
improve, and the residual plot should be more oval than fan-shaped.

0.2 Lab Setup

Run the following code to both install and load the required packages.
install.packages('olsrr') # install the package that runs residual plots and check assumptions
library(olsrr) # Load the package

1
0.3 The Data
The dataset is from Chapter 8, Problem 10 in your textbook. We are trying to estimate the survival of liver
transplant patients using information on the patients collected before the operation. The variables are:
• CLOT: a measure of the clotting potential of the patient’s blood;
• PROG: a subjective index of the patient’s prospective of recovery;
• ENZ: a measure of a protein present in the body;
• LIV: a measure relating to white blood cell count and the response;
• TIME: a measure of the survival time of the patient.
In this lab we will use the TIME as the dependent and ENZ as the independent variable. The data
is available at http://statweb.lsu.edu/EXSTWeb/StatLab/DataSets/EXST7015/FW&M%20Data%202010/
TEXT/DATATAB_8_31.TXT
#' The data link above is unavailable now
#' so download the data_lab4.txt file to your working directory
#' Create an object to host the data set
#'
#' @sep="" because the columns are seperated by 'space'
#'

patients <- read.table('data_lab4.txt', header = TRUE, sep = "")

str(patients) # get a structure (description) of your dataset

## 'data.frame': 54 obs. of 6 variables:

## $ obs : int 1 2 3 4 5 6 7 8 9 10 ...
## $ clot: num 3.7 8.7 6.7 6.7 3.2 5.2 3.6 5.8 5.7 6 ...
## $ prog: int 51 45 51 26 64 54 28 38 46 85 ...
## $ enz : int 41 23 43 68 65 56 99 72 63 28 ...
## $ liv : num 1.55 2.52 1.86 2.1 0.74 2.71 1.3 1.42 1.91 2.98 ...
## $ time: int 34 58 65 70 71 72 75 80 80 87 ...

0.4 Fitting the Simple Linear Regression Model

T IM E = β0 + β1 EN Z + where T IM E is the Y , EN Z is X, and is a random error term that is normally
distributed with mean 0 and unknown variance σ 2 . β0 is the estimate of Y −intercept, and β1 is the estimate
of the slope coefficient.
#' Create an object called lm_patients (it can be any name)
#' to host the model
#'
#'
#' Specify the model, time = B0 + enz (B1) using the lm function

lm_patients <- lm(time ~ enz, data = patients)

Visualizing fitted model with observations. The blue lines represent the errors for each fitted value. The red
line is the fitted model.

2
Fitted Model of Survival Time vs Enzyme (Blood Protein)

800

600
time

400

200

25 50 75 100
enz

0.4.1 Probing the lm function

The model created above contains a lot of information The object created to host the lm model can be
subsetted (or extracted) with the following names:
names(lm_patients) # produces the call names of the lm function

## [1] "coefficients" "residuals" "effects" "rank"

## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
Illustration of how to use the names of the lm function.
lm_patients$coefficients # extracts the coefficients (parameter estimates) of the model

## (Intercept) enz
## -108.71614 3.96678
Also certain sub-functions specific to the lm model can be applied to the model object (lm_patients)
methods(class = class(lm_patients))[1:10] # extracts the 1st 10 functions

## [1] "add1.lm" "alias.lm"

## [3] "anova.lm" "case.names.lm"
## [5] "coerce,oldClass,S3-method" "confint.lm"
## [7] "cooks.distance.lm" "deviance.lm"
## [9] "dfbeta.lm" "dfbetas.lm"
An example is :

3
confint(lm_patients) # produces 95% CI of parameter estimates

## 2.5 % 97.5 %
## (Intercept) -232.564499 15.132220
## enz 2.417402 5.516158
Some global base R functions like plot( ), summary( ), print( ) can be applied to the lm model.
Example:
print(lm_patients)

##
## Call:
## lm(formula = time ~ enz, data = patients)
##
## Coefficients:
## (Intercept) enz
## -108.716 3.967
#' @plot does not have the data argument
#' so to avoid using the $ (indexing/subsetting) symbol
#' the @with is used to attach the patients dataset
#' for the @plot() function

with(patients, plot(enz, time)) # this produces a scatterplot of enz vs time

800
600
time

400
200

20 40 60 80 100 120

enz

4
0.4.2 Assessing Homogeneity of Variance and Normality Assumptions
The residual plot can be used to detect various problems such as non-linear, non-homogeneous variances
and outliers. If the data is of homogeneity of variance, most of residual points of the data randomly scatter
around the mean residual (or zero line). If patterns like curvature ( that is, non-homogeneity of variance)
are detected in the residual plot, we may consider fitting a more complicated model.
Checking Homogeneity of Variance
#' The function below is from the olsrr package
#'
#' @ols_plot_resid_fit function plots the model residuals against
#' the fitted values of the model
#' this function has one argument which is the name of the lm object

ols_plot_resid_fit(lm_patients)

#' Alternatively you can use the plot function from base R
#'
#' Applying the plot() on the lm object produces several diagnostic plots
#' the @which= can be used to extract a particular plot in this case,
#' the plot for Fitted values against residuals
plot(lm_patients, which =1)

Checking Normality of the Residuals

This assumption is assessed by checking the normality of the residuals. Shapiro-Wilk is a popular statistics
to evaluate normality of some data (in this case, residuals data). - Null Hypothesis: The residuals are
normally-distributed - Decision Rule: Reject the null, IF the p-value is LESS THAN the significance level
(say, 0.05) and conclude that the residuals are NOT not normally distributed.
#' @ols_test_normality also from the olsrr paxkage
#'
#' this function has one argument which is the name of the lm object

ols_test_normality(lm_patients)

#' Alternatively you can use the shaprio.wilk function from base R
#' Extract the model residuals @lm_patients$residuals

shapiro.test(lm_patients$residuals)

0.5 Fitting the Exponential Model

log(T IM E) = β0 + β1 EN Z + where T IM E is the Y , EN Z is X, and is a random error term that is
normally distributed with mean 0 and unknown variance σ 2 . β0 is the estimate of Y − intercept, and β1 is
the estimate of the slope coefficient.
#' The model below fits the log(Y) against X
#' In R the natural log is the default for the function log()
#'
#' Specify the model, log(time) = B0 + enz (B1) using the lm function

log_patients <- lm(log(time) ~ enz, data = patients)

summary(log_patients)

5
## Call:
## lm(formula = log(time) ~ enz, data = patients)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.19415 -0.29725 -0.02198 0.34125 1.01853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.558633 0.245526 14.494 < 2e-16 ***
## enz 0.019727 0.003072 6.423 4.12e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4753 on 52 degrees of freedom
## Multiple R-squared: 0.4424, Adjusted R-squared: 0.4316
## F-statistic: 41.25 on 1 and 52 DF, p-value: 4.118e-08
with(patients, plot(enz, log(time)))
3.5 4.0 4.5 5.0 5.5 6.0 6.5
log(time)

20 40 60 80 100 120

enz

0.6 Lab Assignment

Your assignment is to answer the following questions by performing necessary analysis in either SAS or R.
Only report or print necessary results.

6
0.6.1 Question 1
Make a scatter plot to show the relationship between TIME and ENZ. What is your observation? How about
the scatter plot showing the relationship between Log-Time ( that is the log transform of Time) and ENZ.

0.6.2 Question 2
Fit the simple linear regression model T IM E = β0 + β1 EN Z + . Write down the estimated regression
function and examine the residual plot and normality test. Describe what you observed and make brief
comments. Hint: you need to check the ANOVA table (that is the F-Statistic and its p-value on the
last line of the summary(yourmodel) output)), parameter estimates tables, R-Square, residual plot and
normality test.

0.6.3 Question 3
Fit the exponential model logT IM E = β0 + β1 EN Z + . Write down the estimated regression function.
Does the model fit well? Why? Hint: you need to check the ANOVA table (that is the F-Statistic and
its p-value on the last line of the summary(yourmodel) output)), parameter estimates tables, R-Square,
residual plot and normality test.
Remember to attach your code

0.6.4 Question 4
Compare the simple linear model in Question 1 and the exponential model in Question 2, do you observe any
improvements after conducting the exponential model relative to the linear model? Support your conclusion
with details (such as R-Square, homogeneity of variance and normality test)

Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Solutions To The Review Questions at The End of Chapter 7
No ratings yet
Solutions To The Review Questions at The End of Chapter 7
7 pages
Stochastic Modeling: Analysis and Simulation
From Everand
Stochastic Modeling: Analysis and Simulation
Barry L. Nelson
No ratings yet
Definition of Statistics: Examples
No ratings yet
Definition of Statistics: Examples
60 pages
Package Earlywarnings': R Topics Documented
No ratings yet
Package Earlywarnings': R Topics Documented
16 pages
Bootstrap: Estimate Statistical Uncertainties
No ratings yet
Bootstrap: Estimate Statistical Uncertainties
22 pages
Fitting Models With JAGS
No ratings yet
Fitting Models With JAGS
15 pages
Introduction To Probability and Statistics - Regression - Jupyter Notebook
No ratings yet
Introduction To Probability and Statistics - Regression - Jupyter Notebook
57 pages
Descriptive and Inferential Statistics With R
No ratings yet
Descriptive and Inferential Statistics With R
6 pages
Lavaan
No ratings yet
Lavaan
54 pages
Regression Ex
No ratings yet
Regression Ex
8 pages
Jahmm 0.6.1 Userguide
No ratings yet
Jahmm 0.6.1 Userguide
23 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Data Science Probability
No ratings yet
Data Science Probability
75 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Package Reams': R Topics Documented
No ratings yet
Package Reams': R Topics Documented
12 pages
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
No ratings yet
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
100 pages
Package Genetics': R Topics Documented
No ratings yet
Package Genetics': R Topics Documented
43 pages
User-Sem-Lavaan (SEM)
No ratings yet
User-Sem-Lavaan (SEM)
33 pages
Randomization in Matlab
No ratings yet
Randomization in Matlab
30 pages
Notes 4 - Influential Points and Departures From Linearity
No ratings yet
Notes 4 - Influential Points and Departures From Linearity
4 pages
Feature Selection For SVMS: J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik
No ratings yet
Feature Selection For SVMS: J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik
7 pages
The Opensees Examples Primer
100% (1)
The Opensees Examples Primer
46 pages
Minimax Optimization
No ratings yet
Minimax Optimization
126 pages
Guía R: Índice General
No ratings yet
Guía R: Índice General
46 pages
MLp
No ratings yet
MLp
28 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Package Gmodels': R Topics Documented
No ratings yet
Package Gmodels': R Topics Documented
20 pages
Package Rminer': R Topics Documented
No ratings yet
Package Rminer': R Topics Documented
43 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Counterexample Explanation in Divine Model-Checker: M U F I
No ratings yet
Counterexample Explanation in Divine Model-Checker: M U F I
56 pages
dlweek6
No ratings yet
dlweek6
4 pages
SC Lab File Fayiz PDF
No ratings yet
SC Lab File Fayiz PDF
29 pages
C Language Tutorial
No ratings yet
C Language Tutorial
15 pages
Regression Analysis
No ratings yet
Regression Analysis
57 pages
An Overview of The Ordinal Calculator
No ratings yet
An Overview of The Ordinal Calculator
34 pages
Eviews Brief Tutorial PDF
No ratings yet
Eviews Brief Tutorial PDF
11 pages
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
No ratings yet
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
7 pages
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
No ratings yet
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
8 pages
Mirt
No ratings yet
Mirt
103 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
11 Different Ways For Outlier Detection in Python
No ratings yet
11 Different Ways For Outlier Detection in Python
11 pages
Null Pointer Assignment
100% (2)
Null Pointer Assignment
10 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
manual
No ratings yet
manual
45 pages
Peak Indexing and Lattice Parameter Refinement
100% (1)
Peak Indexing and Lattice Parameter Refinement
14 pages
Applied Numerical Analysis With MATLAB - Simulink. For Engineers A
100% (1)
Applied Numerical Analysis With MATLAB - Simulink. For Engineers A
327 pages
orcutt
No ratings yet
orcutt
9 pages
Fast Matlab Code
No ratings yet
Fast Matlab Code
22 pages
ps 6
No ratings yet
ps 6
17 pages
Week 6 - Model Assumptions in Linear Regression
No ratings yet
Week 6 - Model Assumptions in Linear Regression
17 pages
Intro To Forecasting
No ratings yet
Intro To Forecasting
15 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Formulation for Observed and Computed Values of Deep Space Network Data Types for Navigation
From Everand
Formulation for Observed and Computed Values of Deep Space Network Data Types for Navigation
Theodore D. Moyer
No ratings yet
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
From Everand
JAVA PROGRAMMING FOR BEGINNERS: Master Java Fundamentals and Build Your Own Applications (2023 Crash Course)
Theo Houle
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Source DF Adj Ss Adj Ms F-Value P-Value Sig
100% (1)
Source DF Adj Ss Adj Ms F-Value P-Value Sig
7 pages
05 - BIOE 211 - Normal Distribution and Hypothesis Testing
No ratings yet
05 - BIOE 211 - Normal Distribution and Hypothesis Testing
29 pages
Bodo Winter's ANOVA Tutorial
No ratings yet
Bodo Winter's ANOVA Tutorial
18 pages
Studiu de Caz Markov Analysis: Data (One Step Transition Matrix)
No ratings yet
Studiu de Caz Markov Analysis: Data (One Step Transition Matrix)
3 pages
A Brief Overview of The Classical Linear Regression Model (CLRM)
No ratings yet
A Brief Overview of The Classical Linear Regression Model (CLRM)
85 pages
HW2368-Chapter3
No ratings yet
HW2368-Chapter3
18 pages
Chapter Four Data and Empirical Results 4.1 Unit Root Test
No ratings yet
Chapter Four Data and Empirical Results 4.1 Unit Root Test
4 pages
SL22ULBB039
No ratings yet
SL22ULBB039
12 pages
Thesis - Mastromatteo On The Typicalproblems of Inverse Statistical Mechanics
No ratings yet
Thesis - Mastromatteo On The Typicalproblems of Inverse Statistical Mechanics
183 pages
Estiiiiiiiii Terbaru
No ratings yet
Estiiiiiiiii Terbaru
4 pages
Statistical Analysis in Excel by Golden MCpherson
No ratings yet
Statistical Analysis in Excel by Golden MCpherson
315 pages
Goodness of Fit Testing
No ratings yet
Goodness of Fit Testing
8 pages
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
No ratings yet
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
44 pages
SaeHB Me Beta
No ratings yet
SaeHB Me Beta
6 pages
Practical Meta Analysis
No ratings yet
Practical Meta Analysis
6 pages
Trade Your Way To Financial Freedom GKK
10% (10)
Trade Your Way To Financial Freedom GKK
14 pages
Me III Sem Syllabus
No ratings yet
Me III Sem Syllabus
4 pages
Essential Assessment S&P 1.5-3
No ratings yet
Essential Assessment S&P 1.5-3
2 pages
ECONOMETRIE Tema 1
0% (1)
ECONOMETRIE Tema 1
11 pages
TGL 15 Desember 2023
No ratings yet
TGL 15 Desember 2023
10 pages
Comparison of Acceptability of Orthodontic Applian
No ratings yet
Comparison of Acceptability of Orthodontic Applian
9 pages
FDS SYLLABUS AIDS
No ratings yet
FDS SYLLABUS AIDS
2 pages
Anova & Factor Analysis
No ratings yet
Anova & Factor Analysis
24 pages
Thesis Chi Square
100% (3)
Thesis Chi Square
5 pages
Complete Bayesian Modeling and Computation in Python 1st Edition Martin PDF For All Chapters
100% (2)
Complete Bayesian Modeling and Computation in Python 1st Edition Martin PDF For All Chapters
79 pages
KE - LAB
No ratings yet
KE - LAB
20 pages
Economics 536 Introduction To Specification Testing in Dynamic Econometric Models
No ratings yet
Economics 536 Introduction To Specification Testing in Dynamic Econometric Models
6 pages
Role of Rules of Thumb in Forecasting: A Case Study of Foreign Tourist Arrivals in India
No ratings yet
Role of Rules of Thumb in Forecasting: A Case Study of Foreign Tourist Arrivals in India
15 pages

R Lab 4

Uploaded by

R Lab 4

Uploaded by

Simple Linear Regression and Curvilinear Regression

Lab 4 R Notes: EXST 7014/15

2. Check the assumption of homogeneous variances and normality test

3. Use the lm to fit curvilinear models.

0.2 Lab Setup

patients <- read.table('data_lab4.txt', header = TRUE, sep = "")

str(patients) # get a structure (description) of your dataset

## 'data.frame': 54 obs. of 6 variables:

0.4 Fitting the Simple Linear Regression Model

lm_patients <- lm(time ~ enz, data = patients)

0.4.1 Probing the lm function

## [1] "coefficients" "residuals" "effects" "rank"

## [1] "add1.lm" "alias.lm"

with(patients, plot(enz, time)) # this produces a scatterplot of enz vs time

Checking Normality of the Residuals

0.5 Fitting the Exponential Model

log_patients <- lm(log(time) ~ enz, data = patients)

0.6 Lab Assignment

You might also like