0% found this document useful (0 votes)
11 views20 pages

Computing With R

Uploaded by

Sleek Felix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views20 pages

Computing With R

Uploaded by

Sleek Felix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Computing with R

A.A. Ayenigba
0ctober 31, 2019

Contents
History and Overview of R 2
Advantages of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
R for Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
What you will learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
R and R Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
R as a calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Comment in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Variable assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Variable assignment and data types in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Naming Rules for Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Rules for naming variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Basic classes of objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Basic data structure or types 7


Create a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Naming a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Create a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Arithmetic operation with vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Vector selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Create special vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Matrices 9
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Progressing from vector to matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Naming a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Matrices selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Arithmetic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
System of linear equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Short group work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Dataframe 15
Quick, have a look at your dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Using built-in datasets in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Statistical modelling in R 17
Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Multiple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

History and Overview of R

Figure 1: R programming

R is a dialect of S and S is a language that was developed by John Chambers and others at the old Bell
Telephone Laboratories, originally part of AT & T Corp. S was initiated in 1976 as an internal statistical
analysis environment—originally implemented as Fortran libraries.
The R language came to use quite a bit after S had been developed. One key limitation of the S language
was that it was only available in a commercial package, S-PLUS. In 1991, R was created by Ross Ihaka
and Robert Gentleman in the Department of Statistics at the University of Auckland. In 1993 the first
announcement of R was made to the public.
R is a programming language and free software environment for statistical computations, data cleaning, data
analysis and graphical representation of data. The R language is widely used among statisticians and data
miners for developing statistical software and data analysis.

Advantages of R
1. Availability:

2
R programming language is open source. This makes it highly cost effective for a project of any size. Since it
is open source, developments in R happen at a rapid scale and the community of developers is huge. All of
this, along with a tremendous amount of learning resources makes R programming a perfect choice to begin
learning R programming for data science. Because there are many new developers exploring the landscape of
R programming it is easier and cost-effective to recruit or outsource to R developers.
2. Academia:
R is a very popular language in academia. Many researchers and scholars use R for data analysis. Many
popular books and learning resources on statistics use R for statistical analysis as well. Since it is a language
preferred by academicians, this creates a large pool of people who have a good working knowledge of R
programming. Putting it differently, if many people study R programming in their academic years than this
will create a large pool of skilled statisticians who can use this knowledge when they move to the industry.
Thus, leading increased traction towards this language.
3. Data wrangling
Data wrangling is the process of cleaning messy and complex data sets to enable convenient consumption and
further analysis. This is a very important and time taking process in data science. R has an extensive library
of tools for database manipulation and wrangling. Some of the popular packages for data manipulation in R
include:
dplyr - Created and maintained by Hadley Wickham, dplyr is best known for its data exploration and
transformation capabilities and highly adaptive chaining syntax.
data.table- It allows for faster manipulation of data set with minimum coding. It simplifies data aggregation
and drastically reduces the compute time.
readr- ‘readr’ helps in reading various forms of data into R. By not converting characters into factors it
performs the task at 10x faster speed.
4. Data visualization: Data visualization is the visual representation of data in graphical form. This allows
analyzing data from angles which are not clear in unorganized or tabulated data. R has many tools
that can help in data visualization, analysis, and representation. The R packages ggplot2 and ggedit for
have become the standard plotting packages. While the ggplot2 package is focused on visualizing data,
ggedit helps users bridge the gap between making a plot and getting all of those pesky plot aesthetics
precisely correct.
5. Specificity:
R is a language designed especially for statistical analysis and data reconfiguration. All the R libraries
focus on making one thing certain - to make data analysis easier, more approachable and detailed. Any new
statistical method is first enabled through R libraries. This makes R a perfect choice for data analysis and
projection. Members of the R community are very active and supporting and they have a great knowledge of
statistics as well as programming. This all gives R a special edge, making it a perfect choice for data science
projects.
6. Machine learning:
At some point in data science, a programmer may need to train the algorithm and bring in automation and
learning capabilities to make predictions possible. R provides ample tools to developers to train and evaluate
an algorithm and predict future events. Thus, R makes machine learning (a branch of data science) lot more
easy and approachable. The list of R packages for machine learning is really extensive. R machine learning
packages include MICE (to take care of missing values), rpart & PARTY (for creating data partitions),
CARET (for classification and regression training), randomFOREST (for creating decision trees) and much
more.

3
R for Data Science
Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and
knowledge. The goal of R for Data Science is to help you learn the most important tools in R that will allow
you to do data science. Data science is a huge field, and there’s no way you can master it by reading a single
book.

What you will learn

Figure 2: Data science phases

R and R Studio
R is a statistical programming language for data analysis and visualization while R Studio is an integrated
development environment (IDE) for R programming. R Studio makes programming easier in R.

Figure 3: R Studio

4
In this section, you will take your first steps with R. You will learn how to use the console as a calculator and
how to assign variables. You will also get to know the basic data types in R. Let’s get started!

R as a calculator
In its most basic form, R can be used as a simple calculator. Consider the following arithmetic operations:
• Addition :
• Subtraction :
• Multiplication :
• Division :
• Exponentiation:
• Modulo :
Calculate 6 + 12
6 + 12

## [1] 18
Calculate 800 − 900
800 - 900

## [1] -100
Calculate 4 × 5
4 * 5

## [1] 20
2018
Calculate 2
2018 / 2

## [1] 1009
Calculate 23
2^3

## [1] 8
Calculate 20%%3
20 %% 3

## [1] 2

Calculate the square root of 4
sqrt(4)

## [1] 2

Calculate ( 4)2
(sqrt(4))^2

## [1] 4

5
Comment in R
R makes use of the # sign to add comments, so that you and others can understand what the R code is about.
Just like Twitter! Comments are not run as R code, so they will not influence your result. For example, any
code like #3 + 4 at the console is a comment. R ignores any code in #, this means that the code will not run.
# 3+4

Variable assignment
A basic concept in statistical programming is called a variable. A variable allows you to store a value (e.g. 5)
or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access
the value or the object that is stored with this variable.
Example
Store the value of 4 as your first name
ezekiel <- 4

To know what is stored in memory as your first name, type your first name in the console and press return
key from the keyboard
ezekiel

## [1] 4

Variable assignment and data types in R


x <- 3
y <- 4
z <- 10

x + y

## [1] 7
z - x - y

## [1] 3
x * y

## [1] 12
z^x

## [1] 1000

Naming Rules for Variables


The best naming convention is to choose a variable name that will tell the reader of the program what the
variable represents

Rules for naming variables


• All variables must begin with a letter of the alphabet.
• After the initial letter, variable names can also contain (_ or .) and numbers. No spaces or special
characters, however are allowed.
• Uppercase characters are different from lowercase characters (in R and also in Python)

6
Example

Samples of acceptable variable names Samples of acceptable variable names


Grade Grade(Test)
GradeOnTest GradeTest#1
Ibadan_R_users Ibadan R users
sales_price_2017 2017sales_price

Basic classes of objects


R works with numerous atomic classes of objects. Some of the most basic atomic data types to get started
are:
• Decimas values like 4.7 are called numeric
• Natural numbers like 4 are called integers. Integers are also numeric
• Boolean values (TRUE or FALSE) are called logical
• Text (or string) values are called characters
• Factors : Categorical variable where each level is a category

Basic data structure or types


1. Vector : A collection of elements of the same class
2. Matrix : All columns must uniformly contain only one variable type
3. data.frame : The columns can contain different classes
4. List : Can hold object of different classes and lenght

Create a vector
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. In R, you can
create a vector with the combine function c(). You place the vector elements separate by a comma between
the parenthesis.
For example
character.vector <- c('Ayenigba', 'Emmanuel', 'Ezekiel', 'Ajayi', 'Ebun')

numeric_vector <- c(1, 2, 3, 6, 7, 10)

Notice
Adding a space behind the commas in the c() function improves the readability of your code

Naming a vector
As a data analyst, it is important to have a clear view on the data that you are using. Understanding what
each element refers to is essential. You can give a name to the elements of a vector with the names ()
function

Create a vector
Example
sales_tax <- c(140000, 200000, 600000, 180000, 170000)
names(sales_tax) <- c(
"Monday", "Tuesday", "Wednessday",

7
"Thursday", "Friday"
)
sales_tax

## Monday Tuesday Wednessday Thursday Friday


## 140000 200000 600000 180000 170000

Arithmetic operation with vectors


It is important to know that if you sum two vectors in R, it takes the element-wise sum
Example
a <- c(1, 2, 3, 4, 5)
b <- c(6, 7, 8, 9, 10)
c <- a + b
c

## [1] 7 9 11 13 15

Vector selection
To select elements of a vector (and later matrices, data frames), you can use square brackets [ ], between the
square brackets, you indicate what elements to select.
To select the first elements of vector a, you type a[1].
To select the second element of the vector, you typed a[2], etc.
Example
a
a[1]
a[2]

Short group work


What does it do?
a[a>3]

Create special vectors


a <- 1:10 # Create sequence 1 to 10
b <- 10:1 # Create sequence 10 to 1

To create sequence with increament of 2 from 1 to 16, we can seq() function e.g.
seq(1, 16, 2)
seq(1, 20, 0.1)
seq(20, 1, -0.1)

If you have a sequence value you don’t know the last element, say you just know the start of the sequence
and the length of the sequence, e.g.
seq(5, by = 2, length = 50)
length(seq(5, by = 2, length = 50))

Repeating elements for certain number of time

8
rep(5, 10) # Repeat 5 in 10 times
rep(1:4, 5) # Repeat 1 to 4 five times
rep(1:4, each = 3) # Each element of 1 to 4 3 times

Short group work


What are the output of the following codes?
rep(1:4, each = 3, time = 2)
rep(1:4, 1:4)
rep(1:4, c(4, 1, 8, 2))

Group work

Figure 4: Group work

Matrices
In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into
a fixed number of rows and columns.
Since we are only working with rows and columns, a matrix is called two dimensional array.
You can construct a matrix in R with the matrix () function.
Example
A <- matrix(1:9, nrow = 3, byrow = TRUE)
A
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9

• The first argument is the collection of elements that #Rstats will arrange into the rows and columns of
the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, . . . , 9).
• The arguement byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled
by the columns, we just place byrow=FALSE

9
• The argument nrow indicates that the matrix should have 3 rows

Short group work


Construct a matrix with 3 rows containing the numbers 1 up to 9 filled column-wise

Progressing from vector to matrix


fiscal_year2016_17 <- c(140, 134)

fiscal_year2017_18 <- c(160, 158)

performance_analysis <- matrix(c(


fiscal_year2016_17,
fiscal_year2017_18
),
nrow = 2,
ncol = 2, byrow = T
)
performance_analysis

## [,1] [,2]
## [1,] 140 134
## [2,] 160 158

Naming a matrix
To help you understand what is stored in the performance analysis matrix, it is good to add the names of
the rows and columns respectively. Not only does this help you to read the data, but it also useful to select
certain elements from the matrix.
rownames(performance_analysis) <-
c(
"Fiscal year July-June 2016/17",
"Fiscal year July-June 2017/18"
)

colnames(performance_analysis) <- c("Actual", "Target")

performance_analysis

## Actual Target
## Fiscal year July-June 2016/17 140 134
## Fiscal year July-June 2017/18 160 158

Other examples
A <- matrix(c(1, 3, 5, 7, 9, 11, 13, 15, 17),
ncol = 3,
byrow = F
)
A

## [,1] [,2] [,3]


## [1,] 1 7 13

10
## [2,] 3 9 15
## [3,] 5 11 17
B <- matrix(c(2, 4, 6, 8, 10, 12, 14, 16, 18),
ncol = 3,
byrow = F
)
B

## [,1] [,2] [,3]


## [1,] 2 8 14
## [2,] 4 10 16
## [3,] 6 12 18

Matrices selection
To select elements in a matrix we can use square brackets [ , ], between the square brackets, you indicate the
position of the row and column in which the elements to select are.
To select the element in the first row and second column of matrix A, you type A[1,2].
To select the element in the third row and second column of matrix A, you type A[3,2], etc.
Example
A
A[1, 2]
A[3, 2]

Arithmetic Operation
We can perform all the arithmetic operations on matrices
• Addition
C <- A + B
C

## [,1] [,2] [,3]


## [1,] 3 15 27
## [2,] 7 19 31
## [3,] 11 23 35
• Subtraction
D <- B - A
D

## [,1] [,2] [,3]


## [1,] 1 1 1
## [2,] 1 1 1
## [3,] 1 1 1
• Multiplication
F <- A %*% B
F

## [,1] [,2] [,3]


## [1,] 108 234 360
## [2,] 132 294 456

11
## [3,] 156 354 552
• Transpose

 
1 7 13
G = t(A) = 3 9 15
5 11 17

G <- t(A)
G

## [,1] [,2] [,3]


## [1,] 1 3 5
## [2,] 7 9 11
## [3,] 13 15 17
• Determinant

1 7 13
G = det(A) = 3 9 15
5 11 17

G <- det(A)
G

## [1] 4.263256e-14

Inverse
For inverse, we use solve() a base function in R
H <- solve(B)
H
Did you encounter a problem?
Be of good cheer; for I have overcome the world!- Jesus Christ in John 16:33
Inverse function to tackle the problem
inverse <- function(A) {
if (det(A) < 0.01) {
cat("Since the given matrix is singular.
Sorry, I can't find inverse")
} else {
solve(A)
}
}
inverse(A)

## Since the given matrix is singular.


## Sorry, I can't find inverse

12
Short group work
Use the function that you wrote to find the inverse of matrix J, where J is:

 
5 1 0
J = 3 −1 2
4 0 −1

Note
Assign the matrix to J and call inverse(J) in R

Short group work


Can you also confirm the result with the base function solve(J)?
solve(J)

Are they the same? Try it with this R-code


inverse(J) == solve(J)

System of linear equation


We can use matrix skills to solve any system of linear equations
Solve the following system of equations

x−y =3
2x + 3y = −4

Matrices preparation

     
1 −1 x 3
A= B= C=
2 3 y −4

B = A−1 × C

Codes in R
A <- matrix(c(1, -1, 2, 3), nrow = 2, byrow = T)
A

## [,1] [,2]
## [1,] 1 -1
## [2,] 2 3
C <- matrix(c(3, -4), nrow = 2, byrow = T)
C

## [,1]
## [1,] 3
## [2,] -4

13
Codes in R
B <- solve(A) %*% C
B

## [,1]
## [1,] 1
## [2,] -2
x <- B[1, 1]
x

## [1] 1
y <- B[2, 1]
y

## [1] -2

Eigenvalues and Eigenvectors


Consider the following matrix

 
1 −6
B=
3 −8

1. Determine the eigenvalues of B


2. Determine the eigenvectors corresponding to each eigenvalue of B
Solution
B <- matrix(c(1, -6, 3, -8),
nrow = 2, ncol = 2,
byrow = TRUE
)
print(B) # To see the matrix

## [,1] [,2]
## [1,] 1 -6
## [2,] 3 -8

Eigenvalues and Eigenvectors


The function for calculating eigenvalues is eigen(). Note the function eigen() will produce a list as
results. You will soon know what a list() is in the next next section.
eigen(B)

## eigen() decomposition
## $values
## [1] -5 -2
##
## $vectors
## [,1] [,2]
## [1,] 0.7071068 0.8944272
## [2,] 0.7071068 0.4472136

14
Short group work
Consider the following matrix

 
4 5 −5
B = 0 4 1
0 1 2

1. Determine the eigenvalues of B


2. Determine the eigenvectors corresponding to each eigenvalue of B

Dataframe
Dataframes are another way to put data in tables! Unlike matrices, dataframes can have different types of
data!
A dataframe has the variables of a data set as columns and the observations as rows. This will be a familiar
concept for those coming from different statistical software packages such as Excel, SPSS, or STATA
The function for dataframe is data.frame().
Example
# Make a dataframe with columns named a and b
data.frame(a = 2:4, b = 5:7)

a b
2 5
3 6
4 7

The numbers 1 2 3 at the left on your console are row labels and are not a column of the dataframe
Each column in a dataframe is a vector!
Example
a <- c(6, 5, 1)

b <- c(1, 1, 3)

data <- data.frame(a, b) # The output is ?

Group work
Create a dataframe and call it data for the following vectors:
# Set the same seed to get the same sample
set.seed(123)
height <- rnorm(n = 100, mean = 135, sd = 12)
weight <- rnorm(n = 100, mean = 55, sd = 9)

Quick, have a look at your dataset


Working with large datasets is common in data science. When you work with (extremely) large datasets and
dataframes, your first task as a data analyst is to develop a clear understanding of its structure and main

15
elements. Therefore, it is often useful to show only part of the entire dataset.
1. head(): enables you to show the first observations of a dataframe.
2. tail(): enables you to print out the last observations in your dataset.
Both head() and tail() print a top line called header, which contains the names of the different variables
in your data set.
Another method that is often used to get a rapid overview of your dataset is the function str().
3. str(): Shows you the structure of your dataset
The structure of a dataframe tells you :
1. The total number of observations
2. The total number of variables
3. A full list of the variables names
4. The first observations
Note
Applying the str() function will often be the first thing that you do when receiving a new dataset or
dataframe. It is a great way to get more insight in your dataset before diving into the real analysis.
Example
Consider the vectors:
height <- rnorm(n = 120, mean = 135, sd = 12)
weight <- rnorm(n = 120, mean = 55, sd = 9)

Create a dataframe for it.


data <- data.frame(height, weight)

str(data)

## 'data.frame': 120 obs. of 2 variables:


## $ height: num 161 151 132 142 130 ...
## $ weight: num 57.1 66 43 60.9 50.3 ...
Example
head(data, 5)

height weight
161.3857 57.13687
150.7490 65.96298
131.8183 42.95103
141.5183 60.94738
130.0279 50.29379

tail(data, 3)

height weight
118 119.8962 64.76298
119 155.2132 34.97511
120 145.9367 66.12124

16
Using built-in datasets in R
There are several ways to find the included datasets in R. Using data() will give you a list of the datasets of
all loaded packages.
data()

Example
library(datasets)

data <- airquality

str(data)

To get help for the proper description of the dataset


?airquality

Statistical modelling in R
In this section, we will use R for statistical modelling.

Simple linear regression


When one variable influences the other variable, then we can say there is a linear relationship between them.
A simple linear model has one independent variable (X) that is related to the other dependent variable (Y).
The simple linear regression model is:
Y = b0 + b1 X +  where b0 is the intercept on the Y-axis, b1 is the slope and  is the error term.

Example
We shall use women dataset in R. The description about women dataset can be seen by using ?women i.e.
?women
data <- women

head(data)

height weight
58 115
59 117
60 120
61 123
62 126
63 129

str(women)

## 'data.frame': 15 obs. of 2 variables:


## $ height: num 58 59 60 61 62 63 64 65 66 67 ...
## $ weight: num 115 117 120 123 126 129 132 135 139 142 ...
In this data, the dependent variable is height and independent variable is weight.

17
model <- lm(height ~ weight, data = data)

The function lm() is used to fit the linear model and ~ is used separate dependent variable from independent
variable, and we specify the name of our data in argument data.
To see the results:
model

##
## Call:
## lm(formula = height ~ weight, data = data)
##
## Coefficients:
## (Intercept) weight
## 25.7235 0.2872
From the results, we see that:
height = 25.7235 + 0.2872weight.
To see the full results, we use summary() function i.e.
summary(model)

##
## Call:
## lm(formula = height ~ weight, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83233 -0.26249 0.08314 0.34353 0.49790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.723456 1.043746 24.64 2.68e-12 ***
## weight 0.287249 0.007588 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.44 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14

Multiple linear regression


Multiple linear regression is an extension of simple linear regression in that we have more than one dependent
variable.
The statistical model for multiple linear regression is:
Y = b0 + b1 X1 + b2 X2 + · · · + bp Xp +  where p is the number of indepent variables in the model and is the
error term.

Example
We shall be using attitude dataset in R. The description about the dataset can be seen by using ?attitude
i.e.

18
?attitude
dataset <- attitude

head(dataset)

rating complaints privileges learning raises critical advance


43 51 30 39 61 92 45
63 64 51 54 63 73 47
71 70 68 69 76 86 48
61 63 45 47 54 84 35
81 78 56 66 71 83 47
43 55 49 44 54 49 34

str(dataset)

## 'data.frame': 30 obs. of 7 variables:


## $ rating : num 43 63 71 61 81 43 58 71 72 67 ...
## $ complaints: num 51 64 70 63 78 55 67 75 82 61 ...
## $ privileges: num 30 51 68 45 56 49 42 50 72 45 ...
## $ learning : num 39 54 69 47 66 44 56 55 67 47 ...
## $ raises : num 61 63 76 54 71 54 66 70 71 62 ...
## $ critical : num 92 73 86 84 83 49 68 66 83 80 ...
## $ advance : num 45 47 48 35 47 34 35 41 31 41 ...
In this data, the dependent variable is rating.
model <- lm(rating ~ ., data = dataset)

The function lm() is used to fit the linear model and ~. is used separate dependent variable from independent
variable and to include all the independent variables in the dataset, and we specify the name of our data in
argument dataset.
To see the results:
model

##
## Call:
## lm(formula = rating ~ ., data = dataset)
##
## Coefficients:
## (Intercept) complaints privileges learning raises
## 10.78708 0.61319 -0.07305 0.32033 0.08173
## critical advance
## 0.03838 -0.21706
From the results, we see that:
attitude = 10.78707639 + 0.61318761(complaints) − 0.07305014(privileges) + 0.32033212(learning) +
0.08173213(raises) + 0.03838145(critical) − 0.21705668(advance)
To see the full results, we use summary() function i.e.
summary(model)

##
## Call:

19
## lm(formula = rating ~ ., data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.9418 -4.3555 0.3158 5.5425 11.5990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.78708 11.58926 0.931 0.361634
## complaints 0.61319 0.16098 3.809 0.000903 ***
## privileges -0.07305 0.13572 -0.538 0.595594
## learning 0.32033 0.16852 1.901 0.069925 .
## raises 0.08173 0.22148 0.369 0.715480
## critical 0.03838 0.14700 0.261 0.796334
## advance -0.21706 0.17821 -1.218 0.235577
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.068 on 23 degrees of freedom
## Multiple R-squared: 0.7326, Adjusted R-squared: 0.6628
## F-statistic: 10.5 on 6 and 23 DF, p-value: 1.24e-05

20

You might also like