R Unit 4th and 5th
R Unit 4th and 5th
Salary dataset:
Years experienced Salary
1.1 39343.00
1.3 46205.00
1.5 37731.00
2.0 43525.00
2.2 39891.00
2.9 56642.00
3.0 60150.00
3.2 54445.00
3.2 64445.00
3.7 57189.00
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
print(relation)
Coefficients:
(Intercept) x
-38.4551 0.6746
1. The selling price of a house can depend on the desirability of the location,
the number of bedrooms, the number of bathrooms, the year the house
was built, the square footage of the lot, and a number of other factors.
2. The height of a child can depend on the height of the mother, the height of
the father, nutrition, and environmental factors.
Or,
Syntax
The basic syntax for lm() function in multiple regression is −
lm(y ~ x1+x2+x3...,data)
mpg disp hp wt
Mazda RX4 21.0 160 110 2.620
Mazda RX4 Wag 21.0 160 110 2.875
Datsun 710 22.8 108 93 2.320
Hornet 4 Drive 21.4 258 110 3.215
Hornet Sportabout 18.7 360 175 3.440
Valiant 18.1 225 105 3.460
a <- coef(model)[1]
print(a)
print(Xdisp)
print(Xhp)
print(Xwt)
Stepwise Regression in R :
The stepwise regression (or stepwise selection) consists of iteratively adding and removing predictors, in
the predictive model, in order to find the subset of variables in the data set resulting in the best performing
model, that is a model that lowers prediction error.
There are three strategies of stepwise regression (James et al. 2014,P. Bruce and Bruce (2017)):
1. Forward selection, which starts with no predictors in the model, iteratively adds the most
contributive predictors, and stops when the improvement is no longer statistically significant.
2. Backward selection (or backward elimination), which starts with all predictors in the model
(full model), iteratively removes the least contributive predictors, and stops when you have a
model where all predictors are statistically significant.
3. Stepwise selection (or sequential replacement), which is a combination of forward and
backward selections. You start with no predictors, then sequentially add the most contributive
predictors (like forward selection). After adding each new variable, remove any variables that no
longer provide an improvement in the model fit (like backward selection).
Examples of use of decision tress is − predicting an email as spam or not spam, predicting of a tumor is
cancerous or predicting a loan as a good or bad credit risk based on the factors in each of these.
Syntax
The basic syntax for creating a decision tree in R is −
ctree(formula, data)
Following is the description of the parameters used −
• formula is a formula describing the predictor and response variables.
• data is the name of the data set used.
And this algorithm can easily be implemented in the R language. Some important
points about decision tree classifiers are,
• It is more interpretable
• Automatically handles decision-making
• Bisects the space into smaller spaces
• Prone to overfitting
• Can be trained on a small training set
• Majorly affected by noise
Example
We will use the ctree() function to create the decision tree and see its graph.
Random Forest
Random forest is a machine learning algorithm that uses a collection of decision
trees providing more flexibility, accuracy, and ease of access in the output. This
algorithm dominates over decision trees algorithm as decision trees provide poor
accuracy as compared to the random forest algorithm. In simple words, the random
forest approach increases the performance of decision trees. It is one of the best
algorithm as it can use both classification and regression techniques. Being a
supervised learning algorithm, random forest uses the bagging method in decision
trees and as a result, increases the accuracy of the learning model.
Random forest searches for the best feature from a random subset of features
providing more randomness to the model and results in a better and accurate
model. Let us learn about the random forest approach with an example. Suppose a
man named Bob wants to buy a T-shirt from a store. The salesman asks him first
about his favourite colour. This constitutes a decision tree based on colour feature.
Further, the salesman asks more about the T-shirt like size, type of fabric, type of
collar and many more. More criteria of selecting a T-shirt will make more decision
trees in machine learning. Together all the decision trees will constitute to random
forest approach of selecting a T-shirt based on many features that Bob would like
to buy from the store.
Syntax
The basic syntax for creating a random forest in R is −
randomForest(formula, data)
Following is the description of the parameters used −
• formula is a formula describing the predictor and response variables.
• data is the name of the data set used.
Example
We will use the randomForest() function to create the decision tree and see it's graph.
R - Pie Charts
R Programming language has numerous libraries to create charts and graphs. A pie-chart is a
representation of values as slices of a circle with different colors. The slices are labeled and the
numbers corresponding to each slice is also represented in the chart.
In R the pie chart is created using the pie() function which takes positive numbers as a vector input.
The additional parameters are used to control labels, color, title etc.
Syntax
The basic syntax for creating a pie-chart using the R is −
pie(x, labels, radius, main, col, clockwise)
Following is the description of the parameters used −
• x is a vector containing the numeric values used in the pie chart.
• labels is used to give description to the slices.
• radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
• main indicates the title of the chart.
• col indicates the color palette.
• clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
Example
A very simple pie-chart is created using just the input vector and labels. The below script will
create and save the pie chart in the current R working directory.
R - Bar Charts
A bar chart represents data in rectangular bars with length of the bar proportional to the value of
the variable. R uses the function barplot() to create bar charts. R can draw both vertical and
Horizontal bars in the bar chart. In bar chart each of the bars can be given different colors.
Syntax
The basic syntax to create a bar-chart in R is −
barplot(H,xlab,ylab,main, names.arg,col)
Following is the description of the parameters used −
Live Demo
# Create the data for the chart
H <- c(7,12,28,3,41)
Example
The below script will create and save the bar chart in the current R working directory.
Syntax
The basic syntax to create a boxplot in R is −
boxplot(x, data, notch, varwidth, names, main)
Following is the description of the parameters used −
• x is a vector or a formula.
• data is the data frame.
• notch is a logical value. Set as TRUE to draw a notch.
• varwidth is a logical value. Set as true to draw width of the box proportionate to the
sample size.
• names are the group labels which will be printed under each boxplot.
• main is used to give a title to the graph.
R – Histograms
A histogram represents the frequencies of values of a variable bucketed into ranges. Histogram is
similar to bar chat but the difference is it groups the values into continuous ranges. Each bar in
histogram represents the height of the number of values present in that range.
R creates histogram using hist() function. This function takes a vector as an input and uses some
more parameters to plot histograms.
Syntax
The basic syntax for creating a histogram using R is −
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
• v is a vector containing numeric values used in histogram.
• main indicates title of the chart.
• col is used to set color of the bars.
• border is used to set border color of each bar.
• xlab is used to give description of x-axis.
• xlim is used to specify the range of values on the x-axis.
• ylim is used to specify the range of values on the y-axis.
• breaks is used to mention the width of each bar.
Example
A simple histogram is created using input vector, label, col and border parameters.
The script given below will create and save the histogram in the current R working directory.
R - Line Graphs
A line chart is a graph that connects a series of points by drawing line segments between them.
These points are ordered in one of their coordinate (usually the x-coordinate) value. Line charts are
usually used in identifying the trends in data.
The plot() function in R is used to create the line graph.
Syntax
The basic syntax to create a line chart in R is −
plot(v,type,col,xlab,ylab)
Following is the description of the parameters used −
• v is a vector containing the numeric values.
• type takes the value "p" to draw only the points, "l" to draw only the lines and "o" to
draw both points and lines.
• xlab is the label for x axis.
• ylab is the label for y axis.
• main is the Title of the chart.
• col is used to give colors to both the points and lines.
Example
A simple line chart is created using the input vector and the type parameter as "O". The below
script will create and save a line chart in the current R working directory.
R – Scatterplots
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of
two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
The basic syntax for creating scatterplot in R is −
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Following is the description of the parameters used −
• x is the data set whose values are the horizontal coordinates.
• y is the data set whose values are the vertical coordinates.
• main is the tile of the graph.
• xlab is the label in the horizontal axis.
• ylab is the label in the vertical axis.
• xlim is the limits of the values of x used for plotting.
• ylim is the limits of the values of y used for plotting.
• axes indicates whether both axes should be drawn on the plot.
Example
We use the data set "mtcars" available in the R environment to create a basic scatterplot. Let's
use the columns "wt" and "mpg" in mtcars.
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
# Save the file.
dev.off()