Unit 3 - Supervised Learning
Unit 3 - Supervised Learning
3rd Semester
Linear Integrated
4th Semester
2nd Semester
Wireless
Communication -
EC3501 Embedded Systems
and IOT Design -
ET3491
VLSI and Chip Design
5th Semester
8th Semester
6th Semester
Introduction to machine learning – Linear Regression Models: Least squares, single & multiple variables, Bayesian linear
regression, gradient descent,
Linear Classification Models: Discriminant function – Probabilistic discriminative model - Logistic regression, Probabilistic
generative model – Naive Bayes, Maximum margin classifier – Support vector machine, Decision Tree, Random forests.
Machine learning is a growing technology which enables computers to learn automatically from past data. Machine learning uses
various algorithms for building mathematical models and making predictions using historical data or information. Currently, it is
being used for various tasks such as image recognition, speech recognition, email filtering, Facebook auto-tagging, recommender
system, and many more.
This machine learning tutorial gives you an introduction to machine learning along with the wide range of machine learning techniques
such as Supervised, Unsupervised, and Reinforcement learning. You will learn about regression and classification models, clustering
methods, hidden Markov models, and various sequential models.
What is Regression?
Regression allows researchers to predict or explain the variation in one variable based on another
variable.
The variable that researchers are trying to explain or predict is called the response variable. It is also
sometimes called the dependent variable because it depends on another variable.
The variable that is used to explain or predict the response variable is called the explanatory variable. It is
also sometimes called the independent variable because it is independent of the other variable.
In regression, the order of the variables is very important. The explanatory variable (or the independent
variable) always belongs on the x-axis. The response variable (or the dependent variable) always belongs on
the y-axis.
Example:
If it is already known that there is a significant correlation between students’ GPA and their self-esteem, the
next question researchers might ask is: Can students’ scores on a self-esteem scale be predicted based on
GPA? In other words, does GPA explain self-esteem? These are the types of questions that regression
responds to.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
**Note that these questions do not imply a causal relationship. In this example, GPA is the explanatory
variable (or the independent variable) and self-esteem is the response variable (or the dependent variable).
GPA belongs on the x-axis and self-esteem belongs on the y-axis.
Regression is essential for any machine learning problem that involves continuous numbers, which includes
a vast array of real-life applications:
2. Automobile testing
3. Weather analysis
4. Time series forecasting
Types of regression(2M,16M)
What are the three approaches in stepwise regression?(2M)
Linear Regression
Logistic Regression
Polynomial Regression
Stepwise Regression
Ridge Regression
Lasso Regression
Elastic Net Regression
Simple linear regression is useful for finding relationship between two continuous variables. One is predictor
or independent variable and other is response or dependent variable. It looks for statistical relationship but
not deterministic relationship. Relationship between two variables is said to be deterministic if one variable
can be accurately expressed by the other. For example, using temperature in degree Celsius it is possible to
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
accurately predict Fahrenheit. Statistical relationship is not accurate in determining relationship between two
variables. For example, relationship between height and weight.
The core idea is to obtain a line that best fits the data. The best fit line is the one for which total prediction
error (all data points) are as small as possible. Error is the distance between the point to the regression line.
Calculate the regression coefficient and obtain the lines of regression for the following data
Solution:
Regression coefficient of X on Y
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
3.4.3 Types of Regression
Types of regression(2M,16M)
What are the three approaches in stepwise regression?(2M)
Linear Regression
Logistic Regression
Polynomial Regression
Stepwise Regression
Ridge Regression
Lasso Regression
Elastic Net Regression
Simple linear regression is useful for finding relationship between two continuous variables. One is predictor
or independent variable and other is response or dependent variable. It looks for statistical relationship but
not deterministic relationship. Relationship between two variables is said to be deterministic if one variable
can be accurately expressed by the other. For example, using temperature in degree Celsius it is possible to
accurately predict Fahrenheit. Statistical relationship is not accurate in determining relationship between two
variables. For example, relationship between height and weight.
The core idea is to obtain a line that best fits the data. The best fit line is the one for which total prediction
error (all data points) are as small as possible. Error is the distance between the point to the regression line.
Calculate the regression coefficient and obtain the lines of regression for the following data
Solution:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Regression coefficient of X on Y
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Explain Least Square Regression line With example(13M)
The least squares method is a form of mathematical regression analysis used to determine the line of best
fit for a set of data, providing a visual demonstration of the relationship between the data points. Each
point of data represents the relationship between a known independent variable and an unknown
dependent variable.
This method of regression analysis begins with a set of data points to be plotted on an x- and y-axis graph.
An analyst using the least squares method will generate a line of best fit that explains the potential
relationship between independent and dependent variables.
The least squares method is used in a wide variety of fields, including finance and investing. For financial
analysts, the method can help to quantify the relationship between two or more variables—such as a
stock’s share price and its earnings per share (EPS). By performing this type of analysis investors often try
to predict the future behavior of stock prices or other factors.
The regression line under the Least Squares method is calculated using the following formula –
y = a + bx
Where,
y = dependent variable
x = independent variable
a = y-intercept
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Or
If the data shows a leaner relationship between two variables, the line that best fits this linear relationship
is known as a least-squares regression line, which minimizes the vertical distance from the data points to
the regression line. The term “least squares” is used because it is the smallest sum of squares of errors,
which is also called the "variance."
In regression analysis, dependent variables are illustrated on the vertical y-axis, while independent
variables are illustrated on the horizontal x-axis. These designations will form the equation for the line of
best fit, which is determined from the least squares method.
In contrast to a linear problem, a non-linear least-squares problem has no closed solution and is generally
solved by iteration.
EXAMPLE:
The line of best fit is a straight line drawn through a scatter of data points that best represents the
relationship between them.
Let us consider the following graph wherein a set of data is plotted along the x and y-axis. These data points
are represented using the blue dots. Three lines are drawn through these points – a green, a red, and a blue
line. The green line passes through a single point, and the red line passes through three data points.
However, the blue line passes through four data points, and the distance between the residual points to the
blue line is minimal as compared to the other two lines.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
In the above graph, the blue line represents the line of best fit as it lies closest to all the values and the
distance between the points outside the line to the line is minimal (i.e., the distance between the residuals
to the line of best fit – also referred to as the sums of squares of residuals). In the other two lines, the
orange and the green, the distance between the residuals to the lines is greater as compared to the blue
line.
3.4.5 MULTIPLE REGRESSION:
Multiple regression is a statistical technique that can be used to analyze the relationship between a single
dependent variable and several independent variables. The objective of multiple regression analysis is to use
the independent variables whose values are known to predict the value of the single dependent value. Each
predictor value is weighed, the weights denoting their relative contribution to the overall prediction.
Y=a+b1X1+b2X3+…+bnXn
Here Y is the dependent variable, and X1,…,Xn are the n independent variables. In calculating the weights, a,
b1,…,bn, regression analysis ensures maximal prediction of the dependent variable from the set of
independent variables. This is usually done by least squares estimation.
In the case of linear regression, although it is used commonly, it is limited to just one independent and one
dependent variable. Apart from that, linear regression restricts the training data set and does not predict a
non-linear regression.
For the same limitations and to cover them, we use multiple regression. It focuses on overcoming one
particular limitation and that is allowing to analyze more than one independent variable.
We will start the discussion by first taking a look at the linear regression equation:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
y = bx + a
Where,
y is a dependent variable we need to find, x is an independent variable. The constants a and b drive the
equation. But according to our definition, as the multiple regression takes several independent variables
(x), so for the equation we will have multiple x values too:
Here, to calculate the value of the dependent variable y, we have multiple independent variables x1, x2,
and so on. The number of independent variables can grow till n and the constant b with every variable
denotes its numeric value. The purpose of the constant a is to denote the dependent variable’s value in
case when all the independent variable values turn to zero.
Example: A researcher decides to study students’ performance at a school over a period of time. He
observed that as the lectures proceed to operate online, the performance of students started to decline
as well. The parameters for the dependent variable “decrease in performance” are various independent
variables like “lack of attention, more internet addiction, neglecting studies” and much more.
So for the above example, the multiple regression equation would be:
The variables considered for the model should be relevant and the model should be reliable.
The variance should be constant for all levels of the predicted variable.
Multiple regression analysis helps us to better study the various predictor variables at hand.
It increases reliability by avoiding dependency on just one variable and having more than one
independent variable to support the event.
Multiple regression analysis permits you to study more formulated hypotheses that are possible.
Logistic regression
Logistic regression is a statistical analysis method to predict a binary outcome, such as yes or no, based on
prior observations of a data set.
A logistic regression model predicts a dependent data variable by analyzing the relationship between one or
more existing independent variables. For example, a logistic regression could be used to predict whether a
political candidate will win or lose an election or whether a high school student will be admitted or not to a
particular college. These binary outcomes allow straightforward decisions between two alternatives.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
A logistic regression model can take into consideration multiple input criteria. In the case of college
acceptance, the logistic function could consider factors such as the student's grade point average, SAT score
and number of extracurricular activities. Based on historical data about earlier outcomes involving the same
input criteria, it then scores new cases on their probability of falling into one of two outcome categories.
In Bayesian linear regression, the mean of one parameter is characterized by a weighted sum of other variables.
This type of conditional modeling aims to determine the prior distribution of the regressors as well as other
variables describing the allocation of the regressand) and eventually permits the out-of-sample forecasting of the
regressand conditional on observations of the regression coefficients.
The normal linear equation, where the distribution of display style YY given by display style XX is Gaussian, is
the most basic and popular variant of this model. The future can be determined analytically for this model, and a
specific set of prior probabilities for the parameters is known as conjugate priors. The posteriors usually have
more randomly selected priors.
When the dataset has too few or poorly dispersed data, Bayesian Regression might be quite helpful. In contrast
to conventional regression techniques, where the output is only derived from a single number of each attribute, a
Bayesian Regression model's output is derived from a probability distribution.
The result, "y," is produced by a normal distribution (where the variance and mean are normalized). The goal of
the Bayesian Regression Model is to identify the 'posterior' distribution again for model parameters rather than
the model parameters themselves. The model parameters will be expected to follow a distribution in addition to
the output y.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Posterior: It is the likelihood that an event, such as H, will take place given the occurrence of another
event, such as E, i.e., P(H | E).
Priority: This refers to the likelihood that event H happened before event A, i.e., P(H) (H)
P(A) is the likelihood that event A will occur, while P(A|B) is the likelihood that event A will occur, provided
that event B has already occurred. Here, A and B seem to be events. P(B), the likelihood of event B happening
cannot be zero because it already has.
According to the aforementioned formula, we get a prior probability for the model parameters that is
proportional to the probability of the data divided by the posterior distribution of the parameters, unlike
Ordinary Least Square (OLS), which is what we observed in the case of the OLS.
The value of probability will rise as more data points are collected and eventually surpass the previous value.
The parameter values converge to values obtained by OLS in the case of an unlimited number of data points.
Consequently, we start our regression method with an estimate (the prior value).
As we begin to include additional data points, the accuracy of our model improves. Therefore, to make a
Bayesian Ridge Regression model accurate, a considerable amount of train data is required.
Let's quickly review the mathematical side of the situation now. If 'y' is the expected value in a linear model,
then
y(w,x) = w0+w1x1+...+wpxp
where, The vector "w" is made up of the elements w0, w1,... The weight value is expressed as 'x'.
w=(w1…wp)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
As a result, the output "y" is now considered to be the Gaussian distribution around Xw for Bayesian Regression
to produce a completely probabilistic model, as demonstrated below:
p(y|X, w. 𝛼) = N(y|Xw, 𝛼)
where the Gamma distribution prior hyper-parameter alpha is present. It is handled as a probability calculated
from the data. The Bayesian Ridge Regression implementation is provided below.
p(y|λ)=N(w|0, λ^-1Ip)
where alpha is the Gamma distribution's shape parameter before the alpha parameter and lambda is the
distribution's shape parameter before the lambda parameter.
We have discussed Bayesian Linear Regression so, let us now discuss some of its real-life applications.
Some of the real-life applications of Bayesian Linear Regression are given below:
Using Priors: Consider a scenario in which your supermarkets carry a new product, and we want to
predict its initial Christmas sales. For the new product's Christmas effect, we may merely use the
average of comparable things as a previous one.
Additionally, once we obtain data from the new item's initial Christmas sales, the previous is immediately
updated. As a result, the forecast for the next Christmas is influenced by both the prior and the new item's data.
Regularize Priors: With the season, day of the week, trend, holidays, and a tonne of promotion
indicators, our model is severely over-parameterized. Therefore regularization is crucial to keep the
forecasts in check.
Since we got an idea regarding the real-life applications of Bayesian Linear Regression, we will now learn about
its advantages and disadvantages.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Particularly well-suited for online learning as opposed to batch learning, when we know the complete
dataset before we begin training the model. This is so that Bayesian Regression can be used without
having to save data.
The Bayesian technique has been successfully applied and is quite strong mathematically. Therefore,
using this requires no additional prior knowledge of the dataset.
The Bayesian strategy is not worthwhile if there is a lot of data accessible for our dataset, and the
regular probability approach does the task more effectively.
After going through the definitions, applications, and advantages and disadvantages of Bayesian Linear
Regression, it is time for us to explore how to implement Bayesian Regression using Python.
We shall apply Bayesian Ridge Regression in this example. The Bayesian method, however, can be used in any
regression technique, including regression analysis, lasso regression, etc. To implement Probabilistic Ridge
Regression, we'll use the sci-kit-learn library.
We'll make use of the Boston Housing dataset, which includes details on the average price of homes in various
Boston neighborhoods.
The r2 score will be used for evaluation. The r2 score should be as high as 1.0. The value of the r2 score is zero
if the model predicts consistently independent of the attributes. Even inferior models may have a negative r2
score.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
However, before we begin the coding, you must comprehend the crucial components of a Bayesian Ridge
Regression model:
tol: How to know when to end the procedure after the model converges. 1e-3 is the default value.
alpha_1: Alpha parameter over the Gamma distribution shape parameter of a regressor line. 1e-6 is
the default value.
alpha_2: Gamma distribution's inverse scale parameter relative to the alpha parameter. 1e-6 is the
default value.
lambda_1: Gamma distribution's shape parameter relative to lambda. 1e-6 is the default value.
lambda_2: Gamma distribution's inverse scale parameter over the lambda variable. 1e-6 is the default
value.
Linear Classification Models: Machine learning Introduction- Logistic regression,– Naive Bayes, Maximum margin classifier
– Support vector machine, Decision Tree, Random forests.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
An example
Data: Loan application data
Task: Predict whether a loan should be approved or not.
Performance measure: accuracy.
No learning: classify all future applications (test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
We can do better than 60% with learning.
Decision tree learning is one of the most widely used techniques for classification.
Its classification accuracy is competitive with other methods, and
it is very efficient.
The classification model is a tree, called decision tree.
Basic algorithm (a greedy divide-and-conquer algorithm)
Assume attributes are categorical now (continuous attributes can be handled too)
Tree is constructed in a top-down recursive manner
At start, all the training examples are at the root
Examples are partitioned recursively based on selected attributes
Attributes are selected on the basis of an impurity function (e.g., information gain)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
A decision tree can be converted to a set of rules Each path from the root to a leaf is a rule.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Expressiveness
Decision trees can express any function of the input attributes.
E.g., for Boolean functions, truth table row → path to leaf:
Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f
nondeterministic in x) but it probably won't generalize to new examples
Prefer to find more compact decision trees
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
An example
k is usually chosen empirically via a validation set or cross-validation by trying a range of k values.
Distance function is crucial, but depends on applications.
Discussions
kNN can deal with complex and arbitrary decision boundaries.
Despite its simplicity, researchers have shown that the classification accuracy of kNN can be quite
strong and in many cases as accurate as those elaborated methods.
kNN is slow at the classification time
kNN does not produce an understandable model
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification problems in
Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in the future. This best
decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as
support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below diagram in which
there are two different categories that are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a
cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it
will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but
we need to find out the best decision boundary that helps to classify the data points. This best boundary is known
as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features
(as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a
2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance between the
data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane
are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has
two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called
as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points are called
support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of SVM
is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw
a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions
x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with
z=1, then it will become as:
So now, SVM will divide the datasets into classes in the following way. Consider the below image:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Advantages of SVM:
Effective in high dimensional cases
Its memory efficient as it uses a subset of training points in the decision function called support
vectors
Different kernel functions can be specified for the decision functions and its possible to specify
custom kernels
SVM implementation in python:
Objective: Predict if cancer is Benign or malignant.
Using historical data about patients diagnosed with cancer, enable the doctors to differentiate malignant cases
and benign given the independent attributes.
Dataset: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Importing Data file
data = pd.read_csv('bc2.csv')
dataset = pd.DataFrame(data)
dataset.columns
Output:
Index(['ID', 'ClumpThickness', 'Cell Size', 'Cell Shape', 'Marginal Adhesion',
'Single Epithelial Cell Size', 'Bare Nuclei', 'Normal Nucleoli', 'Bland Chromatin',
'Mitoses', 'Class'], dtype='object')
As we have seen, SVMs depends on supervised learning algorithms. The aim of using SVM is to correctly
classify unseen data. SVMs have a number of applications in several fields.
Some common applications of SVM are-
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Face detection – SVMc classify parts of the image as a face and non-face and create a square
boundary around the face.
Text and hypertext categorization – SVMs allow Text and hypertext categorization for both
inductive and transductive models. They use training data to classify documents into different
categories. It categorizes on the basis of the score generated and then compares with the threshold
value.
Classification of images – Use of SVMs provides better search accuracy for image classification.
It provides better accuracy in comparison to the traditional query-based searching techniques.
Bioinformatics – It includes protein classification and cancer classification. We use SVM for
identifying the classification of genes, patients on the basis of genes and other biological problems.
Protein fold and remote homology detection – Apply SVM algorithms for protein remote
homology detection.
Handwriting recognition – We use SVMs to recognize handwritten characters used widely.
Generalized predictive control(GPC) – Use SVM based GPC to control chaotic dynamics with
useful parameters.
Random Forest Algorithm
Random Forest is a supervised machine learning algorithm made up of decision trees. Random Forest is used for
both classification and regression—for example, classifying whether an email is “spam” or “not spam”.
It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead
of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority
votes of predictions, and it predicts the final output.
Random Forest works in two-phase first is to create the random forest by combining N decision tree, and second
is to make predictions for each tree created in the first phase.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Finally, select the most voted prediction result as the final prediction result.
This combination of multiple models is called Ensemble. Ensemble uses two methods:
1. Bagging: Creating a different training subset from sample training data with replacement is called
Bagging. The final output is based on majority voting.
2. Boosting: Combing weak learners into strong learners by creating sequential models such that the
final model has the highest accuracy is called Boosting.
Bagging: From the principle mentioned above, we can understand Random Forest uses the Bagging code. Now,
let us understand this concept in detail. Bagging is also known as Bootstrap Aggregation used by random forest.
The process begins with any original random data. After arranging, it is organised into samples known as
Bootstrap Sample. This process is known as Bootstrapping. Further, the models are trained individually, yielding
different results known as Aggregation. In the last step, all the results are combined, and the generated output is
based on majority voting. This step is known as Bagging and is done using an Ensemble Classifier.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to the Random
Forest classifier. The dataset is divided into subsets and given to each decision tree. During the training phase,
each decision tree produces a prediction result, and when a new data point occurs, then based on the majority of
results, the Random Forest classifier predicts the final decision. Consider the below image:
There are mainly four sectors where Random Forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
o It enhances the accuracy of the model and prevents the overfitting issue.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
o Although random forest can be used for both classification and regression tasks, it is not more suitable
for Regression tasks.
o Random forest is a collection of decision trees; still, there are a lot of differences in their behavior.
**************************************
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Click on Subject/Paper under Semester to enter.
Environmental Sciences
Professional English and Sustainability -
Professional English - - II - HS3252 Discrete Mathematics GE3451
I - HS3152 - MA3354
Statistics and Theory of Computation
Matrices and Calculus Numerical Methods - Digital Principles and - CS3452
3rd Semester
4th Semester
- MA3151 MA3251 Computer Organization
1st Semester
2nd Semester
8th Semester
6th Semester
3rd Semester
Linear Integrated
4th Semester
2nd Semester
Wireless
Communication -
EC3501 Embedded Systems
and IOT Design -
ET3491
VLSI and Chip Design
5th Semester
8th Semester
6th Semester
4th Semester
- MA3151 MA3251 Computer Organization
1st Semester
2nd Semester
Computer Networks -
CS3591
Object Oriented
Full Stack Web Software Engineering - Human Values and
5th Semester
8th Semester
6th Semester
Elective 1 Elective-5
Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering