0% found this document useful (0 votes)
10 views

Unit2 Updated ML f

Uploaded by

Sm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit2 Updated ML f

Uploaded by

Sm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 263

Noida Institute of Engineering and Technology,

Greater Noida
MINING ASSOCIATION AND
SUPERVISED LEARNING

Unit: 2

MACHINE LEARNING
Dr. Hitesh Singh
Associate Professor
B Tech 5th Sem Section A & B IT DEPARTMENT

Dr. Hitesh Singh KCS 055 ML Unit 1


1
9/7/2022
CONTENT
Brief Introduction of Faculty

I am pleased to introduce myself as Dr. Hitesh Singh, presently associated with NIET, Greater Noida as
Assistant Professor in IT Department. I completed my Ph.D. degree under the supervision of Boncho Bonev
(PhD), Technical University of Sofia, Sofia, Bulgaria in 2019. My area of research interest is related to Radio
wave propagation, Machine Learning and have rich experience of millimetre wave technologies.
I started my research carrier in 2009 and since then I published research articles in SCI/Scopus indexed
Journals/Conferences like Springer, IEEE, Elsevier. I presented research work in international reputed
Conferences like (IEEE International Conference on Infocom Technologies and Unmanned
Systems (ICTUS'2017)”, Dubai and ELECTRONICA, Sofia. Four patents and two book chapter have been
published (Elsevier Publication) under my inventor ship and authorship.
My area of research interest is related to Radio wave propagation, Machine Learning and have rich
experience of millimeter wave technologies.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 2


CONTENT
Evaluation Scheme

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 3


Subject LEARNING
THE CONCEPT Syllabus TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 4


Subject LEARNING
THE CONCEPT Syllabus TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 5


THE CONCEPT LEARNING TASK
Text Books

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 6


THE CONCEPT
Branch LEARNING TASK
Wise Applications

Dr. Hitesh Singh KCS 055 ML Unit 1


9/7/2022 7
Course Objective
THE CONCEPT LEARNING TASK

• To introduce students to the basic concepts of Machine Learning.

• To develop skills of implementing machine learning for solving


practical problems.

• To gain experience of doing independent study and research related


to Machine Learning

Dr. Hitesh Singh KCS 055 ML Unit 1


9/7/2022 8
THE CONCEPT LEARNING TASK
Course Outcome

At the end of the semester, student will be able to:

Course CO Description Blooms’


Outcomes Taxonomy
(CO)
CO1 Understanding utilization and implementation proper K2
machine learning algorithm.

CO2 Understand the basic supervised machine learning K2


algorithms.
CO3 Understand the difference between supervise and K2
unsupervised learning.
CO4 Understand algorithmic topics of machine learning and K2
mathematically deep enough to introduce the required
theory.
CO5 Apply an appreciation for what is involved in learning K3
from data.
9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 9
Program Outcome
CONTENT
 1. Engineering knowledge:
 2. Problem analysis:
 3. Design/development of solutions:
 4. Conduct investigations of complex problems:
 5. Modern tool usage:
 6. The engineer and society:
 7. Environment and sustainability:
 8. Ethics:
 9. Individual and team work:
 10. Communication:
 11. Project management and finance:
 12. Life-long learning
9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 10
THE CONCEPT
CO-PO LEARNING
and PSO MappingTASK

Correlation Matrix of CO with PO


CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

KCS055.1 3 2 2 1 2 2 - - - 1 - -

KCS055.2 3 2 2 3 2 2 1 - 2 1 1 2

KCS055.3 2 2 2 2 2 2 2 1 1 - 1 3

KCS055.4 3 3 1 3 1 1 2 - 2 1 1 2

KCS055.5 3 2 1 2 1 2 1 1 2 1 1 1

AVG 2.8 2.2 1.6 2.2 1.6 1.8 1.2 0.4 1.4 0.8 0.8 1.6

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 11


Program Specific
THE CONCEPT Outcomes
LEARNING TASK

• PSO1: Work as a software developer, database


administrator, tester or networking engineer for
providing solutions to the real world and industrial
problems.
• PSO2:Apply core subjects of information technology
related to data structure and algorithm, software
engineering, web technology, operating system, database
and networking to solve complex IT problems.
• PSO3: Practice multi-disciplinary and modern computing
techniques by lifelong learning to establish innovative
career.
• PSO4: Work in a team or individual to manage projects
with ethical concern to be a successful employee or
employer in IT industry.

Dr. Hitesh Singh KCS 055 ML Unit 1


9/7/2022 12
CO-PO
THE and PSO
CONCEPT MappingTASK
LEARNING

Matrix of CO/PSO:
PSO1 PSO2 PSO3 PSO4

RCS080.1 3 2 3 1

RCS080.2 3 2 2 3

RCS080.3 3 2 3 2

RCS080.4 2 1 1 1

RCS080.5 2 2 1 2

AVG 2.6 1.8 2 1.8

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 13


Program Educational
THE CONCEPT Objectives
LEARNING TASK

• PEO1: able to apply sound knowledge in the field


of information technology to fulfill the needs of IT
industry.
• PEO2:able to design innovative and
interdisciplinary systems through latest digital
technologies.
• PEO3: able to inculcate professional and social
ethics, team work and leadership for serving the
society.
• PEO4: able to inculcate lifelong learning in the
field of computing for successful career in
organizations and R&D sectors.
Dr. Hitesh Singh KCS 055 ML Unit 1
9/7/2022 14
Result Analysis
THE CONCEPT LEARNING TASK

• ML Result of 2020-21: 89.39%


• Average Marks: 46.05

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 15


End Semester Question
THE CONCEPT Paper Template
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 16


THE CONCEPT LEARNING TASK
Prerequisite

Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 17


THE
BriefCONCEPT LEARNING
Introduction TASK
to Subject

https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-
h9vYZkQkYNWcItqhlRJLN

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 18


TopicTHE
Mapping withLEARNING
CONCEPT Course Outcome
TASK

Topics Course outcome


Classification and CO2
Regression,
Regression:
Decision Trees: ID3, CO2
C4.5, CART.
Apriori Algorithm:
CO2
Market basket analysis,
Association Rules.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 19


LectureLEARNING
THE CONCEPT Plan TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 20


LectureLEARNING
THE CONCEPT Plan TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 21


LectureLEARNING
THE CONCEPT Plan TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 22


LectureLEARNING
THE CONCEPT Plan TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 23


LectureLEARNING
THE CONCEPT Plan TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 24


CONTENT

➢ Unit 2 Content:
 Classification and Regression,
 Regression: Linear Regression,
 Multiple Linear Regression,
 Logistic Regression,
 Polynomial Regression,
 Decision Trees: ID3, C4.5, CART.
 Apriori Algorithm: Market basket analysis, Association Rules.
 Neural Networks: Introduction, Perceptron, Multilayer Perceptron, Support vector
machine.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 25


THE CONCEPT LEARNING TASK
Unit Objective

The objective of the Unit 1 is


1. To understand the basics of Regression,
2. To understand the concept of Decision Tree Algorithm
3. To understand the brief working of Artificial Neural Network

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 26


THE CONCEPT LEARNING TASK
Topic Objective

Student will be able to understand


 Regression

 Linear Regression

 Logistic Regression

 Polynomial Regression

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 27


THERegression (CO1)TASK
CONCEPT LEARNING

• Linear Regression vs Logistic Regression


• Linear Regression and Logistic Regression are the two famous
Machine Learning Algorithms which come under supervised
learning technique.
• Since both the algorithms are of supervised in nature hence these
algorithms use labeled dataset to make the predictions.
• But the main difference between them is how they are being used.
• The Linear Regression is used for solving Regression problems
whereas Logistic Regression is used for solving the Classification
problems.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 28


THERegression (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 29


THERegression (CO1)TASK
CONCEPT LEARNING
• Linear Regression:
• Linear Regression is one of the most simple Machine learning algorithm that
comes under Supervised Learning technique and used for solving regression
problems.
• It is used for predicting the continuous dependent variable with the help of
independent variables.
• The goal of the Linear regression is to find the best fit line that can accurately
predict the output for the continuous dependent variable.
• If single independent variable is used for prediction then it is called Simple Linear
Regression and if there are more than two independent variables then such
regression is called as Multiple Linear Regression.
• By finding the best fit line, algorithm establish the relationship between
dependent variable and independent variable. And the relationship should be of
linear nature.
• The output for Linear regression should only be the continuous values such as
price, age, salary, etc. The relationship between the dependent variable and
independent variable can be shown in below image:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 30


THERegression (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 31


Introduction
THE (CO1)TASK
CONCEPT LEARNING

• Logistic Regression:
• Logistic regression is one of the most popular Machine learning algorithm that
comes under Supervised Learning techniques.
• It can be used for Classification as well as for Regression problems, but mainly
used for Classification problems.
• Logistic regression is used to predict the categorical dependent variable with the
help of independent variables.
• The output of Logistic Regression problem can be only between the 0 and 1.
• Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false etc.
• Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.
• In logistic regression, we pass the weighted sum of inputs through an activation
function that can map values in between 0 and 1. Such activation function is
known as sigmoid function and the curve obtained is called as sigmoid curve or S-
curve. Consider the below image:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 32


THERegression (CO1)TASK
CONCEPT LEARNING

The equation for logistic regression is:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 33


What is Regression?

Regression Analysis is a predictive modelling technique

It estimates the relationship between a dependent (target) and


an independent variable(predictor)

34
Use Case: Regression?

35
Use Case: Regression?

36
Regression!!

37
Regression!!

38
Regression!!

39
Regression!!

40
Regression!!

41
Regression!!

42
Regression!!

43
Regression!!

44
Regression!!

45
Regression!!

46
Regression!!

47
Regression!!

48
Regression!!

49
Logistic Regression!!

50
Logistic Regression!!

51
Logistic Regression!!

52
Logistic Regression!!

53
Logistic Regression!!

54
Logistic Regression!!

55
Logistic Regression!!

56
Logistic Regression!!

57
Logistic Regression!!
THE CONCEPT LEARNING(CO1)
TASK

• Probability of passing an exam versus hours of study

• To answer the following question:

• A group of 20 students spends between 0 and 6 hours studying


for an exam. How does the number of hours spent studying affect
the probability of the student passing the exam?

9/7/2022 58
Dr. Hitesh Singh KCS 055 ML Unit 2
Logistic Regression!!
THE CONCEPT LEARNING(CO1)
TASK
Hours Pass
0.5 0
0.75 0
1 0
1.25 0
Coefficie P-value
1.5 0 Std.Error z-value
nt (Wald)
1.75 0
1.75 1 Intercept −4.0777 1.761 −2.316 0.0206
2 0
2.25 1 Hours 1.5046 0.6287 2.393 0.0167
2.5 0
2.75 1
3 0
3.25 1
3.5 0
4 1
4.25 1
4.5 1
4.75 1
5 1
5.5 1
9/7/2022 59
Dr. Hitesh Singh KCS 055 ML Unit 2
Logistic Regression!!
THE CONCEPT LEARNING(CO1)
TASK

9/7/2022 60
Dr. Hitesh Singh KCS 055 ML Unit 2
Logistic Regression!!
THE CONCEPT LEARNING(CO1)
TASK

9/7/2022 61
Dr. Hitesh Singh KCS 055 ML Unit 2
Logistic Regression!!
THE CONCEPT LEARNING(CO1)
TASK
Linear Regression Logistic Regression
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given
set of independent variables. set of independent variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values
continuous variables. of categorical variables.
In linear regression, we find the best fit line, In Logistic Regression, we find the S-curve
by which we can easily predict the output. by which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.
The output of Logistic Regression must be a
The output for Linear Regression must be a
Categorical value such as 0 or 1, Yes or No,
continuous value, such as price, age, etc.
etc.
In Linear regression, it is required that In Logistic regression, it is not required to
relationship between dependent variable have the linear relationship between the
and independent variable must be linear. dependent and independent variable.
In linear regression, there may be In logistic regression, there should not be
collinearity between the independent collinearity between the independent
9/7/2022
variables. variable.
62
Dr. Hitesh Singh KCS 055 ML Unit 2
Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Introduction

• Classification is a two-step process, learning step and


prediction step, in machine learning.
• In the learning step, the model is developed based on given
training data.
• In the prediction step, the model is used to predict the
response for given data.
• Decision Tree is one of the easiest and popular classification
algorithms to understand and interpret.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 63


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Decision Tree Algorithm


• Decision Tree algorithm belongs to the family of supervised learning
algorithms.
• Unlike other supervised learning algorithms, the decision tree algorithm
can be used for solving regression and classification problems too.
• The goal of using a Decision Tree is to create a training model that can use
to predict the class or value of the target variable by learning simple
decision rules inferred from prior data(training data).
• In Decision Trees, for predicting a class label for a record we start from
the root of the tree.
• We compare the values of the root attribute with the record’s attribute.
• On the basis of comparison, we follow the branch corresponding to that
value and jump to the next node.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 64


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Types of Decision Trees

Types of decision trees are based on the type of target variable


we have. It can be of two types:
1. Categorical Variable Decision Tree: Decision Tree which has a
categorical target variable then it called a Categorical variable
decision tree.
2. Continuous Variable Decision Tree: Decision Tree has a
continuous target variable then it is called Continuous
Variable Decision Tree.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 65


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Important Terminology related to Decision Trees:


1. Root Node: It represents the entire population or sample and this further
gets divided into two or more homogeneous sets.
2. Splitting: It is a process of dividing a node into two or more sub-nodes.
3. Decision Node: When a sub-node splits into further sub-nodes, then it is
called the decision node.
4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
5. Pruning: When we remove sub-nodes of a decision node, this process is
called pruning. You can say the opposite process of splitting.
6. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-
tree.
7. Parent and Child Node: A node, which is divided into sub-nodes is called a
parent node of sub-nodes whereas sub-nodes are the child of a parent
node.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 66


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 67


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Decision trees classify the examples by sorting them down the


tree from the root to some leaf/terminal node, with the
leaf/terminal node providing the classification of the example.

• Each node in the tree acts as a test case for some attribute,
and each edge descending from the node corresponds to the
possible answers to the test case.

• This process is recursive in nature and is repeated for every


subtree rooted at the new node.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 68


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Assumptions while creating Decision Tree:

• In the beginning, the whole training set is considered as


the root.
• Feature values are preferred to be categorical.
• If the values are continuous then they are discretized prior to
building the model.
• Records are distributed recursively on the basis of attribute
values.
• Order to placing attributes as root or internal node of the tree
is done by using some statistical approach.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 69


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

How do Decision Trees work?

• The decision of making strategic splits heavily affects a tree’s accuracy.


• The decision criteria are different for classification and regression trees.
• Decision trees use multiple algorithms to decide to split a node into two or
more sub-nodes.
• The creation of sub-nodes increases the homogeneity of resultant sub-
nodes.
• In other words, we can say that the purity of the node increases with
respect to the target variable.
• The decision tree splits the nodes on all available variables and then
selects the split which results in most homogeneous sub-nodes.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 70


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• The algorithm selection is also based on the type of target


variables. Let us look at some algorithms used in Decision
Trees:
• ID3 → (extension of D3)
C4.5 → (successor of ID3)
CART → (Classification And Regression Tree)
CHAID → (Chi-square automatic interaction detection
Performs multi-level splits when computing classification
trees)
MARS → (multivariate adaptive regression splines)

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 71


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

The ID3 (Iterative Dichotomiser 3) algorithm builds decision trees using a top-
down greedy search approach through the space of possible branches with no
backtracking. A greedy algorithm, as the name suggests, always makes the choice
that seems to be the best at that moment.

Steps in ID3 algorithm:


1. It begins with the original set S as the root node.
2. On each iteration of the algorithm, it iterates through the very unused
attribute of the set S and calculates Entropy(H) and Information gain(IG) of
this attribute.
3. It then selects the attribute which has the smallest Entropy or Largest
Information gain.
4. The set S is then split by the selected attribute to produce a subset of the data.
5. The algorithm continues to recur on each subset, considering only attributes
never selected before.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 72


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK
• Attribute Selection Measures
• If the dataset consists of N attributes then deciding which attribute to
place at the root or at different levels of the tree as internal nodes is a
complicated step.
• By just randomly selecting any node to be the root can’t solve the issue.
• If we follow a random approach, it may give us bad results with low
accuracy.
• For solving this attribute selection problem, researchers worked and
devised some solutions.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 73


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

They suggested using some criteria like :

1. Entropy,
2. Information gain,
3. Gini index,
4. Gain Ratio,
5. Reduction in Variance
6. Chi-Square

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 74


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Entropy:
• Entropy is a measure of the randomness in the information being
processed.
• The higher the entropy, the harder it is to draw any conclusions from that
information.
• Flipping a coin is an example of an action that provides information that is
random.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 75


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK
• From the above graph, it is quite evident that the entropy H(X) is zero when the
probability is either 0 or 1.
• The Entropy is maximum when the probability is 0.5 because it projects perfect
randomness in the data and there is no chance if perfectly determining the
outcome.
• ID3 follows the rule — A branch with an entropy of zero is a leaf node and A
branch with entropy more than zero needs further splitting.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 76


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Mathematically Entropy for 1 attribute is represented as:

• Where S → Current state, and Pi → Probability of an event i of state S or


Percentage of class i in a node of state S.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 77


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Mathematically Entropy for multiple attributes is represented as:

where T→ Current state


and X → Selected
attribute

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 78


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Information Gain:
• Information gain or IG is a statistical property that measures how well a given
attribute separates the training examples according to their target classification.
• Constructing a decision tree is all about finding an attribute that returns the
highest information gain and the smallest entropy.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 79


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Information gain is a decrease in entropy.


• It computes the difference between entropy before split and average
entropy after split of the dataset based on given attribute values.
• ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 80


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Mathematically, IG is represented as:

Where “before” is the dataset before the split,


K is the number of subsets generated by the
split, and (j, after) is subset j after the split.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 81


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Gini Index
• You can understand the Gini index as a cost function used to evaluate
splits in the dataset.
• It is calculated by subtracting the sum of the squared probabilities of each
class from one.
• It favors larger partitions and easy to implement whereas information gain
favors smaller partitions with distinct values.

• Gini Index works with the categorical target variable “Success” or


“Failure”. It performs only Binary splits.
• Higher the value of Gini index higher the homogeneity.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 82


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Steps to Calculate Gini index for a split


1. Calculate Gini for sub-nodes, using the above formula for success(p) and
failure(q) (p²+q²).
2. Calculate the Gini index for split using the weighted Gini score of each
node of that split.
• CART (Classification and Regression Tree) uses the Gini index method to
create split points.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 83


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK
Gain ratio
• Information gain is biased towards choosing attributes with a large
number of values as root nodes.
• It means it prefers the attribute with a large number of distinct values.
• C4.5, an improvement of ID3, uses Gain ratio which is a modification of
Information gain that reduces its bias and is usually the best option.
• Gain ratio overcomes the problem with information gain by taking into
account the number of branches that would result before making the split.
• It corrects information gain by taking the intrinsic information of a split
into account.

Where “before” is the dataset before the split, K is the number of subsets generated by the
split, and (j, after) is subset j after the split.
9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 84
Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Reduction in Variance:

• Reduction in variance is an algorithm used for continuous target variables


(regression problems).
• This algorithm uses the standard formula of variance to choose the best split.
• The split with lower variance is selected as the criteria to split the population:

• Above X-bar is the mean of the values, X is actual and n is the number of
values.
• Steps to calculate Variance:
• Calculate variance for each node.
• Calculate variance for each split as the weighted average of each node
variance.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 85


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Chi-Square:
• The acronym CHAID stands for Chi-squared Automatic Interaction
Detector.
• It is one of the oldest tree classification methods.
• It finds out the statistical significance between the differences between
sub-nodes and parent node.
• We measure it by the sum of squares of standardized differences between
observed and expected frequencies of the target variable.
• It works with the categorical target variable “Success” or “Failure”.
• It can perform two or more splits.
• Higher the value of Chi-Square higher the statistical significance of
differences between sub-node and Parent node.
• It generates a tree called CHAID (Chi-square Automatic Interaction
Detector).

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 86


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

• Mathematically, Chi-squared is represented as:

• Steps to Calculate Chi-square for a split:


1. Calculate Chi-square for an individual node by calculating the deviation for Success and
Failure both
2. Calculated Chi-square of Split using Sum of all Chi-square of success and Failure of each
node of the split

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 87


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Issues in Decision Tree:


• The common problem with Decision trees, especially having a
table full of columns, they fit a lot.
• Sometimes it looks like the tree memorized the training data
set.
• If there is no limit set on a decision tree, it will give you 100%
accuracy on the training data set because in the worse case it
will end up making 1 leaf for each observation.
• Thus this affects the accuracy when predicting samples that
are not part of the training set.
• Here are two ways to remove overfitting:
1. Pruning Decision Trees.
2. Random Forest

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 88


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Pruning Decision Trees:


• In pruning, you trim off the branches of the tree, i.e., remove the decision
nodes starting from the leaf node such that the overall accuracy is not
disturbed.
• This is done by segregating the actual training set into two sets: training
data set, D and validation data set, V.
• Prepare the decision tree using the segregated training data set, D.
• Then continue trimming the tree accordingly to optimize the accuracy of
the validation data set, V.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 89


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 90


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 91


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 92


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 93


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 94


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 95


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

In the above diagram, the ‘Age’ attribute in the left-hand side of the tree has been
pruned as it has more importance on the right-hand side of the tree, hence
removing overfitting.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 96


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Pseudocode of C4.5 algorithm


• Let’s see the Pseudocode of the C4.5 algorithm in data mining.
1. First, notice the base
2. For each attribute X, find the normalized information gain ratio by splitting
between X.
3. Suppose that X is an attribute with the highest normalized information gain ratio.
4. Create a decision node that splits on attribute X.
5. Repeat it on the sublists obtained by splitting the attribute X, and add these
nodes as children of the node.

Advantages of C4.5 over other Decision Tree systems


1. The algorithm is very helpful in Mitigating the overfitting because C4.5 inherently
employs the Single Pass Pruning Process.
2. C4.5 can work with Discrete data and can also work with Continuous Data
3. C4.5 is very helpful in solving the issues of data incompleteness

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 97


ExampleTHE
2 Decision Tree (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 98


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 99


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 100


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 101


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 102


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 103


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 104


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 105


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 106


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 107


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 108


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 109


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 110


Decision TreeTHE
usingCONCEPT LEARNING
CART algorithm TASK 1
Solved Example

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 111


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 112


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 113


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 114


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 115


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 116


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 117


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 118


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 119


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 120


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 121


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK

Random Forest
• Random Forest is an example of ensemble learning, in which we combine
multiple machine learning algorithms to obtain better predictive
performance.

Why the name “Random”?


• Two key concepts that give it the name random:
1. A random sampling of training data set when building trees.
2. Random subsets of features considered when splitting nodes.

• A technique known as bagging is used to create an ensemble of trees


where multiple training sets are generated with replacement.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 3 122


Decision TreeLEARNING
THE CONCEPT (CO1,2,3,5)
TASK
• In the bagging technique, a data set is divided into N samples using randomized
sampling.
• Then, using a single learning algorithm a model is built on all samples.
• Later, the resultant predictions are combined using voting or averaging in parallel.

9/7/2022 123
Dr. Hitesh Singh KCS 055 ML Unit 3
THE
Introduction CONCEPT
of Machine LEARNING
Learning TASK
Approaches(CO1,2,3,4)

A Neural Network
• A neural network is a processing device, either an algorithm or an actual
hardware, whose design was inspired by the design and functioning of
animal brains and components thereof.

• The neural networks have ability to learn by example, which makes them
very flexible and powerful.

• These networks are also well suited for real-time systems because of their
fast response and computational times which are because of their parallel
architecture.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 124
THE CONCEPT LEARNING TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 125


Introduction of Machine
THE CONCEPT Learning
LEARNING TASK
Approaches(CO1,2,3,4)
Artificial Neural Network: Definition:
• An artificial neural network (ANN) may be defined as an information
processing model that is inspired by the way biological nervous systems,
such as the brain, process information.
• This model tries to replicate only the most basic functions of the brain.
• The key element of ANN is the novel structure of its information
processing system.
• An ANN is composed of a large number of highly interconnected
processing elements (neurons) working in unison to solve specific
problems.
• Artificial neural networks, like people, learn by example.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 126
Introduction of Machine Learning
THE CONCEPT LEARNING TASK
Approaches(CO1,2,3,4)

Advantages of Neural Networks:

• Adaptive learning: An ANN is endowed with the ability to learn how to do


tasks based on the data given for training or initial experience.
• Self-organization: An ANN can create its own organization or
representation of the information it receives during learning time.
• Real-time operation: ANN computations may be carried out in parallel.
Special hardware devices are being designed and manufactured to rake
advantage of this capability of ANNs.
• Fault tolerance via redundant information coding: Partial destruction of a
neural network leads to the corresponding degradation of performance.
However, some network capabilities may be retrained even after major
network damage.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 127
THE
Introduction CONCEPT
of Machine LEARNING
Learning TASK
Approaches(CO1,2,3,4)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 128
THE
Introduction CONCEPT
of Machine LEARNING
Learning TASK
Approaches(CO1,2,3,4)

Application Scope of Neural Networks:


• Air traffic control.
• Animal behavior, predator/prey relationships and population cycles.
• Appraisal and valuation of property, buildings, automobiles, machinery,
etc.
• Betting on horse races, stock markets, sporting events, etc.
• Criminal sentencing could be predicted using a large sample of crime
details as input and the resulting sentence as output.
• Complex physical and chemical processes that may involve the interaction
of numerous (possibly unknown) mathematical formulas could be
modeled heuristically using a neural network.
• Echo patterns from sonar, radar, seismic and magnetic instruments could
be used to predict their targets.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 129
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

• An artificial neural network (ANN) is an efficient information processing


system which resembles in characteristics with a biological neural
network.
• ANNs possess large number of highly interconnected processing elements
called nodes or units or neuron, which usually operate in parallel and are
configured in regular architectures.
• Each neuron is connected with the other by a connection link.
• Each connection link is associated with weights which contain information
about input signals.
• This information is used by the neuron to solve a particular problem.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 130
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

• To depict the basic operation of a neural net, consider a set of neurons, say X1
and X2, transmitting signals to another neuron, Y.
• Here X1 and X2 are input neurons, which transmit signals, and Y is the output
neuron, which receives signals.
• Input neurons X1 and X2 are connected to the output neuron Y, over a
weighted interconnection links (W1 and W2).

Activation Function

Dr. Hitesh Singh KCS - 402 TAFL Unit Number: 1


9/7/2022 131
BiologicalTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

• The Architecture of Neuron

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 132
BiologicalTHE CONCEPT
Neural LEARNING TASK
Networks(CO1, CO2, CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 133
THE CONCEPT
Model LEARNING
of ANN (CO1, CO2, TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 134
Model of ANN (CO1,
THE CONCEPT CO2, TASK
LEARNING CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 135
Model of ANN (CO1,
THE CONCEPT CO2, TASK
LEARNING CO3)

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 136


THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Characteristics of ANN:

• It is a neutrally implemented mathematical model.


• There exists a large number of highly interconnected processing elements
called neurons in an ANN.
• The interconnections with their weighted linkages hold the informative
knowledge.
• The input signals arrive at the processing elements through connections and
connecting weights.
• The processing elements of the ANN have the ability to learn, recall and
generalize from the given data by suitable assignment or adjustment of
weights.
• The computational power can be demonstrated only by the collective
behavior of neurons, and it should be noted that no single neuron carries
specific information.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 137
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

The models of ANN are specified by the three basic


entities namely:

1. The model’s synaptic interconnections.


2. The training or learning rules adopted for updating
and adjusting the connection weights.
3. Their activation functions.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 138
THE
Models of CONCEPT
Neural LEARNING TASK
Networks(CO1, CO2, CO3)

Network Architecture of ANN:

1. single-layer feed-forward network;


2. multilayer feed-forward network;
3. single node with its own feedback;
4. single-layer recurrent network;
5. multilayer recurrent network.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 139
THE
Models of CONCEPT
Neural LEARNING TASK
Networks(CO1, CO2, CO3)
Single Layer Feed Forward Network:

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 140
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

• a

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 141
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 142
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 143
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 144
Models ofTHE CONCEPT
Neural LEARNING
Networks (CO1,TASK
CO2, CO3)

• Lateral Inhibition Structure

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 145
ActivationTHE CONCEPT
Function of LEARNING
ANN (CO1,TASK
CO2, CO3)

Types of Activation Function of ANN :


1. Identity Function
2. Binary Step Function
3. Bipolar Step Function
4. Sigmoidal Functions
5. Ramp Function

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 146
ActivationTHE CONCEPT
Function of LEARNING
ANN (CO1,TASK
CO2, CO3)

• Identity function: It is a linear function and can be defined as

• The output here remains the same as input. The input layer uses the
identity activation function.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 147
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

• Binary step function: This function can be defined as:

• where Ɵ represents the threshold value. This function is most widely used
in single-layer nets to convert the net input to an output that is a binary (1
or 0).

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 148
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

• Bipolar step function: This function can be defined as

• where Ɵ represents the threshold value. This function is also used in


single-layer nets to convert the input to an output that is bipolar(+ 1 or -
1).

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 149
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

• Sigmoidal functions-. The sigmoidal functions are widely used in back-


propagation nets because of the relationship between the value of the
functions at a point and the values of the derivative at that point which
reduce the computational burden during training.
• Binary sigmoid function: It is also referred as logistic sigmoid function or
unipolar sigmoid function. It can be defined as:

• where A is the steepness parameter. The derivative of this function is

• Here the range of the sigmoid function is from 0 to 1.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 150
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

• a

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 151
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

• a

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 152
THE
Activation CONCEPT
Function ofLEARNING
ANN(CO1,TASK
CO2, CO3)

Binary sigmoidal function:

Bipolar sigmoidal function:


• Ramp Function
Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1
9/7/2022 153
THE CONCEPT
Terminologies LEARNING
of ANN TASKCO3)
(CO1, CO2,

Weights:

• In the architecture of an ANN, each neuron is connected to other neurons


by means of directed communication links, and each communication link
is associated with weights.
• The weight contain information about the input signal.
• This information is used by the net to solve a problem.
• The weight can be represented by the form of Matrix

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 154
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)
Bias:
• The bias included in the network has impact in calculating the net input.
• The bias is included by adding a component x0 = 1 to the input vector X.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 155
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Threshold

• Threshold is a set value based upon which the final output of the network
may be calculated.
• The threshold value is used in the activation function.
• A comparison is made between the calculated net input and the threshold
to obtain the network output.
• For each and every application there is a threshold limit.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 156
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Learning Rate:
• The learning rate is denoted by "α." It is used to control the amount of
weight adjustment at each step of training. The learning rate , ranging
from 0 to 1, determine the rate of learning at each time step.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 157
THE CONCEPT
Model LEARNING
of ANN (CO1, CO2, TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 158
THE CONCEPT
Model LEARNING
of ANN (CO1, CO2, TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 159
THE CONCEPT LEARNING TASK

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 160


THE CONCEPT
Neural Networks LEARNING
Questions (CO1,TASK
CO2, CO3)

• NOTE: All has to solve the questions send to you and revert
back in group.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 161
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

• a

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 162
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

• a

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 163
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 164
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 165
THENetworks
Neural CONCEPT LEARNING
(CO1, CO2,TASK
CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 166
THE CONCEPT
McCulloch-Pitts LEARNINGCO2,
Neuron(CO1, TASK CO3)

McCulloch-Pitts Neuron:

• The McCulloch-Pitts neuron was the earliest neural network discovered in


1943.
• It is usually called as M-P neuron.
• The M-P neurons are connected by directed weighted paths.
• It should be noted that the activation of a M-P neuron is binary, that is, at
any time step the neuron may fire or may not fire.
• The weights associated with the communication links may be excitatory
(weight is positive) or inhibitory (weight is negative).
• All the excitatory connected weights entering into a particular neuron will
have same weights.
• The threshold plays a major role in M-P neuron:

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 167
THE CONCEPT
McCulloch-Pitts LEARNINGCO2,
Neuron(CO1, TASK CO3)

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 168
THE CONCEPT
McCulloch-Pitts LEARNINGCO2,
Neuron(CO1, TASK CO3)

• A simple M-P neuron is shown in Figure in previous slide.


• The M-P neuron has both excitatory and inhibitory connections.
• It is excitatory with weight (w > 0) or inhibitory with weight -p(p < 0).
• Input from Xi ro Xn possess excitatory weighted connections and inputs from
Xn+ 1 to Xn+m possess inhibitory weighted interconnections.
• The activation function here is defined as:

• For inhibition to be absolute, the threshold with the activation function should
satisfy the following condition:

• For excitatory connections the threshold will be :

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 169
THE CONCEPT
McCulloch-Pitts LEARNINGCO2,
Neuron(CO1, TASK CO3)

• The M-P neuron has no particular training algorithm.


• An analysis has to be performed to determine the values of the weights and the
threshold.
• Here the weights of the neuron are set along with the threshold to make the neuron
"perform a simple logic function.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 170
THENetwork(CO1,
Hebb CONCEPT LEARNING
CO2, TASK
CO3)

• For a neural net, the Hebb learning rule is a simple one.


• Donald Hebb stated in 1949 that in the brain, the learning is performed by
the change in the synaptic gap.
• When an axon of a cell A is near enough to excite cell B, and repeatedly
or permanently takes place in firing it, some growth process or metabolic
change takes place in one or both the cells such that A’s efficiency as one
of the cells training B, is increased.
• According to the Hebb rule, the weight vector is found to increase proportionately
to the product of the input and the learning signal.
• Here the learning signal is equal to the neuron's output.

Dr. Hitesh Singh KCS - 056 ASC Unit Number: 1


9/7/2022 171
Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

• The weight update in Hebb rule is given by:

• The Hebb rule is more suited for bipolar data than binary data. If binary data is
used, then above weight updating formula cannot distinguish two conditions
namely;
1. A training pair in which an input unit is "on" and target value is "off."
2. A training pair in which both the input unit and the target value are "off."

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 172


Hebb
THE Network (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 173


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

• Design a Hebb net to implement logical AND function (use bipolar inputs and
targets).
• Solution: The training data for the AND function is
• given in Table:

• The network is trained using the Hebb network training


• algorithm. lnitially the weights and bias are set to zero, i.e.,
• W1=w2=b=0

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 174


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 175


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 176


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 177


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 178


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 179


Hebb
THENetwork (CO1,2,3,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 180


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

• Perceptron network consists of three units, namely, sensory unit (input unit),
associator unit (hidden unit), response unit (output unit).
• The sensory units are connected to associator units with fixed weights having
values 1, 0 or -l, which are assigned at random.
• The binary activation function is used in sensory unit and associator unit.
• The response unit has an activation of l, 0 or -1.
• The binary step will fixed threshold Ɵ is used as activation for associator.
• The output signals that are sent from the associator unit to the response unit are
only binary.
• The output of the perceptron network is given by:

• where is activation function and is defined as:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 181


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

• The perceptron learning rule is used in the weight updating between the associator unit
and the response unit.
• For each training input, the net will calculate the response and it will determine whether
or not an error has occurred.
• The error calculation is based on the comparison of the targets with those of the
calculated outputs.
• The weights on the connections from the units that send the nonzero signal will get
adjusted suitably.
• The weights will be adjusted on the basis of the learning rule if an error has occurred for
a particular training patterns i.e..,

• If no error occurs, there is no weight updating and hence the training process may be
stopped.
• In the above equations, the target value "t" is+ I or –l and a is the learning rate.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 182


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 183


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

• Perceptron Learning Rule:


• The perceptron learning rule is explained as follows:
• Consider a finite "n" number of input training vectors, with their associated target
(desired) values x(n)
• and t{n), where "n" ranges from 1 to N. The target is either + 1 or -1. The output
''y" is obtained on the basis of the net input calculated and activation function being
applied over the net input.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 184


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

• The weight updation in case of perceptron learning is as shown.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 185


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 186


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 187


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 188


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 189


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 190


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 191


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 192


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 193


Perceptron Networks
THE CONCEPT (CO1,2,3,5)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 194


ASSIGNMENTS (CO1,2,3,5)
THE CONCEPT LEARNING TASK

Q1: Design Hebb Network for both AND and OR function.


Q2: Design Perceptron Network for both AND and OR Function.

NOTE: Complete the assignment and send it to Group.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 4 195


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• The Support Vector Machine is a supervised learning algorithm mostly used for
classification but it can be used also for regression.
• The main idea is that based on the labeled data (training data) the algorithm tries
to find the optimal hyperplane which can be used to classify new data points.
• In two dimensions the hyperplane is a simple line.
• Usually a learning algorithm tries to learn the most common characteristics (what
differentiates one class from another) of a class and the classification is based on
those representative characteristics learnt (so classification is based on differences
between classes).
• The SVM works in the other way around. It finds the most similar examples
between classes. Those will be the support vectors.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 196


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• As an example, lets consider two classes, apples and lemons.


• Other algorithms will learn the most evident, most
representative characteristics of apples and lemons, like
apples are green and rounded while lemons are yellow and
have elliptic form.
• In contrast, SVM will search for apples that are very similar to
lemons, for example apples which are yellow and have elliptic
form.
• This will be a support vector. The other support vector will be
a lemon similar to an apple (green and rounded).
• So other algorithms learns the differences while SVM learns
similarities.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 197


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 198


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• As we go from left to right, all the examples will be classified as apples


until we reach the yellow apple.
• From this point, the confidence that a new example is an apple drops
while the lemon class confidence increases.
• When the lemon class confidence becomes greater than the apple class
confidence, the new examples will be classified as lemons (somewhere
between the yellow apple and the green lemon).
• Based on these support vectors, the algorithm tries to find the best
hyperplane that separates the classes.
• In 2D the hyperplane is a line, so it would look like this:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 199


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 200


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Finding the Optimal Hyperplane


• Intuitively the best line is the line that is far away from both apple and lemon
examples (has the largest margin). To have optimal solution, we have to maximize
the margin in both ways (if we have multiple classes, then we have to maximize it
considering each of the classes).

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 201


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• So if we compare the picture above with the picture below, we can easily observe,
that the first is the optimal hyperplane (line) and the second is a sub-optimal
solution, because the margin is far shorter.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 202


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Because we want to maximize the margins taking in consideration all the classes,
instead of using one margin for each class, we use a “global” margin, which takes
in consideration all the classes. This margin would look like the purple line in the
following picture:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 203


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• This margin is orthogonal to the boundary and equidistant to the support vectors.
• So where do we have vectors? Each of the calculations (calculate distance and optimal
hyperplanes) are made in vectorial space, so each data point is considered a vector. The
dimension of the space is defined by the number of attributes of the examples. To
understand the math behind, please read this brief mathematical description of vectors,
hyperplanes and optimizations: SVM Succintly.
• All in all, support vectors are data points that defines the position and the margin of the
hyperplane. We call them “support” vectors, because these are the representative data
points of the classes, if we move one of them, the position and/or the margin will change.
Moving other data points won’t have effect over the margin or the position of the
hyperplane.
• To make classifications, we don’t need all the training data points (like in the case of KNN), we
have to save only the support vectors. In worst case all the points will be support vectors, but
this is very rare and if it happens, then you should check your model for errors or bugs.
• So basically the learning is equivalent with finding the hyperplane with the best margin, so
it is a simple optimization problem.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 204


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Basic Steps
• The basic steps of the SVM are:
1. select two hyperplanes (in 2D) which separates the data with no points between
them (red lines)
2. maximize their distance (the margin)
3. the average line (here the line half way between the two red lines) will be the
decision boundary
• This is very nice and easy, but finding the best margin, the optimization problem is
not trivial (it is easy in 2D, when we have only two attributes, but what if we have
N dimensions with N a very big number)
• To solve the optimization problem, we use the Lagrange Multipliers. To
understand this technique you can read the following two articles: Duality
Langrange Multiplier and A Simple Explanation of Why Langrange Multipliers
Wroks.
• Until now we had linearly separable data, so we could use a line as class boundary.
But what if we have to deal with non-linear data sets?

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 205


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• SVM for Non-Linear Data Sets


• An example of non-linear data is:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 206


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• In this case we cannot find a straight line to separate apples from lemons. So how
can we solve this problem. We will use the Kernel Trick!
• The basic idea is that when a data set is inseparable in the current dimensions, add
another dimension, maybe that way the data will be separable.
• Just think about it, the example above is in 2D and it is inseparable, but maybe in
3D there is a gap between the apples and the lemons, maybe there is a level
difference, so lemons are on level one and lemons are on level two.
• In this case we can easily draw a separating hyperplane (in 3D a hyperplane is a
plane) between level 1 and 2.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 207


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Mapping to Higher Dimensions


• To solve this problem we shouldn’t just blindly add another dimension, we should transform
the space so we generate this level difference intentionally.
• Mapping from 2D to 3D
• Let's assume that we add another dimension called X3. Another important transformation is
that in the new dimension the points are organized using this formula x1² + x2².
• If we plot the plane defined by the x² + y² formula, we will get something like this:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 208


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Now we have to map the apples and lemons (which are just simple points) to this
new space.
• Think about it carefully, what did we do?
• We just used a transformation in which we added levels based on distance.
• If you are in the origin, then the points will be on the lowest level.
• As we move away from the origin, it means that we are climbing the hill (moving
from the center of the plane towards the margins) so the level of the points will be
higher.
• Now if we consider that the origin is the lemon from the center, we will have
something like this:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 209


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 210


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Now we can easily separate the two classes.


• These transformations are called kernels.
• Popular kernels are:
– Polynomial Kernel,
– Gaussian Kernel,

– Radial Basis Function (RBF),

– Laplace RBF Kernel,


– Sigmoid Kernel,

– Anove RBF Kernel, etc

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 211


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• SVM Kernel Functions


• SVM algorithms use a set of mathematical functions that are defined as
the kernel. The function of kernel is to take data as input and transform it
into the required form. Different SVM algorithms use different types of
kernel functions. These functions can be different types. For example
linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.
Introduce Kernel functions for sequence data, graphs, text, images, as well
as vectors. The most used type of kernel function is RBF. Because it has
localized and finite response along the entire x-axis.
The kernel functions return the inner product between two points in a
suitable feature space. Thus by defining a notion of similarity, with little
computational cost even in very high-dimensional spaces.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 212


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Kernel Rules
• Define kernel or a window function as follows:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 213


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

Examples of SVM Kernels


Let us see some common kernels used with SVMs and their uses:
1. Polynomial kernel
• It is popular in image processing.
Equation is:

2. Gaussian kernel
• It is a general-purpose kernel; used when there is no prior knowledge
about the data. Equation is:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 214


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

3. Gaussian radial basis function (RBF)


• It is a general-purpose kernel; used when there is no prior knowledge about the
data.
Equation is:

4. Laplace RBF kernel


• It is general-purpose kernel; used when there is no prior knowledge about the
data.
Equation is:

5. Hyperbolic tangent kernel


• We can use it in neural networks.
Equation is:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 215


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

6. Sigmoid kernel
• We can use it as the proxy for neural networks. Equation is

7. Bessel function of the first kind Kernel

8. ANOVA radial basis kernel

9. Linear splines kernel in one-dimension

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 216


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Mapping from 1D to 2D
• Another, easier example in 2D would be:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 217


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• After using the kernel and after all the transformations we will get:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 218


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• So after the transformation, we can easily delimit the two classes using just a
single line.
• In real life applications we won’t have a simple straight line, but we will have lots
of curves and high dimensions. In some cases we won’t have two hyperplanes
which separates the data with no points between them, so we need some trade-
offs, tolerance for outliers. Fortunately the SVM algorithm has a so-called
regularization parameter to configure the trade-off and to tolerate outliers.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 219


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Tuning Parameters
• As we saw in the previous section choosing the right kernel is crucial, because if
the transformation is incorrect, then the model can have very poor results. As a
rule of thumb, always check if you have linear data and in that case always use
linear SVM (linear kernel). Linear SVM is a parametric model, but an RBF kernel
SVM isn’t, so the complexity of the latter grows with the size of the training set.
Not only is more expensive to train an RBF kernel SVM, but you also have to keep
the kernel matrix around, and the projection into this “infinite” higher
dimensional space where the data becomes linearly separable is more expensive
as well during prediction. Furthermore, you have more hyperparameters to tune,
so model selection is more expensive as well! And finally, it’s much easier to
overfit a complex model!

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 220


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Regularization
• The Regularization Parameter (in python it’s called C) tells the SVM optimization how much
you want to avoid miss classifying each training example.
• If the C is higher, the optimization will choose smaller margin hyperplane, so training data
miss classification rate will be lower.
• On the other hand, if the C is low, then the margin will be big, even if there will be miss
classified training data examples. This is shown in the following two diagrams:

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 221


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• Gamma
• The next important parameter is Gamma. The gamma parameter defines how far
the influence of a single training example reaches. This means that high Gamma
will consider only points close to the plausible hyperplane and low Gamma will
consider points at greater distance.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 222


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• As you can see, decreasing the Gamma will result that finding the correct
hyperplane will consider points at greater distances so more and more
points will be used (green lines indicates which points were considered
when finding the optimal hyperplane).
• Margin
• The last parameter is the margin. We’ve already talked about margin,
higher margin results better model, so better classification (or
prediction). The margin should be always maximized.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 223


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK

• The most important parameters are:


1. kernel: the kernel type to be used. The most common kernels are rbf (this is the
default value), poly or sigmoid, but you can also create your own kernel.
2. C: this is the regularization parameter described in the Tuning Parameters section
3. gamma: this was also described in the Tuning Parameters section
4. degree: it is used only if the chosen kernel is poly and sets the degree of the
polinom
5. probability: this is a boolean parameter and if it’s true, then the model will return
for each prediction, the vector of probabilities of belonging to each class of the
response variable. So basically it will give you the confidences for each prediction.
6. shrinking: this shows whether or not you want a shrinking heuristic used in your
optimization of the SVM, which is used in Sequential Minimal Optimization. It’s
default value is true, an if you don’t have a good reason, please don’t change this
value to false, because shrinking will greatly improve your performance, for very
little loss in terms of accuracy in most cases.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 224


SVM
THE (CO1,2,3,4,5)
CONCEPT LEARNING TASK
Advantages
1. SVN can be very efficient, because it uses only a subset of the
training data, only the support vectors
2. Works very well on smaller data sets, on non-linear data sets
and high dimensional spaces
3. Is very effective in cases where number of dimensions is
greater than the number of samples
4. It can have high accuracy, sometimes can perform even better
than neural networks
5. Not very sensitive to overfitting

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 225


Introduction
THE (CO1)TASK
CONCEPT LEARNING

Disadvantages
1.Training time is high when we have large data sets
2.When the data set has more noise (i.e. target classes are
overlapping) SVM doesn’t perform well

Popular Use Cases

1.Text Classification
2.Detecting spam
3.Sentiment analysis
4.Aspect-based recognition
5.Aspect-based recognition
6.Handwritten digit recognition
9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 226
SVM
THE EXAMPLE
CONCEPT (CO1)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 227


Introduction
THE (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 228


Introduction
THE (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 229


Introduction
THE (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 230


Introduction
THE (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 231


Introduction
THE (CO1)TASK
CONCEPT LEARNING

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 232


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• Market Basket Analysis


• Market basket analysis is a data mining technique used by
retailers to increase sales by better understanding customer
purchasing patterns.
• It involves analyzing large data sets, such as purchase history,
to reveal product groupings and products that are likely to be
purchased together.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 233


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

How does Market Basket Analysis Work?


• Market Basket Analysis is modelled on Association rule mining, i.e., the IF
{}, THEN {} construct. For example, IF a customer buys bread, THEN he is
likely to buy butter as well.
• Association rules are usually represented as: {Bread} -> {Butter}
• Some terminologies to familiarize yourself with Market Basket Analysis
are:
• Antecedent: Items or 'itemsets' found within the data are antecedents. In
simpler words, it's the IF component, written on the left-hand side. In the
above example, bread is the antecedent.
• Consequent: A consequent is an item or set of items found in combination
with the antecedent. It's the THEN component, written on the right-hand
side. In the above example, butter is the consequent.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 234


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 235


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• Descriptive market basket analysis: This type only derives


insights from past data and is the most frequently used
approach. The analysis here does not make any predictions
but rates the association between products using statistical
techniques. For those familiar with the basics of Data Analysis,
this type of modelling is known as unsupervised learning.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 236


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• Predictive market basket analysis: This type uses supervised learning


models like classification and regression. It essentially aims to mimic the
market to analyze what causes what to happen. Essentially, it considers
items purchased in a sequence to determine cross-selling. For example,
buying an extended warranty is more likely to follow the purchase of an
iPhone. While it isn't as widely used as a descriptive MBA, it is still a very
valuable tool for marketers.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 237


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• Differential market basket analysis: This type of analysis is beneficial for


competitor analysis. It compares purchase history between stores,
between seasons, between two time periods, between different days of
the week, etc., to find interesting patterns in consumer behaviour. For
example, it can help determine why some users prefer to purchase the
same product at the same price on Amazon vs Flipkart. The answer can be
that the Amazon reseller has more warehouses and can deliver faster, or
maybe something more profound like user experience.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 238


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• Algorithms associated with Market Basket Analysis

• In market basket analysis, association rules are used to predict the


likelihood of products being purchased together. Association rules count
the frequency of items that occur together, seeking to find associations
that occur far more often than expected.

• Algorithms that use association rules include AIS, SETM and Apriori. The
Apriori algorithm is commonly cited by data scientists in research articles
about market basket analysis. It identifies frequent items in the database
and then evaluates their frequency as the datasets are expanded to larger
sizes.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 239


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• There are three components in APRIORI ALGORITHM:

• SUPPORT
• CONFIDENCE
• LIFT

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 240


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

• For example, suppose 5000 transactions have been made through a popular e-
Commerce website. Now they want to calculate the support, confidence, and
lift for the two products. For example, let's say pen and notebook, out of 5000
transactions, 500 transactions for pen, 700 transactions for notebook, and
1000 transactions for both.

SUPPORT
• It has been calculated with the number of transactions divided by the total
number of transactions made,
1. Support = freq(A, B)/N
• support(pen) = transactions related to pen/total transactions
• i.e support -> 500/5000=10 percent

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 241


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

CONFIDENCE

• Whether the product sales are popular on individual sales or through


combined sales has been calculated. That is calculated with combined
transactions/individual transactions.

• Confidence = freq (A, B)/ freq(A)

• Confidence = combine transactions/individual transactions

• i.e confidence-> 1000/500=20 percent

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 242


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

LIFT

• Lift is calculated for knowing the ratio for the sales.

• Lift = confidence percent/ support percent

• Lift-> 20/10=2

• When the Lift value is below 1, the combination is not so frequently bought by
consumers. But in this case, it shows that the probability of buying both the things
together is high when compared to the transaction for the individual items sold.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 243


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 244


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 245


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 246


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 247


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 248


Apriori Algorithm
THE CONCEPT (CO2)
LEARNING TASK

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 2 249


THE CONCEPT LEARNING
Assignment 1: TASK

1. Calculate the regression coefficient and obtain the lines of regression for the following data

2. Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations from a actual
means of X and Y.

Estimate the likely demand when the price is Rs.20.

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 250


THE CONCEPT LEARNING
Assignment 2: TASK

•What is entropy?
•What is information gain?
•How are entropy and information gain related vis-a-vis decision trees?
•How do you calculate the entropy of children nodes after the split based on on a
feature?
•How do you decide a feature suitability when working with decision tree?
•Explain feature selection using information gain/entropy technique?
•Which algorithm (packaged) is used for building models based on the decision tree?
•What are some of the techniques to decide decision tree pruning?

9/7/2022 Dr. Hitesh Singh KCS 055 ML Unit 1 251


THE CONCEPT
Daily LEARNING
Quiz TASK

1.Decision Tree
2.Entropy, Information Gain, Gini Impurity
3.Decision Tree Working For Categorical and Numerical Features
4.What are the scenarios where Decision Tree works well
5.Decision Tree Low Bias And High Variance- Overfitting
6.Hyperparameter Techniques
7.Library used for constructing decision tree
8.Impact of Outliers Of Decision Tree
9.Impact of mising values on Decision Tree
10.Does Decision Tree require Feature Scaling

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 252


THEGlossary
CONCEPTQuestions
LEARNING TASK

1. A _________ is a decision support tool that uses a tree-like graph or model of


decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

2. Decision Tree is a display of an algorithm.


a) True
b) False

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 253


THEGlossary
CONCEPTQuestions
LEARNING TASK

3. What is Decision Tree?


a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch
represents outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class label
d) None of the mentioned

4. Decision Trees can be used for Classification Tasks.


a) True
b) False

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 254


THE CONCEPT
MCQLEARNING TASK

Question 1 :
SVM stands for?
Options :
a. Simple Vector Machine
b. Support Vector Machine
c. Super Vector Machine
d. All the Above

Question 2 :
SVM is classified into how many types?
Options :
a. One
b. Two
c. Three
d. Four

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 255


THE CONCEPT
MCQLEARNING TASK

Question 3 :
SVM, which best segregates classes into how many classes?
Options :
a. One
b. Two
c. Three
d. Four

Question 4 :
SVM is a supervised Machine Learning can be used for
Options :
a. Regression
b. Classification
c. Either a or b
d. None of These

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 256


Faculty VideoTHE
Links,CONCEPT
Youtube & LEARNING
NPTEL Video Links
TASKand Online
Courses Details

Youtube video-

•https://www.youtube.com/watch?v=PDYfCkLY_DE
•https://www.youtube.com/watch?v=ncOirIPHTOw
•https://www.youtube.com/watch?v=cW03t3aZkmE

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 257


THEWeekly
CONCEPT LEARNING TASK
Assignment

Assignment 1
1. What are Support Vector Machines (SVMs)?

2. What are Support Vectors in SVMs?


3. What is the basic principle of a Support Vector Machine?

4. What are hard margin and soft Margin SVMs?

5. What do you mean by Hinge loss?

6. What is the “Kernel trick”?

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 258


THEOld
CONCEPT LEARNING
Question Papers TASK

Note: No old question paper available for this subject. Introduced


first time.

I have added expected question for university exam in next slide.

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 259


THEQuestions
Expected CONCEPT LEARNING TASK
for University Exam

• What is the role of the C hyper-parameter in SVM? Does it affect


the bias/variance trade-off?
• Explain different types of kernel functions.
• How you formulate SVM for a regression problem statement?
• What affects the decision boundary in SVM?
• What is a slack variable?
• What is a dual and primal problem and how is it relevant to
SVMs?

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 260


THE CONCEPT LEARNING TASK
References

Text books:

1. Tom M. Mitchell, ―Machine Learning, McGraw-Hill Education


(India) Private Limited, 2013.
2. Ethem Alpaydin, ―Introduction to Machine Learning (Adaptive
Computation and Machine Learning), The MIT Press 2004.
3. Stephen Marsland, ―Machine Learning: An Algorithmic
Perspective, CRC Press, 2009.
4. Bishop, C., Pattern Recognition and Machine Learning. Berlin:
Springer-Verlag.

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 261


THE CONCEPT LEARNING
Recap of Unit TASK
• The performance and interpretation of linear regression analysis are subject to
a variety of pitfalls, which are discussed here in detail. The reader is made
aware of common errors of interpretation through practical examples. Both the
opportunities for applying linear regression analysis and its limitations are
presented.
• A decision tree is a tree-shaped diagram that shows statistical probability or
determines a course of action. It shows the analysts and by that, the decision-
makers which steps they have to take and how different choices could affect
the whole process.
• SVM is a supervised learning algorithm which separates the data into
different classes through the use of a hyperplane. The chosen hyperplane is
one with the greatest margin between the hyperplane and all points, this yields
the greatest likelihood of accurate classification.
• Neural networks are suitable for predicting time series mainly because of
learning only from examples, without any need to add additional information
that can bring more confusion than prediction effect. Neural networks are able
to generalize and are resistant to noise. On the other hand, it is generally not
possible to determine exactly what a neural network learned and it is also hard
to estimate possible prediction error.
9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 262
CONTENT
INTRODUCTION

Thank you

9/7/2022 Gaurav Kumar RCS080 and ML Unit 1 263

You might also like