0% found this document useful (0 votes)
10 views146 pages

Module-3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views146 pages

Module-3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 146

SRI RAMAKRISHNA ENGINEERING

COLLEGE
[Educational Service : SNR Sons Charitable Trust]
[Autonomous Institution, Reaccredited by NAAC with ‘A+’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001:2015 Certified and all Eligible Programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.

Department of Information Technology

20IT211- Data Science

Presentation by
Mrs.S.Jansi Rani, AP(Sr.Gr)/IT
COURSE OUTCOMES
20IT211- Data Science
Understand the basic concepts of data science and
CO1 PO1,PO2,PO12
data mining

CO2 Identify the techniques to explore and evaluate data PO3,PO5,PO12

Apply various data mining algorithms for real time PO2,PO3,PO5,P


CO3
applications O12

Implement the concepts of clustering and model


CO4 PO3,PO5,PO12
evaluation

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 2


20IT211- Data Science

Module I : INTRODUCTION 9 hours

What is data science – Case for data science – Data


science classification – Data science algorithms – Data
science process – Prior knowledge – Data preprocessing –
Data cleaning – Data integration – Data reduction – Data
transformation and data discretization – Feature selection
– Data sampling – Modeling – Application.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 3


20IT211- Data Science

Module II : DATA EXPLORATION AND VISUALIZATION


9 hours

Objectives of Data exploration – Datasets – Descriptive


statistics – Data Visualization – Univariate visualization –
Multivariate visualization – Visualizing high dimensional
data – Roadmap for data exploration.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 4


20IT211- Data Science

Module III : CLASSIFICATION AND ASSOCIATION


ANALYSIS 18 hours

Basic concepts of Classification – Decision tree induction –


Bayes classification methods – Rule based classification –
Techniques to improve classification accuracy – Support vector
machines – Regression methods: Linear regression – Logistic
regression – Association analysis: Frequent Item set mining
methods – Pattern evaluation methods.
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 5
20IT211- Data Science

Module IV : CLUSTERING AND MODEL EVALUATION 9


hours

Basic concepts and methods in cluster analysis – Partitioning


methods – Density based methods – Model evaluation:
Confusion matrix – Receiver Operator Characteristics (ROC) and
Area under the Curve (AUC) – Lift curves – Evaluating the
Predictions – Implementation

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 6


TEXTBOOKS
1. Vijay Kotu and Bala Deshpande, “Data Science Concepts and Practice”, 2
Edition, Morgan Kaufmann Publishers, 2019.

2. Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and
Techniques”, 3 Edition, Morgan Kaufmann Publishers, 2012.

3. Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From
The Frontline”, O’Reilly, 2016.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 7


Reference(s)
1. Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis:
Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.

2. Matt Harrison, “Learning the Pandas Library: Python Tools for Data
Munging, Analysis and Visualization O’Reilly, 2016.

3. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly
Media, 2015. 4. Wes McKinney, “Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython”, O’Reilly Media, 2012

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 8


WEB REFERENCES
1. https://nptel.ac.in/courses/106/106/106106179/

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 9


Acknowledgement
Resources are taken from the internet and textbooks

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 10


Supervised vs. Unsupervised
Learning
Supervised learning (classification)
◦ Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
◦ New data is classified based on the training set

Unsupervised learning (clustering)


◦ The class labels of training data is unknown
◦ Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
11
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Prediction Problems: Classification
vs. Numeric Prediction
Classification
◦ predicts categorical class labels (discrete or nominal)
◦ classifies data (constructs a model) based on the training set and the values
(class labels) in a classifying attribute and uses it in classifying new data
◦ Two steps: Learning step and Classification step
Numeric Prediction
◦ models continuous-valued functions, i.e., predicts unknown or missing
values
Typical applications
◦ Credit/loan approval:
◦ Medical diagnosis: if a tumor is cancerous or benign
◦ Fraud detection: if a transaction is fraudulent
◦ Web page categorization: which category it is
12
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Classification—A Two-Step
Process

Model construction: describing a set of predetermined classes


◦ Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
◦ The set of tuples used for model construction is training set
◦ The model is represented as classification rules, decision trees, or mathematical formulae

Model usage: for classifying future or unknown objects


◦ Estimate accuracy of the model
◦ The known label of test sample is compared with the classified result from the model
◦ Accuracy rate is the percentage of test set samples that are correctly classified by the
model
◦ Test set is independent of training set (otherwise overfitting)
◦ If the accuracy is acceptable, use the model to classify new data

Note: If the test set is used to select models, it is called validation (test) set
13
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Process (1): Model
Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no OR years > 6
Anne Associate Prof 3 no THEN tenured = ‘yes’
14
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Process (2): Using the Model

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
15
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Decision Tree Induction
A decision tree is a non-parametric supervised

learning algorithm, which is utilized for both

classification and regression tasks.

It has a hierarchical, tree structure (Flow-Chart

like tree structure), which consists of a root

node, branches, internal nodes and leaf nodes.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 16


Decision Tree Induction
 From the diagram above, a decision tree starts with a root node, which does not have any incoming branches.

The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes.

Based on the available features, both node types conduct evaluations to form homogenous subsets, which are

denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset

Flowchart structure also creates an easy to digest representation of decision-making

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 17


Decision Tree Induction: An
Example
 Training data set: Buys_computer
 The data set follows an example of age income student credit_rating buys_computer
<=30 high no fair no
Quinlan’s ID3 (Playing Tennis) <=30 high no excellent no
 Resulting tree: 31…40 high no fair yes
>40 medium no fair yes
age? >40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 overcast
31..40 >40 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
student? yes credit rating? 31…40 high yes fair yes
>40 medium no excellent no

no yes excellent fair

No Yes No Yes

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 18


Algorithm for Decision Tree
Induction
Basic algorithm (a greedy algorithm)
◦ Tree is constructed in a top-down recursive divide-and-conquer manner
◦ At start, all the training examples are at the root
◦ Attributes are categorical (if continuous-valued, they are discretized in
advance)
◦ Examples are partitioned recursively based on selected attributes
◦ Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
Conditions for stopping partitioning
◦ All samples for a given node belong to the same class
◦ There are no remaining attributes for further partitioning – majority voting
is employed for classifying the leaf
◦ There are no samples left
19
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Attribute Selection Measure
(ID3/C4.5)
Information gain

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 21


Attribute Selection Measure

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 22


Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 23


Gini index:
The Gini index is used in CART

If A has v possible values, then there are 2 v possible subsets.


For example, if income has three possible values, namely {low, medium, high},
 then the possible subsets are {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low},
{medium}, {high}, and {}.
 We exclude the power set, {low, medium, high}, and the empty set from consideration since, conceptually,
they do not represent a split.
 Therefore, there are 2 v −2 possible ways to form two partitions of the data, D, based on a binary split on A
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 24
If a binary split on A partitions D into D1 and D2, the gini
index of D given that partitioning is

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 25


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 26
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 27
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 28
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 29
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 30
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 31
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 32
Repeat the same procedure to find the next split attribute if not classified
fully.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 33


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 34
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 35
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 36
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 37
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 38
The attribute with largest reduction in gini index is selected as a split
attribute.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 39


Overfitting and Tree Pruning
Overfitting: An induced tree may overfit the training data
◦ Too many branches, some may reflect anomalies due to noise or
outliers
◦ Poor accuracy for unseen samples
Two approaches to avoid overfitting
◦ Prepruning: Halt tree construction early ̵ do not split a node if this
would result in the goodness measure falling below a threshold
◦ Difficult to choose an appropriate threshold
◦ Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
◦ Use a set of data different from the training data to decide which is the “best pruned tree”

40
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Bayesian Classification
Bayesian classifiers are statistical classifiers

They can predict class membership probabilities, such as the probability that a given tuple
belongs to a particular class

Bayesian classification is based on Bayes’ theorem

Studies comparing classification algorithms have found a simple Bayesian classifier known as the
naive Bayesian classifier to be comparable in performance with decision tree and selected neural
network classifiers.

Bayesian classifiers have also exhibited high accuracy and speed when applied to large
databases.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 41


Bayesian Classification
Naïve Bayesian classifiers assume that the effect of an attribute value on a
given class is independent of the values of the other attributes. This
assumption is called class conditional independence

Bayesian belief networks are graphical models, which unlike naïve Bayesian
classifiers, allow the representation of dependencies among subsets of
attributes.
Bayesian belief networks can also be used for classification.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 42


Bayesian Classification
Bayes’ Theorem:

Let X be a data tuple. In Bayesian terms, X is considered “evidence.”

 Let H be some hypothesis, such as that the data tuple X belongs to a specified class
C.

For classification problems, want to determine P(H|X), the probability that the
hypothesis H holds given the “evidence” or observed data tuple X.

In other words, we are looking for the probability that tuple X belongs to class C, given
that we know the attribute description of X. P(H|X) is the posterior probability, or a
posteriori probability, of H conditioned on X.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 43


P(H) is the prior probability, or a priori probability, of H

The posterior probability, P(H|X), is based on more information than the


prior probability, P(H), which is independent of X. Similarly, P(X|H) is the
posterior probability of X conditioned on H

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 44


Naïve Bayesian Classification

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 45


As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If
the class prior probabilities are not known, then it is commonly assumed that
the classes are equally likely, that is, P(C1) = P(C2) = ··· = P(Cm), and we
would therefore maximize P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci). Note
that the class prior probabilities may be estimated by P(Ci) = |Ci,D|/|D|,
where |Ci,D| is the number of training tuples of class Ci in D

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 46


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 47
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 48
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 49
Naïve Bayes Classifier: Training
Dataset age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data to be classified: 31…40 low yes excellent yes
<=30 medium no fair no
X = (age <=30,
<=30 low yes fair yes
Income = medium, >40 medium yes fair yes
Student = yes <=30 medium yes excellent yes
Credit_rating = Fair) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
50
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
age income studentcredit_rating
buys_computer

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30


<=30
high
high
no fair
no excellent
no
no
31…40 high no fair yes
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 >40
>40
medium
low
no fair
yes fair
yes
yes

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40


31…40
low
low
yes excellent
yes excellent
no
yes
<=30 medium no fair no
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 <=30
>40
low yes fair
medium yes fair
yes
yes

P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 <=30


31…40
medium yes excellent
medium no excellent
yes
yes
31…40 high yes fair yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 >40 medium no excellent no

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)


P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

51
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 52
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 53
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 54
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 55
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 56
Bayesian Belief Networks
Bayesian belief networks specify joint conditional probability distributions.

They allow class conditional independencies to be defined between subsets of variables.

They provide a graphical model of causal relationships, on which learning can be


performed.

Trained Bayesian belief networks can be used for classification.

 Bayesian belief networks are also known as belief networks, Bayesian networks, and
probabilistic networks. Refer also as belief networks.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 57


A belief network is defined by two components—a directed acyclic graph and

a set of conditional probability tables

Each node in the directed acyclic graph represents a random variable. The

variables may be discrete or continuous-valued. They may correspond to

actual attributes given in the data or to “hidden variables” believed to form a

relationship

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 58


Bayesian Belief Networks
Bayesian belief networks (also known as Bayesian
networks, probabilistic networks): allow class conditional
independencies between subsets of variables

A (directed acyclic) graphical model of causal relationships


◦ Represents dependency among the variables
◦ Gives a specification of joint probability distribution
 Nodes: random variables
 Links: dependency
X Y  X and Y are the parents of Z, and Y is the
parent of P
Z  No dependency between Z and P
P  Has no loops/cycles
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 59
59
Bayesian Belief Network: An
Example
Family CPT: Conditional Probability
Smoker (S)
History (FH) Table for variable LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1


LungCancer
Emphysema ~LC 0.2 0.5 0.3 0.9
(LC)

shows the conditional probability


for each possible combination of its
parents
Derivation of the probability of a
PositiveXRay Dyspnea particular combination of values
of X, from CPT:
n
Bayesian Belief Network P ( x1 ,..., xn )   P ( x i | Parents (Y i ))
i 1
60
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Rule-Based Classification

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 61


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 62
how we can use rule-based classification to predict the class label of a given
tuple, X
If a rule is satisfied by X, the rule is said to be triggered
X= (age = youth, income = medium, student = yes, credit rating = fair).
To classify X according to buys computer. X satisfies R1, which triggers the
rule.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 63


If R1 is the only rule satisfied, then the rule fires by returning the class
prediction for X.

If more than one rule is triggered: need a conflict resolution strategy to


figure out which rule gets to fire and assign its class prediction to X.

possible strategies are

1. Size ordering

2. Rule ordering.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 64


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 65
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 66
Approaches to Rule Genration

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 67


Rule Extraction from a Decision
Tree

To extract rules from a decision tree, one

rule is created for each path from the root

to a leaf node. Each splitting criterion

along a given path is logically ANDed to

form the rule antecedent (“IF” part). The

leaf node holds the class prediction,

forming the rule consequent (“THEN”

part).

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 68


Direct Extraction of Rules
Rule Induction Using a Sequential Covering Algorithm

IF-THEN rules can be extracted directly from the training data (i.e., without
having to generate a decision tree first) using a sequential covering algorithm.

The name comes from the notion that the rules are learned sequentially (one at
a time), where each rule for a given class will ideally cover many of the class’s
tuples (and hopefully none of the tuples of other classes).

Rules are learned one at a time. Each time a rule is learned, the tuples covered
by the rule are removed, and the process repeats on the remaining tuples.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 69


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 70
Metrics for Evaluating Classifier
Performance

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 71


True positives .TP: These refer to the positive tuples that were correctly labeled by the classifier.

True negatives .TN: These are the negative tuples that were correctly labeled by the classifier

False positives .FP: These are the negative tuples that were incorrectly labeled as positive (e.g.,
tuples of class buys computer = no for which the classifier predicted buys computer = yes).

False negatives .FN: These are the positive tuples that were mislabeled as negative. (e.g., tuples
of class buys computer = yes for which the classifier predicted buys computer = no).

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 72


class imbalance problem: where the main class of interest is rare. That is, the data set distribution
reflects a significant majority of the negative class and a minority positive class. For example, in
fraud detection applications, the class of interest (or positive class) is “fraud,” which occurs much
less frequently than the negative “nonfraudulant” class.

sensitivity and specificity measures can be


used,

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 73


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 74
An alternative way to use precision and recall is to combine them into a single
measure.
This is the approach of the F measure (also known as the F1 score or F-score) and the
F measure

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 75


Support Vector Machines

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 76


Support Vector Machines
SVM chooses the extreme points/vectors
that help in creating the hyperplane. These
extreme cases are called support
vectors and hence algorithm is termed a
Support Vector Machine.

Consider the diagram in which there are


two different categories that are classified
using a decision boundary or hyperplane.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 77


Support Vector Machines -
Example
Suppose see a strange cat that also has some features of dogs, so if want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using
the SVM algorithm.

 First train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs and then test it with this strange creature.

 So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog.

On the basis of the support vectors, it will classify it as a cat.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 78


Support Vector Machines
Consider the below diagram:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 79


Types of SVM
SVM can be of two types

Linear SVM:

Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly
separable data and classifier is used called as Linear SVM classifier.

Non-linear SVM:

 Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 80


Hyperplane and Support Vectors
in the SVM algorithm
Hyperplane:

There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to
find out the best decision boundary that helps to classify the data points.

 This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line.

And if there are 3 features, then hyperplane will be a 2-dimension plane.

To create a hyperplane that has a maximum margin, which means the maximum distance between the data
points.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 81


Hyperplane and Support Vectors
in the SVM algorithm
Support Vectors:

 The data points or vectors that are the closest to the hyperplane

and which affect the position of the hyperplane are termed as


Support Vector.

 Since these vectors support the hyperplane, hence called a


Support vector.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 82


How does SVM works?
Linear SVM
 The working of the SVM algorithm can be understood by using an example.
 Suppose we have a dataset that has two tags (green and blue), and the
dataset has two features x1 and x2.
 We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue.
Consider the image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 83


How does SVM works?
So as it is 2-d space so by just using a straight line, we can easily separate
these two classes.

 But there can be multiple lines that can separate these classes. Consider the
below image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 84


How does SVM works?
 Hence, the SVM algorithm helps to find the best line or
decision boundary; this best boundary or region is called as
a hyperplane.

 SVM algorithm finds the closest point of the lines from both
the classes. These points are called support vectors.

 The distance between the vectors and the hyperplane is called


as margin. And the goal of SVM is to maximize this margin.

 The hyperplane with maximum margin is called the optimal


hyperplane.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 85


How does SVM works?
Non-Linear SVM:

 If data is linearly arranged, then we can separate it by using a straight line,


but for non-linear data, we cannot draw a single straight line. Consider the
below image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 86


How does SVM works?

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 87


How does SVM works?
 So now, SVM will divide the datasets  Since we are in 3-d Space, hence it is
into classes in the following way. looking like a plane parallel to the x-axis. If
Consider the below image: we convert it in 2d space with z=1, then it
will become as:

 Hence we get a circumference of radius 1


in case of non-linear data.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 88


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 89


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 90


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 91


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 92


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 93


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 94


SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 95


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 96
Regression
 Regression analysis is a statistical method that helps us to analyze and
understand the relationship between two or more variables of interest
 Used to estimate the relationship between variables
 Predict the value of one variable (dependent variable) on the basis of
other variables (Independent variable)

19/01/25 20IT211- DATA SCIENCE 97


Variable
Dependent
Variable
 This is the variable that we are trying to understand or forecast.
Independent
Variable
 These are factors that influence the analysis or target variable
and provide us with information regarding the relationship of the
variables with the target variable
What is a person’s expected income?
 What is the probability that an applicant will default on a loan?

19/01/25 20IT211- DATA SCIENCE 98


Regression

19/01/25 20IT211- DATA SCIENCE 99


Regression

19/01/25 20IT211- DATA SCIENCE 100


Types

19/01/25 20IT211- DATA SCIENCE 101


Simple Regression
 Simple linear regression
 uses one independent variable to explain or
predict the outcome of the dependent variable
Y
 Dependent variable considered here is always a
continuous variable.
 predictive model used for finding
the linear relationship between a dependent
variable and one independent variables.

19/01/25 20IT211- DATA SCIENCE 102


Multiple Regression
 If the relationship between Independent and dependent variables are
multiple in number, then it is called Multiple Regression

19/01/25 20IT211- DATA SCIENCE 103


Linear Regression
Linear regression is the simplest form of regression in which data are
modelled using straight line

Where α and β are regression co-efficient specifying the y-intercept and


slope of the line respectively,

Y is the response variable

X is a predictor variable

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 104


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 105
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 106
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 107
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 108
Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 109


Example
Example 2:

You have to examine the

relationship between the age and

price of used cars sold in the last

year by a car dealership company.

Find the Price in dollars when the

car age is 15 years.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 110


Example
Find a linear regression equation for the following two sets of data:

x 2 4 6 8

y 3 7 5 10

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 111


Logistic
Regression

19/01/25 20IT211- DATA SCIENCE 112


What is logistic regression?

Logistic regression is an example of supervised learning. It is

used to calculate or predict the probability of a binary (yes/no)

event occurring.

An example of logistic regression could be applying machine

learning to determine if a person is likely to be infected with


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 113
 In the example, the probability of a person being infected with
COVID-19 could be based on the viral load and the symptoms
and the presence of antibodies, etc.
 Viral load, symptoms, and antibodies would be our factors (
Independent Variables), which would influence our outcome (
Dependent Variable).
 In linear regression, the outcome is continuous and can be
any possible value. However in the case of logistic regression,
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 114
The three types of logistic regression

1.Binary logistic regression - When we have two possible outcomes, like our original

example of whether a person is likely to be infected with COVID-19 or not.

2.Multinomial logistic regression - When we have multiple outcomes, say if we build

out our original example to predict whether someone may have the flu, an allergy, a

cold, or COVID-19.

3.Ordinal logistic regression - When the outcome is ordered, like if we build out our

original example to also help determine


19/01/25
the severity of a COVID-19 infection, sorting
20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 115
it
19/01/25 20IT211- DATA SCIENCE 116
19/01/25 20IT211- DATA SCIENCE 117
19/01/25 20IT211- DATA SCIENCE 118
19/01/25 20IT211- DATA SCIENCE 119
19/01/25 20IT211- DATA SCIENCE 120
19/01/25 20IT211- DATA SCIENCE 121
19/01/25 20IT211- DATA SCIENCE 122
Logistic vs Linear
 Logistic Regression is much similar to the Linear Regression except that how they are
used.
 Linear Regression is used for solving Regression problems, whereas Logistic regression
is used for solving the classification problems.

19/01/25 20IT211- DATA SCIENCE 123


Logistic Regression
 It is used for predicting the categorical dependent variable using a given
set of independent variables.

19/01/25 20IT211- DATA SCIENCE 124


Logistic Regression
 Produces result in a binary format which is used to predict the outcome of
a categorical dependent variable
 Outcome: discrete/categorical

For example,
To predict whether an email is spam (1)
or (0)
Whether the tumor is malignant (1) or not
(0)

19/01/25 20IT211- DATA SCIENCE 125


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 126
Logistic Regression
 In Logistic regression, instead of fitting a
regression line, we fit an "S" shaped logistic
function, which predicts two maximum
values (0 or 1)
 curve from the logistic function indicates
the likelihood of something such as
whether the cells are cancerous or not, a
mouse is obese or not based on its weight
 it has the ability to provide probabilities Sigmoid “S”
and classify new data using continuous and Curve
discrete datasets
19/01/25 20IT211- DATA SCIENCE 127
Sigmoid Function
 sigmoid function is a mathematical function used to
map the predicted values to probabilities
 maps any real value into another value within a
range of 0 and 1.
 The value of the logistic regression must be between
0 and 1, which cannot go beyond this limit, so it
forms a curve like the "S" form.

 The S-form curve is called the Sigmoid function or the logistic function
 threshold value, which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold values tends to 0

19/01/25 20IT211- DATA SCIENCE 128


Assumptions for Logistic
Regression
 The dependent variable must be categorical in nature.

 The independent variable should not have multi-collinearity

19/01/25 20IT211- DATA SCIENCE 129


Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 130


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 131
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 132
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 133
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 134
Logistic Regression Equation
straight line
equation

Range is from –infinity to +infinity

Logistic equation

Range is from 0 to +infinity Range is from 0 to 1

19/01/25 20IT211- DATA SCIENCE 135


glm() function
glm(formula,data,family

 formula is the symbol presenting the relationship between the variables.


 data is the data set giving the values of these variables.
 family is R object to specify the details of the model. It's value is binomial
for logistic regression.

19/01/25 20IT211- DATA SCIENCE 136


Type of Logistic Regression
 Binomial:
 there can be only two possible types of the dependent variables
 0 or 1, Pass or Fail
 Multinomial:
 there can be 3 or more possible unordered types of the dependent
variable,
 "cat", "dogs", or "sheep"
 Ordinal:
 there can be 3 or more possible ordered types of dependent variables,
 "low", "Medium", or "High".

19/01/25 20IT211- DATA SCIENCE 137


Use cases

19/01/25 20IT211- DATA SCIENCE 138


Implementing logistic
regression

19/01/25 20IT211- DATA SCIENCE 139


19/01/25 20IT211- DATA SCIENCE 140
Confusion Matrix
 A confusion matrix is a table that is often used to describe the performance
of a model (or "classifier") on a set of test data for which the true values
are known
 matrix compares the actual target values with those predicted by the
machine learning model

19/01/25 20IT211- DATA SCIENCE 141


Techniques to Improve
Classification Accuracy
Ensemble Method:

An ensemble for classification is a composite model, made up of a


combination of classifiers.

The individual classifiers vote, and a class label prediction is returned by


the ensemble based on the collection of votes.

Ensembles tend to be more accurate than their component classifiers

The main principle behind the ensemble model is that a group of weak
learners come together to form a strong learner.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 142


19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 143
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 144
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 145
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 146
Thank You

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 147

You might also like