Module-3
Module-3
COLLEGE
[Educational Service : SNR Sons Charitable Trust]
[Autonomous Institution, Reaccredited by NAAC with ‘A+’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001:2015 Certified and all Eligible Programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.
Presentation by
Mrs.S.Jansi Rani, AP(Sr.Gr)/IT
COURSE OUTCOMES
20IT211- Data Science
Understand the basic concepts of data science and
CO1 PO1,PO2,PO12
data mining
2. Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and
Techniques”, 3 Edition, Morgan Kaufmann Publishers, 2012.
3. Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From
The Frontline”, O’Reilly, 2016.
2. Matt Harrison, “Learning the Pandas Library: Python Tools for Data
Munging, Analysis and Visualization O’Reilly, 2016.
3. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly
Media, 2015. 4. Wes McKinney, “Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython”, O’Reilly Media, 2012
Note: If the test set is used to select models, it is called validation (test) set
13
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Process (1): Model
Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
15
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Decision Tree Induction
A decision tree is a non-parametric supervised
The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes.
Based on the available features, both node types conduct evaluations to form homogenous subsets, which are
denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset
No Yes No Yes
40
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Bayesian Classification
Bayesian classifiers are statistical classifiers
They can predict class membership probabilities, such as the probability that a given tuple
belongs to a particular class
Studies comparing classification algorithms have found a simple Bayesian classifier known as the
naive Bayesian classifier to be comparable in performance with decision tree and selected neural
network classifiers.
Bayesian classifiers have also exhibited high accuracy and speed when applied to large
databases.
Bayesian belief networks are graphical models, which unlike naïve Bayesian
classifiers, allow the representation of dependencies among subsets of
attributes.
Bayesian belief networks can also be used for classification.
Let H be some hypothesis, such as that the data tuple X belongs to a specified class
C.
For classification problems, want to determine P(H|X), the probability that the
hypothesis H holds given the “evidence” or observed data tuple X.
In other words, we are looking for the probability that tuple X belongs to class C, given
that we know the attribute description of X. P(H|X) is the posterior probability, or a
posteriori probability, of H conditioned on X.
51
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 52
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 53
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 54
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 55
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 56
Bayesian Belief Networks
Bayesian belief networks specify joint conditional probability distributions.
Bayesian belief networks are also known as belief networks, Bayesian networks, and
probabilistic networks. Refer also as belief networks.
Each node in the directed acyclic graph represents a random variable. The
relationship
1. Size ordering
2. Rule ordering.
part).
IF-THEN rules can be extracted directly from the training data (i.e., without
having to generate a decision tree first) using a sequential covering algorithm.
The name comes from the notion that the rules are learned sequentially (one at
a time), where each rule for a given class will ideally cover many of the class’s
tuples (and hopefully none of the tuples of other classes).
Rules are learned one at a time. Each time a rule is learned, the tuples covered
by the rule are removed, and the process repeats on the remaining tuples.
True negatives .TN: These are the negative tuples that were correctly labeled by the classifier
False positives .FP: These are the negative tuples that were incorrectly labeled as positive (e.g.,
tuples of class buys computer = no for which the classifier predicted buys computer = yes).
False negatives .FN: These are the positive tuples that were mislabeled as negative. (e.g., tuples
of class buys computer = yes for which the classifier predicted buys computer = no).
First train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs and then test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
Linear SVM:
Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly
separable data and classifier is used called as Linear SVM classifier.
Non-linear SVM:
Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to
find out the best decision boundary that helps to classify the data points.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line.
To create a hyperplane that has a maximum margin, which means the maximum distance between the data
points.
The data points or vectors that are the closest to the hyperplane
But there can be multiple lines that can separate these classes. Consider the
below image:
SVM algorithm finds the closest point of the lines from both
the classes. These points are called support vectors.
X is a predictor variable
x 2 4 6 8
y 3 7 5 10
event occurring.
1.Binary logistic regression - When we have two possible outcomes, like our original
out our original example to predict whether someone may have the flu, an allergy, a
cold, or COVID-19.
3.Ordinal logistic regression - When the outcome is ordered, like if we build out our
For example,
To predict whether an email is spam (1)
or (0)
Whether the tumor is malignant (1) or not
(0)
The S-form curve is called the Sigmoid function or the logistic function
threshold value, which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold values tends to 0
Logistic equation
The main principle behind the ensemble model is that a group of weak
learners come together to form a strong learner.