0% found this document useful (0 votes)

10 views146 pages

Module-3

Uploaded by

shaanfaydh.2105130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views146 pages

Module-3

Uploaded by

shaanfaydh.2105130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 146

SRI RAMAKRISHNA ENGINEERING

COLLEGE
[Educational Service : SNR Sons Charitable Trust]
[Autonomous Institution, Reaccredited by NAAC with ‘A+’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001:2015 Certified and all Eligible Programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.

Department of Information Technology

20IT211- Data Science

Presentation by
Mrs.S.Jansi Rani, AP(Sr.Gr)/IT
COURSE OUTCOMES
20IT211- Data Science
Understand the basic concepts of data science and
CO1 PO1,PO2,PO12
data mining

CO2 Identify the techniques to explore and evaluate data PO3,PO5,PO12

Apply various data mining algorithms for real time PO2,PO3,PO5,P

CO3
applications O12

Implement the concepts of clustering and model

CO4 PO3,PO5,PO12
evaluation

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 2

20IT211- Data Science

Module I : INTRODUCTION 9 hours

What is data science – Case for data science – Data

science classification – Data science algorithms – Data
science process – Prior knowledge – Data preprocessing –
Data cleaning – Data integration – Data reduction – Data
transformation and data discretization – Feature selection
– Data sampling – Modeling – Application.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 3

20IT211- Data Science

Module II : DATA EXPLORATION AND VISUALIZATION

9 hours

Objectives of Data exploration – Datasets – Descriptive

statistics – Data Visualization – Univariate visualization –
Multivariate visualization – Visualizing high dimensional
data – Roadmap for data exploration.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 4

20IT211- Data Science

Module III : CLASSIFICATION AND ASSOCIATION

ANALYSIS 18 hours

Basic concepts of Classification – Decision tree induction –

Bayes classification methods – Rule based classification –
Techniques to improve classification accuracy – Support vector
machines – Regression methods: Linear regression – Logistic
regression – Association analysis: Frequent Item set mining
methods – Pattern evaluation methods.
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 5
20IT211- Data Science

Module IV : CLUSTERING AND MODEL EVALUATION 9

hours

Basic concepts and methods in cluster analysis – Partitioning

methods – Density based methods – Model evaluation:
Confusion matrix – Receiver Operator Characteristics (ROC) and
Area under the Curve (AUC) – Lift curves – Evaluating the
Predictions – Implementation

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 6

TEXTBOOKS
1. Vijay Kotu and Bala Deshpande, “Data Science Concepts and Practice”, 2
Edition, Morgan Kaufmann Publishers, 2019.

2. Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and
Techniques”, 3 Edition, Morgan Kaufmann Publishers, 2012.

3. Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From
The Frontline”, O’Reilly, 2016.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 7

Reference(s)
1. Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis:
Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.

2. Matt Harrison, “Learning the Pandas Library: Python Tools for Data
Munging, Analysis and Visualization O’Reilly, 2016.

3. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly
Media, 2015. 4. Wes McKinney, “Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython”, O’Reilly Media, 2012

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 8

WEB REFERENCES
1. https://nptel.ac.in/courses/106/106/106106179/

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 9

Acknowledgement
Resources are taken from the internet and textbooks

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 10

Supervised vs. Unsupervised
Learning
Supervised learning (classification)
◦ Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
◦ New data is classified based on the training set

Unsupervised learning (clustering)

◦ The class labels of training data is unknown
◦ Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
11
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Prediction Problems: Classification
vs. Numeric Prediction
Classification
◦ predicts categorical class labels (discrete or nominal)
◦ classifies data (constructs a model) based on the training set and the values
(class labels) in a classifying attribute and uses it in classifying new data
◦ Two steps: Learning step and Classification step
Numeric Prediction
◦ models continuous-valued functions, i.e., predicts unknown or missing
values
Typical applications
◦ Credit/loan approval:
◦ Medical diagnosis: if a tumor is cancerous or benign
◦ Fraud detection: if a transaction is fraudulent
◦ Web page categorization: which category it is
12
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Classification—A Two-Step
Process

Model construction: describing a set of predetermined classes

◦ Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
◦ The set of tuples used for model construction is training set
◦ The model is represented as classification rules, decision trees, or mathematical formulae

Model usage: for classifying future or unknown objects

◦ Estimate accuracy of the model
◦ The known label of test sample is compared with the classified result from the model
◦ Accuracy rate is the percentage of test set samples that are correctly classified by the
model
◦ Test set is independent of training set (otherwise overfitting)
◦ If the accuracy is acceptable, use the model to classify new data

Note: If the test set is used to select models, it is called validation (test) set
13
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Process (1): Model
Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no OR years > 6
Anne Associate Prof 3 no THEN tenured = ‘yes’
14
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Process (2): Using the Model

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
15
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Decision Tree Induction
A decision tree is a non-parametric supervised

learning algorithm, which is utilized for both

classification and regression tasks.

It has a hierarchical, tree structure (Flow-Chart

like tree structure), which consists of a root

node, branches, internal nodes and leaf nodes.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 16

Decision Tree Induction
 From the diagram above, a decision tree starts with a root node, which does not have any incoming branches.

The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes.

Based on the available features, both node types conduct evaluations to form homogenous subsets, which are

denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset

Flowchart structure also creates an easy to digest representation of decision-making

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 17

Decision Tree Induction: An
Example
 Training data set: Buys_computer
 The data set follows an example of age income student credit_rating buys_computer
<=30 high no fair no
Quinlan’s ID3 (Playing Tennis) <=30 high no excellent no
 Resulting tree: 31…40 high no fair yes
>40 medium no fair yes
age? >40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 overcast
31..40 >40 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
student? yes credit rating? 31…40 high yes fair yes
>40 medium no excellent no

no yes excellent fair

No Yes No Yes

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 18

Algorithm for Decision Tree
Induction
Basic algorithm (a greedy algorithm)
◦ Tree is constructed in a top-down recursive divide-and-conquer manner
◦ At start, all the training examples are at the root
◦ Attributes are categorical (if continuous-valued, they are discretized in
advance)
◦ Examples are partitioned recursively based on selected attributes
◦ Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
Conditions for stopping partitioning
◦ All samples for a given node belong to the same class
◦ There are no remaining attributes for further partitioning – majority voting
is employed for classifying the leaf
◦ There are no samples left
19
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Attribute Selection Measure
(ID3/C4.5)
Information gain

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 21

Attribute Selection Measure

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 22

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 23

Gini index:
The Gini index is used in CART

If A has v possible values, then there are 2 v possible subsets.

For example, if income has three possible values, namely {low, medium, high},
 then the possible subsets are {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low},
{medium}, {high}, and {}.
 We exclude the power set, {low, medium, high}, and the empty set from consideration since, conceptually,
they do not represent a split.
 Therefore, there are 2 v −2 possible ways to form two partitions of the data, D, based on a binary split on A
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 24
If a binary split on A partitions D into D1 and D2, the gini
index of D given that partitioning is

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 25

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 26
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 27
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 28
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 29
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 30
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 31
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 32
Repeat the same procedure to find the next split attribute if not classified
fully.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 33

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 34
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 35
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 36
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 37
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 38
The attribute with largest reduction in gini index is selected as a split
attribute.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 39

Overfitting and Tree Pruning
Overfitting: An induced tree may overfit the training data
◦ Too many branches, some may reflect anomalies due to noise or
outliers
◦ Poor accuracy for unseen samples
Two approaches to avoid overfitting
◦ Prepruning: Halt tree construction early ̵ do not split a node if this
would result in the goodness measure falling below a threshold
◦ Difficult to choose an appropriate threshold
◦ Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
◦ Use a set of data different from the training data to decide which is the “best pruned tree”

40
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Bayesian Classification
Bayesian classifiers are statistical classifiers

They can predict class membership probabilities, such as the probability that a given tuple
belongs to a particular class

Bayesian classification is based on Bayes’ theorem

Studies comparing classification algorithms have found a simple Bayesian classifier known as the
naive Bayesian classifier to be comparable in performance with decision tree and selected neural
network classifiers.

Bayesian classifiers have also exhibited high accuracy and speed when applied to large
databases.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 41

Bayesian Classification
Naïve Bayesian classifiers assume that the effect of an attribute value on a
given class is independent of the values of the other attributes. This
assumption is called class conditional independence

Bayesian belief networks are graphical models, which unlike naïve Bayesian
classifiers, allow the representation of dependencies among subsets of
attributes.
Bayesian belief networks can also be used for classification.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 42

Bayesian Classification
Bayes’ Theorem:

Let X be a data tuple. In Bayesian terms, X is considered “evidence.”

 Let H be some hypothesis, such as that the data tuple X belongs to a specified class
C.

For classification problems, want to determine P(H|X), the probability that the
hypothesis H holds given the “evidence” or observed data tuple X.

In other words, we are looking for the probability that tuple X belongs to class C, given
that we know the attribute description of X. P(H|X) is the posterior probability, or a
posteriori probability, of H conditioned on X.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 43

P(H) is the prior probability, or a priori probability, of H

The posterior probability, P(H|X), is based on more information than the

prior probability, P(H), which is independent of X. Similarly, P(X|H) is the
posterior probability of X conditioned on H

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 44

Naïve Bayesian Classification

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 45

As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If
the class prior probabilities are not known, then it is commonly assumed that
the classes are equally likely, that is, P(C1) = P(C2) = ··· = P(Cm), and we
would therefore maximize P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci). Note
that the class prior probabilities may be estimated by P(Ci) = |Ci,D|/|D|,
where |Ci,D| is the number of training tuples of class Ci in D

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 46

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 47
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 48
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 49
Naïve Bayes Classifier: Training
Dataset age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data to be classified: 31…40 low yes excellent yes
<=30 medium no fair no
X = (age <=30,
<=30 low yes fair yes
Income = medium, >40 medium yes fair yes
Student = yes <=30 medium yes excellent yes
Credit_rating = Fair) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
50
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
age income studentcredit_rating
buys_computer

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 <=30

<=30
high
high
no fair
no excellent
no
no
31…40 high no fair yes
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 >40
>40
medium
low
no fair
yes fair
yes
yes

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 >40

31…40
low
low
yes excellent
yes excellent
no
yes
<=30 medium no fair no
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 <=30
>40
low yes fair
medium yes fair
yes
yes

P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 <=30

31…40
medium yes excellent
medium no excellent
yes
yes
31…40 high yes fair yes
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 >40 medium no excellent no

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

51
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 52
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 53
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 54
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 55
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 56
Bayesian Belief Networks
Bayesian belief networks specify joint conditional probability distributions.

They allow class conditional independencies to be defined between subsets of variables.

They provide a graphical model of causal relationships, on which learning can be

performed.

Trained Bayesian belief networks can be used for classification.

 Bayesian belief networks are also known as belief networks, Bayesian networks, and
probabilistic networks. Refer also as belief networks.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 57

A belief network is defined by two components—a directed acyclic graph and

a set of conditional probability tables

Each node in the directed acyclic graph represents a random variable. The

variables may be discrete or continuous-valued. They may correspond to

actual attributes given in the data or to “hidden variables” believed to form a

relationship

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 58

Bayesian Belief Networks
Bayesian belief networks (also known as Bayesian
networks, probabilistic networks): allow class conditional
independencies between subsets of variables

A (directed acyclic) graphical model of causal relationships

◦ Represents dependency among the variables
◦ Gives a specification of joint probability distribution
 Nodes: random variables
 Links: dependency
X Y  X and Y are the parents of Z, and Y is the
parent of P
Z  No dependency between Z and P
P  Has no loops/cycles
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 59
59
Bayesian Belief Network: An
Example
Family CPT: Conditional Probability
Smoker (S)
History (FH) Table for variable LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

LungCancer
Emphysema ~LC 0.2 0.5 0.3 0.9
(LC)

shows the conditional probability

for each possible combination of its
parents
Derivation of the probability of a
PositiveXRay Dyspnea particular combination of values
of X, from CPT:
n
Bayesian Belief Network P ( x1 ,..., xn )   P ( x i | Parents (Y i ))
i 1
60
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT
Rule-Based Classification

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 61

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 62
how we can use rule-based classification to predict the class label of a given
tuple, X
If a rule is satisfied by X, the rule is said to be triggered
X= (age = youth, income = medium, student = yes, credit rating = fair).
To classify X according to buys computer. X satisfies R1, which triggers the
rule.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 63

If R1 is the only rule satisfied, then the rule fires by returning the class
prediction for X.

If more than one rule is triggered: need a conflict resolution strategy to

figure out which rule gets to fire and assign its class prediction to X.

possible strategies are

1. Size ordering

2. Rule ordering.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 64

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 65
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 66
Approaches to Rule Genration

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 67

Rule Extraction from a Decision
Tree

To extract rules from a decision tree, one

rule is created for each path from the root

to a leaf node. Each splitting criterion

along a given path is logically ANDed to

form the rule antecedent (“IF” part). The

leaf node holds the class prediction,

forming the rule consequent (“THEN”

part).

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 68

Direct Extraction of Rules
Rule Induction Using a Sequential Covering Algorithm

IF-THEN rules can be extracted directly from the training data (i.e., without
having to generate a decision tree first) using a sequential covering algorithm.

The name comes from the notion that the rules are learned sequentially (one at
a time), where each rule for a given class will ideally cover many of the class’s
tuples (and hopefully none of the tuples of other classes).

Rules are learned one at a time. Each time a rule is learned, the tuples covered
by the rule are removed, and the process repeats on the remaining tuples.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 69

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 70
Metrics for Evaluating Classifier
Performance

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 71

True positives .TP: These refer to the positive tuples that were correctly labeled by the classifier.

True negatives .TN: These are the negative tuples that were correctly labeled by the classifier

False positives .FP: These are the negative tuples that were incorrectly labeled as positive (e.g.,
tuples of class buys computer = no for which the classifier predicted buys computer = yes).

False negatives .FN: These are the positive tuples that were mislabeled as negative. (e.g., tuples
of class buys computer = yes for which the classifier predicted buys computer = no).

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 72

class imbalance problem: where the main class of interest is rare. That is, the data set distribution
reflects a significant majority of the negative class and a minority positive class. For example, in
fraud detection applications, the class of interest (or positive class) is “fraud,” which occurs much
less frequently than the negative “nonfraudulant” class.

sensitivity and specificity measures can be

used,

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 73

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 74
An alternative way to use precision and recall is to combine them into a single
measure.
This is the approach of the F measure (also known as the F1 score or F-score) and the
F measure

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 75

Support Vector Machines

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 76

Support Vector Machines
SVM chooses the extreme points/vectors
that help in creating the hyperplane. These
extreme cases are called support
vectors and hence algorithm is termed a
Support Vector Machine.

Consider the diagram in which there are

two different categories that are classified
using a decision boundary or hyperplane.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 77

Support Vector Machines -
Example
Suppose see a strange cat that also has some features of dogs, so if want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using
the SVM algorithm.

 First train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs and then test it with this strange creature.

 So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog.

On the basis of the support vectors, it will classify it as a cat.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 78

Support Vector Machines
Consider the below diagram:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 79

Types of SVM
SVM can be of two types

Linear SVM:

Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly
separable data and classifier is used called as Linear SVM classifier.

Non-linear SVM:

 Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 80

Hyperplane and Support Vectors
in the SVM algorithm
Hyperplane:

There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to
find out the best decision boundary that helps to classify the data points.

 This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line.

And if there are 3 features, then hyperplane will be a 2-dimension plane.

To create a hyperplane that has a maximum margin, which means the maximum distance between the data
points.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 81

Hyperplane and Support Vectors
in the SVM algorithm
Support Vectors:

 The data points or vectors that are the closest to the hyperplane

and which affect the position of the hyperplane are termed as

Support Vector.

 Since these vectors support the hyperplane, hence called a

Support vector.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 82

How does SVM works?
Linear SVM
 The working of the SVM algorithm can be understood by using an example.
 Suppose we have a dataset that has two tags (green and blue), and the
dataset has two features x1 and x2.
 We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue.
Consider the image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 83

How does SVM works?
So as it is 2-d space so by just using a straight line, we can easily separate
these two classes.

 But there can be multiple lines that can separate these classes. Consider the
below image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 84

How does SVM works?
 Hence, the SVM algorithm helps to find the best line or
decision boundary; this best boundary or region is called as
a hyperplane.

 SVM algorithm finds the closest point of the lines from both
the classes. These points are called support vectors.

 The distance between the vectors and the hyperplane is called

as margin. And the goal of SVM is to maximize this margin.

 The hyperplane with maximum margin is called the optimal

hyperplane.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 85

How does SVM works?
Non-Linear SVM:

 If data is linearly arranged, then we can separate it by using a straight line,

but for non-linear data, we cannot draw a single straight line. Consider the
below image:

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 86

How does SVM works?

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 87

How does SVM works?
 So now, SVM will divide the datasets  Since we are in 3-d Space, hence it is
into classes in the following way. looking like a plane parallel to the x-axis. If
Consider the below image: we convert it in 2d space with z=1, then it
will become as:

 Hence we get a circumference of radius 1

in case of non-linear data.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 88

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 89

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 90

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 91

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 92

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 93

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 94

SVM – Linear

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 95

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 96
Regression
 Regression analysis is a statistical method that helps us to analyze and
understand the relationship between two or more variables of interest
 Used to estimate the relationship between variables
 Predict the value of one variable (dependent variable) on the basis of
other variables (Independent variable)

19/01/25 20IT211- DATA SCIENCE 97

Variable
Dependent
Variable
 This is the variable that we are trying to understand or forecast.
Independent
Variable
 These are factors that influence the analysis or target variable
and provide us with information regarding the relationship of the
variables with the target variable
What is a person’s expected income?
 What is the probability that an applicant will default on a loan?

19/01/25 20IT211- DATA SCIENCE 98

Regression

19/01/25 20IT211- DATA SCIENCE 99

Regression

19/01/25 20IT211- DATA SCIENCE 100

Types

19/01/25 20IT211- DATA SCIENCE 101

Simple Regression
 Simple linear regression
 uses one independent variable to explain or
predict the outcome of the dependent variable
Y
 Dependent variable considered here is always a
continuous variable.
 predictive model used for finding
the linear relationship between a dependent
variable and one independent variables.

19/01/25 20IT211- DATA SCIENCE 102

Multiple Regression
 If the relationship between Independent and dependent variables are
multiple in number, then it is called Multiple Regression

19/01/25 20IT211- DATA SCIENCE 103

Linear Regression
Linear regression is the simplest form of regression in which data are
modelled using straight line

Where α and β are regression co-efficient specifying the y-intercept and

slope of the line respectively,

Y is the response variable

X is a predictor variable

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 104

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 105
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 106
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 107
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 108
Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 109

Example
Example 2:

You have to examine the

relationship between the age and

price of used cars sold in the last

year by a car dealership company.

Find the Price in dollars when the

car age is 15 years.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 110

Example
Find a linear regression equation for the following two sets of data:

x 2 4 6 8

y 3 7 5 10

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 111

Logistic
Regression

19/01/25 20IT211- DATA SCIENCE 112

What is logistic regression?

Logistic regression is an example of supervised learning. It is

used to calculate or predict the probability of a binary (yes/no)

event occurring.

An example of logistic regression could be applying machine

learning to determine if a person is likely to be infected with

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 113
 In the example, the probability of a person being infected with
COVID-19 could be based on the viral load and the symptoms
and the presence of antibodies, etc.
 Viral load, symptoms, and antibodies would be our factors (
Independent Variables), which would influence our outcome (
Dependent Variable).
 In linear regression, the outcome is continuous and can be
any possible value. However in the case of logistic regression,
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 114
The three types of logistic regression

1.Binary logistic regression - When we have two possible outcomes, like our original

example of whether a person is likely to be infected with COVID-19 or not.

2.Multinomial logistic regression - When we have multiple outcomes, say if we build

out our original example to predict whether someone may have the flu, an allergy, a

cold, or COVID-19.

3.Ordinal logistic regression - When the outcome is ordered, like if we build out our

original example to also help determine

19/01/25
the severity of a COVID-19 infection, sorting
20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 115
it
19/01/25 20IT211- DATA SCIENCE 116
19/01/25 20IT211- DATA SCIENCE 117
19/01/25 20IT211- DATA SCIENCE 118
19/01/25 20IT211- DATA SCIENCE 119
19/01/25 20IT211- DATA SCIENCE 120
19/01/25 20IT211- DATA SCIENCE 121
19/01/25 20IT211- DATA SCIENCE 122
Logistic vs Linear
 Logistic Regression is much similar to the Linear Regression except that how they are
used.
 Linear Regression is used for solving Regression problems, whereas Logistic regression
is used for solving the classification problems.

19/01/25 20IT211- DATA SCIENCE 123

Logistic Regression
 It is used for predicting the categorical dependent variable using a given
set of independent variables.

19/01/25 20IT211- DATA SCIENCE 124

Logistic Regression
 Produces result in a binary format which is used to predict the outcome of
a categorical dependent variable
 Outcome: discrete/categorical

For example,
To predict whether an email is spam (1)
or (0)
Whether the tumor is malignant (1) or not
(0)

19/01/25 20IT211- DATA SCIENCE 125

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 126
Logistic Regression
 In Logistic regression, instead of fitting a
regression line, we fit an "S" shaped logistic
function, which predicts two maximum
values (0 or 1)
 curve from the logistic function indicates
the likelihood of something such as
whether the cells are cancerous or not, a
mouse is obese or not based on its weight
 it has the ability to provide probabilities Sigmoid “S”
and classify new data using continuous and Curve
discrete datasets
19/01/25 20IT211- DATA SCIENCE 127
Sigmoid Function
 sigmoid function is a mathematical function used to
map the predicted values to probabilities
 maps any real value into another value within a
range of 0 and 1.
 The value of the logistic regression must be between
0 and 1, which cannot go beyond this limit, so it
forms a curve like the "S" form.

 The S-form curve is called the Sigmoid function or the logistic function
 threshold value, which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold values tends to 0

19/01/25 20IT211- DATA SCIENCE 128

Assumptions for Logistic
Regression
 The dependent variable must be categorical in nature.

 The independent variable should not have multi-collinearity

19/01/25 20IT211- DATA SCIENCE 129

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 130

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 131
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 132
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 133
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 134
Logistic Regression Equation
straight line
equation

Range is from –infinity to +infinity

Logistic equation

Range is from 0 to +infinity Range is from 0 to 1

19/01/25 20IT211- DATA SCIENCE 135

glm() function
glm(formula,data,family

 formula is the symbol presenting the relationship between the variables.

 data is the data set giving the values of these variables.
 family is R object to specify the details of the model. It's value is binomial
for logistic regression.

19/01/25 20IT211- DATA SCIENCE 136

Type of Logistic Regression
 Binomial:
 there can be only two possible types of the dependent variables
 0 or 1, Pass or Fail
 Multinomial:
 there can be 3 or more possible unordered types of the dependent
variable,
 "cat", "dogs", or "sheep"
 Ordinal:
 there can be 3 or more possible ordered types of dependent variables,
 "low", "Medium", or "High".

19/01/25 20IT211- DATA SCIENCE 137

Use cases

19/01/25 20IT211- DATA SCIENCE 138

Implementing logistic
regression

19/01/25 20IT211- DATA SCIENCE 139

19/01/25 20IT211- DATA SCIENCE 140
Confusion Matrix
 A confusion matrix is a table that is often used to describe the performance
of a model (or "classifier") on a set of test data for which the true values
are known
 matrix compares the actual target values with those predicted by the
machine learning model

19/01/25 20IT211- DATA SCIENCE 141

Techniques to Improve
Classification Accuracy
Ensemble Method:

An ensemble for classification is a composite model, made up of a

combination of classifiers.

The individual classifiers vote, and a class label prediction is returned by

the ensemble based on the collection of votes.

Ensembles tend to be more accurate than their component classifiers

The main principle behind the ensemble model is that a group of weak
learners come together to form a strong learner.

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 142

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 143
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 144
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 145
19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 146
Thank You

19/01/25 20IT211- DATA SCIENCE - MRS.S. JANSI RANI, AP(SR.GR)/IT 147

Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Module-1
No ratings yet
Module-1
140 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
AFRICDSA Certified Data Scientist Syllabus - V1.2
No ratings yet
AFRICDSA Certified Data Scientist Syllabus - V1.2
12 pages
Unit 4
No ratings yet
Unit 4
186 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Science Intro Mulawarman
No ratings yet
Data Science Intro Mulawarman
89 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
ho
No ratings yet
ho
9 pages
Classification
No ratings yet
Classification
33 pages
Down 4
No ratings yet
Down 4
83 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Unit 3
No ratings yet
Unit 3
16 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
IntroClassificationDA-2024
No ratings yet
IntroClassificationDA-2024
129 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
CP1407 Practials1to Practials 5
No ratings yet
CP1407 Practials1to Practials 5
32 pages
CH 5
No ratings yet
CH 5
84 pages
brochure (3)
No ratings yet
brochure (3)
13 pages
01. Introduction
No ratings yet
01. Introduction
20 pages
Machine
No ratings yet
Machine
61 pages
Module-2
No ratings yet
Module-2
83 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
Introduction to Data Science Course Outline
No ratings yet
Introduction to Data Science Course Outline
5 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Module-4
No ratings yet
Module-4
97 pages
7 Classification
100% (3)
7 Classification
63 pages
Class Basic
No ratings yet
Class Basic
67 pages
FDS Unit-4
No ratings yet
FDS Unit-4
15 pages
06 Classification
No ratings yet
06 Classification
32 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
DM_06-Mar-2025
No ratings yet
DM_06-Mar-2025
13 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Lect 1
No ratings yet
Lect 1
38 pages
course report
No ratings yet
course report
22 pages
CSC407_Chapter 1
No ratings yet
CSC407_Chapter 1
31 pages
Data Science Foundations
No ratings yet
Data Science Foundations
4 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Report Print
No ratings yet
Report Print
22 pages
Unit 4
No ratings yet
Unit 4
20 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Recent Advances in Wireless Communications and Networks
No ratings yet
Recent Advances in Wireless Communications and Networks
466 pages
Bayes Practice Book
No ratings yet
Bayes Practice Book
229 pages
Bandit Algorithms
No ratings yet
Bandit Algorithms
596 pages
Philosophy and The Practice of Bayesian Statistics in The Social Sciences
No ratings yet
Philosophy and The Practice of Bayesian Statistics in The Social Sciences
14 pages
BF Prediction Error Alai
No ratings yet
BF Prediction Error Alai
25 pages
John Geweke Present Positions
No ratings yet
John Geweke Present Positions
26 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
Hennig P. Probabilistic Numerics. Computation As Machine Learning 2022
No ratings yet
Hennig P. Probabilistic Numerics. Computation As Machine Learning 2022
411 pages
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach download pdf
100% (5)
(eBook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach download pdf
41 pages
The Frailty Model pdf docx
No ratings yet
The Frailty Model pdf docx
15 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
List of Previously Mapped Courses 300824 (1)
No ratings yet
List of Previously Mapped Courses 300824 (1)
23 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Iknayan Et Al, 2014 Detecting Diversity Emerging Methods To Estimate Species Diversity
No ratings yet
Iknayan Et Al, 2014 Detecting Diversity Emerging Methods To Estimate Species Diversity
10 pages
Bayesian Hypothesis Testing
No ratings yet
Bayesian Hypothesis Testing
1 page
Probabilistic Reasoning in AI
No ratings yet
Probabilistic Reasoning in AI
29 pages
Bayesian Tutorial
83% (6)
Bayesian Tutorial
76 pages
Short-Term Actuarial Mathematics Exam-October 2018
No ratings yet
Short-Term Actuarial Mathematics Exam-October 2018
8 pages
Bayesian Statistics Explained in Simple English For Beginners PDF
100% (1)
Bayesian Statistics Explained in Simple English For Beginners PDF
19 pages
Exercise 4 - Questions
0% (1)
Exercise 4 - Questions
4 pages
Book Review: Bayesian Statistics The Fun Way: Understanding Statistics and Probability With Star Wars, Lego, and Rubber Ducks
No ratings yet
Book Review: Bayesian Statistics The Fun Way: Understanding Statistics and Probability With Star Wars, Lego, and Rubber Ducks
4 pages
Bayesian Methods Statistical Analysis
100% (7)
Bayesian Methods Statistical Analysis
697 pages
Menezes Et Al 2017
No ratings yet
Menezes Et Al 2017
16 pages
Bayesian Applications in Pharmaceutical Development 1st Edition Mani Lakshminarayanan (Editor) 2024 Scribd Download
100% (2)
Bayesian Applications in Pharmaceutical Development 1st Edition Mani Lakshminarayanan (Editor) 2024 Scribd Download
55 pages
Bayesian Optimization : Theory and Practice Using Python Peng Liu 2024 Scribd Download
100% (1)
Bayesian Optimization : Theory and Practice Using Python Peng Liu 2024 Scribd Download
57 pages
Supp2 2
No ratings yet
Supp2 2
307 pages
Research Method Using r
No ratings yet
Research Method Using r
442 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Unit - II Data Analysis
No ratings yet
Unit - II Data Analysis
49 pages
Non-Linear Moving Target Tracking: A Particle Filter Approach
No ratings yet
Non-Linear Moving Target Tracking: A Particle Filter Approach
7 pages