0% found this document useful (0 votes)

7 views18 pages

CP4252 ML UNIT-III

The document discusses various machine learning techniques, focusing on unsupervised learning, clustering validity, recommendation systems, the EM algorithm, reinforcement learning, and model-based learning. It outlines the principles, advantages, disadvantages, and applications of these methods, emphasizing the importance of clustering validity and the role of feedback in reinforcement learning. Additionally, it highlights different types of recommendation systems and their real-world applications.

Uploaded by

SMILEY FF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views18 pages

CP4252 ML UNIT-III

Uploaded by

SMILEY FF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT – III

1].Unsupervised Learning

Unsupervised learning is a type of machine learning in which models are

trained using unlabelled dataset and are allowed to act on that data without
any supervision.
The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset in a
compressed format.

Unsupervised Algorithm:
1. K-means clustering
2. KNN (k-nearest neighbours)
3. Hierarchal clustering
4. Anomaly detection
5. Neural Networks
6. Principle Component Analysis
7. Apriori algorithm
Working of Unsupervised Learning Algorithm
• We have taken an unlabelled input data, which means it is not categorized
and corresponding outputs are also not given.
• Now, this unlabelled input data is fed to the machine learning model in
order to train it.
• Firstly, it will interpret the raw data to find the hidden patterns from the
data and then will apply suitable algorithms such as k-means clustering,
Decision tree, etc.
• Once it applies the suitable algorithm, the algorithm divides the data
objects into groups according to the similarities and difference between
the objects.

Why use Unsupervised Learning

1. Unsupervised learning is helpful for finding useful insights from the data.
2. To find hidden structure of data.
3. Unsupervised learning works on unlabelled and uncategorized data which
make unsupervised learning more important.
4. In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.
Types of unsupervised machine learning :

Type 1: Clustering :
• Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no
similarities with the objects of another group.
• Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those
commonalities.
Type 2: Association :
• An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database.
• It determines the set of items that occurs together in the dataset.
• Association rule makes marketing strategy more effective.

Example:
1. Market Basket Analysis:
2. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam)
3. item.
4. Statistical data analysis
5. Social network analysis
6. Image segmentation
7. Anomaly detection
Advantages & Disadvantages :
Advantages of Unsupervised Learning:

• Unsupervised learning is used for more complex tasks as compared to

supervised learning because, in unsupervised learning, we don't have
labelled input data.
• Unsupervised learning is preferable as it is easy to get unlabelled data in
comparison to labelled data.

Disadvantages:
• Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate
as input data is not labelled, and algorithms do not know the exact
output in advance.

2].Clustering validity

• Clustering Validity refers to methods for evaluating the quality of

clustering results when no ground truth (i.e., class labels) is
available.
• The goal is to measure how well the data has been clustered, i.e.,
how coherent the clusters are and how separated they are from
each other.
Why is it Important?
• In unsupervised learning, we don’t have labeled data.
• Hence, we need internal or external methods to validate how “good” our
clustering is.
• Helps in:
o Choosing the number of clusters (k)
o Comparing different clustering algorithms
o Avoiding overfitting or underfitting

Types of Clustering Validity Indices :

1. Internal Validation
Measures how well the clusters are formed based on:
• Compactness: how close data points are within a cluster
• Separation: how far apart clusters are from each other

Common Internal Indices:

Metric Purpose Good
When...
Silhouette Measures both cohesion & Value close to
Coefficient separation 1

Davies–Bouldin Lower value = better clustering Value close to

Index 0

Dunn Index High = well-separated & compact Higher is better

Within-Cluster SSE Sum of squared distances within Lower is better

cluster
Silhouette Score Formula
For a point iii:

Where:
• a(i) : average intra-cluster distance
• b(i) : average nearest-cluster distance
• s(i) ∈ [−1,1] ; close to 1 means better fit

2. External Validation
Compares clustering against known ground truth (if available).

Common External Indices:

Metric Description
Measures similarity between predicted and actual
Rand Index (RI)
clusterings

Adjusted Rand Index

Adjusted for chance
(ARI)

F-Measure Combines precision and recall

Jaccard Coefficient Overlap between true and predicted clusters

External validation is only possible if ground truth labels are available.

3. Relative Validation
• Compares multiple clustering results to choose the best one.
• Often used when tuning the number of clusters kkk (like in K-Means).

How to Perform Clustering Validation in Practice

1. Run the clustering algorithm (e.g., K-Means).
2. Compute internal validity scores (e.g., Silhouette, DBI).
3. Try different number of clusters (k).
4. Choose the k that gives the best validation score.

Example Use in K-Means

• Run for k=2 to k=10
• For each k , compute:
o Silhouette Score
o Inertia (Within-cluster SSE)
• Plot and choose the "elbow point" or highest silhouette.
3].Recommendation System

To predict user preferences and suggest items that the user is likely to interact
with, purchase, or enjoy, such as:
• Products (Amazon)
• Movies (Netflix)
• Music (Spotify)
• Friends (Facebook)
• Ads (Google)

Core Idea
Recommendation systems analyze past behavior (likes, views, purchases) and
use data mining + machine learning to:
• Recommend similar items (item-based)
• Recommend based on similar users (user-based)
• Predict user ratings or preferences
Types of Recommendation Systems
1. Content-Based Filtering
• Recommends items similar to those the user liked before
• Based on item features and user profiles
• Uses:
o Keywords
o Categories
o Tags
Example:
If you watched Inception, it recommends Interstellar (same genre/director).

2. Collaborative Filtering
• Based on user-item interaction matrix
• Two types:
o User-based: Find users with similar tastes
o Item-based: Find items similar to what user liked
Example:
"Users who watched Avengers also watched Iron Man"

3. Hybrid Methods
• Combine content-based and collaborative filtering
• Improves accuracy, handles cold-start problems better
Used in:
Netflix, Amazon, YouTube
4. Model-Based Filtering
• Use machine learning models to predict ratings or preferences
• Techniques:
o Matrix factorization (e.g., SVD)
o Deep learning (autoencoders, neural CF)
o Clustering models

Real-World Applications :
Platform Recommendation

Amazon Product recommendations

Netflix Movies & series suggestions

Spotify Music playlists

YouTube Video recommendations

Facebook/Instagram People to follow, content feed

Google Ads Ad targeting based on preferences

4]. EM ALGORITHM

• In the real-world applications of machine learning, it is very common that

there are many relevant features available for learning but only a small
subset of them are observable.
• The Expectation-Maximization algorithm can be used for the latent variables
(variables that are not directly observable and are actually inferred from the
values of the other observed variables).
• This algorithm is actually the base for many unsupervised clustering
algorithms in the field of machine learning.

Let us understand the EM algorithm in detail.

• Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that
the observed data comes from a specific model.
• The next step is known as "Expectation" - step or E-step. In this step, we
use the observed data in order to estimate or guess the values of the
missing or incomplete data. It is basically used to update the variables.
• The next step is known as "Maximization"-step or M-step. In this step, we
use the complete data generated in the preceding "Expectation" - step in
order to update the values of the parameters. It is basically used to update
the hypothesis.
• Now, in the fourth step, it is checked whether the values are converging
or not, if yes, then stop otherwise repeat step-2 and step-3 i.e.
"Expectation" - step and "Maximization" — step until the convergence
occurs.
Algorithm:
• Given a set of incomplete data, consider a set of starting
parameters.
• Expectation step (E - step): Using the observed available data of
the dataset, estimate (guess) the values of the missing data.
• Maximization step (M - step): Complete data generated after the
expectation(E) step is used in order to update the parameters.
• Repeat step 2 and step 3 until convergence.

Usage of EM algorithm :
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden
Markov Model (HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm :
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms
of implementation.
• Solutions to the M-steps often exist in the closed form.

Disadvantages of EM algorithm :
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical
optimization requires only forward probability).

5].Reinforcement Learning

Introduction to Reinforcement Learning

• Reinforcement Learning is a feedback-based Machine learning

Approach here an agent learns to which actions to perform by
looking at the environment and the results of actions.
• For each correct action, the agent gets positive feedback, and
for each incorrect action, the agent gets negative feedback or
penalty.
• The agent interacts with the environment and identifies the possible
actions he can perform.
• The primary goal of an agent in reinforcement learning is to perform
actions by looking at the environment and get the maximum positive
rewards.
• In Reinforcement Learning, the agent learns automatically using
feedbacks without any labeled data, unlike supervised learning.
• Since there is no labeled data, so the agent is bound to learn by its
experience only.
• Reinforcement Learning is used to solve specific type of problem where
decision making is sequential, and the goal is long-term, such as game-
playing, robotics, etc.

There are two types of reinforcement learning - positive and

negative.

• Positive reinforcement learning is a recurrence of behaviour due to

positive rewards.
• Rewards increase strength and the frequency of a specific behaviour.
• This encourages to execute similar actions that yield maximum reward.
• Similarly, in negative reinforcement learning, negative rewards are used
as a deterrent to weaken the behaviour and to avoid it.
• Rewards decreases strength and the frequency of a specific behaviour.
• In a maze game, there may a danger spot that may lead to loss.
• Negative rewards can be designed for such spots so that the agent does
not visit that spot.
• Positive and negative rewards are simulated in reinforcement learning,
say +10 for positive reward and -10 for some danger or negative reward.
• Reinforcement learning is an example of semi-supervised learning
technique and is used to model sequential decision-making process.
Scope of reinforcement learning :

• What are the situations where reinforcement learning can be used?

• Consider the following grid game shown in Figure, where a robot can
move.
• Assume the starting node is E and goal node is G, the game is about
finding the shortest path from starting to goal state.
• Consider another grid game as shown in Figure.
• In this grid game, the grey tile indicates the danger, black is a block and
the tile with diagonal lines is the goal.
• The aim is to start, say from bottom-left grid, using the actions left, right,
top and bottom to reach the goal state.
• Reinforcement learning is highly suitable for solving problems like this
especially the ones with uncertainty
• Reinforcement is not suitable for environments where complete
information is available.
• For example, the problems like object detection, face recognition, fraud
detction can be better solved using a classifier than by reinforcement
learning.
6]. Model-Based Learning

• Model-based learning refers to a class of machine learning approaches

that involve building a mathematical or probabilistic model of the data
and using that model to make predictions or decisions.
• A model captures the underlying patterns or relationships in the training
data and can generalize to make predictions on new, unseen data.

Key Components :

Component Description
Model A mathematical function f(x,θ) with parameters θ
Learning Estimating the best values for θ from training data
Prediction Using the learned model to predict outputs for new inputs

How It Works:
1. Choose a model structure (e.g., linear, polynomial, probabilistic)
2. Train the model by estimating parameters from the data
3. Optimize model performance using a loss/cost function
4. Evaluate using validation data
5. Predict on new inputs using the trained model

Examples of Model-Based Learning :

Algorithm Model Type Notes

Linear Regression Linear model Predicts continuous output
Logistic Logistic (sigmoid) Predicts probabilities (binary
Regression model class)
Naive Bayes Probabilistic model Assumes feature independence
Decision Trees Rule-based model Uses tree structure to model
splits
Neural Networks Non-linear parametric Learns complex patterns
Advantages of Model-Based Learning
• Easy to interpret (especially linear models)
• Can generalize well to unseen data
• Theoretically sound (based on statistical principles)
• Flexible — supports both classification and regression

Disadvantages
• Requires correct model assumptions
• May overfit or underfit if model is too complex or too simple
• Sensitive to outliers and noise

Model-Based vs Instance-Based Learning

Feature Model-Based Learning Instance-Based Learning

Learning Builds abstract model Memorizes training
Approach instances
Examples Linear Regression, SVM, k-NN, RBF networks
Naive Bayes
Generalization Uses fitted model to Compares new input to
predict saved examples
Computation Fast at prediction time Slow at prediction
(computes distances)
Training Time Longer Almost none (lazy learning)

Example (Linear Regression)

Given data:
x = hours studied ,y = exam score
Model:
y=θ0 + θ1x
Training:
Use gradient descent to learn θ0 , θ1
Prediction:
Given x=5, predict exam score.

A Deeper Learning Companion For Clil Putting Pluriliteracies-Into-practice-Coyle D, Meyer O, Staschen-Dielmann S, Eds
No ratings yet
A Deeper Learning Companion For Clil Putting Pluriliteracies-Into-practice-Coyle D, Meyer O, Staschen-Dielmann S, Eds
309 pages
ELLN Digital Course Guide
95% (43)
ELLN Digital Course Guide
22 pages
ML Ch-3 Unsupervised Learning
100% (1)
ML Ch-3 Unsupervised Learning
31 pages
UNIT-4
No ratings yet
UNIT-4
62 pages
Unit-4
No ratings yet
Unit-4
53 pages
ML Unit-2 - RTU
No ratings yet
ML Unit-2 - RTU
33 pages
Lecture Unsupervised (17!04!2024).Pptx
No ratings yet
Lecture Unsupervised (17!04!2024).Pptx
61 pages
UNIT IV,V
No ratings yet
UNIT IV,V
35 pages
Unit 5
No ratings yet
Unit 5
77 pages
Learning by Doing A Guide to Teaching and Learning Methods 1st Edition Graham Gibbs download pdf
No ratings yet
Learning by Doing A Guide to Teaching and Learning Methods 1st Edition Graham Gibbs download pdf
78 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
Cluster
No ratings yet
Cluster
42 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
9 Som
No ratings yet
9 Som
32 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Digit-Recognition-Presentation (1)
No ratings yet
Digit-Recognition-Presentation (1)
12 pages
Sans Ssip Edited Annex e
No ratings yet
Sans Ssip Edited Annex e
47 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
UNIT 4
No ratings yet
UNIT 4
32 pages
ML-Lecture-2-3-Types
No ratings yet
ML-Lecture-2-3-Types
27 pages
UNIT- 1-1
No ratings yet
UNIT- 1-1
35 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
51 pages
Access To Higher Education
No ratings yet
Access To Higher Education
3 pages
Skinner Vs Chomsky Language Acquisition Debate
No ratings yet
Skinner Vs Chomsky Language Acquisition Debate
5 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
Unit 5a
No ratings yet
Unit 5a
60 pages
1
No ratings yet
1
59 pages
23ECE205 FoDS 13 Introduction To ML
No ratings yet
23ECE205 FoDS 13 Introduction To ML
41 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
lksk ML typesToStudents
No ratings yet
lksk ML typesToStudents
18 pages
Big Data Unit III
No ratings yet
Big Data Unit III
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
15 pages
NI
No ratings yet
NI
10 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
m Learning
No ratings yet
m Learning
11 pages
Copy of How to Perform Clustering Algorithms in Machine Learning
No ratings yet
Copy of How to Perform Clustering Algorithms in Machine Learning
9 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
unsupervised-learning
No ratings yet
unsupervised-learning
13 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
M4 - FDS
No ratings yet
M4 - FDS
15 pages
DSUP_Exp5[1]
No ratings yet
DSUP_Exp5[1]
7 pages
FAM_Unit5
No ratings yet
FAM_Unit5
47 pages
MAPEH+GRADE+9+DLL Updated Converted by Abcdpdf
100% (1)
MAPEH+GRADE+9+DLL Updated Converted by Abcdpdf
65 pages
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
No ratings yet
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
8 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Resume SRS
No ratings yet
Resume SRS
2 pages
AI - W8L15
No ratings yet
AI - W8L15
44 pages
U5 unsupervised learning
No ratings yet
U5 unsupervised learning
15 pages
Machine Learning - Part -1
No ratings yet
Machine Learning - Part -1
17 pages
CP4252 ML UNIT- V
No ratings yet
CP4252 ML UNIT- V
17 pages
Unit III 1
No ratings yet
Unit III 1
22 pages
E Class Core All Subjects
No ratings yet
E Class Core All Subjects
57 pages
ML-UNSUPERVISED
No ratings yet
ML-UNSUPERVISED
35 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
20 pages
Prompt Engineering - NLP and MLFoundations
No ratings yet
Prompt Engineering - NLP and MLFoundations
10 pages
Module 6.1
No ratings yet
Module 6.1
42 pages
CALL-based Vocabulary Teaching
No ratings yet
CALL-based Vocabulary Teaching
9 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
Aaron Loeb: Preschool Teacher
No ratings yet
Aaron Loeb: Preschool Teacher
1 page
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
19 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
Collaborative Learning Study Report
No ratings yet
Collaborative Learning Study Report
6 pages
Tomita Yasuyo 201106 PHD Thesis-2 PDF
No ratings yet
Tomita Yasuyo 201106 PHD Thesis-2 PDF
244 pages
Unit 1
No ratings yet
Unit 1
52 pages
AI Oman
No ratings yet
AI Oman
9 pages
03-20-24 Arts
No ratings yet
03-20-24 Arts
2 pages
Module 5
No ratings yet
Module 5
26 pages
Elizabeth Hayes - Curriculum Vitae
No ratings yet
Elizabeth Hayes - Curriculum Vitae
2 pages
2 ML
No ratings yet
2 ML
9 pages
Group I Discrete Mathematics
No ratings yet
Group I Discrete Mathematics
4 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Training & Development: - Manisha Vijayran Nain Asstt - Prof.-HR
No ratings yet
Training & Development: - Manisha Vijayran Nain Asstt - Prof.-HR
22 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Prk. Sta. Teresita Barangay San Isidro Yamada T. Mandani BLP/Elementary/Secondary
No ratings yet
Prk. Sta. Teresita Barangay San Isidro Yamada T. Mandani BLP/Elementary/Secondary
4 pages
Unsupervised - Learning Final
No ratings yet
Unsupervised - Learning Final
20 pages
Bloa Erq Ib 1 5-14
No ratings yet
Bloa Erq Ib 1 5-14
2 pages
Machine Learning Lab Viva
100% (1)
Machine Learning Lab Viva
9 pages
DLP Q3W1 Day 1
No ratings yet
DLP Q3W1 Day 1
8 pages
UnSupervised ML
No ratings yet
UnSupervised ML
17 pages
MPC 001 Cognitive Psychology
No ratings yet
MPC 001 Cognitive Psychology
243 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Fs 2 Experiencing The Teaching-Learning Process
71% (7)
Fs 2 Experiencing The Teaching-Learning Process
29 pages
TLE7 ICT TD M8 v3
No ratings yet
TLE7 ICT TD M8 v3
19 pages
LESSON 2 Introduction To Media and Informartion Literacy
No ratings yet
LESSON 2 Introduction To Media and Informartion Literacy
24 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Instructional Design Model Addie
No ratings yet
Instructional Design Model Addie
11 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
GEE2 People and Earths Ecosystem. Syllabus
100% (1)
GEE2 People and Earths Ecosystem. Syllabus
14 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

CP4252 ML UNIT-III

Uploaded by

CP4252 ML UNIT-III

Uploaded by

UNIT – III

Unsupervised learning is a type of machine learning in which models are

Why use Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to

• Clustering Validity refers to methods for evaluating the quality of

Types of Clustering Validity Indices :

Common Internal Indices:

Davies–Bouldin Lower value = better clustering Value close to

Dunn Index High = well-separated & compact Higher is better

Within-Cluster SSE Sum of squared distances within Lower is better

Common External Indices:

Adjusted Rand Index

F-Measure Combines precision and recall

Jaccard Coefficient Overlap between true and predicted clusters

External validation is only possible if ground truth labels are available.

How to Perform Clustering Validation in Practice

Example Use in K-Means

Amazon Product recommendations

Netflix Movies & series suggestions

Spotify Music playlists

YouTube Video recommendations

Facebook/Instagram People to follow, content feed

Google Ads Ad targeting based on preferences

• In the real-world applications of machine learning, it is very common that

Let us understand the EM algorithm in detail.

Introduction to Reinforcement Learning

• Reinforcement Learning is a feedback-based Machine learning

There are two types of reinforcement learning - positive and

• Positive reinforcement learning is a recurrence of behaviour due to

• What are the situations where reinforcement learning can be used?

• Model-based learning refers to a class of machine learning approaches

Examples of Model-Based Learning :

Algorithm Model Type Notes

Model-Based vs Instance-Based Learning

Feature Model-Based Learning Instance-Based Learning

Example (Linear Regression)

You might also like