CP4252 ML UNIT-III
CP4252 ML UNIT-III
1].Unsupervised Learning
Unsupervised Algorithm:
1. K-means clustering
2. KNN (k-nearest neighbours)
3. Hierarchal clustering
4. Anomaly detection
5. Neural Networks
6. Principle Component Analysis
7. Apriori algorithm
Working of Unsupervised Learning Algorithm
• We have taken an unlabelled input data, which means it is not categorized
and corresponding outputs are also not given.
• Now, this unlabelled input data is fed to the machine learning model in
order to train it.
• Firstly, it will interpret the raw data to find the hidden patterns from the
data and then will apply suitable algorithms such as k-means clustering,
Decision tree, etc.
• Once it applies the suitable algorithm, the algorithm divides the data
objects into groups according to the similarities and difference between
the objects.
Type 1: Clustering :
• Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no
similarities with the objects of another group.
• Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those
commonalities.
Type 2: Association :
• An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database.
• It determines the set of items that occurs together in the dataset.
• Association rule makes marketing strategy more effective.
Example:
1. Market Basket Analysis:
2. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam)
3. item.
4. Statistical data analysis
5. Social network analysis
6. Image segmentation
7. Anomaly detection
Advantages & Disadvantages :
Advantages of Unsupervised Learning:
Disadvantages:
• Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate
as input data is not labelled, and algorithms do not know the exact
output in advance.
2].Clustering validity
1. Internal Validation
Measures how well the clusters are formed based on:
• Compactness: how close data points are within a cluster
• Separation: how far apart clusters are from each other
Where:
• a(i) : average intra-cluster distance
• b(i) : average nearest-cluster distance
• s(i) ∈ [−1,1] ; close to 1 means better fit
2. External Validation
Compares clustering against known ground truth (if available).
Metric Description
Measures similarity between predicted and actual
Rand Index (RI)
clusterings
To predict user preferences and suggest items that the user is likely to interact
with, purchase, or enjoy, such as:
• Products (Amazon)
• Movies (Netflix)
• Music (Spotify)
• Friends (Facebook)
• Ads (Google)
Core Idea
Recommendation systems analyze past behavior (likes, views, purchases) and
use data mining + machine learning to:
• Recommend similar items (item-based)
• Recommend based on similar users (user-based)
• Predict user ratings or preferences
Types of Recommendation Systems
1. Content-Based Filtering
• Recommends items similar to those the user liked before
• Based on item features and user profiles
• Uses:
o Keywords
o Categories
o Tags
Example:
If you watched Inception, it recommends Interstellar (same genre/director).
2. Collaborative Filtering
• Based on user-item interaction matrix
• Two types:
o User-based: Find users with similar tastes
o Item-based: Find items similar to what user liked
Example:
"Users who watched Avengers also watched Iron Man"
3. Hybrid Methods
• Combine content-based and collaborative filtering
• Improves accuracy, handles cold-start problems better
Used in:
Netflix, Amazon, YouTube
4. Model-Based Filtering
• Use machine learning models to predict ratings or preferences
• Techniques:
o Matrix factorization (e.g., SVD)
o Deep learning (autoencoders, neural CF)
o Clustering models
Real-World Applications :
Platform Recommendation
Usage of EM algorithm :
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden
Markov Model (HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm :
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms
of implementation.
• Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm :
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical
optimization requires only forward probability).
5].Reinforcement Learning
Key Components :
Component Description
Model A mathematical function f(x,θ) with parameters θ
Learning Estimating the best values for θ from training data
Prediction Using the learned model to predict outputs for new inputs
How It Works:
1. Choose a model structure (e.g., linear, polynomial, probabilistic)
2. Train the model by estimating parameters from the data
3. Optimize model performance using a loss/cost function
4. Evaluate using validation data
5. Predict on new inputs using the trained model
Disadvantages
• Requires correct model assumptions
• May overfit or underfit if model is too complex or too simple
• Sensitive to outliers and noise