Introduction To Machine Learning-Presentation
Introduction To Machine Learning-Presentation
2
The Challenge
3
Over fitting Vs under fitting
4
Machine Learning tasks
5
Machine Learning Tasks
6
Tools and techniques
• Supervised learning
– Regression: desired output is a continuous number
– Classification: desired output is a category
• Unsupervised learning
– Clustering: Grouping data
– Dimensionality reduction: Compressing data
– Association rule learning: If X then Y
7
Intro to Clustering
9
Distance
1
0
Types of Clustering
1. Connectivity based clustering (Hierarchical clustering): based on the idea that related
objects are closer to each other. Can we then create a hierarchy of clusters/groups.
– Useful when you want flexibility in how many clusters you ultimately want. For
example, imagine grouping items on an online marketplace like Etsy or Amazon.
– In a dendrogram, the y-axis marks the distance at which the clusters merge,
while the objects are placed along the x-axis.
– Algorithms can be agglomerative (start with 1 object and aggregate them into
clusters) or divisive (start with complete data and divide into partitions).
1
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited
1
2
Types of Clustering
2. Centroid based clustering (Eg. K- Means clustering):
The objective is to find K clusters/groups. The way
these groups are defined is by creating a centroid for
each group. The centroids are like the heart of the
cluster, they “capture” the points closest to them
and add them to the cluster.
– Large K produces smaller groups and a small K produces
larger groups
– K-Means uses Eucledian distances and is the most popular
– Other variants like K-medians and K-mediods use other
distance measures
1
3
Clustering
1
6
Agglomerative
• Starts with each object as a cluster of one record each
• Sequentially merges 2 closest records by distance as a
measure of similarity to form a cluster.
• How would we measure distance between two
clusters?
1
7
Distance between clusters
• Single linkage – Minimum
distance or Nearest neighbor
• Complete linkage –
Maximum distance or
Farthest distance
• Average linkage – Average
of the distances between all
pairs
• Centroid method – combine
cluster with minimum
distance between the
centroids of the two clusters
• Ward’s method – Combine
clusters with which the
increase in within cluster
variance is to the smallest
degree
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
8
or distribution prohibited
Distance between objects
2
0
Lloyd’s algorithm
1. Assume K Centroids
2
1
Choosing the optimal K
1. Assume K Centroids
23
Market Basket Analysis (or) Association Rules
2
7
Support, Confidence and Lift
2
8
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited