0% found this document useful (0 votes)

12 views

Introduction To Machine Learning-Presentation

This is an introduction to Machine Learning and its importance in our everyday live.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Introduction To Machine Learning-Presentation

This is an introduction to Machine Learning and its importance in our everyday live.

Uploaded by

mofolukeakintayojo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Learning from Data

• Can we learn about the world around us using data?

• Model building from data
– Take data as input
– Find patterns in the data
– Summarize the pattern in a mathematically precise way
• Machine learning automates this model building.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

2
The Challenge

• Data unfortunately contains noise. If not, machine learning

would be trivial!
• Think of Data = Information + Noise
• The challenge is to identify the information content and
distill away the noise.
• To help do this, machine learning uses a train and test
approach.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

3
Over fitting Vs under fitting

• If the model we finish with ends up

– modeling the noise as well, we call it “over fitting” - bad for
prediction!
– not modeling all the information, we call it “under fitting” - bad for
prediction!
• The hope is that the model that does the best on testing
data manages to capture/model all the information but leave
out all the noise.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

4
Machine Learning tasks

1. Supervised learning: Building a mathematical model using

data that contains both the inputs and the desired outputs
(ground truth).
– Examples:
• Determining if an image has a horse. The data would include images with
and without the horse (the input), and for each image we would have a
label (the output) indicating if there is a horse in that image.
• Determining is a client might default on a loan
• Determining if a call center employee is likely to quit
– Since we have desired outputs, model performance can be
evaluated by comparisons.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

5
Machine Learning Tasks

2. Unsupervised learning: Building a mathematical model

using data that contains only inputs and no desired outputs.
– Used to find structure in the data, like grouping or clustering of
data points. To discover patterns and group the inputs into
categories.
– Example: an advertising platform segments the population into
smaller groups with similar demographics and purchasing habits.
Helping advertisers reach their target market with relevant ads.
– Since no labels are provided, there is no specific way to compare
model performance in most unsupervised learning methods.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

6
Tools and techniques

• Supervised learning
– Regression: desired output is a continuous number
– Classification: desired output is a category
• Unsupervised learning
– Clustering: Grouping data
– Dimensionality reduction: Compressing data
– Association rule learning: If X then Y

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

7
Intro to Clustering

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Clustering

• Clustering is an Unsupervised Learning Technique

• A Cluster: collection of objects that are similar
• Objective is to group similar data points into a group
– Segmenting customers into similar groups
– Automatically organizing similar files/emails into folders
• Simplifies data by reducing many data points into a few
clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

9
Distance

• Do define “similarity” you need a measure of distance

• Examples of common distance measures
– Manhattan Distance
– Eucledian Distance
– Chebyshev Distance

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
0
Types of Clustering

1. Connectivity based clustering (Hierarchical clustering): based on the idea that related
objects are closer to each other. Can we then create a hierarchy of clusters/groups.

– Useful when you want flexibility in how many clusters you ultimately want. For
example, imagine grouping items on an online marketplace like Etsy or Amazon.

– In terms of outputs from the algorithm, in addition to cluster assignments you

also build a nice tree (dendrogram) that tells you about the hierarchies between
the clusters. You can then pick the number of clusters you want from this tree.

– In a dendrogram, the y-axis marks the distance at which the clusters merge,
while the objects are placed along the x-axis.

– Algorithms can be agglomerative (start with 1 object and aggregate them into
clusters) or divisive (start with complete data and divide into partitions).

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use
or distribution prohibited

1
2
Types of Clustering
2. Centroid based clustering (Eg. K- Means clustering):
The objective is to find K clusters/groups. The way
these groups are defined is by creating a centroid for
each group. The centroids are like the heart of the
cluster, they “capture” the points closest to them
and add them to the cluster.
– Large K produces smaller groups and a small K produces
larger groups
– K-Means uses Eucledian distances and is the most popular
– Other variants like K-medians and K-mediods use other
distance measures

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
3
Clustering

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited
Data we will work with
– Customer Spend Data
• AVG_Mthly_Spend: The average monthly amount spent by customer
• No_of_Visits: The number of times a customer visited in a month
• Item Counts: Count of Apparel, Fruits and Vegetable, Staple Items purchased

• Can we cluster similar customers together?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
5
or distribution prohibited
Connectivity Based: Hierarchical Clustering

• Hierarchical Clustering techniques create clusters in a

hierarchical tree like structure
• Any type of distance measure can be used as a
measure of similarity
• Cluster tree like output is called Dendogram
• Techniques either start with individual objects and
sequentially combine them (Agglomerative ), or start
from one cluster of all objects and sequentially divide
them (Divisive)

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
6
Agglomerative
• Starts with each object as a cluster of one record each
• Sequentially merges 2 closest records by distance as a
measure of similarity to form a cluster.
• How would we measure distance between two
clusters?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

or distribution prohibited

1
7
Distance between clusters
• Single linkage – Minimum
distance or Nearest neighbor
• Complete linkage –
Maximum distance or
Farthest distance
• Average linkage – Average
of the distances between all
pairs
• Centroid method – combine
cluster with minimum
distance between the
centroids of the two clusters
• Ward’s method – Combine
clusters with which the
increase in within cluster
variance is to the smallest
degree
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1
8
or distribution prohibited
Distance between objects

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1

9
or distribution prohibited
Centroid based: K-Means Clustering

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

minimize the within-cluster sum of squares (i.e. variance).

• Computationally less expensive compared to hierarchical

techniques.

• Have to pre-define K, the no of clusters

or distribution prohibited

2
0
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

or distribution prohibited

2
1
Choosing the optimal K

• Usually subjective, based on striking a good balance between

compression and accuracy

• The “elbow” method is commonly used

or distribution prohibited
2
2
Lloyd’s algorithm

1. Assume K Centroids

2. Compute Squared Eucledian distance of each objects with

these K centroids. Assign each to the closest centroid forming
clusters.

3. Compute the new centroid (mean) of each cluster based on

the objects assigned to each clusters.

4. Repeat 2 and 3 till convergence: usually defined as the point

at which there is no movement of objects between clusters

or distribution prohibited

23
Market Basket Analysis (or) Association Rules

or distribution prohibited
Market Baskets
– Transactions/Baskets

• Is it true that {Breakfast Cereals}->{Bread}

• How sure are you?

• Other patterns like, If {A,B,…} then {C,…}?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
5
or distribution prohibited
Association Rules Learning
• Rules-bases unsupervised learning:
– If X then Y. Written as X -> Y.
– X and Y can be sets of multiple items
• Market basket analysis is the term usually used to
when the context is the transactions in retail/e-
commerce.
• The rule X -> Y, indicating that if you have all items in X
then you are more likely to have items in Y as well. Of
course each rule might or might not be true in a given
data set and hence has to be appropriate qualified.
• Other Applications
– web usage mining
– intrusion detection, network traffic analysis
– bioinformatics, protein sequencing
– medical diagnosis
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 2
6
or distribution prohibited
How good is a given Rule?
• {Breakfast Cereals}->{Bread}?
• If you think this is true
– Does it apply to a large number of transactions?
– Is it often correct?
– Are you sure it is not just a coincidence?
• Lets say for example, transactions looked like this
– Total: 415
– BC: 54
– Bread: 90
– Bread and BC: 44

or distribution prohibited

2
7
Support, Confidence and Lift

• Results of an actual analysis would look like this:

Islam, Science, and The Challenge of History
No ratings yet
Islam, Science, and The Challenge of History
254 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Program Proposal On Curriculum Review and Contextualization
100% (2)
Program Proposal On Curriculum Review and Contextualization
3 pages
Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Slides - Intro To Clustering
No ratings yet
Slides - Intro To Clustering
10 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
22 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Unit 4
No ratings yet
Unit 4
74 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Clustering
No ratings yet
Clustering
75 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Clustering
No ratings yet
Clustering
20 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Unit IV
No ratings yet
Unit IV
96 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Clustering
No ratings yet
Clustering
75 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
Clustering
No ratings yet
Clustering
27 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
UNIT IV
No ratings yet
UNIT IV
19 pages
unit-4 ML
No ratings yet
unit-4 ML
16 pages
Clustering
No ratings yet
Clustering
39 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering new
No ratings yet
Clustering new
6 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Unit 6
No ratings yet
Unit 6
22 pages
Unit-4
No ratings yet
Unit-4
53 pages
GROKKING ALGORITHMS: Simple and Effective Methods to Grokking Deep Learning and Machine Learning
From Everand
GROKKING ALGORITHMS: Simple and Effective Methods to Grokking Deep Learning and Machine Learning
Eric Schmidt
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Get (Ebook PDF) Service Management: Operations, Strategy, Information Technology 9th Edition Free All Chapters
75% (4)
Get (Ebook PDF) Service Management: Operations, Strategy, Information Technology 9th Edition Free All Chapters
51 pages
Henrietta Lacks Hyperdoc
No ratings yet
Henrietta Lacks Hyperdoc
4 pages
P 9 Laura Walker
No ratings yet
P 9 Laura Walker
12 pages
Pipeline Stress Analysis With Caesar II
No ratings yet
Pipeline Stress Analysis With Caesar II
16 pages
A Study On Catholic Religious Practices
No ratings yet
A Study On Catholic Religious Practices
16 pages
Pge Sample Utility Bill - 2
No ratings yet
Pge Sample Utility Bill - 2
8 pages
Difference Between Unearthed Cable
No ratings yet
Difference Between Unearthed Cable
14 pages
Relationship of Poisson and Exponential Distributions: FX X X T X e
No ratings yet
Relationship of Poisson and Exponential Distributions: FX X X T X e
1 page
Microbiology and Parasitology
No ratings yet
Microbiology and Parasitology
26 pages
Symbols Of Freemasonry Daniel Beresniak instant download
No ratings yet
Symbols Of Freemasonry Daniel Beresniak instant download
40 pages
Panzer 1 Soviet TO&E Sample
No ratings yet
Panzer 1 Soviet TO&E Sample
3 pages
Review Stoichiometry Chemistry Practice Quiz and Answers
No ratings yet
Review Stoichiometry Chemistry Practice Quiz and Answers
3 pages
Summary of VAT
No ratings yet
Summary of VAT
5 pages
Fault Seal Analysis
No ratings yet
Fault Seal Analysis
8 pages
Amy Zwicker@colorado Edu
No ratings yet
Amy Zwicker@colorado Edu
1 page
Material Control Procedure - Template
No ratings yet
Material Control Procedure - Template
5 pages
Audio Captcha: Existing Solutions Assessment and A New Implementation For Voip Telephony
No ratings yet
Audio Captcha: Existing Solutions Assessment and A New Implementation For Voip Telephony
11 pages
Paranthropology: Journal of Anthropological Approaches To The Paranormal (Vol. 6 No. 2)
100% (2)
Paranthropology: Journal of Anthropological Approaches To The Paranormal (Vol. 6 No. 2)
88 pages
ReleaseNote FileList of FX505DD 19H2 64 V1.02
No ratings yet
ReleaseNote FileList of FX505DD 19H2 64 V1.02
6 pages
Volvo s80 Owners Manual 2004
No ratings yet
Volvo s80 Owners Manual 2004
216 pages
The Bharat Microfinace Report 2017 Final PDF
100% (1)
The Bharat Microfinace Report 2017 Final PDF
139 pages
Conflict Management
100% (2)
Conflict Management
22 pages
TR Articulated Off-Highway Dump Truck
No ratings yet
TR Articulated Off-Highway Dump Truck
75 pages
Ledapol Basic Lingerie and Clothing Collection 23
100% (1)
Ledapol Basic Lingerie and Clothing Collection 23
108 pages
24049644
No ratings yet
24049644
50 pages
Math 285
No ratings yet
Math 285
8 pages
Lost and Found Accessible Programme Copy - Aea26b5
No ratings yet
Lost and Found Accessible Programme Copy - Aea26b5
29 pages
MoE ESG Structural and Geotechnical Guidelines
No ratings yet
MoE ESG Structural and Geotechnical Guidelines
40 pages

Introduction To Machine Learning-Presentation

Uploaded by

Introduction To Machine Learning-Presentation

Uploaded by

Introduction to Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Can we learn about the world around us using data?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Data unfortunately contains noise. If not, machine learning

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• If the model we finish with ends up

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

1. Supervised learning: Building a mathematical model using

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Unsupervised learning: Building a mathematical model

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Clustering is an Unsupervised Learning Technique

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Do define “similarity” you need a measure of distance

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

– In terms of outputs from the algorithm, in addition to cluster assignments you

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Can we cluster similar customers together?

• Hierarchical Clustering techniques create clusters in a

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use 1

• K-Means is probably the most used clustering technique

• Aims to partition the n observations into k clusters so as to

• Computationally less expensive compared to hierarchical

• Have to pre-define K, the no of clusters

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Usually subjective, based on striking a good balance between

• The “elbow” method is commonly used

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

2. Compute Squared Eucledian distance of each objects with

3. Compute the new centroid (mean) of each cluster based on

4. Repeat 2 and 3 till convergence: usually defined as the point

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Is it true that {Breakfast Cereals}->{Bread}

• How sure are you?

• Other patterns like, If {A,B,…} then {C,…}?

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use

• Results of an actual analysis would look like this:

You might also like