L02 Fundamentals of ML
L02 Fundamentals of ML
Unit 2:
The Fundamentals of Machine Learning
Outline
• Machine Learning
Reference:
• Introduction
• Types of machine learning
• Challenges of Machine Learning
• The Machine Learning Framework
Géron Chapter 1
Problems with Traditional Programming
Traditional programming paradigm:
Deploy
Study the
Write rules Evaluate
problem
Analyze errors
3
The Machine Learning Framework
▪ Instead of handcraft rules, ML learns a model from training samples
(data)
▪ One learning algorithm for different problems
Training Deploy
Samples
Collect data
Analyze errors
4
What is Machine Learning?
5
Why use Machine Learning?
▪ Some problems are too complex to solve by using rules
• An example: image classification
6
Why use machine learning?
Training Update
Deploy
Samples data
Can be
updated
Study the Train ML Evaluate
problem algorithm Solution
Analyze errors
7
Why use machine learning?
Lots of
training samples
8
Data in Machine Learning
▪ Numerical (quantitative) data
• Discrete (e.g. 1, 3, 8, -4, …)
• Continue (e.g. 2.321, 0.2437, …)
▪ Categorical (qualitative) data
• Ordinal (e.g. low, medium, high)
• Nominal (e.g. red, blue, yellow)
9
Data in Machine Learning
▪ Labelled data vs unlabelled data
• Labelled data: data that comes with a tag (e.g. name, value)
• Unlabelled data: data that comes with no tag
10
Structured Data vs Unstructured Data
Structured Data
• Specific and stored in a
predefined format
• Suitable for traditional
machine learning
• Focused in this course
Unstructured
Data
• Collection of varied types of
data that are stored in their
native formats (e.g. text,
image, video, audio,…)
• Better result with deep
learning techniques 11
Type of Machine Learning Algorithms
12
Supervised Learning
13
Supervised Learning Tasks
Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present) 400
300
Price (RM x1000)
Yes No 200
100
0 Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car)
Housing Price Prediction
14
Supervised Learning – Classification Applications
Income classification:
• Input: numerical and categorical data
• Output: 1 (income<=50K), 0 (income>50K) – discrete
• Features: age, workclass, marital-status, race, education, area, …
15
More Classification Applications
16
Supervised Learning – Regression Applications
Predict the house price in the Boston area (regression):
• Input: numerical and categorical data
• Output: house price (in 1000usd) – continue
• Features:
− CRIM: per capita crime rate by town
− ZN: proportion of residential land
− INDUS: proportion of non-retail business acres per town
− CHAS: Charles River dummy variable (= 1 if near river; 0 otherwise)
− NOX: nitric oxides concentration (parts per 10 million)
− RM: average number of rooms per dwelling
− AGE: proportion of units built prior to 1940
− DIS: weighted distance to employment centres
− RAD: index of accessibility to radial highways
− TAX: full-value property-tax rate per $10,000
− …
17
A Simple Supervised Learning Example (1/6)
18
A Simple Supervised Learning Example (2/6)
y
(salary)
X (year)
Training set
Test set
y
(salary)
X (year)
20
A Simple Supervised Learning Example (4/6)
𝑦 = 𝜃0 + 𝜃1 𝑥
prediction error
y
(salary)
X (year)
21
A Simple Supervised Learning Example (5/6)
y
(salary)
X (year)
22
A Simple Supervised Learning Example (6/6)
𝑦 = 𝜃0 + 𝜃1 𝑥
y
(salary)
X (year)
23
Supervised Learning Algorithms
x1
x1 25
Unsupervised Learning Task: Clustering
26
Unsupervised Learning Task: Anomaly detection
▪ Identify items, events or observations which do not conform
to an expected pattern or other items in a dataset
▪ Anomalies are also referred to as outliers, novelties or
noise
outliers
28
Unsupervised Learning Tasks and Algorithms
▪ Clustering
• k-Means
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization
29
Semi-supervised learning
Features
▪ Partially labeled training data. length width weight
Label
30
Reinforcement Learning
The agent
1. observes the environment
2. Select action using policy
3. Perform action
4. Get reward or penalty
32
Differences between Machine Learning Types
Solves problems by mapping Solves problems by discovering Solves problems by trial and
input to known output underlying patterns error
Used for regression and Used for clustering and Used for control and decision
classification tasks association tasks making tasks
33
Foundation Model & Self-Supervised Learning
▪ Conventionally, a AI model is trained on task-specific data to
perform very specific task.
▪ A new paradigm in AI has emerged called foundation models.
Unlike traditional AI, foundation models learn from massive
datasets across different domains.
▪ Through self-supervised learning techniques, a foundation
model teach itself to acquire broad scope of knowledge and
understanding of the world (general intelligence).
▪ Large language models (LLM) such as OpenAI's GPT-4 and
Google's PaLM are examples of foundation models .
▪ The foundation models can then be transferred to perform any
other tasks through fine-tuning or prompting.
• GPT -> ChatGPT, GPT -> Copilot, GPT -> Duolingo
▪ Foundation models require a lot more data and computing
power to train.
34
Self-Supervised Learning
▪ Self-supervised learning is a new machine learning process
where the model trains itself to learn one part of the input
from another part of the input to obtain useful representations
and knowledge.
▪ The trained model can help with downstream learning tasks.
35
How LLMs are trained
Self-supervised learning
Source: Borealis AI 36
Challenges of Machine Learning
37
Poor-quality data
▪ Training data may contain errors, for example:
Outlier
▪ Data cleaning: Most data scientists spend a significant time to clean the
data. For example:
• Fill up missing value
• Drop a column (feature) with many missing values/errors
• Remove rows (samples) with outliers
• Fix error/format manually
38
Non-representative data
▪ Training data should be representative of the new cases that you want
to generalize to.
▪ Consider fitting a linear model to the GDP dataset with and without 7
missing countries :
(This model does not generalize well)
without missing
countries
with missing
countries
39
Irrelevant features
▪ Selected features must be relevant to the task at hand. Having
irrelevant features in your data can decrease the accuracy of
the models.
• For example, area or perimeter length are irrelevant feature for
classifying shapes like circle and rectangle.
f(area) = ?
• Features such as signature, number of corners are more suitable for
classifying shapes.
40
Feature Engineering
Deep learning learns features automatically but requires lots of training data.
41
Underfitting & Overfitting
▪ Underfitting (high bias) may happen when our model is over-simplified
or not expressive enough (high training error and high test error).
▪ Overfitting (high variance) may happen when our model is too complex
and fits too specifically to the training set, but it does not generalize
well to new data (low training error but high test error).
= test data 42
Underfitting & Overfitting
▪ Underfitting and overfitting in regression task
43
Hyperparameter Tuning
Learning Test
Hyperparameter
Algorithm
Tuning & Tune
Validation
Predict
45
Next: