04 Machine Learning Overview
04 Machine Learning Overview
Learning?
Machine Learning
● Section Overview:
○ What is Machine Learning?
○ What is Deep Learning?
○ Difference between Supervised and
Unsupervised Learning
○ Supervised Learning Process
○ Evaluating performance
○ Overfitting
What is Machine Learning?
● Machine Learning
○ Automated analytical models.
● Neural Networks
○ A type of machine learning architecture
modeled after biological neurons.
● Deep Learning
○ A neural network with more than one
hidden layer.
Machine Learning
Test
Data
Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
Machine Learning Process
Data
Acquisition
Machine Learning Process
Data Data
Acquisition Cleaning
Machine Learning Process
Test
Data
Training
Data Data Data
Acquisition Cleaning
Machine Learning Process
Test
Data
Model
Data Data Training &
Acquisition Cleaning Building
Machine Learning Process
Test
Data
Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building
Machine Learning Process
Test
Data
Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building
Adjust
Model
Parameters
Machine Learning Process
Test
Data
Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
Supervised Learning
● Overfitting
○ The model fits too much to the noise from
the data.
○ This often results in low error on training
sets but high error on test/validation sets.
Machine Learning
Data
X
Machine Learning
Good Model
X
Machine Learning
● Overfitting
X
Machine Learning
● Overfitting
X
Machine Learning
● Overfitting
X
Machine Learning
● Overfitting
X
Machine Learning
● Underfitting
○ Model does not capture the underlying
trend of the data and does not fit the data
well enough.
○ Low variance but high bias.
○ Underfitting is often a result of an
excessively simple model.
Machine Learning
Data
X
Machine Learning
Underfitting
X
Machine Learning
● Good Model
Error
Training
Time
Machine Learning
● Good Model
Error
Epochs
Machine Learning
● Bad Model
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
Error
Epochs
Machine Learning
TRAINED
MODEL
Model Evaluation
TRAINED
Test Image
from X_test
MODEL
Model Evaluation
TRAINED
Test Image
from X_test
MODEL
DOG
Correct Label
from y_test
Model Evaluation
TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label
from y_test
Model Evaluation
TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == DOG ?
from y_test
Compare Prediction to Correct Label
Model Evaluation
TRAINED
Test Image CAT
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == CAT ?
from y_test
Compare Prediction to Correct Label
Model Evaluation
● Accuracy
○ Accuracy in classification problems is the
number of correct predictions made by
the model divided by the total number
of predictions.
Model Evaluation
● Accuracy
○ For example, if the X_test set was 100
images and our model correctly
predicted 80 images, then we have
80/100.
○ 0.8 or 80% accuracy.
Model Evaluation
● Accuracy
○ Accuracy is useful when target classes
are well balanced
○ In our example, we would have roughly
the same amount of cat images as we
have dog images.
Model Evaluation
● Accuracy
○ Accuracy is not a good choice with
unbalanced classes!
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that
always predicted dog we would get 99%
accuracy!
Model Evaluation
● Accuracy
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that
always predicted dog we would get 99%
accuracy!
○ In this situation we’ll want to understand
recall and precision
Model Evaluation
● Recall
○ Ability of a model to find all the relevant
cases within a dataset.
○ The precise definition of recall is the
number of true positives divided by the
number of true positives plus the
number of false negatives.
Model Evaluation
● Precision
○ Ability of a classification model to
identify only the relevant data points.
○ Precision is defined as the number of
true positives divided by the number of
true positives plus the number of false
positives.
Model Evaluation
● F1-Score
○ In cases where we want to find an
optimal blend of precision and recall we
can combine the two metrics using what
is called the F1 score.
Model Evaluation
● F1-Score
○ The F1 score is the harmonic mean of
precision and recall taking both metrics
into account in the following equation:
Model Evaluation
● F1-Score
○ We use the harmonic mean instead of a
simple average because it punishes
extreme values.
○ A classifier with a precision of 1.0 and a
recall of 0.0 has a simple average of 0.5
but an F1 score of 0.
Model Evaluation
Software Research
Domain
Knowledge
Confusion Matrix
Software Research
Domain
Knowledge
Model Evaluation
medical doctors)
Domain
Knowledge
Evaluating
Performance
REGRESSION
Evaluating Regression
● Clustering
○ Grouping together unlabeled data
points into categories/clusters
○ Data points are assigned to a cluster
based on similarity
Machine Learning
● Anomaly Detection
○ Attempts to detect outliers in a
dataset
○ For example, fraudulent transactions
on a credit card.
Machine Learning
● Dimensionality Reduction
○ Data processing techniques that
reduces the number of features in a
data set, either for compression, or to
better understand underlying trends
within a data set.
Machine Learning
● Unsupervised Learning
○ It’s important to note, these are
situations where we don’t have the
correct answer for historical data!
○ Which means evaluation is much
harder and more nuanced!
Unsupervised Process
Test
Data
Model
Data Data Training & Transformation Model
Acquisition Cleaning Building Deployment
Machine Learning