MLA TAB Lecture3
MLA TAB Lecture3
• In reality … error
• Learn better and better models, such that overall model error gets smaller
and smaller … ideally, as small as possible!
Optimization
• In ML, use optimization to minimize an error function of the ML model
Error function: , where = input, = function, = output
Optimizing the error function:
- Minimizing means finding the input that results in the lowest value
- Maximizing, means finding that gives the largest
Gradient Optimization
• Gradient: direction and rate of the fastest increase of a function.
It can be calculated with partial derivatives of the function with respect
to each input variable in .
Because it has a direction, the gradient is a “vector”.
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Global Minimum
Regression Models
Linear Regression
We use (linear) regression for
numerical value prediction.
Example: How does the price of a
house (target, outcome , ) change
relate to its square footage living
(feature, attribute )?
LogLoss
LogLoss
• : true class = 1, = 0.8 p=0.3
p=0.8
Ensemble Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data 1
Weak Model 1
Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data 1
Weak Model 1
Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data 1 Data 2
Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data 1 Data 2
Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
…
Data 1 Data 2
Ensemble …
Prediction
Gradient Boosting Machines (GBM)
Gradient Boosting Machines (GBM): Boosting trees
• Train a weak model on the given data, and make predictions with it
• Iteratively create a new model to learn to overcome prediction errors of the
previous model (use previous prediction error as new target)
Features Features Features Features
Target 2- Prediction 2
Target 1- Prediction 1
Target 3- Prediction 3
Target 1 Target 2 Target 3 … Target N
(sum)
(weights)
Input
Activation function
(sum)
(weights)
Input
Activation function
(sum)
(weights)
Input
Activation function
(sum)
(weights)
Input
(6 weights)
Input Much better!
Neural Network/Multilayer Perceptron
Artificial Neuron: Captures mostly
linear interactions in the data
Output Layer
Question: Can we use a similar
(3 weights)
approach to capture non-linear
Hidden Layer
interactions in the data?
MultiLayer Network: Two layers (one hidden layer, output layer), with five
hidden neurons in the hidden layer, and one output neuron.
MultiLayer Network: Two layers (one hidden layer, output layer), with five MultiLayer Network: Four layers (three hidden layer, output layer), with five-three-
hidden neurons in the hidden layer, and three output neurons. two hidden neurons in the hidden layers, and two output neurons.
More details
Build and Train a Neural Network
𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:
Input Layer
Activation Functions
• “How to get from linear weighted sum input to non-linear output?”
Name Plot Function Description
1
The most common activation
Logistic (sigmoid) function. Squashes input to
0 x (0,1).
Hyperbolic tangent 1
Squashes input to (-1, 1).
(tanh) 0 x
-1
Popular activation function.
Rectified Linear Unit Anything less than 0, results
(ReLU) in zero activation.
0 x
Derivatives of these functions are also important (gradient descent).
Output Activations/Functions
• “How to output/predict a result”
Problem Description Name Function
𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:
Input Layer
Forward Pass
(𝒐𝒖𝒕 )
𝒐 Output Layer
(𝒊𝒏)
𝒐
0.4 0.45
0 . 61
Output Layer
0.44
0.4 0.45
0.15 0.4
For binary classification, we would
0.5 0.1 Input Layer classify this (0.5, 0.1) input data point, as
class 1 (as 0.61 > 0.5).
Cost Functions
• “How to compare the outputs with the truth?”
Problem Name Function Notes
: Cost
Gradient with respect to
Dropout
• Regularization technique to prevent overfitting.
• Randomly removes some nodes with a fixed probability during the
training.
More details
Why Neural Networks?
• Automatically extract useful features
from input data.
• In recent years, deep learning has
achieved state-of-the art results in
many machine learning areas.
MLA-TAB-Lecture3-MXNet.ipynb
Putting it all together: Lecture 3
• In this notebook, we continue to work with our review dataset to
predict the target field
• The notebook covers the following tasks:
Exploratory Data Analysis
Splitting dataset into training and test sets
Data Balancing, categoricals encoding, text vectorization
Train a Neural Network
Check the performance metrics on test set
MLA-TAB-Lecture3-Neural-Networks.ipynb
AutoML
AutoML
AutoML helps automating some of the tasks related to ML model
development and training such as:
• Preprocessing and cleaning data
• Feature selection
• ML model selection
• Hyper-parameter optimization
Auto AutoML
• Open source AutoML Toolkit (AMLT) created by Amazon AI.
• Easy to Use – Built-in Application
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.
MLA-TAB-Lecture3-AutoGluon.ipynb
THANK YOU