0% found this document useful (0 votes)

23 views

MLA TAB Lecture3

This document provides an overview of key concepts from Lecture 3 of a Machine Learning accelerator course, including optimization, gradient descent, linear and logistic regression, regularization, and boosting. Optimization techniques like gradient descent are used to minimize error functions and find optimal parameters for machine learning models. Regularization helps address overfitting by adding a penalty for model complexity. Boosting builds multiple weak models sequentially to boost overall performance by reducing errors from previous models.

Uploaded by

Lori Guerra

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

MLA TAB Lecture3

Uploaded by

Lori Guerra

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 70

MACHINE LEARNING ACCELERATOR

Tabular Data – Lecture 3

Course Overview
Lecture 1 Lecture 2 Lecture 3

• Introduction to ML • Feature Engineering • Optimization

• Model Evaluation • Tree-based Models • Regression Models

 Train-Validation-Test  Decision Tree • Regularization

 Overfitting  Random Forest • Boosting

• Exploratory Data Analysis • Hyperparameter Tuning • Neural Networks

• K Nearest Neighbors (KNN) • AWS AI/ML Services • AutoML

Optimization
Optimization in Machine Learning
• We build and train ML models, hoping for:

ML Model Features ML Model (Rules) ML Model Target

• In reality … error

ML Model Features ML Model (Rules) ML Model Prediction

• Learn better and better models, such that overall model error gets smaller
and smaller … ideally, as small as possible!
Optimization
• In ML, use optimization to minimize an error function of the ML model
 Error function: , where = input, = function, = output
 Optimizing the error function:
- Minimizing means finding the input that results in the lowest value
- Maximizing, means finding that gives the largest
Gradient Optimization
• Gradient: direction and rate of the fastest increase of a function.
 It can be calculated with partial derivatives of the function with respect
to each input variable in .
 Because it has a direction, the gradient is a “vector”.
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left

• As we go towards to the bottom part of the

function, gradient gets smaller
Gradient Example
, with gradient vector
• Sign of the gradient shows direction the
function increases: + right and – left

• As we go towards to the bottom part of the

function, gradient gets smaller and becomes zero
(i.e., function can no longer change, can no longer
decrease – it reached the min!)
Gradient Descent Method
• Gradient Descent method uses gradients to find the minimum of a
function iteratively.
• Taking steps (proportional to the gradient size) towards the minimum, in
the opposite direction of the gradient.

• Gradient Descent Algorithm:

 Start at an initial point
 Update:
Gradient Descent Method

large Initial Values large

Global Minimum
Regression Models
Linear Regression
We use (linear) regression for
numerical value prediction.
Example: How does the price of a
house (target, outcome , ) change
relate to its square footage living
(feature, attribute )?

* Data source: King County, WA Housing Info. For ,

Multiple Linear Regression
Example: How does the price of a house (target, outcome ) change relate
to its square footage living (feature ), its number of bedrooms (feature
), its zip code ( ),…? That is, using multiple features…

Using the multiple linear regression equation:

• Assuming all other variables stay the same, an increase of by 1 foot
square, increases the price by
• Assuming all other variables stay the same, an increase of by 1
bedroom, increases the price by , and so on …
Linear Regression
Regression line , is
defined by: (intercept), (slope).
The vertical offset for each data point
from the line is the error between the
true label) and (the prediction based on
).
Best “line” (best , ) minimizes the
sum of squared errors (SSE):
Fitting a Model: Gradient Descent
• For a Linear Regression model:
,

with features , and parameters/weights

• Minimize the Mean Squared Error cost function:
: index; : number of samples
: output; : model prediction

• Iteratively update parameters/weights with Gradient Descent:

From Regression to Classification
Linear regression was useful when predicting continuous values

Can we use a similar approach to solve classification problems?

The most simple classification problem is a binary classification, where {0,
1}.
Examples:
Email: Spam or Not Spam
Text: Positive or Negative product review
Image: Cat or Not Cat
Logistic Regression
Idea: We can apply the Sigmoid function to
• Sigmoid (Logistic) function

“squishes” values to the 0 –1 range.

• Can define a “Decision boundary” at 0.5
- if 0.5, round down (class 0)
- if 0.5, round up (class 1)
• Our regression equation becomes:
Log-Loss (Binary Cross-Entropy)
Log-Loss: A numeric value that measures the performance of a binary
classifier when model output is a probability between 0 and 1:

: true class {0, 1}, = : probability of class, and : logarithm

• As the output of Logistic Regression is between 0 and 1, Log-Loss is a

suitable cost function for the Logistic Regression.
• To improve Logistic Regression model learning from data, minimize Log-
Loss.
Log-Loss (Binary Cross-Entropy)
Example: Let’s calculate the Log-Loss

for the following scenarios:

• : true class = 1, = 0.3

LogLoss
LogLoss
• : true class = 1, = 0.8 p=0.3
p=0.8

Better prediction gives smaller Log-Loss predicted probability

Fitting a Model: Gradient Descent
• For a Logistic Regression model:
,

with features , and parameters/weights

• Minimize the LogLoss cost function:
: index; : # samples
: output
: model prediction

• Iteratively update parameters/weights with Gradient Descent:

Regularization
Regularization
Underfitting: Model too simple, fewer features,
smaller weights, weak learning.
Overfitting: Model too complex, too many features,
larger weights, weak generalization.
‘Good Fit’ Model: Compromise between fit and
complexity (drop features, reduce weights).

Regularization does both: penalizes large weights,

sometimes reduced all the way to zero!
Regularization
• Tune model complexity by adding a penalty score for complexity to the
cost function (think error function, minimizing towards best fit!):

• Calibrate regularization strength by using a regularizer parameter,

• Standard regularization types:
 L2 regularization (Ridge): (L2: popular choice)
 L1 regularization (LASSO): (L1: useful as feature
selection, since most
 Both L2 and L1 (ElasticNet)
weights shrink to 0 -
sparsity)
• Note: Important to scale features first!
Regression in sklearn
LinearRegression: sklearn Linear Regression (and regularization)
LinearRegression()
Ridge(alpha=1.0), RidgeCV(alpha=1.0, cv=5)
Lasso(alpha=1.0), LassoCV(alpha=1.0, cv=5)
ElasticNet(alpha=1.0, l1_ratio=0.5), ElasticNetCV(cv=5)

LogisticRegression: sklearn Logistic Regression (and regularization)

LogisticRegression(penalty='l2', C=1.0, l1_ratio=None)
LogisticRegressionCV(penalty='l2', C=1.0, l1_ratio=None, cv=5)
Ensemble Methods: Boosting
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.
Data

Weak Model Weak Model Weak Model …

Prediction 1 Prediction 2 Prediction 2

Ensemble Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data Data Data

Weak Model Weak Model Weak Model …

Prediction 1 Prediction 2 Prediction 3
far from target far from target far from target
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1

Weak Model 1

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1

Weak Model 1

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1 Data 2

Weak Model 1 Weak Model 2

Prediction large error

far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

Data 1 Data 2

Weak Model 1 Weak Model 2

Prediction large error Prediction still large error

far from target far from target

Ensemble
Prediction
Boosting
Boosting method: build multiple weak models sequentially, each
subsequent model attempting to boost performance overall, by
overcoming/reducing the errors of the previous model.

…
Data 1 Data 2

Weak Model 1 Weak Model 2 …

Prediction large error Prediction still large error …

far from target far from target

Ensemble …
Prediction
Gradient Boosting Machines (GBM)
Gradient Boosting Machines (GBM): Boosting trees
• Train a weak model on the given data, and make predictions with it
• Iteratively create a new model to learn to overcome prediction errors of the
previous model (use previous prediction error as new target)
Features Features Features Features

Target 2- Prediction 2
Target 1- Prediction 1

Target 3- Prediction 3
Target 1 Target 2 Target 3 … Target N

Tree 1 Tree 2 Tree 3 … Tree N

Prediction 1 Prediction 2 Prediction 3 … Prediction N

Prediction 1 + Prediction 2 + Prediction 3 + … + Prediction N

Gradient Boosting in Python
• sklearn GBM algorithms:
 GradientBoostingClassifier (Regressor)
 HistGradientBoostingClassifier (Regressor) – faster, experimental
• Additional third-party libraries provide computationally efficient alternate
GBM implementations, with better results in practice:
 XGBoost (Extreme Gradient Boosting): efficient compute, memory
 LightGBM: much faster
 CatBoost (Category Gradient Boosting): fast, supports categoricals
Gradient Boosting in sklearn
GradientBoostingClassifier: sklearn’s Gradient Boosting classifier
(there is also a Regressor version) - .fit(), .predict()

GradientBoostingClassifier(n_estimators=100, learning_rate = 0.1,

min_samples_split=2, min_samples_leaf=1, max_depth=3)

The full interface is larger.

Notice the mix of boosting-specific and tree-specific parameters.
Gradient Boosting in sklearn
HistGradientBoostingClassifier: sklearn’s Light GBM classifier (there
is also a Regressor version), in experimental stage - .fit(), .predict()

from sklearn.experimental import enable_hist_gradient_boosting

HistGradientBoostingClassifier(max_iter=100, learning_rate = 0.1,
max_leaf_nodes=31, min_samples_leaf=20, max_depth=None)

The full interface is larger.

Neural Networks
Looking back at Regression Models
Output Linear Regression*: Given { },
predict :

(sum)
(weights)
Input

* Basically assuming that the output depends only on

first order interactions of the inputs
Looking back at Regression Models
Output Linear Regression*: Given { },
predict :

where is the linear function:

Activation function
(sum)
(weights)
Input

* Linear activation function

Looking back at Regression Models
Output Logistic Regression*: Given { },
predict , where ::

where is the logistic function:

Activation function
(sum)
(weights)
Input

* Non-linear activation function / binary classifier

Perceptron (Rosenblatt, 1957)
Output Perceptron*: Given { }, predict ,
where :

where is the step function:

Activation function
(sum)
(weights)
Input

* Non-linear activation function / binary classifier

Artificial Neuron
Output Artificial Neuron*: Given { },
predict :

where is a nonlinear activation

function (sigmoid, tanh, ReLU, …)
Activation function
(sum)
(weights)
Input

* Similar to how neurons in the brain function

Artificial Neuron
Output
Artificial Neuron: Captures mostly
linear interactions in the data.

Question: Can we use a similar

approach to capture non-linear
Activation function
interactions in the data?
(sum)
(weights)
Input Not a very good classifier
…
Neural Network/Multilayer Perceptron
Output
Artificial Neuron: Captures mostly
linear interactions in the data.

Question: Can we use a similar

(3 weights)
approach to capture non-linear
interactions in the data?

(6 weights)
Input Much better!
Neural Network/Multilayer Perceptron
Artificial Neuron: Captures mostly
linear interactions in the data
Output Layer
Question: Can we use a similar
(3 weights)
approach to capture non-linear
Hidden Layer
interactions in the data?

MultiLayer Network: Two layers (one hidden layer, output layer), with five
hidden neurons in the hidden layer, and one output neuron.

MultiLayer Network: Two layers (one hidden layer, output layer), with five MultiLayer Network: Four layers (three hidden layer, output layer), with five-three-
hidden neurons in the hidden layer, and three output neurons. two hidden neurons in the hidden layers, and two output neurons.

More details
Build and Train a Neural Network

𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:

• (no bias, for simplicity)

• 2 inputs: = 0.5 and = 0.1
(𝒐𝒖𝒕 )
𝒉𝟏 𝒉𝟐
(𝒐𝒖𝒕 )
Hidden Layer • 1 hidden layer with 2 neurons
(𝒊𝒏) (𝒊𝒏)
𝒉𝟏 𝒉𝟐 • 1 output neuron in the output layer

Input Layer
Activation Functions
• “How to get from linear weighted sum input to non-linear output?”
Name Plot Function Description

1
The most common activation
Logistic (sigmoid) function. Squashes input to
0 x (0,1).

Hyperbolic tangent 1
Squashes input to (-1, 1).
(tanh) 0 x
-1
Popular activation function.
Rectified Linear Unit Anything less than 0, results
(ReLU) in zero activation.
0 x
Derivatives of these functions are also important (gradient descent).
Output Activations/Functions
• “How to output/predict a result”
Problem Description Name Function

Binary • Output probability for each class, in (0,1)

classification • Logistic regression of output of last layer Sigmoid

• Output probability for each class, in (0,1)

Multi-class
• Sum of outputs to be 1 (probability distribution)
classification • Training drives target class values up, others down Softmax

Regression Linear/ ReLU

Build and Train a Neural Network

𝒐
(𝒐𝒖𝒕 ) We build a neural network for a binary
Output Layer
𝒐
(𝒊𝒏) classification task, with:

• (no bias, for simplicity)

• 2 inputs: = 0.5 and = 0.1
(𝒐𝒖𝒕 )
𝒉𝟏 𝒉𝟐
(𝒐𝒖𝒕 )
Hidden Layer • 1 hidden layer with 2 neurons
(𝒊𝒏) (𝒊𝒏)
𝒉𝟏 𝒉𝟐 • 1 output neuron in the output layer
• All neurons have sigmoid activation function:

Input Layer
Forward Pass
(𝒐𝒖𝒕 )
𝒐 Output Layer
(𝒊𝒏)
𝒐

0.4 0.45

0 . 52 0 .53 Hidden Layer

0.1 0.13
0.25 0.2

0.15 0.4 Similarly,

0.5 0.1 Input Layer
Forward Pass

0 . 61
Output Layer
0.44

0.4 0.45

0 . 52 0 .53 Hidden Layer

0.1 0.13
0.25 0.2

0.15 0.4
For binary classification, we would
0.5 0.1 Input Layer classify this (0.5, 0.1) input data point, as
class 1 (as 0.61 > 0.5).
Cost Functions
• “How to compare the outputs with the truth?”
Problem Name Function Notes

Notations for Classification

Binary Cross entropy for • = training examples
classification logistic • = classes
• = prediction (probability)
• = true class (1/yes, 0/no)
Multi-class Cross entropy for
classification Softmax
Notations for Regression
• = training examples
Regression Mean Squared • = prediction (numeric, )
Error • = true value
Training Neural Networks
• Cost function is selected according to problem: Binary, Multi-class
Classification or Regression.
• Update network weights by applying the gradient descent method and
backpropagation. More details

• Weight update formula:

: Cost
Gradient with respect to
Dropout
• Regularization technique to prevent overfitting.
• Randomly removes some nodes with a fixed probability during the
training.

More details
Why Neural Networks?
• Automatically extract useful features
from input data.
• In recent years, deep learning has
achieved state-of-the art results in
many machine learning areas.

• Three pillars of deep learning:

 Data
 Compute
 Algorithms
Build and Train Neural Networks
• How to build and use these ML models?
• Can it be this simple?
Dive into Deep Learning

E-book on Deep Learning by Amazon Scientists, available here: https://d2l.ai

Related chapters:
Chapters 3: Linear Neural Networks: https://d2l.ai/chapter_linear-networks/index.html
Chapters 4: Multilayer Perceptrons: https://d2l.ai/chapter_multilayer-perceptrons/index.html
MXNet Hands-on
• Open source Deep Learning Library to train
and deploy neural networks.
• With the Gluon interface, we can define and
train neural networks easily.

MLA-TAB-Lecture3-MXNet.ipynb
Putting it all together: Lecture 3
• In this notebook, we continue to work with our review dataset to
predict the target field
• The notebook covers the following tasks:
 Exploratory Data Analysis
 Splitting dataset into training and test sets
 Data Balancing, categoricals encoding, text vectorization
 Train a Neural Network
 Check the performance metrics on test set

MLA-TAB-Lecture3-Neural-Networks.ipynb
AutoML
AutoML
AutoML helps automating some of the tasks related to ML model
development and training such as:
• Preprocessing and cleaning data
• Feature selection
• ML model selection
• Hyper-parameter optimization
Auto AutoML
• Open source AutoML Toolkit (AMLT) created by Amazon AI.
• Easy to Use – Built-in Application
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.
Auto AutoML
With AutoGluon, state-of-the-art ML results can be achieved in a few
lines of Python code.

MLA-TAB-Lecture3-AutoGluon.ipynb
THANK YOU

Computer Science o Level Edexcel Book
45% (11)
Computer Science o Level Edexcel Book
116 pages
Grade 3 Mental Maths Worksheet 1 1
No ratings yet
Grade 3 Mental Maths Worksheet 1 1
2 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
NJ Nilsson Artificial Intelligence A New Synthesis PDF
No ratings yet
NJ Nilsson Artificial Intelligence A New Synthesis PDF
7 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
linear regression
No ratings yet
linear regression
130 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Week 04
No ratings yet
Week 04
101 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Regression
No ratings yet
Regression
16 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
ML-1
No ratings yet
ML-1
24 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
ML_AI
No ratings yet
ML_AI
53 pages
Unit 2
No ratings yet
Unit 2
35 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
2.1 Linear Regression
No ratings yet
2.1 Linear Regression
39 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
115 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
ML-2
No ratings yet
ML-2
155 pages
Final Ml
No ratings yet
Final Ml
54 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
fileml
No ratings yet
fileml
54 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
CBSE Class 10 Maths Notes
No ratings yet
CBSE Class 10 Maths Notes
7 pages
Grade 2 Mental Maths Worksheet 2
100% (2)
Grade 2 Mental Maths Worksheet 2
2 pages
Grade 2 Mental Maths Worksheet 3
No ratings yet
Grade 2 Mental Maths Worksheet 3
2 pages
The SARIMAX Model: Full Name: Short Description
No ratings yet
The SARIMAX Model: Full Name: Short Description
7 pages
Grade 4 Mental Maths Multiplication Worksheet 1
No ratings yet
Grade 4 Mental Maths Multiplication Worksheet 1
2 pages
MLA TAB Lecture1
No ratings yet
MLA TAB Lecture1
81 pages
Data Set Description
No ratings yet
Data Set Description
5 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Chapter 7 Functions
No ratings yet
Chapter 7 Functions
10 pages
Configure San 00 - Multi Server - SAN - Cert
No ratings yet
Configure San 00 - Multi Server - SAN - Cert
10 pages
Docker Container Optimization_ Practical
No ratings yet
Docker Container Optimization_ Practical
5 pages
Cisa 8126
No ratings yet
Cisa 8126
19 pages
Multi Section Digital Axle Counter
No ratings yet
Multi Section Digital Axle Counter
6 pages
IT160 CH 1 Activities
No ratings yet
IT160 CH 1 Activities
20 pages
3 Array and Linked Lists
No ratings yet
3 Array and Linked Lists
32 pages
Dig I Design Profile Manual
No ratings yet
Dig I Design Profile Manual
270 pages
Sri Indu College of Engineering & Technology: Setno1 Code No.: R20Ece2102
No ratings yet
Sri Indu College of Engineering & Technology: Setno1 Code No.: R20Ece2102
2 pages
1.1 Lecture Slides Python and Tableau - The Compete Data Analytics Bootcamp
No ratings yet
1.1 Lecture Slides Python and Tableau - The Compete Data Analytics Bootcamp
56 pages
Cambridge IGCSE™ (9-1) : Computer Science 0984/12
No ratings yet
Cambridge IGCSE™ (9-1) : Computer Science 0984/12
10 pages
Manual Test Plan - v1
No ratings yet
Manual Test Plan - v1
15 pages
Set Up ONTAP
No ratings yet
Set Up ONTAP
20 pages
DOC-01-038 - F - Notifier AFP-3030 NZ Programming Manual - Rev F
No ratings yet
DOC-01-038 - F - Notifier AFP-3030 NZ Programming Manual - Rev F
124 pages
Final PracticalTest Sesi II 23-24 10DDT22F1081
No ratings yet
Final PracticalTest Sesi II 23-24 10DDT22F1081
14 pages
Software Configuration Guide, Cisco IOS XE 17.13.x (Catalyst 9200 Switches)
No ratings yet
Software Configuration Guide, Cisco IOS XE 17.13.x (Catalyst 9200 Switches)
26 pages
Conclusion
No ratings yet
Conclusion
14 pages
Vps Do1
No ratings yet
Vps Do1
14 pages
ADBMS - Unit 1 - 21042018 - 032136AM
No ratings yet
ADBMS - Unit 1 - 21042018 - 032136AM
21 pages
Falcon Prevent Next-Generation Antivirus: Industry-Recognized Legacy Av Replacement
No ratings yet
Falcon Prevent Next-Generation Antivirus: Industry-Recognized Legacy Av Replacement
2 pages
Fall 2021 Lectures 1 and 2
No ratings yet
Fall 2021 Lectures 1 and 2
16 pages
Types of Storage Virtualization: Block vs. File
No ratings yet
Types of Storage Virtualization: Block vs. File
5 pages
MBF Parameters
No ratings yet
MBF Parameters
15 pages
W05-L07 Distributed Snapshot (1)
No ratings yet
W05-L07 Distributed Snapshot (1)
57 pages
I. Answer Any 10 Questions. Each Question Carries 2 Marks 2X10 20
No ratings yet
I. Answer Any 10 Questions. Each Question Carries 2 Marks 2X10 20
2 pages
IDB Presentation
No ratings yet
IDB Presentation
16 pages
TIPL Proposal For 1 Gbps Internet Leased Line - GALINIOS TECH
No ratings yet
TIPL Proposal For 1 Gbps Internet Leased Line - GALINIOS TECH
15 pages
Desigo™ PXC4 & PXC5 Automation System For HVAC and Building Services
No ratings yet
Desigo™ PXC4 & PXC5 Automation System For HVAC and Building Services
18 pages

MLA TAB Lecture3

Uploaded by

MLA TAB Lecture3

Uploaded by

MACHINE LEARNING ACCELERATOR

Tabular Data – Lecture 3

• Introduction to ML • Feature Engineering • Optimization

• Model Evaluation • Tree-based Models • Regression Models

 Train-Validation-Test  Decision Tree • Regularization

 Overfitting  Random Forest • Boosting

• Exploratory Data Analysis • Hyperparameter Tuning • Neural Networks

• K Nearest Neighbors (KNN) • AWS AI/ML Services • AutoML

ML Model Features ML Model (Rules) ML Model Target

ML Model Features ML Model (Rules) ML Model Prediction

• As we go towards to the bottom part of the

• As we go towards to the bottom part of the

• Gradient Descent Algorithm:

large Initial Values large

* Data source: King County, WA Housing Info. For ,

Using the multiple linear regression equation:

with features , and parameters/weights

• Iteratively update parameters/weights with Gradient Descent:

Can we use a similar approach to solve classification problems?

“squishes” values to the 0 –1 range.

: true class {0, 1}, = : probability of class, and : logarithm

• As the output of Logistic Regression is between 0 and 1, Log-Loss is a

for the following scenarios:

Better prediction gives smaller Log-Loss predicted probability

with features , and parameters/weights

• Iteratively update parameters/weights with Gradient Descent:

Regularization does both: penalizes large weights,

• Calibrate regularization strength by using a regularizer parameter,

LogisticRegression: sklearn Logistic Regression (and regularization)

Weak Model Weak Model Weak Model …

Data Data Data

Weak Model Weak Model Weak Model …

Prediction large error

Prediction large error

Weak Model 1 Weak Model 2

Prediction large error

Weak Model 1 Weak Model 2

Prediction large error Prediction still large error

Weak Model 1 Weak Model 2 …

Prediction large error Prediction still large error …

Tree 1 Tree 2 Tree 3 … Tree N

Prediction 1 Prediction 2 Prediction 3 … Prediction N

Prediction 1 + Prediction 2 + Prediction 3 + … + Prediction N

GradientBoostingClassifier(n_estimators=100, learning_rate = 0.1,

The full interface is larger.

from sklearn.experimental import enable_hist_gradient_boosting

The full interface is larger.

* Basically assuming that the output depends only on

where is the linear function:

* Linear activation function

where is the logistic function:

* Non-linear activation function / binary classifier

where is the step function:

* Non-linear activation function / binary classifier

where is a nonlinear activation

* Similar to how neurons in the brain function

Question: Can we use a similar

Question: Can we use a similar

(6 weights) Neural Network/Multilayer

• (no bias, for simplicity)

Binary • Output probability for each class, in (0,1)

• Output probability for each class, in (0,1)

Regression Linear/ ReLU

• (no bias, for simplicity)

0 . 52 0 .53 Hidden Layer

0.15 0.4 Similarly,

0 . 52 0 .53 Hidden Layer

Notations for Classification

• Weight update formula:

• Three pillars of deep learning:

E-book on Deep Learning by Amazon Scientists, available here: https://d2l.ai

You might also like