0% found this document useful (0 votes)

35 views84 pages

MLA TAB Lecture2

Lecture 2 of the course focuses on feature engineering for tabular data. It discusses using domain knowledge to create novel features from raw data like selecting features, feature construction through operations, and encoding categorical features. Specific techniques covered include label encoding, one-hot encoding, and text preprocessing steps like tokenization and stop word removal.

Uploaded by

Lori Guerra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views84 pages

MLA TAB Lecture2

Uploaded by

Lori Guerra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 84

MACHINE LEARNING ACCELERATOR

Tabular Data – Lecture 2

Course Overview
Lecture 1 Lecture 2 Lecture 3

• Introduction to ML • Feature Engineering • Optimization

• Model Evaluation • Tree-based Models • Regression Models

 Train-Validation-Test  Decision Tree • Regularization

 Overfitting  Random Forest • Boosting

• Exploratory Data Analysis • Hyperparameter Tuning • Neural Networks

• K Nearest Neighbors (KNN) • AWS AI/ML Services • AutoML

Feature Engineering
Feature Engineering
Feature engineering: Use domain and data knowledge to create novel
features as inputs for ML models from the raw data provided.

Numerical data
Tabular Raw Data

Categorical data Train ML Model

using meaningful
numerical features
Text data
…
Feature Engineering
Feature engineering: Use domain and data knowledge to create novel
features as inputs for ML models from the raw data provided.

Intuition: What information would a human use to predict?

Numerical data
Tabular Raw Data

Often more art than science!

Categorical data Train ML Model

Feature
using meaningful
Engineering
numerical features
Text data
• Select features
…
Feature Engineering
Feature engineering: Use domain and data knowledge to create novel
features as inputs for ML models from the raw data provided.

Intuition: What information would a human use to predict?

Numerical data
Tabular Raw Data

Often more art than science!

Categorical data Train ML Model

Feature
using meaningful
Engineering
numerical features
Text data
• Select features
… • Feature construction (multiplication, squaring,
polynomial features, logs, other kernels, etc.)
Feature Engineering
Feature engineering: Use domain and data knowledge to create novel
features as inputs for ML models from the raw data provided.

Intuition: What information would a human use to predict?

Numerical data
Tabular Raw Data

Often more art than science!

Categorical data Train ML Model

Feature
using meaningful
Engineering
numerical features
Text data
• Select features
… • Feature construction (multiplication, squaring, etc)
• Feature extraction (encoding, vectorization)
• Feature selection (dimensionality reduction)
Encoding Categoricals
Encoding Categorical Features
Categorical (also called discrete) features: These features don’t have a
natural numerical representation.
Example: color {green, red, blue}, isFraud {false, true}
• Most ML models require converting categorical features to numerical
ones.
Encode/define a mapping: Assign a number to each category.
Ordinals: Categories are ordered, e.g., size {L > M > S}. We can assign
L->3, M->2, S->1.
Nominals: Categories are unordered, e.g., color. We can assign the
numbers randomly.
Encoding Categorical Features
LabelEncoder: sklearn encoder, encodes target labels with value between
0 and n_classes-1 - .fit(), .transform()
• Encodes target labels values, y (or one feature only!), and not the input X.
• Can be used to transform non-numerical labels or numerical labels.

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['color'] = le.fit_transform(df['color'])

Let’s encode one feature, e.g. the

color field.
Encoding Categorical Features
OrdinalEncoder: sklearn encoder, encodes categorical features as an integer
array - .fit(), .transform()
• Encodes (two or more) categorical features (doesn’t work on one feature!)
• Returns a single column of integers (0 to n_categories - 1) per feature.

from sklearn.preprocessing import OrdinalEncoder

oe = OrdinalEncoder()

df[['color','size','classlabel']] =
oe.fit_transform(df[['color','size','classlabel']])

Let’s encode all categorical fields.

Encoding Categorical Features
Problem: Encoding categorical features with integers is wrong because the
ordering and size of the integers is meaningless.
One-hot-encoding: Explode the categorical features into many binary
features (as many categories per feature).
• OneHotEncoder: sklearn one-hot encoder, encodes categorical features
as a one-hot numeric array - .fit(), .transform()
 Does not automatically name the new binary features.
 Works on two or more features (for one-hot encoding of one feature alone
should use LabelBinarizer instead!)
• get_dummies: pandas one-hot encoder
Encoding Categorical Features
get_dummies: pandas one-hot encoder, converts categorical features
into new “dummy”/indicator features.
 Automatically names the new binary features.
pd.get_dummies(df, columns=['color'])
Encoding with many categories
• Define a hierarchy structure:
Example: For a zip code feature, can try to use
regions -> states -> city as the hierarchy,
and can choose a specific level to encode the zip code feature.

• Group/bin the categories into fewer groups by similarity:

Example: For some user demographics dataset, create age groups: 1-
15, 16-22, 23-30, and so forth.
Encoding with many categories
Target Encoding: Encode using values that can explain the target.
Example: Averaging the target value for each category. Then, replace the
categorical values with the average target value.
x1 x2 y x1 x2 y
a c 1 0.6 0.5 1
a d 1 x1 -> cat a -> 3/5 = 0.6 0.6 0.4 1
b c 0 x1 -> cat b -> 0/2 = 0 0 0.5 0
a d 0 0.6 0.4 0
x2 -> cat c -> 1/2 = 0.5
a d 0 0.6 0.4 0
x2 -> cat d -> 2/5 = 0.4
a d 1 0.6 0.4 1
b d 0 0 0.4 0
Text Preprocessing
Machine Learning with Text Data
• Text is a common data type such as titles, names, reviews or any
freeform input.
• ML models need well-defined numerical data.

Text preprocessing Vectorization Train ML

Text data (Cleaning and (Convert to Model using
formatting) numbers) numerical data
Lower case, K Nearest Neighbors (KNN),
Word Representation
Stop words removal, Decision Tree,
Stemming, Regression,
Lemmatization Neural Network, etc.
Cleaning Text Data
• Motivation: Messy text harder to find patterns in.
 Normalize text by removing noise: convert to lowercases, strip
whitespaces, remove special characters, remove markups, etc.

Example: The following two sentences have similar meaning but may
seem quite different to a text classifier (i.e. sentiment detector):
• “The countess (Rebecca) considers\n
the boy to be quite naïve.”
• “countess rebecca considers boy naive”
Tokenization
• Tokenization: Splits text into small parts by white space and
punctuation.

Example:
Sentence Tokens
“I”, “do”, “n’t”, “like”, “eggs”,
“I don’t like eggs.”
“.”
Tokens will be used for further cleaning and vectorization.
Stop Words Removal
• Stop Words: Some words that frequently appear in texts, but they
don’t contribute too much to the overall meaning.
 Common stop words: “a”, “the”, “so”, “is”, “it”, “at”, “in”, “this”, “there”, “that”,
“my”, “by”, “nor”

Example:

Original sentence Without stop words

“There is a tree near the house.” “tree near house”
Stop Words Removal
• Stop Words from the Natural Language Tool Kit (NLTK) library:

Is this a good list of stop words for a binary text classification of product
reviews (positive or negative review) ?
Stemming
• Set of rules to slice a string to a substring that usually refers to a more
general meaning.
 The goal is to remove word affixes (particularly suffixes) such as “s”,
“es”, “ing”, “ed”, etc.
o “playing”
o “played” “play”
o ”plays”

 The issue: It doesn’t usually work with irregular forms such as

irregular verbs: “taught”, “brought”, etc.
Text Vectorization
Text Vectorization: Bag of Words
ML models need well-defined numerical data.

‘Bag of Words’ (BoW) method:

• Converts text data into numerical features.
• Referred as feature extraction, as we extracted important
information from the original text in a numeric form
• For each word in a document, we get a number; it can be:
 binary (1 or 0, present or not)
 word counts or frequencies
Bag of Words: Binary
Simple example for binary features:

a cat dog is it my not old wolf

“It is a dog.” 1 0 1 1 1 0 0 0 0

“my cat is old” 0 1 0 1 0 1 0 1 0

“It is not a dog, it

1 0 1 1 1 0 1 0 1
is a wolf.”
Bag of Words: Counts
Simple example for word counts:

a cat dog is it my not old wolf

“It is a dog.” 1 0 1 1 1 0 0 0 0

“my cat is old” 0 1 0 1 0 1 0 1 0

“It is not a dog, it

2 0 1 2 2 0 1 0 1
is a wolf.”
Term Frequency (TF)
Term frequency (TF): Increases the weights of common words in a
document.

a cat dog is it my not old wolf

“It is a dog.” 0.25 0 0.25 0.25 0.25 0 0 0 0

“my cat is old” 0 0.25 0 0.25 0 0.25 0 0.25 0

“It is not a dog, it

0.22 0 0.11 0.22 0.22 0 0.11 0 0.11
is a wolf.”
Inverse Document Frequency (IDF)
Inverse document frequency (IDF): term idf
Decreases the weights for commonly used a
words, and increases weights for rare words cat
in the vocabulary. dog
is
it
my
not
Example: old
wolf
Inverse Doc. Freq. (TF-IDF)
Term Freq. Inverse Doc. Freq (TF-IDF): Combines term frequency and
inverse document frequency.

a cat dog is it my not old wolf

“It is a dog.” 0.25 0 0.25 0.22 0.25 0 0 0 0

“my cat is old” 0 0.3 0 0.22 0 0.3 0 0.3 0

“It is not a dog, it

0.22 0 0.11 0.19 0.22 0 0.13 0 0.13
is a wolf.”
Bag of Words in sklearn
CountVectorizer: sklearn text vectorizer, converts a collection of text
documents to a matrix of token counts - .fit(), .transform()

from sklearn.feature_extraction.text import CountVectorizer

countVectorizer = CountVectorizer(binary=True)
vocabulary =
sentences = ['This is the first document.', {and, document, first, is, one,
'This is the second document.', second, the, third, this}
'and the third one.',
]

X = countVectorizer.fit_transform(sentences)
print(X.toarray())
Bag of Words in sklearn
TfidfVectorizer: sklearn text vectorizer, converts a collection of text
documents to a matrix of TF-IDF features - .fit(), .transform()

• Returns normalized term frequencies matrix when “use_idf = False”:

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(use_idf=False)

• Returns smoother TF-IDF matrix when “use_idf = False”:

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(use_idf=True)
Text Preprocessing Hands-on
• In this notebook we perform the following tasks:
 Text cleaning
 Text preprocessing
 Text vectorization - get binary Bag of Words features

MLA-TAB-Lecture2-Text-Processing.ipynb
Tree-based Models
Problem: Package Delivery Prediction
• Given this dataset, let’s Weather
Sunny
Demand
High
Address
Correct
ontime
No
predict package on time Sunny High Misspelled No

delivery (yes/no) using a Overcast

Rainy
High
High
Correct
Correct
Yes
Yes

Decision Tree. Rainy Normal Correct Yes

Rainy Normal Misspelled No
Overcast Normal Correct Yes

• Iteratively split the dataset Sunny High Correct No

Sunny Normal Correct Yes
into subsets (branches), such Rainy Normal Misspelled Yes

that the final subsets (leaves) Sunny

Overcast
Normal
High
Misspelled
Misspelled
Yes
Yes
contain mostly one class. Overcast Normal Correct Yes
Rainy High Misspelled No
ML Model: Decision Tree
Weather Demand Address ontime
Sunny High Correct No
Weather Sunny High Misspelled No
Overcast High Correct Yes
Rainy High Correct Yes
Sunny Rainy
Rainy Normal Correct Yes

Overcast Rainy Normal Misspelled No

Overcast Normal Correct Yes
Demand Address Sunny High Correct No
Yes Sunny Normal Correct Yes
Rainy Normal Misspelled Yes
High Normal Misspelled Correct
Sunny Normal Misspelled Yes
Overcast High Misspelled Yes

No Yes No Yes Overcast Normal Correct Yes

Rainy High Misspelled No
Decision Trees
Decision Trees are flowchart-like structures that can be used for
classification or regression tasks.

Weather
Root Node How to Learn a
• the start node
Decision Tree?
Sunny Rainy

Overcast Internal Nodes

Demand Address • exactly one incoming node and two or more outgoing edges
• have attribute conditions to separate records
Yes

High Normal Misspelled Correct

Leaf or Terminal Nodes
No Yes No Yes • exactly one incoming edge and no outgoing edges
• it is assigned a class label (classification) or value (regression)
Learn a Decision Tree
ID3* Algorithm:
(Repeat the steps below)
1. Select “the best feature” to split (we will see how to select)
2. Separate the training samples according to the selected feature
3. Stop if we have samples from a single class or if we used all features,
and note it as a leaf node

Top down approach: Grow the tree from root node to leaf nodes.
*ID3 (Iterative Dichotomiser 3)
Decision Trees: Numerical Example
x1 x2 y
• Given this dataset, let’s
3.5 2 1
predict the y class (1 vs. 2) 5 2.5 2
using a Decision Tree. 1 3 1
2 4 1
• Iteratively split the dataset 4 2 1
into subsets from a root node, 6 6 2
such that the leaf nodes 2 9 2
contain mostly one class (as 4 9 2
pure as possible). 5 4 1
3 8 2
Decision Trees: Numerical Example
x2 Class 1 x1 x2 y
Class 2 3.5 2 1
9 5 2.5 2
8 1 3 1
7 2 4 1
6 4 2 1
5 6 6 2
4 2 9 2
3 4 9 2
2 5 4 1
1 3 8 2

1 2 3 4 5 6 x1
Decision Trees: Numerical Example
x2 Class 1
Class 2
Class: 1, 2
9

8 What feature (x1 or x2) to

7 use to split this dataset, to
6 best separate class 1 from
class 2?
5

3 [select the splits such that

2 the descendent subsets are
1 “purer” than their parents]

1 2 3 4 5 6 x1
Decision Trees: Numerical Example
x2 Class 1
Class 2

9
x2≤ 5
Yes No
8

7 Class: 1, 2 Class = 1
6
What feature (x1 or x2) to
55
use to split this subset, to
4
best separate class 1 from
3 class 2?
2

1 2 3 4 5 6 x1
Decision Trees: Numerical Example
x2 Class 1
Class 2

9
x2≤ 5
Yes No
8

7 x 1 ≤ 4.5 Class = 1
6 No
Yes
55
4
Class = 2 Class: 1, 2
3 What feature (x1 or x2) to
2 use to split this subset, to
1 best separate class 1 from
class 2?
1 2 3 4 4.5
5 6 x1
Decision Trees: Numerical Example
x2 Class 1
Class 2

9
x2≤ 5
Yes No
8

7 x 1 ≤ 4.5 Class = 1
6 No
Yes
55
4
? Class = 2 x2≤ 3
3 Yes No
3
2
Class = 1 Class = 2
1

1 2 3 4 4.5
5 6 x1
Decision Trees: Example
[9+, 5-]
Weather Demand Address ontime
Class: Yes, No Sunny High Correct No
Sunny High Misspelled No
What feature (’Weather’, Overcast High Correct Yes
‘Demand’ or ‘Address’) to Rainy High Correct Yes

use to split the dataset, to Rainy Normal Correct Yes

best separate class ‘No’ Rainy Normal Misspelled No

Overcast Normal Correct Yes
from class ‘Yes’?
Sunny High Correct No
Sunny Normal Correct Yes
Rainy Normal Misspelled Yes
[select the splits such that the
Sunny Normal Misspelled Yes
descendent subsets are Overcast High Misspelled Yes
“purer” than their parents] Overcast Normal Correct Yes
Rainy High Misspelled No
Best Feature to Split with?
A good split results in overall less uncertainty (impurity). For example:
Weather Demand Address ontime
Not too sure Sunny High Correct No
[9+, 5-] Sunny High Misspelled No
Overcast High Correct Yes

Weather Rainy High Correct Yes

Rainy Normal Correct Yes
Sunny Rainy Rainy Normal Misspelled No
Overcast Normal Correct Yes
Overcast Sunny High Correct No
[2+, 3-]
Sunny Normal Correct Yes
[3+, 2-]
Not too Rainy Normal Misspelled Yes
[4+, 0-] Not too
sure Sunny Normal Misspelled Yes
Absolutely sure Overcast High Misspelled Yes
sure Overcast Normal Correct Yes
Rainy High Misspelled No
Best Feature to Split with?
A good split results in overall less uncertainty (impurity). For example:
Weather Demand Address ontime
Sunny High Correct No
Not too sure Not too sure
Sunny High Misspelled No
[9+, 5-] [9+, 5-]
Overcast High Correct Yes
Rainy High Correct Yes
Demand Address
Rainy Normal Correct Yes
Rainy Normal Misspelled No
Overcast Normal Correct Yes
High Normal Correct Misspelled
Sunny High Correct No
Sunny Normal Correct Yes
Rainy Normal Misspelled Yes
[3+, 4-] [1+, 6-] [6+, 2-] [3+, 3-]
Sunny Normal Misspelled Yes

Not too Somewhat Somewhat Not too Overcast High Misspelled Yes

sure sure sure sure Overcast Normal Correct Yes

Rainy High Misspelled No
Best Feature to Split with?
What split will results in overall less uncertainty (impurity)?

Not too sure Not too sure Not too sure

[9+, 5-] [9+, 5-] [9+, 5-]

Demand Address
Weather

Sunny Rainy
High Normal Correct Misspelled
Overcast
[2+, 3-]
Not too [3+, 2-]
[3+, 4-] [1+, 6-] [6+, 2-] [3+, 3-]
[4+, 0-] Not too
sure
Absolutely sure Not too Somewhat Somewhat Not too
sure sure sure sure sure
How to Measure Uncertainty

If we have only + samples or only – samples:

Low uncertainty
-- ---- -+
+- --+ +++
- - + ++++
+
How to Measure Uncertainty

If we have mix of + and - samples:

High uncertainty
-- ---- -+
+- --+ +++
- - + ++++
+
How to Measure Uncertainty
Gini Impurity curve

We will use Gini impurity:

( : number of classes, : prob. of

picking a datapoint from class )

Another measure: -- ---- -+

+- --+ +++
++++
- - + +
Entropy: More details
Information Gain & Feature Selection
Information Gain: Expected reduction in uncertainty due to selected
feature.
Gain = “Weather”,
“Demand” or
Impurity before split Impurity after split “Address”,
which one
[9+, 5-] [9+, 5-] [9+, 5-]
should we
Weather Address
Demand select as feature
Sunny Rainy Correct Misspelled to split?
Overcast High Normal
Calculating Gini Impurity

Not too sure

Gini impurity:
[9+, 5-]

Weather

Sunny Rainy
Overcast
[2+, 3-]
Not too [3+, 2-]
[4+, 0-] Not too
sure
Absolutely sure
sure
Calculating Gini Impurity

Not too sure

Gini impurity:
[9+, 5-]

Weather

Sunny Rainy
Overcast
[2+, 3-]
Not too [3+, 2-]
[4+, 0-] Not too
sure
Absolutely sure
sure

(Weighted sum of impurities)

Information Gain & Feature Selection

Not too sure

[9+, 5-]
Gain =
Weather
Impurity before split Impurity after split
Sunny Rainy
Overcast
[2+, 3-]
[3+, 2-]
Not too
[4+, 0-]
Gain(“Weather”) = 0.46-0.34=0.12
sure Not too
Absolutely sure
sure
Information Gain & Feature Selection
Comparing gains for each feature:
[9+, 5-] [9+, 5-]

Demand Address Gain(“Weather”) = 0.46-0.34 = 0.12

Gain(“Demand”) = 0.46-0.37 = 0.09
High Normal Correct Misspelled Gain(“Address”) = 0.46-0.43 = 0.03

[3+, 4-] [1+, 6-] [6+, 2-] [3+, 3-] “Weather” has the highest gain of all, so
we start the tree with the “Weather”
feature as the root node!
Recap ID3 Algorithm
ID3 Algorithm*: Repeat
Feature with highest
Information Gain 1. Select “the best feature” to split using
Information Gain.
A B C
2. Separate the training samples according to
the selected feature.
Feature with highest Feature with highest
Information Gain given A Information Gain given C 3. Stop if have samples from a single class or if
all features used, and note a leaf node.
D E Class 1 F G
4. Assign the leaf node the majority class of the
Class 2 Class 1 Class 2
samples in it.
Class 1

*To build a Decision Tree Regressor: 1. Replace the Information Gain with Standard Deviation Reduction. 3. Stop when numerical values are homogeneous
(standard deviation is zero) or if used all features, and note it as a leaf node. 4. Assign the leaf node the average value of its samples.
Decision Trees in sklearn
DecisionTreeClassifier: sklearn Decision Tree classifier (there is also a
Regressor version) - .fit(), .predict()

DecisionTreeClassifier(criterion='gini’,
max_depth=None, min_samples_split=2,
min_samples_leaf=1, class_weight=None)

The full interface is larger.

Ensemble Methods: Bagging
Ensemble Learning
Ensemble methods create a strong model by combining the predictions of
multiple weak models (aka weak learners or base estimators) built with
a given dataset and a given learning algorithm.

Weak Model 1

Weak Model 2 Ensemble

Data
… Prediction
Weak Model N

We discuss Bagging and Boosting ensemble models.

Bagging (Bootstrap Aggregating)
Bagging (Bootstrap Aggregating) method:
• Randomly draw N samples of a fix size from the training set (with
replacement) - bootstrap technique
Example: given data [1, 2, 3, 4, 5, 6, 7, 8, 9], samples of size 6 are:
[ 1, 1, 2, 4, 9, 9]; [ 2, 4, 5, 5, 7, 7]; [ 1, 1, 1, 1, 1, 1]; [ 1, 2, 4, 5, 7,9]
• Build independent estimators of same type on each subset
• Majority vote or average the predictions from all estimators

Bagging Decision Trees: Random Forest

Bagging trees: Random Forest
Random Forest: Bagging Decision Trees
• Draw random subsets (with replacement) from the original dataset
• Build a decision tree on each bootstrapped subset
• Combine predictions from each tree for final prediction

Data 1 Tree 1 Prediction 1

Data 2 Tree 2 Prediction 2 Prediction
Data
… … …
Data N Tree N Prediction N
Random Forest in sklearn
RandomForestClassifier: sklearn Random Forest classifier (there is also
a Regressor version) - .fit(), .predict()

RandomForestClassifier(n_estimators=100,
max_samples=None, max_features='auto’,
criterion='gini’, max_depth=None, min_samples_split=2,
min_samples_leaf=1, class_weight=None)

The full interface is larger.

Bagging in sklearn
BaggingClassifier: sklearn very general interface for bagging which can
be provided any base_estimator - .fit(), .predict()

BaggingClassifier(base_estimator=None, n_estimators=10,
max_samples=1.0, bootstrap=True)

The full interface is larger.

Hyperparameter Tuning
Hyperparameter Tuning
• Hyperparameters are ML algorithms parameters that affect the
structure of the algorithms and the performance of the models.
Examples of hyperparameters:
 K Nearest Neighbors: n_neighbors, metric
 Decision trees: max_depth, min_samples_leaf, class_weight, criterion
 Random Forest: n_estimators, max_samples
 Ensemble Bagging: base_estimator, n_estimators

• Hyperparameter tuning looks for the best combination of

hyperparameters (combination that maximizes model performance).
Grid Search in sklearn
GridSearchCV: sklearn basic hyperparameter tuning method, finds the
optimum combination of hyperparameters by exhaustive search over
specified parameter values - .fit(), .predict()

Hyperparameter 2
GridSearchCV(estimator, param_grid, scoring=None)

Example: Hyperparameters for a Decision Tree: Hyperparameter 1

param_grid ={max_depth: [5, 10, 50, 100, 250], Total hyperparameters combinations
5 x 5 = 25
min_samples_leaf: [15, 20, 25, 30, 35]}
[5, 15], [5, 20], [5, 25], [10, 15], …
Randomized Search in sklearn
RandomizedSearchCV: randomized search on hyperparameters
 Chooses a fixed number (given by parameter n_iter) of random combinations of
hyperparameter values and only tries those.
 Can sample from distributions (sampling with replacement is used), if at least one
parameter is given as a distribution.

Hyperparameter 2
RandomizedSearchCV(estimator, param_distributions,
n_iter=10, scoring=None)

Example: Hyperparameters for a Decision Tree: Hyperparameter 1

param_grid ={max_depth: [5, 10, 50, 100, 250],

min_samples_leaf: uniform(15,35,5)}
Bayesian Search
• Bayesian Search method keeps track of previous hyperparameter
evaluations and builds a probabilistic model.
• It tries to balance exploration (uncertain hyperparameter set) and
exploitation (hyperparameters with a good chance of being optimum)
• It prefers points near the ones that worked well
• AWS SageMaker uses Bayesian Search for hyperparameter
optimization.
Data Preprocessing with
Pipeline (sklean)
Transformers in sklearn
• SimpleImputer, StandardScaler, MinMaxScaler, LabelEncoder,
OrdinalEncoder, OneHotEncoder, and CountVectorizer belong to
sklearn’s transformers class, all have:
 .fit() method: learns the transformation from the training dataset
 .transform() method: applies the transformation to any dataset
(training, validation, test) for preprocessing

On training set can also apply .fit_transform()

ColumnTransformer in sklearn
ColumnTransformer: applies transformers to columns of an array or
pandas DataFrame - .fit(), .transform()
 Allows different columns or column subsets of the input (numerical,
categorical, text) to be transformed separately.
 The features generated by each transformer will be concatenated to
form a single feature space.
 This is useful for mixed tabular datasets, to combine several feature
extraction mechanisms or transformations into a single transformer.
ColumnTransformer and Pipeline
numerical_processing = Pipeline([
(‘num_imputer’, SimpleImputer(strategy='mean’)),
(‘num_scaler’, MinMaxScaler())])
y_train
categorical_processing = Pipeline([
(‘cat_imputer’, Imputer(strategy='constant', fill_value='missing’)), X_train X_test
(‘cat_encoder’, OneHotEncoder(handle_unknown='ignore’))]) pipeline.fit pipeline.predict

processor = ColumnTransformer(transformers =[ .fit_transform .transform

(‘num_processing’, numerical_processing, ('feature1', 'feature3')),
(‘cat_processing’, categorical_processing, ('feature0', 'feature2’))]) .fit_transform .transform

pipeline = Pipeline([(‘data_processing’, processor),

(‘estimator’, KNeighborsClassifier())]) .fit .predict

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
Putting it all together
• In this notebook, we continue to work with our review dataset to predict
the target field
• The notebook covers the following tasks:
 Exploratory Data Analysis
 Splitting dataset into training and test sets
 Categoricals encoding and text vectorization
 Train a Decision Tree Classifier, and Hyperparameter Tuning
 Check the performance metrics on test set

MLA-TAB-Lecture2-Trees.ipynb
AWS SageMaker
AWS SageMaker: Train and Deploy
SageMaker is an AWS service to easily build, train, tune and deploy ML
models: https://aws.amazon.com/sagemaker/

MLA-TAB-Lecture2-SageMaker.ipynb
AWS SageMaker
GroundTruth
SageMaker GroundTruth: Data Labeling
• Machine learning can be applied in many different areas. With this,
we usually have many different types of labels.
• We will use SageMaker GroundTruth tool and label some sample
data.
• GroundTruth allows users to create labeling tasks and assign them to
internal team members or outsource them.
SageMaker GroundTruth: Text Tasks
SageMaker GroundTruth: Image Tasks
SageMaker GroundTruth: Demo
Assume we will label these 5 images from our final project.
There are two classes: Software and Video game

Image 1 Image 2 Image 3 Image 4 Image 5

SageMaker GroundTruth: Demo
SageMaker GroundTruth: Demo

Checkout this video

walkthrough for more
details:
https://youtu.be/8J7y51
3oSsE
Looking Ahead: Lecture 3
Looking Ahead: Lecture 3
Optimization: Model training by Gradient Descent
Regression: Linear and Logistic Regression
Regularization: balance overfitting/underfitting
Boosting: Gradient Boosting Machine (GBM)
Neural Networks: More advanced ML models
MXNet, Gluon, and AutoGluon: More Amazon ML tools that helps you
build, train and deploy deep learning models and AutoML models on AWS

Grade 3 Mental Maths Worksheet 1 1
No ratings yet
Grade 3 Mental Maths Worksheet 1 1
2 pages
Grade 4 Mental Maths Multiplication Worksheet 1
100% (2)
Grade 4 Mental Maths Multiplication Worksheet 1
2 pages
Homework3 Sol
No ratings yet
Homework3 Sol
5 pages
Mastering Objectoriented Python
From Everand
Mastering Objectoriented Python
Steven F. Lott
5/5 (2)
5.2_feature_engineering
No ratings yet
5.2_feature_engineering
57 pages
Feature Engineering
100% (2)
Feature Engineering
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Data Mining Numericals
No ratings yet
Data Mining Numericals
38 pages
ML week 8
No ratings yet
ML week 8
12 pages
TextFeatureEnginerring-NLP lec2
No ratings yet
TextFeatureEnginerring-NLP lec2
60 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Module III
No ratings yet
Module III
42 pages
Methodology (Autosaved)
No ratings yet
Methodology (Autosaved)
9 pages
cours data
No ratings yet
cours data
51 pages
Predictive Methods For Text Mining
No ratings yet
Predictive Methods For Text Mining
75 pages
05 -- Feature Engineering (Text)
No ratings yet
05 -- Feature Engineering (Text)
28 pages
1739168641630
No ratings yet
1739168641630
30 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
Module 2 Feature Engineering and Text Representation
No ratings yet
Module 2 Feature Engineering and Text Representation
19 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
Unit iv
No ratings yet
Unit iv
58 pages
7-8 Feature Engineering 101-Normalization
No ratings yet
7-8 Feature Engineering 101-Normalization
8 pages
09 Milestone Project 2 Skimlit
No ratings yet
09 Milestone Project 2 Skimlit
32 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Feature Transformation
No ratings yet
Feature Transformation
6 pages
NLP_Module 2
No ratings yet
NLP_Module 2
54 pages
Lect_05_Preprocessing_text
No ratings yet
Lect_05_Preprocessing_text
25 pages
srujitha-1
No ratings yet
srujitha-1
91 pages
3. Introduction to Machine Learning
No ratings yet
3. Introduction to Machine Learning
27 pages
Extra Feature NLP
No ratings yet
Extra Feature NLP
5 pages
Presentation
No ratings yet
Presentation
10 pages
NLP m3
No ratings yet
NLP m3
111 pages
ML Course PDF
No ratings yet
ML Course PDF
133 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
AI LAB FINAL
No ratings yet
AI LAB FINAL
21 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
NLP_crecord_mid2
No ratings yet
NLP_crecord_mid2
36 pages
NLP_record_2[10] (1)
No ratings yet
NLP_record_2[10] (1)
18 pages
Amazon Food Review Notes
No ratings yet
Amazon Food Review Notes
37 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Text Mining - Vectorization
No ratings yet
Text Mining - Vectorization
24 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
CS L04 MachineLearning Basics 02
No ratings yet
CS L04 MachineLearning Basics 02
69 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
chapter_2
No ratings yet
chapter_2
36 pages
08 Text Data Processing
No ratings yet
08 Text Data Processing
42 pages
8
No ratings yet
8
9 pages
Unit iv
No ratings yet
Unit iv
57 pages
HW 5 Q 1
No ratings yet
HW 5 Q 1
22 pages
Machine Learning, NLP_ Text Classification Using Scikit-learn, Python and NLTK
No ratings yet
Machine Learning, NLP_ Text Classification Using Scikit-learn, Python and NLTK
9 pages
Unit-II
No ratings yet
Unit-II
119 pages
Python
No ratings yet
Python
14 pages
Fake News Detection
No ratings yet
Fake News Detection
14 pages
L2_CSE256_FA24_TC
No ratings yet
L2_CSE256_FA24_TC
65 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
CBSE Class 10 Maths Notes
No ratings yet
CBSE Class 10 Maths Notes
7 pages
Grade 2 Mental Maths Worksheet 2
100% (2)
Grade 2 Mental Maths Worksheet 2
2 pages
Grade 2 Mental Maths Worksheet 3
No ratings yet
Grade 2 Mental Maths Worksheet 3
2 pages
The SARIMAX Model: Full Name: Short Description
No ratings yet
The SARIMAX Model: Full Name: Short Description
7 pages
MLA TAB Lecture1
No ratings yet
MLA TAB Lecture1
81 pages
Data Set Description
No ratings yet
Data Set Description
5 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
1 - Chaid and Cart
No ratings yet
1 - Chaid and Cart
62 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Learning Game AI Programming With Lua Sample Chapter
100% (1)
Learning Game AI Programming With Lua Sample Chapter
74 pages
Spotle - Ai Data Science Final Capstone Project Building An Credit Card Analyser Using Decision Tree Classifier
No ratings yet
Spotle - Ai Data Science Final Capstone Project Building An Credit Card Analyser Using Decision Tree Classifier
4 pages
MachineLearningPropertyModeling_UserGuide
No ratings yet
MachineLearningPropertyModeling_UserGuide
21 pages
Internship Report On Machine Learing
No ratings yet
Internship Report On Machine Learing
30 pages
Black Scholes
No ratings yet
Black Scholes
41 pages
Final Year Project Proposal: Heart Attack Predictor Using Artificial Intelligence
No ratings yet
Final Year Project Proposal: Heart Attack Predictor Using Artificial Intelligence
37 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
PROJECT PROPOSAL
No ratings yet
PROJECT PROPOSAL
11 pages
Report
No ratings yet
Report
9 pages
XGBoost & Adaboost
No ratings yet
XGBoost & Adaboost
22 pages
Review of Literature
No ratings yet
Review of Literature
48 pages
Unit-3 New
No ratings yet
Unit-3 New
75 pages
The Role of Machine Learning Algorithms For Diagnosing Diseases
No ratings yet
The Role of Machine Learning Algorithms For Diagnosing Diseases
10 pages
Remote Sensing of Environment XXX (XXXX) XXX
No ratings yet
Remote Sensing of Environment XXX (XXXX) XXX
11 pages
2021 - Machine Learning (Theses)
No ratings yet
2021 - Machine Learning (Theses)
65 pages
Sharmila Vege Sana 2018
No ratings yet
Sharmila Vege Sana 2018
37 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
Cs3491-Ai&ml Lab Manual Ece
No ratings yet
Cs3491-Ai&ml Lab Manual Ece
42 pages
Vtu Questions From Previous Ai Ml Question Papers (1)
No ratings yet
Vtu Questions From Previous Ai Ml Question Papers (1)
4 pages
1 s2.0 S0029801823028834 Main
No ratings yet
1 s2.0 S0029801823028834 Main
25 pages
مايننغ اسئلة Mcq
No ratings yet
مايننغ اسئلة Mcq
70 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Doan Uccs 0892D 10279
No ratings yet
Doan Uccs 0892D 10279
147 pages
Seminar On AI
No ratings yet
Seminar On AI
41 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Lecture 9
No ratings yet
Lecture 9
24 pages