0% found this document useful (0 votes)

4 views94 pages

ML - Zep

Uploaded by

shivamm.py

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views94 pages

ML - Zep

Uploaded by

shivamm.py

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 94

Machine Learning

BEGINNER → EXPERT

1 Copyright © 2022 TalentLabs Limited

© zepanalytics.com
Agenda
1. Introduction
2. Use Cases
3. Types of Learning
4. Essential Libraries
5. Feature Scaling
6. Regression Algo’s
7. Classification Algo’s
8. Clustering Algo’s
9. Association Rule Learning
10. Ensemble Techniques
11. Time Series Analysis
12. Dimensionality Reduction - Feature Engineering
13. Hyperparameter Optimization

2 Copyright © 2022 TalentLabs Limited

3 Copyright © 2022 TalentLabs Limited

● ML is a subset of AI.
● Name derived from the concept that it deals with “construction & study of
systems that can learn from data”

4 Copyright © 2022 TalentLabs Limited

5 Copyright © 2022 TalentLabs Limited

● Machine learning is a type of artificial

intelligence (AI) that provides computers
with the ability to learn without being
explicitly programmed.

● Machine learning focuses on the

development of computer programs
that can teach themselves to grow and
change when exposed to new data.

6
Copyright © 2022 TalentLabs Limited

• Volume of data collected growing day by day.

• Data production is 5000% greater in 2021 than in 2010.

• Every day, 2.5 quintillion bytes of data are created

• Data is nearly doubling in size every two years.

• Knowledge Discovery is needed to make sense and use of data.

• Machine Learning is a technique in which computers learn from data to

obtain insight and help in knowledge discovery

7 Copyright © 2022 TalentLabs Limited

8
Copyright © 2022 TalentLabs Limited

9 Copyright © 2022 TalentLabs Limited

10
Copyright © 2022 TalentLabs Limited

Better Imaging & Diagnostic Disease Detection using Machine Providing Personalized Treatment
Techniques Learning

Preventing Medical Insurance Drug Discovery & Disease detection using Deep
Frauds Research Learning

11
Copyright © 2022 TalentLabs Limited

12 Copyright © 2022 TalentLabs Limited

1. Supervised
2. Unsupervised
3. Reinforcement

13
Copyright © 2022 TalentLabs Limited

14
Copyright © 2022 TalentLabs Limited

1. Internet presence – Create profiles on Naukri, Monster, LinkedIn, Hirist, Glassdoor, Indeed, StackOverFlow
etc.
2. Push reusable code/ apps to Github. Link it to your profile.
3. Write Blogs, link it on your profiles.
4. Differentiator - Focus both on depth and breadth of Data Science.
5. Profiles should contain all the tech stacks keywords – e.g. Deep Learning, Machine Learning, NLP, Spark,
Flask, Kafka, NoSQL, Python etc.
6. Headlines are important – Data Scientist, Machine Learning Engineer, etc.
7. Prefix-Suffix with lead/ architect/ Head based on the years of experience and current role.
8. Project are a must showing good amount of relevant project experience solving industry level use cases.
9. Participate in online Hackathons and Code challenges.

16
Copyright © 2022 TalentLabs Limited

17 Copyright © 2022 TalentLabs Limited

1. numpy: The matrix / numerical analysis layer at the bottom

2. scipy: Scientific computing utilities (linalg, FFT, signal/image processing...)
3. sklearn: Machine learning (our focus here)
4. matplotlib: Plotting and visualization
5. opencv: Computer vision
6. pandas: Data analysis
7. caffe, theano, minerva, tensorflow, keras: Deep neural networks
8. spyder: The front end (Scientific Python Development Environment)

18
Copyright © 2022 TalentLabs Limited

19 Copyright © 2022 TalentLabs Limited

The train_test_split() method is used to split our data into train and test sets.

First, we need to divide our data into features (X) and labels (y). The dataframe gets divided into X_train,X_test , y_train and y_test.
X_train and y_train sets are used for training and fitting the model. The X_test and y_test sets are used for testing the model if it’s
predicting the right outputs/labels. we can explicitly test the size of the train and test sets. It is suggested to keep our train sets
larger than the test sets.

The train_test_split() method is used to split our data into train and test sets.

Train set: The training dataset is a set of data that was utilized to fit the model. The dataset on which the model is trained. This data
is seen and learned by the model.

Test set: The test dataset is a subset of the training dataset that is utilized to give an accurate evaluation of a final model fit.

validation set: A validation dataset is a sample of data from your model’s training set that is used to estimate model performance
while tuning the model’s hyperparameters.

20
Copyright © 2022 TalentLabs Limited

Sepal Length, Sepal Width → What is the magnitude & Units.

Most ML’s use Euclidean distance, hence without feature scaling, most of the algo’s neglect the Units and focus on Magnitude, in
that case the Euclidean distance may vary significantly → Hence, o/p will be impacted.

1. Standardisation
2. Mean Normalization
3. Min-Max Scaling
4. Vector Scaling

21
Copyright © 2022 TalentLabs Limited

22 Copyright © 2022 TalentLabs Limited

Regression analysis is a form of predictive Let’s go through various regression algorithms:

modelling technique which investigates the
relationship between a dependent (target)
and independent variable (s) (predictor). 1. Linear Regression
This technique is used for forecasting, time 2. Lasso Regression
series modelling and finding the causal 3. Polynomial Regression
effect relationship between the variables. 4. Stepwise Regression
5. Ridge Regression

23
Copyright © 2022 TalentLabs Limited

Simple linear regression is a statistical method that enables users to summarise and study the relationships
between two continuous (quantitative) variables. Linear regression is a linear model wherein a model that
assumes a linear relationship between the input variables (x) and the single output variable (y). Here, y can be
calculated from a linear combination of the input variables (x). When there is a single input variable (x), the
method is called a simple linear regression. When there are multiple input variables, the procedure is referred to
as multiple linear regression.

24
Copyright © 2022 TalentLabs Limited

© zepanalytics.com
Multiple Linear Regression
Simple Linear Regression → y = bo + b1*x1
The difference between simple linear regression
and multiple linear regression, multiple linear Multiple Linear Regression → y = bo + b1*x1 +
regression has (>1) independent variables, b2*x2 + …. + bn*xn
whereas simple linear regression has only 1 Where, y → Dependent variable
independent variable.
x1, x2, ….xn → Independent variables

25
Copyright © 2022 TalentLabs Limited

5 methods of building models:

1. All-in
2. Backward Elimination → Stepwise
3. Forward Selection → Stepwise
4. Bidirectional Elimination → Stepwise
5. Score Comparison

26
Copyright © 2022 TalentLabs Limited

Backward Elimination → Stepwise Forward Pass → Stepwise

1. Select a significance level to stay in the 1. Select a significance level to stay in the
model (Let say 0.05) model (Let say 0.05)
2. Fit the full model with all possible predictors 2. Fit all simple regression models. Select one
3. Consider the predictor with highest p-value with lowest p-value
and if p-value > SL, go to step 4, else FIN 3. Keep the variable & fit all the possible
4. Remove that predictor models with one extra predictor added to
5. Fit the model without this variable one you already have.
4. Consider the predictor with lowest p-value.
If p>SL, go to Step 3 else go to FIN

27
Copyright © 2022 TalentLabs Limited

The equation of Polynomial Regression is:

28
Copyright © 2022 TalentLabs Limited

• Regularisation helps avoid overfitting data at

the cost of some added error.

• LASSO (Least Absolute Shrinkage Selector

Operator) and Ridge regression are
regularisation techniques used for improving
the robustness of the model.

• LASSO or L1 penalty can be used for

dimensionality reduction also

29
Copyright © 2022 TalentLabs Limited

• Regularisation helps avoid overfitting

data at the cost of some added error.

• LASSO (Least Absolute Shrinkage

Selector Operator) and Ridge
regression are regularisation
techniques used for improving the
robustness of the model.

• LASSO or L1 penalty can be used for

dimensionality reduction also

30
Copyright © 2022 TalentLabs Limited

MAE, MSE, RMSE, MAPE etc.

• MSE (Mean Squared Error)

• RMSE (Root Mean Squared Error)
• MAE (Mean Absolute Error)
• MAPE (Mean Absolute Percentage Error)

31
Copyright © 2022 TalentLabs Limited

MAE: We just look at the absolute difference

between data and model’s predictions

Lower MAE → Better model

32
Copyright © 2022 TalentLabs Limited

MSE is going to be a huge number because of squaring,

hence, we can’t compare it with MAE. This ultimately
means that outliers in our data will contribute to much
higher total error in the MSE than they would the MAE.
Similarly, our model will be penalized more for making
predictions that differ greatly from the corresponding
actual value. This is to say that large differences between
actual and predicted are punished more in MSE than in
MAE.

33
Copyright © 2022 TalentLabs Limited

RMSE = Square root(MSE)

As the name suggests, it is the square root of the MSE.

Because the MSE is squared, its units do not match
that of the original output. Researchers will often use
RMSE to convert the error metric back into similar units,
making interpretation easier. Since the MSE and RMSE
both square the residual, they are similarly affected by
outliers.

34
Copyright © 2022 TalentLabs Limited

35 Copyright © 2022 TalentLabs Limited

Predicting category of new observations

Let’s go through various classification algorithms:

1. k-nn ( K- nearest neighbours)

2. Decision Trees
3. Random Forest
4. Naive Bayes
5. Support Vector Machines
6. Logistic Regression

36
Copyright © 2022 TalentLabs Limited

Let say, we have a point at (40,60)

As we have 2 classes here, let’s assume k = 3 (k can be

any value as multiple of number of classes + 1)

As, we see 2 reds and 1 blue nearby the (40,60) mark, we

can assume the new datapoint as red class.

37
Copyright © 2022 TalentLabs Limited

It works on the principle of identifying the root

node, and creates a tree to traverse from top-
bottom approach to identify the right classes.

38
Copyright © 2022 TalentLabs Limited

Old Yes Software Down

Old No Software Down

Old No Hardware Down

Mid Yes Software Down

Mid Yes Hardware Down

Mid No Hardware Up

Mid No Software Up

New Yes Software Up

New No Hardware Up

New No Software Up
39
Copyright © 2022 TalentLabs Limited

Random Forest uses multiple

learning algorithms to obtain
a better predictive
performance. It uses decision
trees as base learners.

40
Copyright © 2022 TalentLabs Limited

© zepanalytics.com
Naive Bayes
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single
algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being
classified is independent of each other.

41
Copyright © 2022 TalentLabs Limited

An SVM model is basically a

representation of different classes in a
hyperplane in multidimensional space.
The hyperplane will be generated in an
iterative manner by SVM so that the error
can be minimized. The goal of SVM is to
divide the datasets into classes to find a
maximum marginal hyperplane (MMH).

42
Copyright © 2022 TalentLabs Limited

Logistic Regression is a supervised

Machine Learning algorithm and despite
the word ‘Regression’, it is used in binary
classification. By binary classification, it
meant that it can only categorize data as
1 (yes/success) or a 0 (no/failure). In other
words, we can say that the Logistic
Regression model predicts P(Y=1) as a
function of X.

43
Copyright © 2022 TalentLabs Limited

Cost = -(yact) ln (ypred) - (1-yact) ln (1 - ypred)

Cost = -ln ypred, where yact = 1

Cost = -ln (1-ypred), where yact = 0

44
Copyright © 2022 TalentLabs Limited

Useful for measuring recall, precision, specificity, accuracy &

AUC-ROC Curve.

False Positive: Type I Error

False Negative: Type II Error

45
Copyright © 2022 TalentLabs Limited

y ypred output for threshold TP TN FP FN Recall Precisio Accurac

0.6 n y

0 0.5 0

1 0.9 1

0 0.7 1

1 0.7 1 2 2 1 2 1/2 2/3 4/7

1 0.3 0

0 0.4 0

1 0.5 0

46
Copyright © 2022 TalentLabs Limited

47
Copyright © 2022 TalentLabs Limited

There are many issues that occur while you solve a

Classification problem, such as

1. Overfitting
2. Class Imbalance Problems

48
Copyright © 2022 TalentLabs Limited

49 Copyright © 2022 TalentLabs Limited

Clustering methods are a Machine Learning technique that involves

the grouping of data points. These methods are used to ﬁnd
similarity as well as the relationship patterns among data samples
and then cluster those samples into groups having similarity based
on features.

Types of Clustering Algorithms:

1. K-Means Clustering
2. Hierarchical Clustering
3. Mean-Shift Algorithm

50
Copyright © 2022 TalentLabs Limited

Pseudo Code:

1. Choose number of K Clusters

2. Select a random K points, the centroids (not
necessarily from the dataset)
3. Assign each data set point to the nearest centroid
→ That forms K clusters
4. Compute & place new centroid for each cluster
5. Reassign each data point to the closest centroid.
6. If any reassignment took place, go to Step 4, else
FINISH.

51
Copyright © 2022 TalentLabs Limited

Fish Poodle
Dogs Pug
Mammals … …
Cats
Animal
… Penguin
Flying
Emu
Birds
Flightless
Kiwi
…

52
Copyright © 2022 TalentLabs Limited

Pseudo Code:

1. Make each data point a single point cluster → N

Clusters
2. Take two closest data points & make 1 cluster → N
-1 Clusters
3. Take two closest clusters & make 1 → N - 2
Clusters
4. Repeat Step 3 until there is one cluster.

Pros:
HC shows all the possible linkages between clusters, & we understand
the data better.
Cons: Can’t handle big data

53
Copyright © 2022 TalentLabs Limited

Mean-shift algorithm basically assigns the data points

to the clusters iteratively by shifting points towards
the highest density of data points i.e. cluster
centroid.

The difference between K-Means algorithm and

Mean-Shift is that later one does not need to specify
the number of clusters in advance because the
number of clusters will be determined by
the algorithm w.r.t. data.

54
Copyright © 2022 TalentLabs Limited

55 Copyright © 2022 TalentLabs Limited

From Wikipedia:

Association rule learning is a rule-based machine

learning method for discovering interesting relations
between variables in large databases. It is
intended to identify strong rules discovered
databases using some measures of interest.

56
Copyright © 2022 TalentLabs Limited

Support: Lift: This says how likely item Y is purchased when item X is
This explains how important an itemset is, as measured by purchased, while controlling for how popular item Y is.
the proportion of transactions in which an itemset appears. Lift(Apple→Beer) =
Support(Apple) = 4/8 Support(Apple,Beer)/(Support(Apple)*Support(Beer))
= (⅜) / (4/8*6/8) = ⅜ * 64/24 = 1

Confidence:
This says how likely item Y is purchased when item X is
purchased, expressed as [X→Y]
Confidence(Apple→Beer) =
Support(Apple,Beer)/Support(Apple)
= (⅜)/(4/8) = ⅜*2 = ¾

57
Copyright © 2022 TalentLabs Limited

ECLAT: Equivalence Class Clustering and bottom-up Lattice Traversal

How the algorithm work?

• The basic idea is to use Transaction Id Sets(tidsets)

intersections to compute the support value of a
candidate and avoiding the generation of subsets
which do not exist in the preﬁx tree.
• Uses breadth first search and hence does not use a lot of
memory
• Computationally Faster than apriori
• In Apriori we need to add Support, Confidence But in Eclat only
we need to give Support.

58
Copyright © 2022 TalentLabs Limited

59
Copyright © 2022 TalentLabs Limited

60
Copyright © 2022 TalentLabs Limited

61 Copyright © 2022 TalentLabs Limited

Ensemble Learning is the mechanism to use multiple

algorithms together to have a better prediction than the
individual models.

Let say, we extract the features from a use case, and try to
check the model’s accuracy or behaviour individually.

Features → Decision Trees → 69%

Features → SVM → 75%
Features → Logistic Regression → 65%

So why use Ensemble Learning?

1. Better Accuracy (Low Error)
2. Higher Consistency (Avoids Overﬁtting)
3. Reduces Bias & Variance Errors

62
Copyright © 2022 TalentLabs Limited

When and Where do we use Ensemble Learning?

1. Single model overﬁts

2. Results worth the extra training
3. Can be used for classiﬁcation & regression both.

Popular Ensemble Methods

1. Bootstrap Aggregation (Bagging)

Bagging Algorithms:
a. Bagged Decision Trees
b. Random Forest
c. Extra Trees

63
Copyright © 2022 TalentLabs Limited

2. Boosting

Boosting is used to create a collection of predictors. In

this technique, learners are learned sequentially with early
learners fitting simple models to the data and then
analysing data for errors. Consecutive trees (random
sample) are fit and at every step, the goal is to
improve the accuracy from the prior tree. When an
input is misclassified by a hypothesis, its weight is
increased so that the next hypothesis is more likely to
classify it correctly. This process converts weak learners
into better performing model.

64
Copyright © 2022 TalentLabs Limited

Bagging:

1. Bagged Decision Trees

All features are used for splitting a node.
2. Random Forest
Subset of features are used at random out of the total
and the best split feature from the subset is used to
split each node in a tree,
3. Extra Trees

Boosting

1. AdaBoost
2. Stochastic Gradient Boosting

• Time series forecasting is an important area of

machine learning that is often neglected.
• It is important because there are so many prediction
problems that involve a time component and these
problems are often neglected because it is this time
component that makes time series problems more
difficult to handle.
• Before getting started with Time Series Analysis, let’s
get our basics clear on Anomaly Detection

It is a series of observations taken at specified times

basically at equal intervals. It is used to predict future
values based on past observed values.

Trend Trend
Seasonality Seasonality Irregularity CyclicCyclic
Irregularity

Increasing or A general systematic The data in the time Pattern exists when
Decreasing value in linear or (most often) series follows a data exhibit rises &
the series non-linear component temporal sequence, falls that are not of
that changes over but the measurements fixed period.
time and does repeat. might not happen at a
regular time interval.

Company Checks for Stationarity

There are many methods to check whether a time series is stationary or non-stationary.

1. Look at Plots: You can review a time series plot of your data and visually check if there are any
obvious trends or seasonality.
2. Summary Statistics: You can review the summary statistics for your data for seasons or random
partitions and check for obvious or significant differences.
3. Statistical Tests: You can use statistical tests to check if the expectations of stationarity are met
or have been violated.

You can split your time series into two (or more) partitions and compare the mean and variance of
each group. If they differ and the difference is statistically significant, the time series is likely non-
stationary.

Company Checks for Stationarity

Statistical tests make strong assumptions about your data. They can only be used to inform the degree
to which a null hypothesis can be rejected or fail to be reject. The result must be interpreted for a given
problem to be meaningful.

Nevertheless, they can provide a quick check and confirmatory evidence that your time series is
stationary or non-stationary.

ADF Test is otherwise known as unit root test.

Company Checks for Stationarity

The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary (has some
time-dependent structure). The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

● Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It
has some time dependent structure.
● Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root,
meaning it is stationary. It does not have time-dependent structure.

We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the
null hypothesis (stationary), otherwise a p-value above the threshold suggests we fail to reject the null hypothesis (non-
stationary).

● p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.
● p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.

Algorithms Holt’s Winter Exponential Smoothing

GARCH
SARIMA/SARIMAX
VAR
VARMA etc. etc.

Quick Link: https://machinelearningmastery.com/time-series-

forecasting-methods-in-python-cheat-sheet/

AR
MA AR MA ARMA ARIMA
ARIMA
ARMA

Auto regressive Moving Average Autoregressive Moving Autoregressive

Average (No Integrated Moving
differencing) Average

Company Checks for Stationarity

There are different techniques to find the right parameters for ARIMA(p,d,q)

• ACF/PACF Plots
• Grid Search
• Auto Arima

Let’s learn about these techniques in the next slides.

Company Checks for Stationarity

1. p – The lag value where the PACF chart crosses the upper confidence interval for the first time. If you notice
closely, in this case p=2.
2. q – The lag value where the ACF chart crosses the upper confidence interval for the first time. If you notice
closely, in this case q=2.
© zepanalytics.com
ACF/PACF plots are some
Company traditional methods of
Checks for Stationarity
obtaining p & q values, and
Grid Search are sometimes misleading,
hence we need to perform a
hyper parameter
optimization step in Time
Series Analysis to get the
optimum p,d & q values

© zepanalytics.com
Company Checks for Stationarity
Grid Search techniques are
Auto Arima manual ways, the same task
can be achieved in few lines
of coding and with a better
efficiency using Auto Arima

© zepanalytics.com
Quick Link:
Company https://facebook.github.io/prophet/do
Checks for Stationarity
cs/quick_start.html#python-api

Features:
Facebook Prophet
• Very fast
• An additive regression model where non-
linear trends are fit with yearly, weekly, and
daily seasonality, plus holiday effects
• Robust to missing data & shifts in trend, and
handles outliers automatically.
• Easy procedure to tweak & adjust forecast
while adding domain knowledge or business
insights.

© zepanalytics.com
Quick Link:
Company Checks for Stationarity
https://colah.github.io/posts/2015-08-
Understanding-LSTMs/

LSTMs Features:

LSTM cell in place of standard neural

network layers:

1. Input gate
2. Forget gate
3. Output gate

Two Techniques:

1. Feature Selection

a. Wrapper methods
i. Recursive feature elimination
ii. Successive feature selection
b. Filtering methods: IG, Chi Square,
Correlation Coefficient
c. Embedded methods: Decision Trees

2. Feature Extraction

a. Principal Component Analysis

Principal Component Analysis

• Principal Component Analysis (PCA) is basically

a dimensionality reduction algorithm, but it can
also be useful as a tool for visualization, noise
filtering, feature extraction & engineering, and
much more.

• PCA creates a visualization of data that

minimizes residual variance in the least squares
sense and maximizes the variance of the
projection coordinates

Linear Discriminant Analysis

• LDA is like PCA but it focuses on maximizing

separatibility among known categories

• LDA is a supervised algorithm

• Maximising distance between classes and

minimising spread/scatter within each category.

• LDA has substantially lower variance. This can

potentially lead to improved prediction
performance.

Quadratic Discriminant Analysis

• QDA (Quadratic Discriminant Analysis) is used to

find a non-linear boundary between classifiers.

• The more the classes are separable and the

more the distribution is normal, the better will be
the classification result for LDA and QDA.

Machine Learning models are composed of two different types of parameters:

- Hyperparameters = are all the parameters which can be arbitrarily set by the user before starting training
(eg. number of estimators in Random Forest).
- Model parameters = are instead learned during the model training (eg. weights in Neural Networks, Linear
Regression).

The model parameters define how to use input data to get the desired output and are learned at training
time. Instead, Hyperparameters determine how our model is structured in the first place.

Machine Learning models tuning is a type of optimization

problem. We have a set of hyperparameters and we aim
to find the right combination of their values which can
help us to find either the minimum (eg. loss) or the
maximum (eg. accuracy) of a function
This can be particularly important when comparing how
different Machine Learning models performs on a
dataset. In fact, it would be unfair for example to
compare an SVM model with the best Hyperparameters
against a Random Forest model which has not been
optimized.

The aim of hyperparameter optimization in machine learning is to find the hyperparameters of a given machine
learning algorithm that return the best performance as measured on a validation set.

f(x) - an objective score to minimize— such as RMSE or error rate— evaluated on the validation set;
x* - is the set of hyperparameters that yields the lowest value of the score
x - can take on any value in the domain X.
In simple terms, we want to find the model hyperparameters that yield the best score on the validation set
metric.

Following are four common methods of

hyperparameter optimization for machine
learning in order of increasing efficiency:

- Grid search
- Manual
- Random search
- Bayesian model-based optimization

Each time we try different hyperparameters, we have to train a model on the training data, make predictions on
the validation data, and then calculate the validation metric. With a large number of hyperparameters and
complex models such as ensembles or deep neural networks that can take days to train, this process quickly
becomes intractable to do by hand!Grid search and random search are slightly better than manual tuning
because we set up a grid of model hyperparameters and run the train-predict -evaluate cycle automatically in
a loop while we do more productive things .
Grid and random search are completely uninformed by past evaluations, and as a result, often spend a
significant amount of time evaluating “bad” hyperparameters.

For example, if we have the following graph with a lower

score being better, where does it make sense to concentrate
our search?
If you said below 200 estimators, then you already have the
idea of Bayesian optimization! We want to focus on the most
promising hyperparameters, and if we have a record of
evaluations, then it makes sense to use this information for
our next choice

Random and grid search pay no attention to past results at

all and would keep searching across the entire range of the
number of estimators even though it’s clear the optimal
answer (probably) lies in a small region!

max_depth → Longest path between root node n_estimators → Number of trees should we
and leaf node consider

min_sample_split → parameter that tells the max_samples → what fraction of original dataset is
decision tree in a random forest the minimum given to any individual tree. (Bootstrap sample
required no. of observations in any given node in fraction)
order to split it. Default value is 2.

max_features → How many features to use in each

max_terminal_nodes → Condition on the splitting tree in a RF.
the nodes in tree, and restricts growth of tree

Machine Learning
100% (3)
Machine Learning
46 pages
Communication Action Meaning - Pearce Cronen Searchable Cropped PDF
100% (1)
Communication Action Meaning - Pearce Cronen Searchable Cropped PDF
345 pages
Unit4_PPT (2)
No ratings yet
Unit4_PPT (2)
126 pages
Production & Cost Concepts - Managerial Economics
86% (14)
Production & Cost Concepts - Managerial Economics
80 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
UNIT-6
No ratings yet
UNIT-6
107 pages
Water Cooler Trainer
No ratings yet
Water Cooler Trainer
2 pages
Q_CELLS_Data_sheet_Q.PEAK_DUO_BLK_ML-G9_365-385_2021-05_Rev04_NA
No ratings yet
Q_CELLS_Data_sheet_Q.PEAK_DUO_BLK_ML-G9_365-385_2021-05_Rev04_NA
2 pages
Lind 18e Chap001 PPT
No ratings yet
Lind 18e Chap001 PPT
20 pages
Conservation Biology - Andrew S.pullin
No ratings yet
Conservation Biology - Andrew S.pullin
9,764 pages
GA& WIRING DRAWING OF MCC PANEL
No ratings yet
GA& WIRING DRAWING OF MCC PANEL
6 pages
Diploma in Astrology
No ratings yet
Diploma in Astrology
3 pages
Ceiling Mounted Sirocco Ventilating Fans: Features
No ratings yet
Ceiling Mounted Sirocco Ventilating Fans: Features
1 page
Diagramas Eléctricos HYUNDAI TUCSON AWD L4-2.4L 2015
No ratings yet
Diagramas Eléctricos HYUNDAI TUCSON AWD L4-2.4L 2015
75 pages
SEC Presentation
No ratings yet
SEC Presentation
22 pages
Mini Test 12
No ratings yet
Mini Test 12
2 pages
ML Super Imp
No ratings yet
ML Super Imp
19 pages
Syllabus of Applied Math in Cu
No ratings yet
Syllabus of Applied Math in Cu
98 pages
Session 1 Coding - Supervised Learning Recap and Code
No ratings yet
Session 1 Coding - Supervised Learning Recap and Code
25 pages
Fiche Econo 2
No ratings yet
Fiche Econo 2
14 pages
Ml Algo Terms
No ratings yet
Ml Algo Terms
11 pages
ML Combined
No ratings yet
ML Combined
254 pages
Gene Keys - Magical Contemplations
100% (8)
Gene Keys - Magical Contemplations
5 pages
Current Log
No ratings yet
Current Log
6 pages
1948-65-HD-FL-Panhead-Parts-Catalog
No ratings yet
1948-65-HD-FL-Panhead-Parts-Catalog
86 pages
Distributed Linear Regression Class Notes
No ratings yet
Distributed Linear Regression Class Notes
140 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
17 SPEAKING TOPICS
No ratings yet
17 SPEAKING TOPICS
5 pages
ML
No ratings yet
ML
9 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
A Broader Understanding of ML and Types of Regression
No ratings yet
A Broader Understanding of ML and Types of Regression
8 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
The Effect of Liquids On Mechanical Strength and Abrasiveness of Rocks
No ratings yet
The Effect of Liquids On Mechanical Strength and Abrasiveness of Rocks
15 pages
SML
No ratings yet
SML
8 pages
Combinepdf (4) Removed Removed
No ratings yet
Combinepdf (4) Removed Removed
97 pages
Ass bigd
No ratings yet
Ass bigd
9 pages
Unit 3
No ratings yet
Unit 3
55 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
ML QB
No ratings yet
ML QB
13 pages
Ml Notes All
No ratings yet
Ml Notes All
32 pages
CH 1
No ratings yet
CH 1
24 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Week 7. Intro to ML. Regression
No ratings yet
Week 7. Intro to ML. Regression
24 pages
sdl unit 1
No ratings yet
sdl unit 1
7 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
Homework 4 Solutions: 2.2 - The Matrix Representation of A Linear Transformation
No ratings yet
Homework 4 Solutions: 2.2 - The Matrix Representation of A Linear Transformation
4 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
machine learning
No ratings yet
machine learning
37 pages
Unit 1 Machine Learning - PDF Lands
No ratings yet
Unit 1 Machine Learning - PDF Lands
5 pages
Cambridge IGCSE: BIOLOGY 0610/61
No ratings yet
Cambridge IGCSE: BIOLOGY 0610/61
16 pages
Learn Rails 2
100% (1)
Learn Rails 2
408 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Taco 0011 Circulator Pump
No ratings yet
Taco 0011 Circulator Pump
2 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Unit1 6thsemCS
No ratings yet
Unit1 6thsemCS
22 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
RT Procedure Rev01E
No ratings yet
RT Procedure Rev01E
20 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Aroma User Manual English
No ratings yet
Aroma User Manual English
38 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Final ML
No ratings yet
Final ML
2 pages
Commonly Used Machine Learning Algorithms (With Python and R Codes)
No ratings yet
Commonly Used Machine Learning Algorithms (With Python and R Codes)
19 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Mercer 3000psi-Test-Stand-IOM-Manual
No ratings yet
Mercer 3000psi-Test-Stand-IOM-Manual
30 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Asco Long Life Valves Catalog
No ratings yet
Asco Long Life Valves Catalog
4 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Sales Enablement - Slim - EP1
No ratings yet
Sales Enablement - Slim - EP1
75 pages
GTS-100N: Electronic Total Station Series GTS-102N GTS-105N
No ratings yet
GTS-100N: Electronic Total Station Series GTS-102N GTS-105N
20 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
5.1 Productivity Engineering and Management Part 1 - BAGULBAGUL
No ratings yet
5.1 Productivity Engineering and Management Part 1 - BAGULBAGUL
28 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
38 pages
Supervised Learning
No ratings yet
Supervised Learning
3 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Essentials of Machine Learning Algorithms
No ratings yet
Essentials of Machine Learning Algorithms
15 pages
Determination of Viscosity Through Brookfield Viscometer.
No ratings yet
Determination of Viscosity Through Brookfield Viscometer.
6 pages
DeepSeek AI A Step by Step Guide
From Everand
DeepSeek AI A Step by Step Guide
Nikiforos Kontopoulos
No ratings yet
DeepSeek AI from Beginner to Paid Professional: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
From Everand
DeepSeek AI from Beginner to Paid Professional: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
Bolakale Aremu
No ratings yet
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
From Everand
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
alasdair gilchrist
4.5/5 (5)
Mastering DeepSeek AI: Unlocking the Power of Next-Generation Artificial Intelligence
From Everand
Mastering DeepSeek AI: Unlocking the Power of Next-Generation Artificial Intelligence
Mustaque Mohammed
No ratings yet
Software Developer: How to Use Your Programming Skills to Build a Business
From Everand
Software Developer: How to Use Your Programming Skills to Build a Business
Daniel Shore
No ratings yet