0% found this document useful (0 votes)
14 views

ML - Module 4

Uploaded by

Amritesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ML - Module 4

Uploaded by

Amritesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module 4

Hypothesis testing, Ensemble Methods, Bagging Adaboost Gradient Boosting,


Clustering, Kmeans, K-medoids, Density-based Hierarchical, Spectral .
Hypothesis testing
Hypothesis
Hypothesis: The hypothesis is defined as the supposition or proposed explanation
based on insufficient evidence or assumptions.

It is just a guess based on some known facts but has not yet been proven.
A good hypothesis is testable, which results in either true or false.
Hypothesis are statement about the given problem.

Hypothesis (h) in ML: It is defined as the approximate function that best describes
the target in supervised machine learning algorithms. It is primarily based on data as
well as bias and restrictions applied to data.

Example: You say an average student in the class is 30 or a boy is taller than girls. All
those are an example in which we assume or need some statistic way to prove those.
We need some mathematical conclusion whatever we are assuming is true.

Hypothesis Testing: Hypothesis testing is a statistical method that is used in making a


statistical decision using experimental data. It is basically an assumption that we
make about a population parameter.

It evaluates two mutually exclusive statements about a population to


determine which statement is best supported by the sample data.

Parameters of hypothesis testing


Null Hypothesis (H0): In statistics, the null hypothesis is a general given statement or
default position that there is no relationship between two measured cases or no
relationship among groups.

A null hypothesis is a type of statistical hypothesis which tells that there is no


statistically significant effect exists in the given set of observations.
In other words, it is a basic assumption or made based on the problem
knowledge.
It is also known as conjecture and is used in quantitative analysis to test
theories about markets, investment, and finance to decide whether an idea is
true or false.

Example: A company production is = 50 unit/per day etc.


Alternative Hypothesis: An alternative hypothesis is a direct contradiction of the null
hypothesis, which means if one of the two hypotheses is true, then the other must
be false.

In other words, an alternative hypothesis is a type of statistical hypothesis


which tells that there is some significant effect that exists in the given set of
observations.

Example: A company production is not equal to 50 unit/per day etc.

Level of significance: It refers to the degree of significance in which we accept or


reject the null-hypothesis. 100% accuracy is not possible for accepting a hypothesis,
so we, therefore, select a level of significance that is usually 5%. This is normally
denoted with α and generally, it is 0.05 or 5%, which means your output should be
95% confident to give similar kind of result in each sample.

P-value: The p-value in statistics is defined as the evidence against a null hypothesis.
In other words, P-value is the probability that a random chance generated the data
or something else that is equal or rarer under the null hypothesis condition.

If the p-value is smaller than the chosen significance level, the evidence will
be stronger, and vice-versa which means the null hypothesis can be rejected
in testing. It is always represented in a decimal form, such as 0.035.

Ensemble Methods
Ensemble method: Ensemble method is machine learning technique that use the
combined output of two or more models/weak learners and solve a particular
computational intelligence problem.

Ensemble Model: An ensemble model is a machine learning model that combines


the predictions from two or more models.

Example: A Random Forest algorithm is an ensemble of various decision trees


combined.

Simple Ensemble Techniques


Max Voting: In this technique, multiple models are used to make predictions for
each data point. The predictions by each model are considered as a ‘vote’. The
predictions which we get from the majority of the models are used as the final
prediction. The max voting method is generally used for classification problems.
Example: The result of max voting would be something like this:

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4

Averaging: Similar to the max voting technique, multiple predictions are made for
each data point in averaging. In this method, we take an average of predictions from
all the models and use it to make the final prediction. Averaging can be used for
making predictions in regression problems or while calculating probabilities for
classification problems.

Example: in the below case, the averaging method would take the average of all the
values.

i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4.4

The Weighted Average: In the weighted average ensemble method, data scientists
assign different weights to all the models in order to make a prediction, where the
assigned weight defines the relevance of each model.

Example: The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +


(4*0.18)] = 4.41.

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

weight 0.23 0.23 0.18 0.18 0.18

rating 5 4 5 4 4 4.41

Advanced Ensemble Methods


Bagging: Bagging is a method of ensemble modeling, which is primarily used to solve
supervised machine learning problems. It is generally completed in two steps as
follows:
Bootstrapping: It is a random sampling method that is used to derive samples
from the data using the replacement procedure. In this method, first, random
data samples are fed to the primary model, and then a base learning
algorithm is run on the samples to complete the learning process.

Aggregation: This is a step that involves the process of combining the output
of all base models and, based on their output, predicting an aggregate result
with greater accuracy and reduced variance.

Example: In the Random Forest method, predictions from multiple decision trees are
ensembled parallelly. Further, in regression problems, we use an average of these
predictions to get the final output, whereas, in classification problems, the model is
selected as the predicted class.

Boosting : Boosting is an ensemble method that enables each member to learn from
the preceding member's mistakes and make better predictions for the future. Unlike
the bagging method, in boosting, all base learners (weak) are arranged in a
sequential format so that they can learn from the mistakes of their preceding
learner. Hence, in this way, all weak learners get turned into strong learners and
make a better predictive model with significantly improved performance.

Boosting is an ensemble modeling technique that attempts to build a strong


classifier from the number of weak classifiers. It is done by building a model
by using weak models in series. Firstly, a model is built from the training data.
Then the second model is built which tries to correct the errors present in the
first model. This procedure is continued and models are added until either the
complete training data set is predicted correctly or the maximum number of
models are added.

Stacking: Stacking is one of the popular ensemble modeling techniques in machine


learning. Various weak learners are ensembled in a parallel manner in such a way
that by combining them with Meta learners, we can predict better predictions for the
future. This ensemble technique works by applying input of combined multiple weak
learners' predictions and Meta learners so that a better output prediction model can
be achieved.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to


learn how to best combine the input predictions to make a better output prediction.

Stacking is also known as a stacked generalization. The new model is stacked up on


top of the others; this is the reason why it is named stacking.
Blending: Blending is a similar approach to stacking with a specific configuration. It is
considered a stacking method that uses k-fold cross-validation to prepare out-of-
sample predictions for the meta-model. In this method, the training dataset is first to
split into different training sets and validation sets then we train learner models on
the training sets. Further, predictions are made on the validation set and sample set,
where validation predictions are used as features to build a new model, which is later
used to make final predictions on the test set using the prediction values as features.

Bagging Adaboost Gradient Boosting


AdaBoost: AdaBoost was the first really successful boosting algorithm developed
for the purpose of binary classification. AdaBoost is short for Adaptive Boosting and
is a very popular boosting technique that combines multiple “weak classifiers” into
a single “strong classifier”.

This is a type of ensemble technique, where a number of weak learners are


combined together to form a strong learner. Here, usually, each weak
learner is developed as decision stumps (A stump is a tree with just a single
split and two terminal nodes) that are used to classify the observations.
It was formulated by Yoav Freund and Robert Schapire. They also won the
2003 Gödel Prize for their work.

Algorithm:

Step 1 : Initialise the dataset and assign equal weight to each of the data point.
Step 2 : Provide this as input to the model and identify the wrongly classified data
points.
Step 3 : Increase the weight of the wrongly classified data points.
Step 4 : if (got required results).
Goto step 5
else
Goto step 2
Step 5 : End
Gradient Boosting: Just like AdaBoost, Gradient Boost also combines a no. of weak
learners to form a strong learner. Here, the residual of the current classifier becomes
the input for the next consecutive classifier on which the trees are built, and hence it
is an additive model. The residuals are captured in a step-by-step manner by the
classifiers, in order to capture the maximum variance within the data, this is done by
introducing the learning rate to the classifiers.

By this method, we are slowly inching in the right direction towards better prediction
(This is done by identifying negative gradient and moving in the opposite direction to
reduce the loss, hence it is called Gradient Boosting in line with Gradient Descent
where similar logic is employed). Thus, by no. of classifiers, we arrive at a predictive
value very close to the observed value. The Gradient Boosting makes a new
prediction by simply adding up the predictions (of all trees).

Algorithm:

Step 1: Train a decision tree


Step 2: Apply the decision tree just trained to predict

Step 3: Calculate the residual of this decision tree, Save residual errors as the new y

Step 4: Repeat Step 1 (until the number of trees we set to train is reached)

Step 5: Make the final prediction

Comparison between AdaBoost and Gradient Boost

Adaboost Gradient Boost

i. An additive model where i. An additive model where


shortcomings of previous models shortcomings of previous models
are identified by high-weight data are identified by the gradient.
points.

ii. The trees are usually grown as ii. The trees are grown to a greater
decision stumps. depth usually ranging from 8 to 32
terminal nodes.

iii. Each classifier has different weights iii. All classifiers are weighed equally
assigned to the final prediction and their predictive capacity is
based on its performance. restricted with learning rate to
increase accuracy.

iv. It gives weights to both classifiers iv. It builds trees on previous classifier’s
and observations thus capturing residuals thus capturing variance in
maximum variance within data. data.

You might also like