ML - Module 4
ML - Module 4
It is just a guess based on some known facts but has not yet been proven.
A good hypothesis is testable, which results in either true or false.
Hypothesis are statement about the given problem.
Hypothesis (h) in ML: It is defined as the approximate function that best describes
the target in supervised machine learning algorithms. It is primarily based on data as
well as bias and restrictions applied to data.
Example: You say an average student in the class is 30 or a boy is taller than girls. All
those are an example in which we assume or need some statistic way to prove those.
We need some mathematical conclusion whatever we are assuming is true.
P-value: The p-value in statistics is defined as the evidence against a null hypothesis.
In other words, P-value is the probability that a random chance generated the data
or something else that is equal or rarer under the null hypothesis condition.
If the p-value is smaller than the chosen significance level, the evidence will
be stronger, and vice-versa which means the null hypothesis can be rejected
in testing. It is always represented in a decimal form, such as 0.035.
Ensemble Methods
Ensemble method: Ensemble method is machine learning technique that use the
combined output of two or more models/weak learners and solve a particular
computational intelligence problem.
5 4 5 4 4 4
Averaging: Similar to the max voting technique, multiple predictions are made for
each data point in averaging. In this method, we take an average of predictions from
all the models and use it to make the final prediction. Averaging can be used for
making predictions in regression problems or while calculating probabilities for
classification problems.
Example: in the below case, the averaging method would take the average of all the
values.
5 4 5 4 4 4.4
The Weighted Average: In the weighted average ensemble method, data scientists
assign different weights to all the models in order to make a prediction, where the
assigned weight defines the relevance of each model.
rating 5 4 5 4 4 4.41
Aggregation: This is a step that involves the process of combining the output
of all base models and, based on their output, predicting an aggregate result
with greater accuracy and reduced variance.
Example: In the Random Forest method, predictions from multiple decision trees are
ensembled parallelly. Further, in regression problems, we use an average of these
predictions to get the final output, whereas, in classification problems, the model is
selected as the predicted class.
Boosting : Boosting is an ensemble method that enables each member to learn from
the preceding member's mistakes and make better predictions for the future. Unlike
the bagging method, in boosting, all base learners (weak) are arranged in a
sequential format so that they can learn from the mistakes of their preceding
learner. Hence, in this way, all weak learners get turned into strong learners and
make a better predictive model with significantly improved performance.
Algorithm:
Step 1 : Initialise the dataset and assign equal weight to each of the data point.
Step 2 : Provide this as input to the model and identify the wrongly classified data
points.
Step 3 : Increase the weight of the wrongly classified data points.
Step 4 : if (got required results).
Goto step 5
else
Goto step 2
Step 5 : End
Gradient Boosting: Just like AdaBoost, Gradient Boost also combines a no. of weak
learners to form a strong learner. Here, the residual of the current classifier becomes
the input for the next consecutive classifier on which the trees are built, and hence it
is an additive model. The residuals are captured in a step-by-step manner by the
classifiers, in order to capture the maximum variance within the data, this is done by
introducing the learning rate to the classifiers.
By this method, we are slowly inching in the right direction towards better prediction
(This is done by identifying negative gradient and moving in the opposite direction to
reduce the loss, hence it is called Gradient Boosting in line with Gradient Descent
where similar logic is employed). Thus, by no. of classifiers, we arrive at a predictive
value very close to the observed value. The Gradient Boosting makes a new
prediction by simply adding up the predictions (of all trees).
Algorithm:
Step 3: Calculate the residual of this decision tree, Save residual errors as the new y
Step 4: Repeat Step 1 (until the number of trees we set to train is reached)
ii. The trees are usually grown as ii. The trees are grown to a greater
decision stumps. depth usually ranging from 8 to 32
terminal nodes.
iii. Each classifier has different weights iii. All classifiers are weighed equally
assigned to the final prediction and their predictive capacity is
based on its performance. restricted with learning rate to
increase accuracy.
iv. It gives weights to both classifiers iv. It builds trees on previous classifier’s
and observations thus capturing residuals thus capturing variance in
maximum variance within data. data.