MLQB Unit 3
MLQB Unit 3
PART A (2 MARKS)
• Answer: Voting in ensemble learning aims to combine the predictions of multiple models to
make a final decision, often using a majority vote.
13. In bagging, how are different subsets of the training data created for each model?
[CO3,K1]
• Answer: Different subsets are created through bootstrap sampling, where instances are randomly
selected with replacement from the original training data.
14. Provide an example scenario where boosting might be beneficial in ensemble learning.
[CO3,K2]
• Answer: Boosting is beneficial when dealing with weak learners, improving their performance
sequentially by giving more emphasis to misclassified instances.
15. How does stacking contribute to the diversity of models in an ensemble? [CO3,K1]
• Answer: Stacking leverages diverse base models by training a meta-model to combine their
predictions, capturing different perspectives and improving overall performance.
16. What is the main difference between supervised and unsupervised learning? [CO3,K1]
• Answer: In supervised learning, models are trained on labeled data with known outputs, while
unsupervised learning deals with unlabeled data, focusing on discovering patterns and structures.
17. Explain the concept of centroids in the K-means clustering algorithm. [CO3,K1]
• Answer: Centroids in K-means are the representative points for each cluster, calculated as the
mean of all data points assigned to that cluster.
• Answer: The class of a new instance is determined by the majority class among its k-nearest
neighbors based on a distance metric.
19. How does GMM handle uncertainty in cluster assignments compared to K-means?
[CO3,K1]
• Answer: GMM assigns probabilities to data points belonging to each cluster, providing a more
nuanced and probabilistic view of cluster assignments compared to the hard assignments in K-
means.
Answer: EM is crucial for estimating the parameters of GMMs. It iteratively updates means,
covariances, and weights to maximize the likelihood of the data, making GMMs effective for
modeling complex distributions.
PART B [16 MARKS]
• Different Hyper-parameters: We can use the same learning algorithm but use it with different
hyper-parameters.
1. Multiexpert combination.
• Multiexpert combination methods have base-learners that work in parallel.
a) Global approach (learner fusion): given an input, all base-learners generate an output and
all these outputs are used, such as voting and stacking and to
b) Local approach (learner selection): in mixture of experts, there is a gating model, which
looks at the input and chooses one (or very few) of the learners as responsible for generating
the output.
Voting
• The simplest way to combine multiple classifiers is by voting, which corresponds to taking a
linear combination of the learners. Voting is an ensemble machine learning algorithm.
yi = Σtj=1 Wijdj
and then we choose the class with the highest Yi.
• One problem with ECOC is that because the code matrix W is set a priori, there is Porno
guarantee that the subtasks as defined by the columns of W will be simple.
Ensemble Learning
• The idea of ensemble learning is to employ multiple learners and combine their predictions.
If we have a committee of M models with uncorrelated errors, simply by averaging them the
average error of a model can be reduced by a factor of M.
1. Variance reduction: If the training sets are completely independent, it will always helps to
average an ensemble because this will reduce variance without affecting bias (e.g., bagging)
and reduce sensitivity to individual data points.
2. Bias reduction: For simple models, average of models has much greater capacity than
single model Averaging models can reduce bias substantially by increasing capacity and
control variance by Citting one component at a time.
Bagging
• Bagging is also called Bootstrap aggregating. Bagging and boosting are meta - algorithms
that pool decisions from multiple classifiers. It creates ensembles feed by repeatedly randomly
resampling the training data.
Pseudocode:
1. Given training data (x1, y1), ..., (xm, Ym)
2. For t = 1,..., T:
a. Form bootstrap replicate dataset St by selecting m random examples from the training set
with replacement.
b. Let ht be the result of training base learning algorithm on St.
3. Output combined classifier:
H(x) = majority (h1(x), ..., hT (x)).
Boosting
• Boosting is a very different method to generate multiple predictions (function mob estimates)
and combine them linearly. Boosting refers to a general and provably effective method of
producing a very accurate classifier by combining rough and moderately inaccurate rules of
thumb.
AdaBoost:
• AdaBoost, short for "Adaptive Boosting", is a machine learning meta - algorithm formulated
by Yoav Freund and Robert Schapire who won the prestigious "Gödel Prize" in 2003 for their
work. It can be used in conjunction with many other types of learning algorithms to improve
their performance.
Stacking
3. Use clustering in real-world scenarios for tasks like customer segmentation, image
segmentation, or document categorization. [CO3,K3]
The output from a clustering algorithm is basically a statistical description of the cluster
centroids with the number of components in each cluster.
• Cluster centroid: The centroid of a cluster is a point whose parameter values are the mean
of the parameter values of all the points in the cluster. Each cluster has a well defined centroid.
• Distance: The distance between two points is taken as a common metric to as see the
similarity among the components of population. The commonly used distance measure is the
euclidean metric which defines the distance between two points
1. Efficient in computation
2. Easy to implement.
• Weaknesses
K-Nearest Neighbour is one of the only Machine Learning algorithms based totally on
supervised learning approach. K-NN algorithm assumes the similarity between the brand new
case/facts and available instances
• K-NN algorithm assumes the similarity between the brand new case/facts and available
instances and placed the brand new case into the category that is maximum similar to the to be
had classes.
• K-NN set of rules shops all of the to be had facts and classifies a new statistics point based
at the similarity. This means when new data seems then it may be effortlessly categorised into
a properly suite class by using K-NN algorithm.
• K-NN set of rules can be used for regression as well as for classification however normally
it's miles used for the classification troubles.
Why Do We Need KNN?
• Suppose there are two categories, i.e., category A and category B and we've a brand new
statistics point x1, so this fact point will lie within of these classes. To solve this sort of
problem, we need a K-NN set of rules. With the help of K-NN, we will without difficulty
discover the category or class of a selected dataset. Consider the underneath diagram:
5. Analyze the role of parameters in GMM, including the means, covariances, and
weights of the individual Gaussian components. [CO3,K4]
Gaussian Mixture Models is a "soft" clustering algorithm, where each point probabilistically
"belongs" to all clusters. This is different than k-means where each point belongs to one
cluster.
Expectation-maximization
• Expectation is used to find the Gaussian parameters which are used to represent each component
of gaussian mixture models. Maximization is termed M and it is involved in determining whether
new data points can be added or not.
• The goal of the algorithm is to find the parameter vector that maximizes the (gpie-likelihood of
the observed values of X, L( ϕ| X).
• But in cases where this is not feasible, we associate the extra hidden variables Z and express the
underlying model using both, to maximize the likelihood of the joint distribution of X and Z, the
complete likelihood Lc ( | X,Z).
• EM alternates between performing an expectation (E) step, which computes an NOV expectation
of the likelihood by including the latent variables as if they were observed, and maximization (M)
step, which computes the maximum likelihood estimates of the parameters by maximizing the
expected likelihood found in the E step.
• Expectation-Maximization (EM) is a technique used in point estimation. Given a set of
observable variables X and unknown (latent) variables Z we want to estimate parameters Ѳ in a
model.
• EM is useful for several reasons: conceptual simplicity, ease of implementation, and the fact that
each iteration improves 1(0). The rate of convergence on the first few steps is typically quite good,
but can become excruciatingly slow as you approach local optima.
7. Explain the concept of Ensemble Learning, highlighting the model combination schemes.
Discuss the advantages and limitations of Voting as a model combination scheme.
[CO3,K2]
Advantages of Voting:
• Simple to implement.
• Effective when combining diverse models.
Limitations of Voting:
8.Compare and contrast the Bagging and Boosting ensemble techniques. Provide insights
into scenarios where each technique excels. [CO3,K4]
Answer: Bagging (Bootstrap Aggregating) and Boosting are both ensemble techniques.
Bagging:
Boosting:
Scenarios:
9.Explore the principles behind Stacking in ensemble learning. Discuss the process of
model stacking and the advantages it offers. [CO3,K4]
Answer: Stacking involves training multiple models, and a meta-model is trained to combine
their predictions. The process includes:
Advantages of Stacking:
10.Discuss the K-means clustering algorithm in unsupervised learning. Explain the steps
involved, including initialization and convergence. Highlight potential challenges in using
K-means. [CO3,K6]
Challenges: