0% found this document useful (0 votes)
9 views

MLQB Unit 3

Machine learning notes

Uploaded by

dummydummi009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

MLQB Unit 3

Machine learning notes

Uploaded by

dummydummi009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT – 3

ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING

PART A (2 MARKS)

Q.1 Define unsupervised learning. [CO3,K1]


Ans.: In an unsupervised learning, the network adapts purely in response to its inputs. Such
networks can learn to pick out structure in their input.
Q.2 Describe semi-supervised learning. [CO3,K1]
Ans.: Semi-supervised learning uses both labeled and unlabeled data to improve supervised
learning.
Q.3 Identify ensemble method. [CO3,K1]
Ans.: Ensemble methods is a machine learning technique that combines several base models in
order to produce one optimal predictive model. It combine the insights obtained from multiple
learning models to facilitate accurate and improved decisions.
Q.4 What is cluster? [CO3,K1]
Ans.: Cluster is a group of objects that belong to the same class. In other words the similar object
are grouped in one cluster and dissimilar are grouped in other cluster.
Q.5 Explain clustering. [CO3,K2]
Ans.: Clustering is a process of partitioning a set of data in a set of meaningful subclasses. Every
data in the subclass shares a common trait. It helps a user understand the natural grouping or
structure in a data set.
Q.6 What is Bagging? [CO3,K1]
Ans.: Bagging is also known as Bootstrap aggregation, ensemble method works by training
multiple models independently and combining later to result in a strong model.
Q.7 Recall boosting. [CO3,K1]
Ans.:Boosting refers to a group of algorithms that utilize weighted averages to make weak learning
algorithms stronger learning algorithms
Q.8 What is K-Nearest Neighbour Methods? [CO3,K1]
Ans.: The K-Nearest Neighbor (KNN) is a classical classification method and requires no training
effort, critically depends on the quality of the distance measures among examples.
The KNN classifier uses mahalanobis distance function. A sample is classified according to the
majority vote of the its nearest K training samples in the feature space. Distance of a sample to its
neighbors is defined using a distance function.
Q.9 Which are the performance factors that influence KNN algorithm? [CO3,K1]
Ans.: The performance of the KNN algorithm is influenced by three main factors:
1. The distance function or distance metric used to determine the nearest neighbors.
2. The decision rule used to derive a classification from the K-nearest neighbors.
3. The number of neighbors used to classify the new example.
Q.10 Recognize K-means clustering [CO3,K1]
Ans.: k-means clustering is heuristic method. Here each cluster is represented by the center of the
cluster. The k-means algorithm takes the input parameter, k, and partitions a set of n objects into
k-clusters so that the resulting intracluster similarity is high but the intercluster similarity is low.

11. Define the Expectation-Maximization (EM) algorithm in the context of Gaussian


Mixture Models (GMM). [CO3,K1]

• Answer: The EM algorithm is an iterative method for estimating parameters in GMMs. It


involves an expectation step (E-step) and a maximization step (M-step) to update parameters and
maximize the likelihood of the data.

12. What is the primary objective of voting in ensemble learning? [CO3,K1]

• Answer: Voting in ensemble learning aims to combine the predictions of multiple models to
make a final decision, often using a majority vote.

13. In bagging, how are different subsets of the training data created for each model?
[CO3,K1]

• Answer: Different subsets are created through bootstrap sampling, where instances are randomly
selected with replacement from the original training data.

14. Provide an example scenario where boosting might be beneficial in ensemble learning.
[CO3,K2]
• Answer: Boosting is beneficial when dealing with weak learners, improving their performance
sequentially by giving more emphasis to misclassified instances.

15. How does stacking contribute to the diversity of models in an ensemble? [CO3,K1]

• Answer: Stacking leverages diverse base models by training a meta-model to combine their
predictions, capturing different perspectives and improving overall performance.

16. What is the main difference between supervised and unsupervised learning? [CO3,K1]

• Answer: In supervised learning, models are trained on labeled data with known outputs, while
unsupervised learning deals with unlabeled data, focusing on discovering patterns and structures.

17. Explain the concept of centroids in the K-means clustering algorithm. [CO3,K1]

• Answer: Centroids in K-means are the representative points for each cluster, calculated as the
mean of all data points assigned to that cluster.

18. In KNN, how is the class of a new instance determined? [CO3,K1]

• Answer: The class of a new instance is determined by the majority class among its k-nearest
neighbors based on a distance metric.

19. How does GMM handle uncertainty in cluster assignments compared to K-means?
[CO3,K1]

• Answer: GMM assigns probabilities to data points belonging to each cluster, providing a more
nuanced and probabilistic view of cluster assignments compared to the hard assignments in K-
means.

20. What is the significance of the Expectation-Maximization (EM) algorithm in Gaussian


Mixture Models (GMM)? [CO3,K1]

Answer: EM is crucial for estimating the parameters of GMMs. It iteratively updates means,
covariances, and weights to maximize the likelihood of the data, making GMMs effective for
modeling complex distributions.
PART B [16 MARKS]

1. Explain Combining Multiple Learners in detail. [CO3,K2]

Combining Multiple Learners


• When designing a learning machine, we generally make some choices like parameters of
machine, training data, representation, etc. This implies some sort of variance in performance.
For example, in a classification setting, we can use a parametric classifier or in a multilayer
perceptron, we should also decide on the number of hidden units.

1. Generating Diverse Learners:


• Different Algorithms: We can use different learning algorithms to train different base-
learners. Different algorithms make different assumptions about the data and lead to different
classifiers.

• Different Hyper-parameters: We can use the same learning algorithm but use it with different
hyper-parameters.

• Different Input Representations: Different representations make different characteristics


explicit allowing better identification.

• Different Training Sets: Another possibility is to train different base-learners by different


subsets of the training set.

Model Combination Schemes


• Different methods are used for generating final output for multiple base learners are
Multiexpert and multistage combination.

1. Multiexpert combination.
• Multiexpert combination methods have base-learners that work in parallel.

a) Global approach (learner fusion): given an input, all base-learners generate an output and
all these outputs are used, such as voting and stacking and to
b) Local approach (learner selection): in mixture of experts, there is a gating model, which
looks at the input and chooses one (or very few) of the learners as responsible for generating
the output.
Voting
• The simplest way to combine multiple classifiers is by voting, which corresponds to taking a
linear combination of the learners. Voting is an ensemble machine learning algorithm.

Error-Correcting Output Codes


• In Error-Correcting Output Codes main classification task is defined in terms of a number of
subtasks that are implemented by the base-learners. The idea is that the original task of
separating one class from all other classes may be a difficult problem.

• Voting scheme are

yi = Σtj=1 Wijdj
and then we choose the class with the highest Yi.

• One problem with ECOC is that because the code matrix W is set a priori, there is Porno
guarantee that the subtasks as defined by the columns of W will be simple.

2. Apply Ensemble Learning to real-world classification and regression tasks,


leveraging the predictive power of multiple models. [CO3,K3]

Ensemble Learning

• The idea of ensemble learning is to employ multiple learners and combine their predictions.
If we have a committee of M models with uncorrelated errors, simply by averaging them the
average error of a model can be reduced by a factor of M.

• Based on one of two basic observations :

1. Variance reduction: If the training sets are completely independent, it will always helps to
average an ensemble because this will reduce variance without affecting bias (e.g., bagging)
and reduce sensitivity to individual data points.
2. Bias reduction: For simple models, average of models has much greater capacity than
single model Averaging models can reduce bias substantially by increasing capacity and
control variance by Citting one component at a time.
Bagging
• Bagging is also called Bootstrap aggregating. Bagging and boosting are meta - algorithms
that pool decisions from multiple classifiers. It creates ensembles feed by repeatedly randomly
resampling the training data.

Pseudocode:
1. Given training data (x1, y1), ..., (xm, Ym)
2. For t = 1,..., T:
a. Form bootstrap replicate dataset St by selecting m random examples from the training set
with replacement.
b. Let ht be the result of training base learning algorithm on St.
3. Output combined classifier:
H(x) = majority (h1(x), ..., hT (x)).

Boosting
• Boosting is a very different method to generate multiple predictions (function mob estimates)
and combine them linearly. Boosting refers to a general and provably effective method of
producing a very accurate classifier by combining rough and moderately inaccurate rules of
thumb.

AdaBoost:
• AdaBoost, short for "Adaptive Boosting", is a machine learning meta - algorithm formulated
by Yoav Freund and Robert Schapire who won the prestigious "Gödel Prize" in 2003 for their
work. It can be used in conjunction with many other types of learning algorithms to improve
their performance.

Stacking

• Stacking, sometimes called stacked generalization, is an ensemble machine learning method


that combines multiple heterogeneous base or component models via a meta-model.

3. Use clustering in real-world scenarios for tasks like customer segmentation, image
segmentation, or document categorization. [CO3,K3]
The output from a clustering algorithm is basically a statistical description of the cluster
centroids with the number of components in each cluster.
• Cluster centroid: The centroid of a cluster is a point whose parameter values are the mean
of the parameter values of all the points in the cluster. Each cluster has a well defined centroid.

• Distance: The distance between two points is taken as a common metric to as see the
similarity among the components of population. The commonly used distance measure is the
euclidean metric which defines the distance between two points

p= (P1, P2,...) and q = (q1,q2,...) is given by,


d = Σ ki=1 (pi - qi)2
K-Means Algorithm Properties
1. There are always K clusters.
2. There is always at least one item in each cluster.
3. The clusters are non-hierarchical and they do not overlap.
4. Every member of a cluster is closer to its cluster than any other cluster because closeness
does not always involve the 'center' of clusters.
The K-Means Algorithm Process
1. The dataset is partitioned into K clusters and the data points are randomly assigned to the
clusters resulting in clusters that have roughly the same number of data points.
2. For each data point.
a. Calculate the distance from the data point to each cluster.
b. If the data point is closest to its own cluster, leave it where it is.
c. If the data point is not closest to its own cluster, move it into the closest cluster.
3. Repeat the above step until a complete pass through all the data points results in no data
point moving from one cluster to another. At this point the clusters are stable and the clustering
process ends.
4. The choice of initial partition can greatly affect the final clusters that result, in terms of inter-
cluster and intracluster distances and cohesion.
• K-means algorithm is iterative in nature. It converges, however only a local minimum is
obtained. It works only for numerical data. This method easy to implement.

• Advantages of K-Means Algorithm:

1. Efficient in computation
2. Easy to implement.
• Weaknesses

1. Applicable only when mean is defined.


2. Need to specify K, the number of clusters, in advance.
3. Trouble with noisy data and outliers.

4. Not suitable to discover clusters with non-convex shapes.

4. Explain Instance Based Learning: KNN(K-Nearest Neighbour) in detail. [CO3,K2]

K-Nearest Neighbour is one of the only Machine Learning algorithms based totally on
supervised learning approach. K-NN algorithm assumes the similarity between the brand new
case/facts and available instances

Instance Based Learning: KNN


• K-Nearest Neighbour is one of the only Machine Learning algorithms based totally on
supervised learning approach.

• K-NN algorithm assumes the similarity between the brand new case/facts and available
instances and placed the brand new case into the category that is maximum similar to the to be
had classes.

• K-NN set of rules shops all of the to be had facts and classifies a new statistics point based
at the similarity. This means when new data seems then it may be effortlessly categorised into
a properly suite class by using K-NN algorithm.

• K-NN set of rules can be used for regression as well as for classification however normally
it's miles used for the classification troubles.
Why Do We Need KNN?
• Suppose there are two categories, i.e., category A and category B and we've a brand new
statistics point x1, so this fact point will lie within of these classes. To solve this sort of
problem, we need a K-NN set of rules. With the help of K-NN, we will without difficulty
discover the category or class of a selected dataset. Consider the underneath diagram:

How Does KNN Work ?


• The K-NN working can be explained on the basis of the below algorithm:

Step 1: Select the wide variety K of the acquaintances.


Step 2: Calculate the Euclidean distance of K variety of friends.
Step 3: Take the K nearest neighbors as according to the calculated Euclidean distance.
Step 4: Among these ok pals, count number the number of the data points in each class.
Step 5: Assign the brand new records points to that category for which the quantity of the
neighbor is maximum.
Step 6: Our model is ready.

5. Analyze the role of parameters in GMM, including the means, covariances, and
weights of the individual Gaussian components. [CO3,K4]

Gaussian Mixture Models is a "soft" clustering algorithm, where each point probabilistically
"belongs" to all clusters. This is different than k-means where each point belongs to one
cluster.

Gaussian Mixture Models and Expectation Maximization


• Gaussian Mixture Models is a "soft" clustering algorithm, where each point probabilistically
"belongs" to all clusters. This is different than k-means where each point belongs to one cluster.
• The Gaussian mixture model is a probabilistic model that assumes all the data points are
generated from a mix of Gaussian distributions with unknown parameters.
• Gaussian mixture models do not rigidly classify each and every instance into one class or the
other. The algorithm attempts to produce K-Gaussian distributions that would take into account
the entire training space. Every point can be associated with one or more distributions.
Consequently, the deterministic factor would be the probability that each point belongs to a
certain Gaussian distribution.
• GMMs have a variety of real-world applications. Some of them are listed below.
a) Used for signal processing
b) Used for customer churn analysis
c) Used for language identification
d) Used in video game industry
e) Genre classification of songs

6. Expectation Maximization in detail. [CO3,K2]

Expectation-maximization

• Expectation is used to find the Gaussian parameters which are used to represent each component
of gaussian mixture models. Maximization is termed M and it is involved in determining whether
new data points can be added or not.

• The Expectation-Maximization (EM) algorithm is used in maximum likelihood estimation where


the problem involves two sets of random variables of which one, X, is observable and the other,
Z, is hidden.

• The goal of the algorithm is to find the parameter vector that maximizes the (gpie-likelihood of
the observed values of X, L( ϕ| X).

• But in cases where this is not feasible, we associate the extra hidden variables Z and express the
underlying model using both, to maximize the likelihood of the joint distribution of X and Z, the
complete likelihood Lc ( | X,Z).

• Expectation-maximization (EM) is an iterative method used to find maximum likelihood


estimates of parameters in probabilistic models, where the model depends on unobserved, also
called latent, variables.

• EM alternates between performing an expectation (E) step, which computes an NOV expectation
of the likelihood by including the latent variables as if they were observed, and maximization (M)
step, which computes the maximum likelihood estimates of the parameters by maximizing the
expected likelihood found in the E step.
• Expectation-Maximization (EM) is a technique used in point estimation. Given a set of
observable variables X and unknown (latent) variables Z we want to estimate parameters Ѳ in a
model.

• EM is useful for several reasons: conceptual simplicity, ease of implementation, and the fact that
each iteration improves 1(0). The rate of convergence on the first few steps is typically quite good,
but can become excruciatingly slow as you approach local optima.

7. Explain the concept of Ensemble Learning, highlighting the model combination schemes.
Discuss the advantages and limitations of Voting as a model combination scheme.
[CO3,K2]

Answer: Ensemble Learning involves combining multiple models to improve overall


performance. Model combination schemes include Voting, Bagging, Boosting, and Stacking.
Voting aggregates predictions, and it can be Hard Voting (majority vote) or Soft Voting
(weighted average of probabilities).

Advantages of Voting:

• Simple to implement.
• Effective when combining diverse models.

Limitations of Voting:

• Assumes equal competence of all models.


• May not perform well if models are highly correlated.

8.Compare and contrast the Bagging and Boosting ensemble techniques. Provide insights
into scenarios where each technique excels. [CO3,K4]

Answer: Bagging (Bootstrap Aggregating) and Boosting are both ensemble techniques.

Bagging:

• Builds independent models in parallel.


• Reduces variance and overfitting.
• Suitable for unstable models.

Boosting:

• Builds models sequentially, giving more weight to misclassified instances.


• Reduces bias and focuses on difficult instances.
• Suitable for improving the performance of weak learners.

Scenarios:

• Bagging is effective when models are unstable or prone to overfitting.


• Boosting is beneficial when dealing with weak learners or when there is a need for higher
accuracy.

9.Explore the principles behind Stacking in ensemble learning. Discuss the process of
model stacking and the advantages it offers. [CO3,K4]

Answer: Stacking involves training multiple models, and a meta-model is trained to combine
their predictions. The process includes:

1. Training diverse base models on the same data.


2. Collecting predictions from base models.
3. Using the predictions as features to train a meta-model.

Advantages of Stacking:

• Captures diverse perspectives from base models.


• Adapts to complex patterns in data.
• Can outperform individual models and traditional ensembles.

10.Discuss the K-means clustering algorithm in unsupervised learning. Explain the steps
involved, including initialization and convergence. Highlight potential challenges in using
K-means. [CO3,K6]

Answer: K-means is a clustering algorithm:

1. Initialization: Select k centroids (initial cluster centers).


2. Assignment: Assign each data point to the nearest centroid, forming k clusters.
3. Update Centroids: Recalculate centroids based on the mean of data points in each cluster.
4. Repeat Assignment and Update: Iterate until convergence (no change in assignments).

Challenges:

• Sensitive to initial centroid selection.


• Assumes clusters are spherical and equally sized.
• May converge to local optima.

You might also like