0% found this document useful (0 votes)
3 views

College notes

Computational learning theory (CoLT) is a branch of AI that uses mathematical methods to analyze learning tasks and algorithms, extending statistical learning theory. It includes concepts like sample complexity, which quantifies the number of training samples needed for effective learning, and VC dimension, which measures the capacity of a hypothesis set. Additionally, ensemble learning techniques, such as bagging and boosting, are discussed for improving predictive performance by combining multiple models.

Uploaded by

harshnebhnani02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

College notes

Computational learning theory (CoLT) is a branch of AI that uses mathematical methods to analyze learning tasks and algorithms, extending statistical learning theory. It includes concepts like sample complexity, which quantifies the number of training samples needed for effective learning, and VC dimension, which measures the capacity of a hypothesis set. Additionally, ensemble learning techniques, such as bagging and boosting, are discussed for improving predictive performance by combining multiple models.

Uploaded by

harshnebhnani02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT 4 COMPUTATIONAL LEARNING THEORY

Computational learning theory (CoLT) is a branch of AI concerned with using mathematical methods or
the design applied to computer learning programs. It involves using mathematical frameworks for the
purpose of quantifying learning tasks and algorithms.

It seeks to use the tools of theoretical computer science to quantify learning problems. This includes
characterizing the difficulty of learning specific tasks.

Computational learning theory can be considered to be an extension of statistical learning theory or SLT
for short, that makes use of formal methods for the purpose of quantifying learning algorithms.

 Computational Learning Theory (CoLT): Formal study of learning tasks.


 Statistical Learning Theory (SLT): Formal study of learning algorithms.

This division of learning tasks vs. learning algorithms is arbitrary, and in practice, there is quite a large
degree of overlap between these two fields.

Computational learning theory is essentially a sub-field of artificial intelligence (AI) that focuses on
studying the design and analysis of machine learning algorithms.
Sample Complexity

The sample complexity of a machine learning algorithm represents the number of training-
samples that it needs in order to successfully learn a target function.

More precisely, the sample complexity is the number of training-samples that we need to supply
to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error
of the best possible function, with probability arbitrarily close to 1.

There are two variants of sample complexity:

 The weak variant fixes a particular input-output distribution;


 The strong variant takes the worst-case sample complexity over all input-output
distributions.
The No free lunch theorem, discussed below, proves that, in general, the strong sample
complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target
function using a finite number of training samples.

However, if we are only interested in a particular class of target functions (e.g, only linear
functions) then the sample complexity is finite, and it depends linearly on the VC dimension on
the class of target functions.

VC Dimension
The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to fit
different data sets. It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1970s
and has become a fundamental concept in statistical learning theory. The VC dimension is a
measure of the complexity of a model, which can help us understand how well it can fit different
data sets.
The VC dimension of a hypothesis set H is the largest number of points that can be shattered by
H. A hypothesis set H shatters a set of points S if, for every possible labeling of the points in S,
there exists a hypothesis in H that correctly classifies the points. In other words, a hypothesis set
shatters a set of points if it can fit any possible labeling of those points.
Bounds of VC – Dimension
The VC dimension provides both upper and lower bounds on the number of training examples
required to achieve a given level of accuracy. The upper bound on the number of training
examples is logarithmic in the VC dimension, while the lower bound is linear.
Applications of VC – Dimension
The VC dimension has a wide range of applications in machine learning and statistics. For
example, it is used to analyze the complexity of neural networks, support vector machines, and
decision trees. The VC dimension can also be used to design new learning algorithms that are
robust to noise and can generalize well to unseen data.
Ensemble Learning

 Ensemble learning is a machine learning technique that combines the predictions from
multiple individual models to obtain a better predictive performance than any single
model. The basic idea behind ensemble learning is to leverage the wisdom of the crowd
by aggregating the predictions of multiple models, each of which may have its own
strengths and weaknesses. This can lead to improved performance and generalization.

 Ensemble learning can be thought of as compensation for poor learning algorithms that
are computationally more expensive than a single model. But they are more efficient than
a single non-ensemble model that has passed through a lot of learning. In this article, we
will have a comprehensive overview of the importance of ensemble learning and how it
works, different types of ensemble classifiers, advanced ensemble learning techniques,
and some algorithms (such as random forest, xgboost) for better clarification of the
common ensemble classifiers and finally their uses in the technical world.

 Several individual base models (experts) are fitted to learn from the same data and
produce an aggregation of output based on which a final decision is taken. These base
models can be machine learning algorithms such as decision trees (mostly used), linear
models, support vector machines (SVM), neural networks, or any other model that is
capable of making predictions.

 Most commonly used ensembles include techniques such as Bagging- used to generate
Random Forest algorithms and Boosting- to generate algorithms such as Adaboost,
Xgboost etc.

There are two techniques given below that are used to perform ensemble decision tree.
Bagging
Bagging is used when our objective is to reduce the variance of a decision tree. Here the concept
is to create a few subsets of data from the training sample, which is chosen randomly with
replacement. Now each collection of subset data is used to prepare their decision trees thus, we
end up with an ensemble of various models. The average of all the assumptions from numerous
tress is used, which is more powerful than a single decision tree.

Random Forest is an expansion over bagging. It takes one additional step to predict a random
subset of data. It also makes the random selection of features rather than using all features to
develop trees. When we have numerous random trees, it is called the Random Forest.

These are the following steps which are taken to implement a Random forest:

 Let us consider X observations Y features in the training data set. First, a model from the
training data set is taken randomly with substitution.
 The tree is developed to the largest.
 The given steps are repeated, and prediction is given, which is based on the collection of
predictions from n number of trees.

Advantages of using Random Forest technique:

 It manages a higher dimension data set very well.


 It manages missing quantities and keeps accuracy for missing data.

Disadvantages of using Random Forest technique:

Since the last prediction depends on the mean predictions from subset trees, it won't give precise
value for the regression model.

Boosting:
Boosting is another ensemble procedure to make a collection of predictors. In other words, we fit
consecutive trees, usually random samples, and at each step, the objective is to solve net error
from the prior trees.

If a given input is misclassified by theory, then its weight is increased so that the upcoming
hypothesis is more likely to classify it correctly by consolidating the entire set at last converts
weak learners into better performing models.

Gradient Boosting is an expansion of the boosting procedure.

1. Gradient Boosting = Gradient Descent + Boosting


It utilizes a gradient descent algorithm that can optimize any differentiable loss function. An
ensemble of trees is constructed individually, and individual trees are summed successively. The
next tree tries to restore the loss ( It is the difference between actual and predicted values).

Advantages of using Gradient Boosting methods:

 It supports different loss functions.


 It works well with interactions.

Disadvantages of using a Gradient Boosting methods:

 It requires cautious tuning of different hyper-parameters.

Bagging Boosting
Various training data subsets are randomly drawn Each new subset contains the components
with replacement from the whole training dataset. that were misclassified by previous models.
Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.
If the classifier is unstable (high variance), then If the classifier is steady and straightforward
we need to apply bagging. (high bias), then we need to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that It is a way of connecting predictions that
belong to the same type. belong to the different types.
New models are affected by the performance
Every model is constructed independently.
of the previously developed model.

You might also like