0% found this document useful (0 votes)
38 views21 pages

Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views21 pages

Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

MACHINE LEARNING

• Deep learning is a specific kind of machine learning. In


order to understand deep learning well, one must have a
solid understanding of the basic principles of machine
learning
LEARNING ALGORITHMS
• A machine learning algorithm is an algorithm that is able
to learn from data.
• “A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured
by P, improves with experience E
• Machine learning allows us to tackle tasks that are too difficult
to solve with fixed programs written and designed by human
beings.

Some of the most common machine learning tasks include the


following:

 Classification:
• In this type of task, the computer program is asked to specify
which of k categories some input belongs to. To solve this task,
the learning algorithm is usually asked to produce a function f :
R n → {1, . . . , k}
 Classification with missing inputs:
• In order to solve the classification task, the
learning algorithm only has to define a single
function mapping from a vector input to a
categorical output.
• When some of the inputs may be missing,
rather than providing a single classification
function, the learning algorithm must learn a
set of functions.
 Regression:
• In this type of task, the computer program is
asked to predict a numerical value given some
input. To solve this task, the learning algorithm
is asked to output a function f : R n → R
• This type of task is similar to classification,
except that the format of output is different.
• These kinds of predictions are also used for
algorithmic trading.
 Transcription:
• In this type of task, the machine learning system is
asked to observe a relatively unstructured
representation of some kind of data and transcribe it
into discrete, textual form.
 Machine translation:
• In a machine translation task, the input already
consists of a sequence of symbols in some language,
and the computer program must convert this into a
sequence of symbols in another language
 Structured output:
• Structured output tasks involve any task
where the output is a vector (or other data
structure containing multiple values) with
important relationships between the different
elements.
• This is a broad category, and subsumes the
transcription and translation tasks described
above, but also many other tasks
• Machine learning algorithms can be broadly
categorized as unsupervised or supervised by what
kind of experience they are allowed to have during
the learning process.
• Most of the learning algorithms in this book can be
understood as being allowed to experience an entire
dataset.
 Unsupervised learning algorithms:
• experience a dataset containing many features, then
learn useful properties of the structure of this dataset
 Supervised learning algorithms:
• Experience a dataset containing features, but
each example is also associated with a label or
target.
• The term supervised learning originates from
the view of the target y being provided by an
instructor or teacher who shows the machine
learning system what to do.
• Some machine learning algorithms do not just
experience a fixed dataset. For example,
reinforcement learning algorithms interact with
an environment, so there is a feedback loop
between the learning system and its experiences.
• One common way of describing a dataset is with
a design matrix. A design matrix is a matrix
containing a different example in each row.
 Linear Regression:
• Our definition of a machine learning algorithm
as an algorithm that is capable of improving a
computer program’s performance at some
task via experience is somewhat abstract.
• To make this more concrete, we present an
example of a simple machine learning
algorithm: linear regression
• As the name implies, linear regression solves a
regression problem. In other words, the goal is
to build a system that can take a vector x ∈ Rn
as input and predict the value of a scalar y ∈ R
as its output. In the case of linear regression,
the output is a linear function of the input.
• We define the output to be yˆ = wx
• w ∈ R n is a vector of parameters.
 Regularization:
• Regularization is any modification we make to a learning
algorithm that is intended to reduce its generalization error
but not its training error
• The behavior of our algorithm is strongly affected not just
by how large we make the set of functions allowed in its
hypothesis space, but by the specific identity of those
functions.
• More generally, we can regularize a model that learns a
function f(x; θ) by adding a penalty called a regularizer to
the cost function.
 Hyper-parameters and Validation Sets:
• Most machine learning algorithms have several
settings that we can use to control the behavior
of the learning algorithm. These settings are
called hyper-parameters.
 Cross-Validation:
• Dividing the dataset into a fixed training set and
a fixed test set can be problematic if it results in
the test set being small.
 Estimators, Bias and Variance:

• The field of statistics gives us many tools that


can be used to achieve the machine learning
goal of solving a task not only on the training set
but also to generalize. Foundational concepts
such as parameter estimation, bias and variance
are useful to formally characterize notions of
generalization, underfitting and overfitting.
 Bias:
• The bias of an estimator is defined as: bias(θˆm) =
E(θˆm) − θ
• where the expectation is over the data (seen as samples
from a random variable) and θ is the true underlying
value of θ used to define the data generating
distribution. An estimator θˆm is said to be unbiased if
bias(θˆm) = 0, which implies that E(θˆm) = θ. An
estimator θˆm is said to be asymptotically unbiased if
limm→∞ bias(θˆm) = 0, which implies that limm→∞
E(θˆm) = θ
 Variance and Standard Error:
• Another property of the estimator that we
might want to consider is how much we
expect it to vary as a function of the data
sample. Just as we computed the expectation
of the estimator to determine its bias, we can
compute its variance. The variance of an
estimator is simply the variance Var( ˆθ).
• where the random variable is the training set.
Alternately, the square root of the variance is called
the standard error, denoted SE(ˆθ).
 Bayesian Statistics:
• So far we have discussed frequentist statistics and
approaches based on estimating a single value of θ,
then making all predictions thereafter based on that
one estimate. Another approach is to consider all
possible values of θ when making a prediction. The
latter is the domain of Bayesian statistics.
 Support Vector Machines:
• One of the most influential approaches to supervised
learning is the support vector machine (Boser et al., 1992;
Cortes and Vapnik, 1995). This model is similar to logistic
regression in that it is driven by a linear function wx +
• Training examples are known as support vectors.
 Unsupervised Learning Algorithms :
 Principal Components Analysis:
• We can also view PCA as an unsupervised learning algorithm
that learns a representation of data.
• PCA learns a linear projection that aligns the
direction of greatest variance with the axes of the
new space. (Left)The original data consists of
samples of x. In this space, the variance might
occur along directions that are not axis-aligned.
(Right)The transformed data z = x W now varies
most along the axis z1. The direction of second
most variance is now along
 k-means Clustering:
• Another example of a simple representation learning
algorithm is k -means clustering. The k-means
clustering algorithm divides the training set into k
different clusters of examples that are near each
other. We can thus think of the algorithm as providing
a k-dimensional one-hot code vector h representing
an input x. If x belongs to cluster i, then hi = 1 and all
other entries of the representation h are zero.

You might also like