Machine Learning
Machine Learning
02/01/20 1
25
Contd…
• Outline:
– Why
learning?
– Supervised (Error based) learning
– Unsupervised learning
• Competitive learning
02/01/20 3
25
What is Machine learning?
• The machine Learning denotes changes in the system that are adaptive in
the sense that they enable the system to do the same task more
effectively
the next time.
02/01/20 4
25
Why machine learning?
• To learn a target function (relation between input and output)that can be
used to predict the values of a discrete class attribute,
02/01/20 5
25
Types of Machine Learning
• Based on training set machine learning algorithms are classified into the
following three categories:
02/01/20 6
25
Supervised learning
• Supervised learning is where you have input variables (x) and an
output
variable (Y) and you use an algorithm to learn the mapping function from
input to the output.
the Y = f(X)
f
02/01/20 7
25
Supervised learning
• It is called supervised learning because the process of an algorithm learning
from the training dataset can be thought of as a teacher supervising the learning
process.
02/01/20 8
25
Supervised Learning Example 1
02/01/20 9
25
Application Example of Supervised learning
• Suppose we have a dataset giving the living areas and prices of house from
KTM:
02/01/20 10
25
Application Example of Supervised learning
• We can plot this data:
02/01/20 11
25
Application Example of Supervised learning
• Given a data like this , we can learn to predict the price of other houses in KTM
as a function of the size of their area, such types of learning is known as
supervised learning.
i.e., y= f(x)
02/01/20 12
25
Unsupervised Learning
• Unsupervised learning is where you only have input data (X) and no
corresponding output variables(targets/ labels).
• The goal for unsupervised learning is to model the underlying structure
or distribution in the data in order to learn more about the data.
• These are called unsupervised learning because unlike
supervised learning above there is no correct answers and there is no
teacher.
• Algorithms are left to their own devises to discover and present the
interesting structure in the data.
• E.g., Clustering
02/01/20 13
25
Unsupervised learning Example1
02/01/20 14
25
Unsupervised learning example 2
02/01/20 15
25
Reinforcement learning
• Is learning behavior through trial-and-error interactions with a environment.
02/01/20 16
25
Contd…
02/01/20 17
25
Supervised learning
• Algorithm
Classification:
– To predict the outcome of a given sample where the output variable is
in the form of categories(discrete). Examples include labels such as
male and female, sick and healthy.
• Regression:
– To predict the outcome of a given sample where the output variable is
in the form of real values(continuous). Examples include real-valued
labels denoting the amount of rainfall, the height of a person.
02/01/20 18
25
Contd…
02/01/20 19
25
Least Mean square Regression algorithm
• For a given data build the predicted output O close to the standard output
y i.e., t (target) as:
• f(x)= O =w0 + w1 x1 + … + wn xn
• Train (adjust or choose) the wi’s such that they minimize the squared error
• Determine the output for new input based on the lien which has
minimum
value of initial seared error.
02/01/20 20
25
Gradient Descent Learning Rule
• Gradient descent algorithm starts with some initial guess for
weights (coefficients) Wi and repeatedly performs the update until
convergence :
• Wi= Wi - l* E[w]
• Where,
02/01/20 22
25
Contd…
Gradient-Descent(training_examples, l)
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is
input values, and t is the target output value, l is the learning rate (e.g. 0.1)
• Initialize each wi to some small random value
• Until the termination condition is met, Do
– Initialize each wi to zero
– For each <(x1,…xn),t> in training_examples Do
• Input the instance (x1,…,xn) to the linear function and compute the
output o
• Calculate change in weight
– wi= l* (t-o) xi
– For each weight wi Do
• wi=wi+wi
02/01/20 23
25
Contd…
• If we have more than one training examples:
02/01/20 24
25
Contd…
• Batch mode : gradient descent
wi=wi + l* i=1 to m (ti- oi)xi over the entire data D and for each weight
02/01/20 25
25
Contd…
• Stochastic (Incremental mode: )gradient descent
– This algorithm repeatedly run through the training set and each time
encounter a training example. So it is dynamic.
– Algorithm:
for(i=1 to m)
02/01/20 26
25
Contd…
• Comparison between batch gradient descent and
stochastic gradient descent:
BGD SGD
For a single step take entire For a single step take only single
training set training example
Slower Faster
02/01/20 27
25
Artificial Neural Network
• A neural network is composed of number of nodes or units , connected
by
links. Each link has a numeric weight associated with it.
Input
Hidden
Layer Output
Layer
1 Layer
4
6
2
5
• These products are simply summed, fed through the transfer function f()
to generate result and output
02/01/20 29
25
Back propagation algorithm
• Back propagation is a neural network learning algorithm. Learns by
adjusting the weight so as to be able to predict the correct class label of
the input.
Method:
satisfied .
Ij
k O k *W kj j
1
Oj Ij
1 e
2/2/2019
02/01/20 31 31
25
Contd…
2. Calculate error:
For output layer:
err j O j (1 O j )(j O j )
For hidden layer:
err j O j (1 O j ) (err k *W j k )
k
3. Update weight
W i j (new) W i j (old ) l * O i
* err j
4. Update bias
j j l * err j
2/2/2019
02/01/20 32
25
Contd…
• Example: Sample calculations for learning by the back-propagation algorithm.
02/01/20
2/2/2019 33 33
25
Contd…
• Step1: Initialization
– Let initial input weight and bias values are:
– Initial input:
X1 X2 X3
1 0 1
– Bias values:
4 5 6
– Initial
weight:
W14 W15 W24 W25 W34 W35 W46 W56
0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2
02/01/20
2/2/2019 34 34
25
Contd…
• 2. Termination condition: Weight of two successive iteration are nearly
• 2.1For each training tuple X in D, calculate the output for each input layer:
– For input Oj
layer: Ij
• O1=I1=1
• O2=I2=0
• O3=I3=1
02/01/20
2/2/2019 35 35
25
Contd…
• For hidden layer and output layer:
02/01/20
2/2/2019 37 37
25
Contd…
• Calculation of error at each node:
err j O j (1 O j )(j O j )
• For output layer:
err 6 O 6 (1 O 6 )( 6
O
6 )0.474(1 0.474)(1 0.474)
• For Hidden layer: 0.1311
e r r O (1 O ) ( e r r * W )
j j j k jk
j O j (1 O j ) ( e r r k * W j k )
er r O 5 (1 O 5 ) ( e r r 6 * W 5 6)
0 . 5 2 5 (1 0 . 5 2 5 ) ( 0 . 1 3 1 1 * ( 0 . 2 ) )
0.0065
e r r j O j (1 O j ) ( e r r k * W j k )
O 4 (1 O 4 ) ( e r r 6 * W 4 6 )
0 . 3 3 2 (1 0 . 3 3 2 ) ( 0 . 1 3 1 1 * ( 0 . 3 ) )
0.0087
02/01/20
2/2/2019 38 38
25
Contd…
• Update the weight: W i j (new) W i j (old ) l * O i
1. * err j
W 4 6(new) W 4 6(old ) l * O 4 * err 6
0.3 0.9 * 0.1311* 0.332
2. W 50.26
6 (new) W 5 6 (old ) l * O 5 *
err 6
8.
W 3 5(new) W 3 5(old ) l * O 3 *
err 5
0.2 0.9 * (0.0065) *1
0.194
02/01/20
2/2/2019 41 41
25
Contd…
• Update the Bias value: l
j
1. * err j
j
6 6 ( old ) l*
0.1 err
0.96 * 0.1311
0.218
2. 5 5 ( o ld ) l * err 5
0.2 0.9
* (0.0065)
0.195 And so on until convergence!
3. 6 4 ( o ld ) l * err 4
(0.4) 0.9 * (0.0087)
02/01/20
2/2/2019
25
42 42
Hebbian Learning
1. Take numberAlgorithm
of steps = Number of inputs(X).
2. In each step
2/2/2019
02/01/20 43
25
Competitive Learning Rule (Winner-takes-all)
Basic Concept of Competitive Network:
•This network is just like a single layer feed-forward network with
feedback
•While the feed-back connections between outputs are inhibitory type, shown
2/2/2019
02/01/20 44 44
25
Basic Concept of Competitive Learning Rule:
• It is unsupervised learning algorithm in which the output
pattern.
• Hence, the main concept is that during training, the output unit
with the highest Net-input to a given input, will be
declared the winner.
• Then only the winning neuron’ s weights are updated and the
rest of the neurons are left unchanged.
• Therefore , This rule is also called Winner-takes-all
02/01/20 45
25
Competitive learning algorithm
• For n number of input and m number of output
• Repeat Until convergence
Where,
02/01/20 46
25
Genetic Algorithm
02/01/20 47
25
Introduction to Genetic
•
Algorithms
Nature has always been a great source of inspiration to all mankind.
• A genetic algorithm is a search heuristic that is inspired by Charles Darwin’s
theory of natural evolution.
• i.e., the fittest individuals are selected for reproduction in order to produce
offspring of the next generation.
• GAs were developed by John Holland and his students and colleagues at
the University of Michigan
02/01/20 48
25
Notion of Natural Selection
• The process of natural selection starts with the selection of fittest
individuals from a population. They produce offspring which inherit the
characteristics of the parents and will be added to the next generation.
• If parents have better fitness, their offspring will be better than parents
and have a better chance at surviving.
• This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
• This notion can be applied for a search problem. We consider a set of
solutions for a problem and select the set of best ones out of them.
02/01/20 49
25
Basic Terminology
• Population: set of individuals each representing a possible solution to a
given problem.
02/01/20 50
25
Nature to Computer Mapping
Nature Computer
02/01/20 51
25
Five phases are considered in a genetic algorithm.
02/01/20 52
25
Simple Genetic
Algorithm
Simple_Genetic_Algorithm()
{
Initialize the Population;
Calculate Fitness Function;
While(Fitness Value != Optimal Value)
{
Selection;//Natural Selection, Survival Of
Fittest
02/01/20 55
25
Selection
• The idea of selection phase is to select the fittest individuals and let
them pass their genes to the next generation.
• Two pairs of individuals (parents) are selected based on their fitness
scores. Individuals with high fitness have more chance to be selected for
reproduction.
• Selection of the parents can be done by using any one of the following
strategy:
– Fitness Proportionate Selection(Roulette Wheel Selection)
– Tournament Selection
– Rank Selection
02/01/20 56
25
Contd..
• Fitness Proportionate Selection(Roulette Wheel Selection):
– Chance of selection is directly propositional to the fitness value. This is
chance not
fixed so, any one can become parent.
02/01/20 57
25
Contd..
• Tournament Selection: Take randomly k individuals and select only
one among k to make the parent of the next generation.
02/01/20 58
25
Contd..
• Rank Selection: mostly used when the individuals in the population have very
close fitness values . This leads to each individual having an almost equal share
of the pie chart. and hence each individual no matter how fit relative to each
other has an approximately same probability of getting selected as a parent. This
in turn leads to a loss in the selection pressure towards fitter individuals,
making the GA to make poor parent selections in such situations.
02/01/20 59
25
Contd..
• Rank every individual in the population is
Selection: ranked
according to their fitness. The higher ranked individuals
are preferred more than the lower ranked ones.
02/01/20 60
25
Crossover
• Crossover is the most significant phase in a genetic algorithm. For
each pair of parents to be mated, a crossover point is chosen at
random from within the genes.
• For example, consider the crossover point to be 3 as shown below.
02/01/20 61
25
Contd..
• Offspring are created by exchanging the genes of parents among
themselves until the crossover point is reached.
• Mutation occurs to maintain diversity within the population and prevent from
the premature convergence.
02/01/20 63
25
Terminatio
n
• The algorithm terminates:
– If the population has converged (does not produce offspring which
are significantly different from the previous generation). Then it is said that
the genetic algorithm has provided a set of solutions to our problem.
– OR If the predefined number of generation is reached.
02/01/20 64
25
Issues for GA
•
Practitioners
Choosing basic implementation issues:
– representation
• Termination Criteria
02/01/20 65
25
Benefits of Genetic
• Algorithms
Concept is easy to understand
• Modular, separate from application
02/01/20 66
25
When to Use a GA
• Alternate solutions are too slow or overly complicated
02/01/20 67
25
Thank You !
02/01/20 68
25