0% found this document useful (0 votes)
6 views

Machine Learning

The document discusses various types of machine learning, including supervised, unsupervised, and reinforcement learning, explaining their definitions and applications. It details algorithms such as gradient descent and backpropagation used in supervised learning, along with examples of how these methods can predict outcomes based on input data. Additionally, it covers the structure of artificial neural networks and the process of training them through weight adjustments.

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning

The document discusses various types of machine learning, including supervised, unsupervised, and reinforcement learning, explaining their definitions and applications. It details algorithms such as gradient descent and backpropagation used in supervised learning, along with examples of how these methods can predict outcomes based on input data. Additionally, it covers the structure of artificial neural networks and the process of training them through weight adjustments.

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Unit 5: Learning LH 6

02/01/20 1
25
Contd…
• Outline:
– Why
learning?
– Supervised (Error based) learning

• Gradient descent learning: Least Mean Square, Back


Propagation algorithm
• Stochastic learning

– Unsupervised learning

• Hebbian learning algorithm

• Competitive learning

Reinforced learning (output based)

Genetic algorithms: operators


02/01/20 2
25
What is learning
• Learning is the process of acquiring new or
modifying existing knowledge, behaviors, skills,
values, or preferences.

• Evidence that learning has occurred may be seen in changes in


behavior from simple to complex.

02/01/20 3
25
What is Machine learning?
• The machine Learning denotes changes in the system that are adaptive in
the sense that they enable the system to do the same task more
effectively
the next time.

• Like human learning from past experiences, computer system learns


from
data, which represent some “past experiences” of an application domain.

• Therefore, Machine learning gives computers the ability to learn


without
being explicitly programmed.

02/01/20 4
25
Why machine learning?
• To learn a target function (relation between input and output)that can be
used to predict the values of a discrete class attribute,

– e.g., male or female, and high-risk or low risk, etc.

• To model the underlying structure or distribution in the data in order to


learn more about the data.

• To learns behavior through trial-and-error interactions with a


dynamic environment.

02/01/20 5
25
Types of Machine Learning
• Based on training set machine learning algorithms are classified into the
following three categories:

02/01/20 6
25
Supervised learning
• Supervised learning is where you have input variables (x) and an
output
variable (Y) and you use an algorithm to learn the mapping function from
input to the output.
the Y = f(X)

• Learning stops the algorithm achieves an acceptable level of


when
performance.
• when you have new input data (x) that you can predict the output variables
(Y)
for that data.

f
02/01/20 7
25
Supervised learning
• It is called supervised learning because the process of an algorithm learning
from the training dataset can be thought of as a teacher supervising the learning
process.

• We know the correct answers, the algorithm iteratively makes predictions


on the training data and is corrected by the teacher.

02/01/20 8
25
Supervised Learning Example 1

02/01/20 9
25
Application Example of Supervised learning
• Suppose we have a dataset giving the living areas and prices of house from
KTM:

02/01/20 10
25
Application Example of Supervised learning
• We can plot this data:

02/01/20 11
25
Application Example of Supervised learning
• Given a data like this , we can learn to predict the price of other houses in KTM
as a function of the size of their area, such types of learning is known as
supervised learning.

• Price of house = f(area of house)

i.e., y= f(x)

02/01/20 12
25
Unsupervised Learning
• Unsupervised learning is where you only have input data (X) and no
corresponding output variables(targets/ labels).
• The goal for unsupervised learning is to model the underlying structure
or distribution in the data in order to learn more about the data.
• These are called unsupervised learning because unlike
supervised learning above there is no correct answers and there is no
teacher.
• Algorithms are left to their own devises to discover and present the
interesting structure in the data.
• E.g., Clustering

02/01/20 13
25
Unsupervised learning Example1

02/01/20 14
25
Unsupervised learning example 2

02/01/20 15
25
Reinforcement learning
• Is learning behavior through trial-and-error interactions with a environment.

• Is learning how to act in order to maximize a reward (Encouragements).

• Reinforcement learning emphasizes learning feedback that evaluates the


learner's performance without providing standards of correctness in the form of
behavioral targets.
• Example: Bicycle learning, game playing, etc.

02/01/20 16
25
Contd…

02/01/20 17
25
Supervised learning
• Algorithm
Classification:
– To predict the outcome of a given sample where the output variable is
in the form of categories(discrete). Examples include labels such as
male and female, sick and healthy.

• Regression:
– To predict the outcome of a given sample where the output variable is
in the form of real values(continuous). Examples include real-valued
labels denoting the amount of rainfall, the height of a person.

02/01/20 18
25
Contd…

02/01/20 19
25
Least Mean square Regression algorithm
• For a given data build the predicted output O close to the standard output
y i.e., t (target) as:

• f(x)= O =w0 + w1 x1 + … + wn xn

• Train (adjust or choose) the wi’s such that they minimize the squared error

– E[w1,…,wn] = ½ i=1 to m (ti-oi)2

• Determine the output for new input based on the lien which has
minimum
value of initial seared error.

02/01/20 20
25
Gradient Descent Learning Rule
• Gradient descent algorithm starts with some initial guess for
weights (coefficients) Wi and repeatedly performs the update until
convergence :
• Wi= Wi - l* E[w]
• Where,

– l is learning rate or step size or rate of weight adjustment.


– Gradient:
E[w]=[E/w0,… E/wn]

E[wi]=/wi 1/2i=1 to m(ti-oi)2

= /wi 1/2i=1 to m(ti-i wi xi)2

= i=1 to m(ti- oi)(-xi)


02/01/20 21
25
Contd…
• For a single training example up date rule is:

• Wi= Wi +l*(ti- oi)(xi)

• This is called LMS update rule

02/01/20 22
25
Contd…
Gradient-Descent(training_examples, l)
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is
input values, and t is the target output value, l is the learning rate (e.g. 0.1)
• Initialize each wi to some small random value
• Until the termination condition is met, Do
– Initialize each wi to zero
– For each <(x1,…xn),t> in training_examples Do
• Input the instance (x1,…,xn) to the linear function and compute the
output o
• Calculate change in weight
– wi= l* (t-o) xi
– For each weight wi Do
• wi=wi+wi

02/01/20 23
25
Contd…
• If we have more than one training examples:

• Two way to modify this gradient descent algorithm

fora training set of more than one example.

– Batch gradient descent

– Stochastic(Incremental ) gradient descent

02/01/20 24
25
Contd…
• Batch mode : gradient descent

– This method looks at every example in the entire training


set on every step. So it is static.
– Algorithm

Repeat until convergene

wi=wi + l* i=1 to m (ti- oi)xi over the entire data D and for each weight

02/01/20 25
25
Contd…
• Stochastic (Incremental mode: )gradient descent
– This algorithm repeatedly run through the training set and each time
encounter a training example. So it is dynamic.
– Algorithm:

for(i=1 to m)

wi=wi +l* (t-o) xi for each weight

02/01/20 26
25
Contd…
• Comparison between batch gradient descent and
stochastic gradient descent:
BGD SGD

For a single step take entire For a single step take only single
training set training example

So a costly operation It does not matter how large m is

Slower Faster

Not preferred Used when m is large

02/01/20 27
25
Artificial Neural Network
• A neural network is composed of number of nodes or units , connected
by
links. Each link has a numeric weight associated with it.
Input
Hidden
Layer Output
Layer
1 Layer
4
6

2
5

• Artificial neural networks are programs design to solve any


problem by trying to mimic the structure and the function of our nervous
system.
02/01/20 28
25
Artificial neural network model:

• Input to the network are represented by mathematical symbol xn.

• Each of these inputs are multiplied by a connection weight, wn

sum  w1 x1  w2 x2 ......  wnxn

• These products are simply summed, fed through the transfer function f()
to generate result and output

02/01/20 29
25
Back propagation algorithm
• Back propagation is a neural network learning algorithm. Learns by
adjusting the weight so as to be able to predict the correct class label of
the input.

Input: D training data set and their associated class label

l= learning rate(normally 0.0-1.0)

Output: a trained neural network.

Method:

Step1: initialize all weights and bias in

network. Step2: while termination condition is not

satisfied .

For each training tuple x in D


02/01/20 30
25
Contd…
2.1 calculate output:
For input layer
Oj  Ij
For hidden layer and output layer

Ij  
k O k *W kj   j

1
Oj  Ij
 1 e
2/2/2019
02/01/20 31 31
25
Contd…
2. Calculate error:
For output layer:
err j  O j (1  O j )(j  O j )
For hidden layer:
err j  O j (1  O j )  (err k *W j k )
k

3. Update weight
W i j (new)  W i j (old )  l * O i
* err j
4. Update bias
 j  j  l * err j
2/2/2019
02/01/20 32
25
Contd…
• Example: Sample calculations for learning by the back-propagation algorithm.

• Figure above shows a multilayer feed-forward neural network. Let the


learning rate be 0.9. The initial weight and bias values of the network are
given in Table below, along with the first training tuple, X = (1, 0, 1), whose
class label is 1.

02/01/20
2/2/2019 33 33
25
Contd…
• Step1: Initialization
– Let initial input weight and bias values are:

– Initial input:

X1 X2 X3
1 0 1

– Bias values:
 4  5  6

-0.4 0.2 0.1

– Initial
weight:
W14 W15 W24 W25 W34 W35 W46 W56
0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2
02/01/20
2/2/2019 34 34
25
Contd…
• 2. Termination condition: Weight of two successive iteration are nearly

equal or user defined number of iterations are reached.

• 2.1For each training tuple X in D, calculate the output for each input layer:

– For input Oj 
layer: Ij
• O1=I1=1

• O2=I2=0

• O3=I3=1

02/01/20
2/2/2019 35 35
25
Contd…
• For hidden layer and output layer:

1. I 4  O1* W14  O2 * W24  O3* W34


4
 1* 0.2  0 * 0.4  1* (-0.5)
 - 0.7
1 1 1
O4
   0.332
 I 4 1  ( 0.7 )
 ( 0.7 )
1 e 1
2. (2.7182)
I 5  O1* W15  O2 * W25  O3 * W35 
5 e
 1* (-0.3)  0 * 0.1  1* 0.2
 0.1
1 1 1
O5  
  ( 0.1)  ( 0.1)
 I 5 1 0.525
02/01/20
25 1 e 1 e (2.7182)
36 36
Contd…
3. I 6  O4 * W46  O5* W56   5
 0.332*(0.3)  0.525*(-0.2) 1*
0.1
 - 0.105
1 1
O6   
1 I 6  ( 0.105)  ( 0.105)
 1 1 0.474
e 1
(2.7182)

02/01/20
2/2/2019 37 37
25
Contd…
• Calculation of error at each node:
err j  O j (1  O j )(j  O j )
• For output layer:
err 6  O 6 (1  O 6 )( 6 
O
6 )0.474(1  0.474)(1  0.474) 
• For Hidden layer: 0.1311
e r r  O (1  O )  ( e r r * W )
j j j k jk

j  O j (1  O j ) ( e r r k * W j k )
er r O 5 (1  O 5 ) ( e r r 6 * W 5 6)

 0 . 5 2 5 (1  0 . 5 2 5 ) ( 0 . 1 3 1 1 * (  0 . 2 ) )
 0.0065

e r r j  O j (1  O j ) ( e r r k * W j k )
 O 4 (1  O 4 ) ( e r r 6 * W 4 6 )
 0 . 3 3 2 (1  0 . 3 3 2 ) ( 0 . 1 3 1 1 * (  0 . 3 ) )
 0.0087
02/01/20
2/2/2019 38 38
25
Contd…
• Update the weight: W i j (new)  W i j (old )  l * O i
1. * err j
W 4 6(new)  W 4 6(old )  l * O 4 * err 6
 0.3  0.9 * 0.1311* 0.332
2. W 50.26
6 (new)  W 5 6 (old )  l * O 5 *

err 6

 0.2  0.9 * 0.1311* 0.525


 0.138
3. W 1 4(new)  W 1 4(old )  l * O 1
* err 4
 0.2  0.9 * 0.0087*1
 0.192
02/01/20
2/2/2019 39 39
25
Contd…
4. W 2 4(new)  W 2 4(old )  l * O 2 *
err 4
 0.4  0.9 * (0.0087* 0
 0.4
5.
W 3 4(new)  W 3 4(old )  l * O 3 *
err 4
 0.5  0.9 * (0.0087*1
 0.508
6.
W 15(new)  W 15(old)  l *O1
*err
 0.3
5
 0.9*(0.0065)
*1
 0.306
02/01/20
2/2/2019 40 40
25
Contd…
7. W 2 5(new)  W 2 5(old )  l * O 2 *
err 5
 0.1  0.9 * (0.0065) * 0
 0.1

8.
W 3 5(new)  W 3 5(old )  l * O 3 *
err 5
 0.2  0.9 * (0.0065) *1
 0.194

02/01/20
2/2/2019 41 41
25
Contd…
• Update the Bias value:   l
j

1.  * err j
 
j
6 6 ( old ) l*
 0.1 err
0.96 * 0.1311
 0.218
2.   5 5 ( o ld )  l * err 5
 0.2  0.9
* (0.0065)
 0.195 And so on until convergence!

3.   6 4 ( o ld )  l * err 4
 (0.4)  0.9 * (0.0087)
02/01/20
2/2/2019
25
42 42
Hebbian Learning
1. Take numberAlgorithm
of steps = Number of inputs(X).

2. In each step

I. Calculate Net input = weight(W) * input(X)

II. Calculate Net Output (O)= f(net input)

III. Calculate change in weight = learning rate * O* X

IV. New weight = old weight + change in weight

3. Repeat step2 until terminating condition is


satisfied

2/2/2019
02/01/20 43
25
Competitive Learning Rule (Winner-takes-all)
Basic Concept of Competitive Network:
•This network is just like a single layer feed-forward network with
feedback

connection between outputs.


•The feed- forward connections are excitatory types.

•While the feed-back connections between outputs are inhibitory type, shown

by dotted lines, which means the competitors never support themselves.

2/2/2019
02/01/20 44 44
25
Basic Concept of Competitive Learning Rule:
• It is unsupervised learning algorithm in which the output

nodes try to compete with each other to represent the input

pattern.

• Hence, the main concept is that during training, the output unit
with the highest Net-input to a given input, will be
declared the winner.
• Then only the winning neuron’ s weights are updated and the
rest of the neurons are left unchanged.
• Therefore , This rule is also called Winner-takes-all

02/01/20 45
25
Competitive learning algorithm
• For n number of input and m number of output
• Repeat Until convergence

• Calculate the Net input

• The outputs of the network are determined by a ``winner-take-all'' competition such


that only the output node receiving the maximal net input will output 1 while all
others output 0:

• Calculate the change in weight:

• Update the weight:

Where,
02/01/20 46
25
Genetic Algorithm

02/01/20 47
25
Introduction to Genetic

Algorithms
Nature has always been a great source of inspiration to all mankind.
• A genetic algorithm is a search heuristic that is inspired by Charles Darwin’s
theory of natural evolution.

“Select The Best, Discard The Rest”

• i.e., the fittest individuals are selected for reproduction in order to produce
offspring of the next generation.
• GAs were developed by John Holland and his students and colleagues at
the University of Michigan

02/01/20 48
25
Notion of Natural Selection
• The process of natural selection starts with the selection of fittest
individuals from a population. They produce offspring which inherit the
characteristics of the parents and will be added to the next generation.
• If parents have better fitness, their offspring will be better than parents
and have a better chance at surviving.
• This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
• This notion can be applied for a search problem. We consider a set of
solutions for a problem and select the set of best ones out of them.

02/01/20 49
25
Basic Terminology
• Population: set of individuals each representing a possible solution to a
given problem.

• Gene: a solution to problem represented as a set of parameters


,these
parameters known as genes.

• Chromosome: genes joined together to form a string of values


called chromosome.

• Fitness Function: measures the quality of solution. It is problem dependent.

02/01/20 50
25
Nature to Computer Mapping

Nature Computer

Population Set of solutions.


Individual Solution to a
Fitness problem.
Chromosom Quality of a solution.
e Gene Encoding for a
Reproductio Solution.
n Part of the encoding
of a solution.
Crossover

02/01/20 51
25
Five phases are considered in a genetic algorithm.

02/01/20 52
25
Simple Genetic
Algorithm
Simple_Genetic_Algorithm()
{
Initialize the Population;
Calculate Fitness Function;
While(Fitness Value != Optimal Value)
{
Selection;//Natural Selection, Survival Of
Fittest

Crossover;//Reproduction, Propagate favorable


characteristics
Mutation;//Mutation
Calculate Fitness Function;
}
2/2/2019
02/01/20 53
25 }
Initialization of Population
• There are two primary methods to initialize a population in a GA.
They are −
• Random Initialization
• Heuristic initialization
• There are several things to be kept in mind when dealing with GA
population −
– The diversity of the population should be maintained otherwise it might
lead to premature convergence.
– The population size should not be kept very large as it can cause a GA
to slow down, while a smaller population might not be enough for a
good mating pool. Therefore, an optimal population size needs to be
decided by trial and error.
02/01/20 54
25
Fitness Function
• The fitness function which takes a candidate solution to the
problem as input and produces as output how fit an individual is
(the ability of an individual to compete with other individuals).
• A fitness function should possess the following characteristics −
– The fitness function should be sufficiently fast to compute.

– It must quantitatively measure how fit a given solution is or


how fit individuals can be produced from the given solution.

02/01/20 55
25
Selection
• The idea of selection phase is to select the fittest individuals and let
them pass their genes to the next generation.
• Two pairs of individuals (parents) are selected based on their fitness
scores. Individuals with high fitness have more chance to be selected for
reproduction.
• Selection of the parents can be done by using any one of the following
strategy:
– Fitness Proportionate Selection(Roulette Wheel Selection)

– Tournament Selection

– Rank Selection

02/01/20 56
25
Contd..
• Fitness Proportionate Selection(Roulette Wheel Selection):
– Chance of selection is directly propositional to the fitness value. This is
chance not
fixed so, any one can become parent.

02/01/20 57
25
Contd..
• Tournament Selection: Take randomly k individuals and select only
one among k to make the parent of the next generation.

02/01/20 58
25
Contd..
• Rank Selection: mostly used when the individuals in the population have very
close fitness values . This leads to each individual having an almost equal share
of the pie chart. and hence each individual no matter how fit relative to each
other has an approximately same probability of getting selected as a parent. This
in turn leads to a loss in the selection pressure towards fitter individuals,
making the GA to make poor parent selections in such situations.

02/01/20 59
25
Contd..
• Rank every individual in the population is
Selection: ranked
according to their fitness. The higher ranked individuals
are preferred more than the lower ranked ones.

02/01/20 60
25
Crossover
• Crossover is the most significant phase in a genetic algorithm. For
each pair of parents to be mated, a crossover point is chosen at
random from within the genes.
• For example, consider the crossover point to be 3 as shown below.

02/01/20 61
25
Contd..
• Offspring are created by exchanging the genes of parents among
themselves until the crossover point is reached.

• The new offspring are added to the population


02/01/20 62
25
Mutation
• Now if you think in the biological sense, are the children produced have the
same traits as their parents? The answer is NO. During their growth, there
is some change in the genes of children which makes them different from
its parents.
• This process is known as mutation, which may be defined as a random
tweak in
the chromosome.
• This implies that some of the bits in the bit string can be flipped.

• Mutation occurs to maintain diversity within the population and prevent from
the premature convergence.
02/01/20 63
25
Terminatio
n
• The algorithm terminates:
– If the population has converged (does not produce offspring which
are significantly different from the previous generation). Then it is said that
the genetic algorithm has provided a set of solutions to our problem.
– OR If the predefined number of generation is reached.

02/01/20 64
25
Issues for GA

Practitioners
Choosing basic implementation issues:
– representation

– population size, mutation rate, ...

– selection, deletion policies

– crossover, mutation operators

• Termination Criteria

02/01/20 65
25
Benefits of Genetic
• Algorithms
Concept is easy to understand
• Modular, separate from application

• Supports multi-objective optimization

• Good for “noisy” environments

• Always an answer; answer gets better with time

• Easy to exploit previous or alternate solutions

• Flexible building blocks for hybrid applications

• Substantial history and range of use

02/01/20 66
25
When to Use a GA
• Alternate solutions are too slow or overly complicated

• Need an exploratory tool to examine new approaches

• Problem is similar to one that has already been successfully solved by


using
a GA
• Want to hybridize with an existing solution

• Benefits of the GA technology meet key problem requirements

02/01/20 67
25
Thank You !

02/01/20 68
25

You might also like