0% found this document useful (0 votes)
21 views

8 Neural Networks

Uploaded by

terpytforyou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

8 Neural Networks

Uploaded by

terpytforyou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

‫‪Artificial Intelligence‬‬

‫‪Artificial Neural‬‬
‫‪Networks‬‬
‫الشبكات ألعصبية‬
‫االصطناعية‬
‫ا‪.‬د‪ /‬احمد سلطان الهجامي‬
‫استاذ الذكاء االصطناعي ونظم المعلومات الذكية‬
‫جامعة صنعاء‬
‫‪AI‬‬ ‫‪Prof. Ahmed Sultan Al-Hegami‬‬ ‫‪1‬‬
Artificial Neural Networks
‫الشبكات ألعصبية‬
‫االصطناعية‬

AI Prof. Ahmed Sultan Al-Hegami 2


Concept Learning

Learning systems differ in how they represent concepts

Backpropagation

Training C4.5 CART


Examples

FOIL, ILP
X^Y  Z
… …
AI Prof. Ahmed Sultan Al-Hegami 3
Neural Networks
 Networks of processing units (neurons) with
connections (synapses) between them
 Large number of neurons: 1012

 Large connectitivity: 105

 Parallel processing

 Distributed computation/memory

 Robust to noise, failures

AI Prof. Ahmed Sultan Al-Hegami 4


A new sort of computer

 What are (everyday) computer systems good at...


and not so good at?
Good at Not so good at
Rule-based systems: Dealing with noisy data
doing what the programmer
Dealing with unknown
wants them to do
environment data
Massive parallelism
Fault tolerance
Adapting to circumstances
AI Prof. Ahmed Sultan Al-Hegami 5
Neural networks to the rescue

 Neural network: information processing


paradigm inspired by biological nervous
systems, such as our brain
 Structure: large number of highly

interconnected processing elements


(neurons) working together
 Like people, they learn from experience (by

example)

AI Prof. Ahmed Sultan Al-Hegami 6


Neural networks to the rescue

 Neural networks are configured for a specific


application, such as pattern recognition or
data classification, through a learning
process
 In a biological system, learning involves

adjustments to the synaptic connections


between neurons
 same for artificial neural networks (ANNs)

AI Prof. Ahmed Sultan Al-Hegami 7


History of Neural Networks
 1943: McCulloch and Pitts proposed a model of a neuron --> Perceptron
(read [Mitchell, section 4.4 ])
 1960s: Widrow and Hoff explored Perceptron networks (which they called
“Adelines”) and the delta rule.
 1962: Rosenblatt proved the convergence of the perceptron training rule.
 1969: Minsky and Papert showed that the Perceptron cannot deal with
nonlinearly-separable data sets---even those that represent simple
function such as X-OR.
 1970-1985: Very little research on Neural Nets
 1986: Invention of Backpropagation [Rumelhart and McClelland, but also
Parker and earlier on: Werbos] which can learn from nonlinearly-
separable data sets.
 Since 1985: A lot of research in Neural Nets!

AI Prof. Ahmed Sultan Al-Hegami 8


Where can neural network systems help

 when we can't formulate an algorithmic


solution.
 when we can get lots of examples of the

behavior we require.
‘learning from experience’
 when we need to pick out the structure from

existing data.

AI Prof. Ahmed Sultan Al-Hegami 9


Inspiration from Neurobiology

 A neuron: many-inputs /
one-output unit
 output can be excited or not
excited
 incoming signals from other
neurons determine if the
neuron shall excite ("fire")
 Output subject to
attenuation in the
synapses, which are
junction parts of the neuron
AI Prof. Ahmed Sultan Al-Hegami 10
Real vs Artificial Neurons

dendrites

cell axon
synapse
dendrites
n
x0 w0 n

w x i i
o 1 if w x
i 0
i i  0 and 0 otherwise


i 0
o

xn
AI
wn Threshold unit
Prof. Ahmed Sultan Al-Hegami 11
Perceptrons
 Basic unit of many neural networks
 Basic operation
 Input: vector of real-values
 Calculates a linear combination of inputs
 Output
 1 if result is greater than some threshold
 0 otherwise

AI Prof. Ahmed Sultan Al-Hegami 12


Perceptron cont….

 Input values -> Linear weighted sum -> Threshold

 Given real-valued inputs x1 through xn, the output o(x1,…,xn) computed by the
perceptron is
o(x1, …, xn) = 1 if w0 + w1x1 + … + wnxn > 0
-1 otherwise
where wi is a real-valued constant, or weight

AI Prof. Ahmed Sultan Al-Hegami 13


Learning
 From experience: examples / training data
 Strength of connection between the neurons

is stored as a weight-value for the specific


connection
 Learning the solution to a problem =

changing the connection weights

AI Prof. Ahmed Sultan Al-Hegami 14


Perceptron Learning Rule
 It’s a single-unit network
 Change the weight by an
amount proportional to the
difference between the
desired output and the
actual output. Input

Wi new = Wi old+ α *(ODesired-O)Xi

Actual output
Learning rate
Desired output

AI Prof. Ahmed Sultan Al-Hegami 15


Linearly Separable Pattern
Classification

AI Prof. Ahmed Sultan Al-Hegami 16


Non-Linearly Separable
Pattern Classification

AI Prof. Ahmed Sultan Al-Hegami 17


Assume Boolean (0/1) input values…
Implementing OR

x1 W1

 o

x2 W2

AI Prof. Ahmed Sultan Al-Hegami 18


Assume Boolean (0/1) input values…
Implementing OR
X1 X2 O desired
0 0 0
0 1 1
1 0 1
1 1 1

Truth Table of OR

AI Prof. Ahmed Sultan Al-Hegami 19


Training Steps in Perceptron

X1 X2 W1 old W2 old O desired O Error W1 W2


0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 1 0 1
1 0 0 1 1 0 1 1 1
1 1 1 1 1 1 0 1 1
0 0 1 1 0 0 0 1 1
0 1 1 1 1 1 0 1 1

x2

- +

- - x1
-
AI Prof. Ahmed Sultan Al-Hegami 20
Activation Functions
 Each neuron in the network
receives one or more input(s).
 An activation function is
applied to the inputs, which
determines the output of the
neuron – the activation level.

AI
1
f ( x) Prof. Ahmed e
;Sultan 2.718...
Al-Hegami f(x)=x 21
1  e x
Problems

 Perceptrons can only perform x1


accurately with linearly separable
classes
 ANN research put on hold for 20yrs.
x2
 Solution: additional (hidden) layers of
neurons, MLP architecture
 Able to solve non-linear classification x1
problems such as XOR

x2
AI Prof. Ahmed Sultan Al-Hegami 22
Solutions: Use Multi-layer
Perceptron
Feed-back Networks
Feed-Forward Neural Networks

Also known as:


The Multi-layer Perceptron
or
The Back-Propagation Neural Network

AI Prof. Ahmed Sultan Al-Hegami 23


Multi-layer Perceptrons
 Each input layer neuron connects to all neurons in
the hidden layer.
 The neurons in the hidden layer connect to all
neurons in the output layer.

Input Layer Hidden Layer Output Layer

1.0 Node 1 W1j

W1i
Node j
Wjk
W2j
0.4 Node 2 W2i Node k
Wik
Node i
W3j

0.7 Node 3 W3i

AI Prof. Ahmed Sultan Al-Hegami 24


Neural Nets
 Pro: More general than perceptrons
 Not restricted to linear discriminants
 Multiple outputs: one classification each
 Con: No simple, guaranteed training
procedure
 Use greedy, hill-climbing procedure to train
 “Gradient descent”, “Backpropagation”

AI Prof. Ahmed Sultan Al-Hegami 25


Neural Net Training
 Goal:
 Determine how to change weights to get correct
output
 Large change in weight to produce large reduction in
error
 Approach:
 Compute actual output: o
 Compare to desired output: d
 Determine effect of each weight w on error = d-o
 Adjust weights
AI Prof. Ahmed Sultan Al-Hegami 26
Backpropagation

 Multilayer neural networks learn in the same way


as perceptrons.
 However, there are many more weights, and it is
important to assign credit (or blame) correctly
when changing weights.
 Backpropagation networks use the sigmoid
activation function, as it is easy to differentiate:

AI Prof. Ahmed Sultan Al-Hegami 27


Backpropagation

 Greedy, Hill-climbing procedure


 Weights are parameters to change
 Slow
 Back propagation: Computes current output,
works backward to correct error

AI Prof. Ahmed Sultan Al-Hegami 28


Back propagation
 Desired output of the training examples
 Error = difference between actual & desired

output
 Change weight relative to error size

 Calculate output layer error , then propagate

back to previous layer


 Improved performance, very common!

AI Prof. Ahmed Sultan Al-Hegami 29


Training Method

I Wij J Wjk
K Ok
Oi Oj

Input Layer Hidden Layer Output Layer

AI Prof. Ahmed Sultan Al-Hegami 30


notations

 We use the Following notations:


 T (target): the actual output
 O (output): The output of every neuron at any layer
 f (activation function)
 η : learning rate
 W: weight
 δ : Error signal

AI Prof. Ahmed Sultan Al-Hegami 31


Training Method
 Step 1: start at the output layer
 Calculate the sumation of signals that enter to each output neuron (N)
 Nk = ∑j(Wjk Oj) ------------------------ (1)
 This value passes through neuron represented by activation function
and hence the output of every output neuron is as follows:
 Ok = 1/(1+e^ -NK)=f(Nk) ---------------------(2)
(This value represents the actual output that the network obtained
which has to be compared to the desired output to know the value of
error).
 Step 2: Computer the error value (δ) as follows:
 δk = (tk – Ok) f’(Nk)
=(tk – Ok) Ok (1– Ok) ---------------------(3)
 Update the weight between output and hidden layers (weights
change based on their contribution on this error) as follows:
Wjk  Wjk + η δk Oj ----------------------(4)

AI Prof. Ahmed Sultan Al-Hegami 32


 Step 3: at hidden layer neurons,
 Repeat the above process as follows:
 Compute the error in this layer as follows:
δj = Oj (1– Ok) ∑kWjk δk ---------------------(5)
 Update the weight between input layer and hidden layers (weights
change based on their contribution on this error) as follows:
Wij  Wij + η δj Oi --------------------(6)
 These 3 steps are repeated many times for all inputs
until the error of the network reaches to the minimum
error where the training process STOPS and
therefore the network becomes trained network.

AI Prof. Ahmed Sultan Al-Hegami 33


A Detailed Example
•The network to be trained
h1

W11 W10
x1
W12 W21 Output Layer (O)

x2 W20
W22
h2

(i) (h)
Input Layer Hidden Layer

AI Prof. Ahmed Sultan Al-Hegami 34


A Detailed Example
•The input/output used for training:
X1 X2 Target (t)
0 0 0
0 1 1
1 0 1
1 1 1
We select η=1 as learning rate for simplicity

AI Prof. Ahmed Sultan Al-Hegami 35


•We assume random weights and use the
first row in the I/O table
x1 x2 t W11 W12 W21 W22 W10 W20

0 0 0 1 0 0 1 1 1

We also use the following notations:

hi1: total inputs for 1st cell in the Hidden layer


hi2: total inputs for 2nd cell in the Hidden layer
ho1: output of 1st cell in the Hidden layer
ho2: output of 2nd cell in the Hidden layer

N: Total inputs to the cell of output layer


O: The actual output of the network

AI Prof. Ahmed Sultan Al-Hegami 36


•We obtain the following:
 hi1= W11x1+W21x2
= (1)(0)+(0)(0) = 0
 hi2= W12x1+W22x2
= (0)(0)+(1)(0) = 0

hO1= 1/(1+e^-hi1) ------------(1)


= 1/(e^-0) = 0.5

hO2= 1/(1+e^-hi2) ------------(2)


= 1/(e^-0) = 0.5
By using the first step in the algorithm, we get the total number of inputs that
entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (1)(0.5) + (1)(0.5) = 1

Therefore the actual output of the network:


O = 1/(1+e^-N)
= 1/(1+e^-1) = 0.73106 (which is far away from desired (target) output).

AI Prof. Ahmed Sultan Al-Hegami 37


As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
 δO = (t – O) O (1 – O)
= (0-0.73106) (0.73106)(1-0.73106)
= -0.14373
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10  W10 + η δO hO1
= 1+(1)(-0.14373)(0.5) = 0.92813
W20  W20 + η δO hO2
= 1+(1)(-0.14373)(0.5) = 0.92813
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)

AI Prof. Ahmed Sultan Al-Hegami 38


determine the error that the hidden layer contributed using equation (5) of step 3 of the algorithms as
follows:
 δh1= hO1(1 – hO1)W10δO

= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335
 δh2= hO2(1 – hO2)W20δO

= (0.5)(1-0.5)(0.92813)(-0.14373)
= -0.03335

By this error value, we can update the weights between hidden and input layers using equation (6) of
step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(-0.03335)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(-0.03335)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(-0.03335)(0) = 0
W22 W22 + η δh2 x2
= 1+(1)(-0.03335)(0) = 1

Notice that, the weights have not been changed as it is normal, due to the initialization of inputs to
ZERO

AI Prof. Ahmed Sultan Al-Hegami 39


The following table shows the results after
training the network only once:

x1 x2 t W11 W12 W21 W22 W10 W20

0 0 0 1 0 0 1 0.92813 0.92813

AI Prof. Ahmed Sultan Al-Hegami 40


 Now, we consider the second ROW of the target Table, and continue
training process of the network by using the same steps:
 And using the following data in the training:
x1 = 0, x2 = 1, t = 1
 Also using the weights obtained in the previous stage of training, We obtain:
 hi1= W11x1+W21x2
= (1)(0)+(0)(1) = 0
 hi2= W12x1+W22x2
= (0)(0)+(1)(1) = 1

hO1= 1/(1+e^-hi1)
= 1/(e^-0) = 0.5

hO2= 1/(1+e^-hi2)
= 1/(e^-1) = 0.73106
By using the first step in the algorithm, we get the total number of inputs that entered unto the output cell as follows:
N = W10hO1 + W20hO2 ------------------(3)
= (0.92813)(0.5)+(0.92813)(0.73106) = 1.1426

Therefore the actual output of the network:


O = 1/(1+e^-N)
= 1/(1+e^-1.1426) = 0.7582 (which is far away from desired (target) output).

AI Prof. Ahmed Sultan Al-Hegami 41


As the actual output is far away from target, we have to modify
the weights to be close from target. To determine the error in
the result, we use step 2 of the algorithm as follows:
 δO = (t – O) O (1 – O)
= (1- 0.7582) (0. 0.7582)(1- 0.7582)
= -0.04435
By this error value, we can update the weights between hidden
and output layers using equation (3) of step 2 in the
algorithm, as follows:
W10  W10 + η δO hO1
= 0.92813+(1)(0.04435)(0.5) = 0.95030
W20  W20 + η δO hO2
= 0.92813+(1)(0.04435)(0.73106) = 0.96056
(at this point we Back Propagate from output layer to hidden layer, and
in the same fashion, propagate to input layer)

AI Prof. Ahmed Sultan Al-Hegami 42


determine the error that the hidden layer contributed using equation (5) of step 3 of the
algorithms as follows:
 δh1= hO1(1 – hO1)W10δO

= (0.5)(1-0.5)(0.9503)(0.04435)
= -0.01054
 δh2= hO2(1 – hO2)W20δO

= (0.73106)(1-0.73106)(0.96056)(0.04435)
= 0.00838

By this error value, we can update the weights between hidden and input layers using
equation (6) of step 3 of the algorithm, as follows:
W11 W11 + η δh1 x1
= 1+(1)(0.01054)(0) = 1
W12 W12 + η δh2 x1
= 0+(1)(0.00838)(0) = 0
W21 W21 + η δh1 x2
= 0+(1)(0.01054)(1) = 0.01054
W22 W22 + η δh2 x2
= 1+(1)(0.00838)(1) = 1.00838

AI Prof. Ahmed Sultan Al-Hegami 43


The following table shows the results after
training the network the second time:

x1 x2 t W11 W12 W21 W22 W10 W20

0 1 1 1 0 0.01054 1.00838 0.9503 0.96056

AI Prof. Ahmed Sultan Al-Hegami 44


The training process have to be repeated many times until we obtain the
MINIMUM error. The following table shows the results after training the
network approximately 1000 times .
As you notice from the Table bellow, the actual outputs
are very near to the desired (target) output.

W11 W12 W21 W22 W10 W20

-3.5402 4.0244 -3.5248 4.5814 -11.9103 4.6940

AI Prof. Ahmed Sultan Al-Hegami 45


The comparison of the actual and desired
(target) output is shown in the table
bellow:

X1 X2 Target (t) Output (O)

0 0 0 0.0264
0 1 1 0.9867
1 0 1 0.9863
1 1 1 0.9908

AI Prof. Ahmed Sultan Al-Hegami 46


Evolving networks
 Continuous process of:
 Evaluate output
 Adapt weights “Learning”
 Take new inputs
 ANN evolving causes stable state of the
weights, but neurons continue working:
network has ‘learned’ dealing with the
problem

AI Prof. Ahmed Sultan Al-Hegami 47


Where are NN used?

 Recognizing and matching complicated,


vague, or incomplete patterns
 Data is unreliable

 Problems with noisy data


Prediction

Classification

Data association
Filtering

Planning
AI Prof. Ahmed Sultan Al-Hegami 48
Applications

 Prediction: learning from past experience


 pick the best stocks in the market
 predict weather
 identify people with cancer risk
 Classification
 Image processing
 Predict bankruptcy for credit card companies
 Risk assessment

AI Prof. Ahmed Sultan Al-Hegami 49


Applications

 Recognition
 Pattern recognition: SNOOPE (bomb detector in
U.S. airports)
 Character recognition
 Handwriting: processing checks
 Data association
 Not only identify the characters that were scanned
but identify when the scanner is not working
properly
AI Prof. Ahmed Sultan Al-Hegami 50
Applications

 Data Filtering
e.g. take the noise out of a telephone signal, signal
smoothing
 Planning
 Unknown environments
 Sensor data is noisy
 Fairly new approach to planning

AI Prof. Ahmed Sultan Al-Hegami 51


Strengths of a Neural Network

 Power: Model complex functions, nonlinearity built


into the network
 Ease of use:
 Learn by example
 Very little user domain-specific expertise needed
 Intuitively appealing: based on model of biology,
will it lead to genuinely intelligent computers/robots?

Neural networks cannot do anything that cannot be


done using traditional computing techniques, BUT
they can do some things which would otherwise be
very difficult.
AI Prof. Ahmed Sultan Al-Hegami 52
General Advantages
 Advantages
 Adapt to unknown situations
 Robustness: fault tolerance due to network
redundancy
 Autonomous learning and generalization
 Disadvantages
 Not exact
 Large complexity of the network structure

AI Prof. Ahmed Sultan Al-Hegami 53


Status of Neural Networks

 Most of the reported applications are


still in research stage
 No formal proofs, but they seem to

have useful applications that work

AI Prof. Ahmed Sultan Al-Hegami 54


Conclusions

 Simulationbased on neurons in brain


 Perceptrons (single neuron)
 Guaranteed to find linear discriminant
 IF one exists -> problem XOR
 Neural nets (Multi-layer perceptrons)
 Very general
 Backpropagation training procedure

AI Prof. Ahmed Sultan Al-Hegami 55

You might also like