0% found this document useful (0 votes)
55 views26 pages

Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar

The document discusses artificial neural networks and machine learning problems. It introduces neural networks as complex non-linear models that can handle problems with large numbers of features that are non-linearly separable. The document describes how artificial neural networks are modeled after biological neurons and can be used for tasks like image recognition. It explains the basic perceptron model and learning rule, and how multi-layer neural networks allow for non-linear separability through the use of hidden layers and activation functions beyond the sign function.

Uploaded by

Yugesh Sapkota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views26 pages

Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar

The document discusses artificial neural networks and machine learning problems. It introduces neural networks as complex non-linear models that can handle problems with large numbers of features that are non-linearly separable. The document describes how artificial neural networks are modeled after biological neurons and can be used for tasks like image recognition. It explains the basic perceptron model and learning rule, and how multi-layer neural networks allow for non-linear separability through the use of hidden layers and activation functions beyond the sign function.

Uploaded by

Yugesh Sapkota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Intelligent Systems

Artificial Neural Networks

Slides are by: Tan, Steinbach, Karpatne, Kumar

02/14/2018 Introduction to Data Mining, 2nd Edition 1


Machine learning Problem

 Non-linear Classification Complex non-linear hypotheses (only 2 features)

Large number of features i.e.100

Quadratic features has complexity of


O(n2)~n2/2 i.e, 5000 features

02/14/2018 Introduction to Data Mining, 2nd Edition 2


Machine learning Problem

 What is this?
 You see this:

02/14/2018 Introduction to Data Mining, 2nd Edition 3


Computer vision : Car detection

 Car Detection Training labeled pictures

02/14/2018 Introduction to Data Mining, 2nd Edition 4


02/14/2018 Introduction to Data Mining, 2nd Edition 5
Neuron in the brain

(INPUT)

(OUTPUT)

02/14/2018 Introduction to Data Mining, 2nd Edition 6


Artificial Neural Networks (ANN)

X 1 X2 X3 Y Input Black box


1 0 0 -1
1 0 1 1
X1
1 1 0 1 Output
1 1 1 1
0 0 1 -1
X2 Y
0 1 0 -1
0 1 1 1 X3
0 0 0 -1

Output Y is 1 if at least two of the three inputs are equal to 1.

02/14/2018 Introduction to Data Mining, 2nd Edition 7


Artificial Neural Networks (ANN)
X0 1
 0   0 . 4 ( bias unit )
Input
nodes Black box
X 1 X2 X 3 Y
1 0 0 -1 Output
1 0 1 1
X1 0.3 node
1 1 0 1
0.3 hY
 (x )
1 1 1 1
0 0 1 -1
X2 

0 1 0 -1
0 1 1 1 X3 0.3 t=0.4
0 0 0 -1

h ( x )  sign ( 0 . 3 X 1  0 . 3 X 2  0 . 3 X 3  0 . 4 )
1 if x  0
where sign ( x )  
 1 if x  0

02/14/2018 Introduction to Data Mining, 2nd Edition 8


Artificial Neural Networks (ANN)

Input
 Model is an assembly of nodes
inter-connected nodes and Black box
Output
weighted links X1 w1 node
w2
X2  hY (x)
 Output node sums up w3

each of its input value X3 t


according to the weights
of its links Perceptron Model
d
h ( x)  sign( ( wi X i )  w0 X 0
i 1
d
 sign( wi X i )
i 0
02/14/2018 Introduction to Data Mining, 2nd Edition 9
Perceptron

 Single layer network


– Contains only input and output nodes

 Activation function: h(x)= sign(wx)

 Applying model is straightforward


h ( x)  sign(0.3 X 1  0.3 X 2  0.3 X 3  0.4)
 1 if x  0
where sign( x)  
 1 if x  0

– X1 = 1, X2 = 0, X3 =1 => y = sign(0.2) = 1
02/14/2018 Introduction to Data Mining, 2nd Edition 10
Perceptron Learning Rule

 Initialize the weights (w0, w1, …, wd)


 Repeat
– For each training example (xi, yi)
 Compute f(w, xi)
 Update the weights:
 
w( k 1)  w( k )   yi  f ( w( k ) , xi ) xi

 Until stopping condition is met

02/14/2018 Introduction to Data Mining, 2nd Edition 11


Perceptron Learning Rule

 Weight update formula:


 
w( k 1)  w( k )   yi  f ( w ( k ) , xi ) xi ;  : learning rate

 Intuition:
– Update weight based on error: e   yi  f ( w( k ) , xi )
– If y=f(x,w), e=0: no update needed
– If y>f(x,w), e=2: weight must be increased so
that f(x,w) will increase
– If y<f(x,w), e=-2: weight must be decreased so
that f(x,w) will decrease
02/14/2018 Introduction to Data Mining, 2nd Edition 12
Example of Perceptron Learning


w ( k 1)  w ( k )   y i  f ( w ( k ) , x i ) x i 
d
Y  sign(  wi X i )
i 0

  0 .1
X1 X2 X3 Y w0 w1 w2 w3 Epoch w0 w1 w2 w3
1 0 0 -1 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 -0.2 -0.2 0 0 1 -0.2 0 0.2 0.2
2 0 0 0 0.2 2 -0.2 0 0.4 0.2
1 1 0 1
3 0 0 0 0.2
1 1 1 1 3 -0.4 0 0.4 0.2
4 0 0 0 0.2
0 0 1 -1 5 -0.2 0 0 0 4 -0.4 0.2 0.4 0.4
0 1 0 -1 6 -0.2 0 0 0 5 -0.6 0.2 0.4 0.2
0 1 1 1 7 0 0 0.2 0.2 6 -0.6 0.4 0.4 0.2
0 0 0 -1 8 -0.2 0 0.2 0.2

02/14/2018 Introduction to Data Mining, 2nd Edition 13


Perceptron Learning Rule

 Since f(w,x) is a linear


combination of input
variables, decision
boundary is linear

 For nonlinearly separable problems, perceptron


learning algorithm will fail because no linear
hyperplane can separate the data perfectly

02/14/2018 Introduction to Data Mining, 2nd Edition 14


General Structure of ANN

x1 x2 x3 x4 x5

Input
Layer Input Neuron i Output
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
Hidden g(Si )
Layer I3

threshold, t

Output Training ANN means learning


Layer the weights of the neurons

02/14/2018 Introduction to Data Mining, 2nd Edition 15


Nonlinearly Separable Data

XOR Data

y  x1  x2
x1 x2 y
0 0 -1
1 0 1
0 1 1
1 1 -1

02/14/2018 Introduction to Data Mining, 2nd Edition 16


Multilayer Neural Network

 An artificial neural network has a more complex


structure than that of a perceptron model.
– Hidden layers: intermediary layers between input &
output layers
– The network may use types of activation functions
other than the sign function.
– Examples of other activation functions include
sigmoid (logistic), and hyperbolic tangent
functions.
– These activation functions allow the hidden and output
nodes to produce output values that are nonlinear in
their input parameters.

02/14/2018 Introduction to Data Mining, 2nd Edition 17


Artificial Neural Networks (ANN)

 Various types of neural network topology


– single-layered network (perceptron) versus
multi-layered network
– Feed-forward versus recurrent network

 Various types of
activation functions (f)

h ( x)  f ( wi X i )
i

02/14/2018 Introduction to Data Mining, 2nd Edition 18


Multi-layer Neural Network

 Multi-layer neural network can solve any type of classification task


involving nonlinear decision surfaces. hyperplanes
XOR Data

Where σ is a sigmoid function

02/14/2018 Introduction to Data Mining, 2nd Edition 19


Learning Multi-layer Neural Network

 Can we apply perceptron learning rule to each


node, including hidden nodes?
– Perceptron learning rule computes error term
e = y-f(w,x) and updates weights accordingly
 Problem: how to determine the true value of y for
hidden nodes?
– Approximate error in hidden nodes by error in
the output nodes
 Problem:
– Not clear how adjustment in the hidden nodes affect overall
error
– No guarantee of convergence to optimal solution

02/14/2018 Introduction to Data Mining, 2nd Edition 20


Learning the ANN model (Multilayer)

 The goal of the ANN learning algorithm


is to determine a set of weights w that
minimize the total sum of squared
errors:

 sum of squared errors depends on w


because the predicted class is a
function of the weights assigned to the
hidden and output nodes.
 error surface is typically encountered
when is a linear function of its
parameters, w.
 i.e. when the error function
becomes quadratic in its parameters and
a global minimum solution can be easily
found.
02/14/2018 Introduction to Data Mining, 2
nd
Edition 21
Learning the ANN model (Multilayer)

 In most cases, the output of an ANN is a


nonlinear function of its parameters because of
the choice of its activation functions (e.g., sigmoid
or tanh function).
 As a result, it is no longer straightforward to
derive a solution for w that is guaranteed to be
globally optimal.
 Greedy algorithms such as those based on the
gradient descent method have been developed to
efficiently solve the optimization problem.

02/14/2018 Introduction to Data Mining, 2nd Edition 22


Learning the ANN model (Multilayer)

 The weight update formula used by the gradient descent method can
be written as follows:

 where λ is the learning rate.


 The second term states that the weight should be increased in a
direction that reduces the overall error term.

 For hidden nodes, the computation is not trivial because it is difficult


to assess their error term , without knowing what their output
values should be.
 A technique known as back-propagation has been developed to
address this problem.

02/14/2018 Introduction to Data Mining, 2nd Edition 23


Design Issues in ANN

 Number of nodes in input layer


– One input node per binary/continuous attribute
– k or log2 k nodes for each categorical attribute with k
values
 Number of nodes in output layer
– One output for binary class problem
– k nodes for k-class problem
 Number of nodes in hidden layer
 Initial weights and biases, random assignment are usually
acceptable.
 Training examples with missing values should be removed
or replaced with most likely values.

02/14/2018 Introduction to Data Mining, 2nd Edition 24


Design Issues in ANN

 Number of nodes in hidden layer:


– start from a fully connected network with a sufficiently
large number of nodes and hidden layers, and then
repeat the model-building procedure with a smaller
number of nodes.
– Alternatively, instead of repeating the model-building
procedure, we could remove some of the nodes and
repeat the model evaluation procedure to select the
right model complexity.
 Initial weights and biases: Random assignment
 Training examples with missing values should be
removed or replaced with most likely values.

02/14/2018 Introduction to Data Mining, 2nd Edition 25


Characteristics of ANN

 Multilayer ANN are universal approximators but could


suffer from overfitting if the network is too large.
 Gradient descent may converge to local minimum. One
way to escape from the local minimum is to add a
momentum term to the weight update formula.
 Model building can be very time consuming, but testing
can be very fast
 Can handle redundant attributes because weights are
automatically learnt
 Sensitive to noise in training data
 Difficult to handle missing attributes

02/14/2018 Introduction to Data Mining, 2nd Edition 26

You might also like