0% found this document useful (0 votes)
34 views

Modul 8 (ANN1)

The document discusses several machine learning clustering and classification algorithms, including k-means clustering, radial basis function (RBF) networks, probabilistic neural networks (PNN), and self-organizing maps (SOM). It provides information on how each algorithm works, such as how k-means clustering assigns data points to centroids to minimize distance, how PNN uses Gaussian basis functions to classify data, and how SOM uses competitive learning on a neural network to project high-dimensional data onto a lower-dimensional display. Issues with algorithms like local minima and empty neurons in SOM are also mentioned.

Uploaded by

api-19755462
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Modul 8 (ANN1)

The document discusses several machine learning clustering and classification algorithms, including k-means clustering, radial basis function (RBF) networks, probabilistic neural networks (PNN), and self-organizing maps (SOM). It provides information on how each algorithm works, such as how k-means clustering assigns data points to centroids to minimize distance, how PNN uses Gaussian basis functions to classify data, and how SOM uses competitive learning on a neural network to project high-dimensional data onto a lower-dimensional display. Issues with algorithms like local minima and empty neurons in SOM are also mentioned.

Uploaded by

api-19755462
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Module 8

k-means
RBF networks
PNN
SOM
Non-Hierarchical Cluster Analysis: k-means

+ B

+
A +: centroid

1. Select number of clusters (centroids)


2. Calculate centroids µk of partitions Mk
3. Assign cluster members x to centroids
4. Minimize distance function K
D = ∑∑ ( xk − µ k ) → Min.
2

k =1 x k

http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/kmeans.html
Example: k-means Clustering

Data
ID x1 x2 B2
A1 1 1 + B3
A2 2 1 B1
B1 4 5 x2
B2 5 7
B3 7 7
A1
A2
+
• 2 centroids (k = 2) x1
• Euclidian Distance
Cluster Boundary
(Classifier)
k-Nearest Neighbors

• Pick k nearest objects to a reference point


• Problem: reasonable choice of K

x2 k=3
k=6

x1
Kernel-based Nearest Neighbors

• Pick nearest objects to a reference point according


to a Kernel function
• Problem: reasonable choice of the Kernel & parameters

Kernel function
f( A,B) → scalar value
f( A,B) = 0 if A = B
f( A,B) > 0 if A ≠ B + µB
x2
σB
Gaussian Kernel φ

 x −µ 2
 A σA
 j 
Φ j ( x ) = exp − 
 2σ 2j
  x1

Kernel Discrimination Methods


Probabilistic Neural Network (PNN)

one standardized Gaussian basis function placed on the


location of each pattern (xi = µi)

 x−x 2

1  

class , j
yclass = f class (x) = exp − 
j∈CLASS
D D
M (2π ) σ j  2σ 2j
 

φ1
Optional: softmax
x1 y1 exp( y class )
zclass = C

xd yc ∑ exp( y
k =1
k )

smoothed version of
Inputs φM Outputs „winner-take-all“
Basis
functions
Probabilistic Neural Network (PNN)

cytoplasmic proteins (class1) fclass1(x)


secreted proteins (class2) fclass2(x) decision boundary
property 2

property 1
P(x)

<l
ip
op
hi
lic
ity
> <volume>
Radial Basis Function (RBF) Network

M
y ( x ) = ∑ w kj Φ j ( x ) + w k 0
j =1
Gaussian basis function φ
M basis functions φ
 x −µ 2
  ( x − µ j )T ( x − µ j ) 
φ1  j   
Φ j ( x ) = exp −  = exp −
 2σ 2j 2σ j 2 
   

x1 y1
Standardized Gaussian φ
w
xd yc  x −µ 2

1  j 
Φ j (x ) = exp − 
(2π )D σ Dj  2σ 2j
 
Inputs φM Outputs
D : dimension of X
Basis
functions
The Self-Organizing Map (SOM)
Data Analysis by Self-Organizing Maps
(Kohonen networks)

X
SOM
Properties of Kohonen Networks

• Projection of high-dimensional space


• Self-organized feature extraction and cluster formation
• Non-linear, topology-preserving mapping

Other Projection Techniques

• Principal Component Analysis (linear)


• Projection to Latent Structures (linear, non-linear)
• Encoder networks (non-linear)
• Sammon mapping (non-linear)
Architecture of Kohonen Networks

w A=6

(A • B) neuron array
x1
x2
Input
x3
B=5
x4

Output
1/0

One neuron fires at a time


Neighborhood Definition in Kohonen Networks

Square
neuron array

second neighborhood
first neighborhood
Hexagonal central (active) neuron
neuron array
Toroidal Topology of 2D-Kohonen Maps

An “endless plane”
Competitive Learning
1. Randomly select an input pattern x
2. Determine the “winner” neuron (competitive stage)
dim 
i* ← min  ∑ x j − wij ; i = 1,2, ... , n
( 2
)
 j =1 

3. Update network weights (cooperative stage)

 wold + η x
 ij j
if i ∈ N i * (Normalization of w)
new  w old + η x
wij =  i

 wijold if i ∉ N i *

4. Goto Step 1 or terminate


Scaling Functions “connection kernel”

h
• Neighborhood (N) correction

 d1 (r , s ) 2  s
h(t , r , s ) = exp  −  r
 2σ 2 (t ) 
 
h
t / t max
 σ fin 
σ (t ) = σ ini  

 σ ini  s
r

•Time-dependent learning rate h


“Mexican Hat”
t / t max
 η fin  (not in SOM)
η (t ) = ηini  
 s
 η ini  r
Vectorial Representation of Competitive Learning

Neuron 2 x2 x2

Neuron 2

x1 x1

Neuron 1
Neuron 1 unit sphere

Learning Time
SOM Adaptation to Probability Distributions

t = 500

t = 400

t = 300

t = 200

t = 100
0 Learning time

B
Voronoi
tesselation
A
SOM - Issues

• Neighboring neurons code for neighboring input positions,


but the inverse does not hold
• Best results: input dimensionality = lattice dimensionality

• Neighborhood decay & shape local minima

• Problems with capturing the fine-structure of input space


(oversampling of low-probability regions)
• “dead” or “empty” neurons

• features are not invariant to, e.g., translations of the input signal
Mapping Chemical Space: “Drugs” and “Nondrugs”

120-dimensional data, Ghose & Crippen parameters


5’000 drugs, 5’000 nondrugs (Sadowski & Kubinyi, 1998)
Visualizing Combinatorial Libraries (UGI)
O O
+
R1 N C + R2 + R3 NH2 + R4
H OH

R2 O R2
MeOH / RT H H
N + N
R1 R4 R1 NH
O R3 O R3

7 Thrombin binding assay


6
IC50 < 10 µM
5
PC2

4
3
2
1
PC1 1 2 3 4 5 6 7

PCA Kohonen-Map
Self-organizing neural networks demo
1) University of Bochum
http://www.neuroinformatik.ruhr-uni-
bochum.de/ini/VDM/research/gsn/DemoGNG/GNG
2) SOMMER

Link on modlab software page


www.modlab.de

You might also like