Self Organizing map
Self Organizing map
Mexican hat
4.2 Grid Topologies
Rectangular grid: Neurons arranged in square blocks.
Hexagonal grid: Neurons arranged with 6 immediate neighbors—better topological
properties.
How It Works: Training Algorithm
Here's a step-by-step explanation of how SONNs (like SOM) learn:
Step 1: Initialize
Each neuron in the map is assigned a random weight vector (same size as input vector).
Step 2: Input a Data Vector
Present an input vector X to the network.
Step 3: Find Best Matching Unit (BMU)
Compute the Euclidean distance between X and all the neuron weight vectors.
The neuron with the smallest distance is the BMU (winner).
Example Detection:
DDoS attack sessions may generate features that land far from typical web browsing
sessions.
Port scanning behavior (many small, fast sessions) may cluster in a different zone.
Advantages of SONNs in Intrusion Detection:
No prior knowledge of attack types is needed.
Visual clustering allows the admin to explore traffic patterns.
Unsupervised learning makes it scalable and adaptable to evolving threats.
Visualization (Example):
You could visualize the SOM as a 2D heatmap:
[ ] [ ] [ ] [ ] [ ] [!] [ ] [ ]
[][][][][][][][]
[ ] [ ] [N] [N] [N] [ ] [ ] [ ]
[ ] [ ] [N] [N] [N] [ ] [ ] [ ]
[][][][][][][][]
[N] = Cluster of normal traffic
[!] = Isolated node with anomalous (potentially malicious) traffic
4. Example: Clustering Colors
4.1 Task
Cluster RGB color vectors into a 2D map to group similar colors together.
4.2 Implementation
Input: 3D RGB vectors, e.g., [255, 0, 0] for red.
Map: 10x10 SOM grid.
Result: Each neuron learns to represent a color. Similar colors are grouped together.
4.3 Visualization
Imagine a 10x10 grid where:
Top-left shows red shades.
Bottom-right shows blue-green shades.
Neighboring colors are perceptually similar.
5. Mathematical Insights
5.1 Distance Metrics
Euclidean distance:
5.2 Neighborhood Function
Gaussian:
where:
o rb and rj are positions of BMU and neuron jon the grid.
7
o Result: After training, the SOM can be analyzed to identify distinct customer
segments. For instance, some neurons may correspond to high-spending
customers, while others correspond to price-sensitive buyers.
4. Feature Mapping
Example: Mapping High-dimensional Sensor Data to a 2D Grid
Scenario: A robot uses multiple sensors to measure various environmental features, and
we want to map this high-dimensional data to a 2D grid.
o Task: Use a SOM to map high-dimensional sensor inputs (e.g., temperature,
humidity, distance, light intensity) to a 2D grid of neurons.
o Process: As the SOM trains, it learns to organize the sensor data into a topology-
preserving map. Neurons in the same region of the map may correspond to similar
sensor data (e.g., regions with low temperature and high humidity).
o Result: The 2D map can be used to identify specific environmental patterns, such
as zones of high humidity or areas with particular light intensity, which can assist
in robot navigation or environment monitoring.
5. Data Visualization
Example: Visualizing High-dimensional Data of Wine Types
Scenario: Visualizing the relationships between different types of wines based on their
chemical composition.
o Task: Apply SOM to the chemical features of various wines, which typically
involve measurements like alcohol content, acidity, pH, and phenolic compounds.
o Process: After training the SOM on this data, the neurons in the SOM grid will
represent different types of wines. Similar wines will be grouped close together
on the map, preserving the relationship between their chemical properties.
o Result: The SOM can be visualized as a 2D map where wines with similar
chemical properties are located near each other, helping researchers or
winemakers identify patterns and trends.
These examples illustrate how Self-Organizing Neural Networks can be applied across various
domains for pattern recognition, dimensionality reduction, clustering, feature mapping, and
data visualization. The key strength of SONNs lies in their ability to process and reveal patterns
in data without the need for labeled information, making them a versatile tool for exploratory
data analysis and unsupervised learning tasks.
Advantages and Limitations
Advantages
Topology preservation: Similar inputs map to nearby neurons.
No need for labeled data
Dimensionality reduction: High-dimensional input is visualized in 1D or 2D.
Intuitive visualization: Especially useful for exploring unknown data.
Limitations
Sensitive to initialization
9