0% found this document useful (0 votes)

5 views

Lecture UnsupervisedML_SOM

The document discusses Self-Organizing Maps (SOM), an unsupervised machine learning method that enables dimensional reduction and visualization of complex datasets, particularly useful in material informatics. It outlines the SOM algorithm, its advantages over other methods like K-Means and PCA, and how it can be combined with these methods for enhanced data analysis. Additionally, it introduces various implementations of SOM, including augmented SOMPY and MiniSOM, and highlights their applications in materials research.

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lecture UnsupervisedML_SOM

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Unsupervised ML SOM

and how to choose a

Data Science method
Quick Review of the methods we learned

> Statistical analysis

> Supervised ML
– Linear regression
– NN,
– KNN,
– Decision Tree
– SVM
> Unsupervised ML
– K-Means Clustering
– PCA
– ….. Why another one,
Why another method, SOM

> Demonstrate some data science methods that are not

widely used or well known, but also can be very useful
for material informatic study
> Introduce a method I have used, and feel is adequate
to the uniqueness of many materials study
applications.
> Demonstrate how various data science methods can
be used together to drive improved results
> Demonstrate a few projects using the same methods
so that we can understand a methods from user point
of view
What is Self-Organizing Map (SOM)

> An Unsupervised ML method

> Dimensional reduction, enabling powerful
visualizations of the data:
– K-Means does clustering, but neither dimensionality
reduction nor visualization
– PCA does dimensionality reduction, enabling visualization to
certain level (not applicable if the first 3 principal
components won’t represent the data well), however, it
does not perform clustering. Besides, the visualization does
not keep the original topographic information.
> Give some insights into how data is clustered in high
dimensions
What is SOM

> You can think of SOM as an artificial neural network

with a single neuronal layer, whose neurons are
arranged in a two-dimensional matrix.
– The 2D matrix can been seen as a position map that
captures the characteristics of the data
> Merits of SOM
– Effective in training big datasets
– Since this is a 2D matrix, visualization of the resulting map
is possible
– kept the topography of the original data,
– Possible to present the Euclidean distance between data
points
Algorithm of SOM
– Normalization of the input data, all features will be distributed more
balancely
– Initialization: each (x,y) position in the map is assigned a weight for each
input neuron, thus associating a weight vector for each map position.

– Iteration:
> Choose a sample from dataset
> Calculate Euclidean distance between that sample and each weight vector
> The (x,y) position ”closest” to the sample is declared the Best Matching Unit

> The weights vector for the BMU get adjusted to more closely match the sample.
Amount of adjustment (learning) decreases as we go through iterations

> The weights vector for neighbors of the BMU also get adjusted, to a lesser extent.
The number of neighbors and how much they get adjusted also depends on
hyperparameters and the number of iterations.

– Convergence:
> Max number of iterations
> Monitoring of topological error
– Reference: https://link.springer.com/article/10.1007/BF00337288
Self-Organizing Map (SOM)

How does it work?

𝑎!
𝑏!
𝑐!
𝑥! = 𝑑
!
𝑒!
𝑓!
Two Dimensional Mesh structure

Each connection can deform

a
1 11 12

b 6 10
f
2
4 8
3
9
5 7
c
e
d
a
1 11 12

b 6 10
f
2
4 8
3
9
5 7
c
e
d
Self-Organizing Map (SOM) Algorithm

> Dragging Nodes

> “Flattening a crumpled paper”
U-matrix and how to use it to get insights for
clustering

> After training, the nodes in

the 2D key map are not
evenly distributed. The
adjacent data point might
not be similar to each
other in the higher
dimension space.
> U-matrix use the concept
of the heatmap to
illustrate the distance in
Euclidean space
Using SOM in conjunction with other methods

> Since this is a dimensionality

reduction method, for smaller
dataset, you can initialize your
SOM map using the first 2
Principal components,
essentially the 2D PCA map
> K-means can also be run on the
same dataset, and
corresponding clusters can be
visualized on SOM map.
K-Means clustering and U-Matrix
They can be compared to validate the results!

> SOM can provide a means to visualize K-Means!

> If the boundary matches well, then the training is
successful
Different Implementations of SOM

> SOM is just an algorithm, there are many

packages you can use that implement it
> We will introduce
– An augmented version of SOMPY, a version our group has
contributions on
– MiniSOM
The uniqueness and functions of augmented
SOMPY

https://github.com/DataScienceUWMSE/SOM

> Utilizes PCA for initialization, and include K-Means

Clustering overlay
> “Heat maps” provide a way to visualize each
feature after training
> Projection function helps users find additional
correlations or patterns among features,
including for categorical data
“heatmap” concept

> Map each node’s

weight onto the 2D
map
> Number of heat maps
equals to number of
input variables
Example of utilizing the
heatmap on materials research
Example 1 Granta Data Set: Experimental Commercial Materials
Property Dataset
> Training data set
contains 398 commercial
materials and 21
numerical properties
Example of utilizing the heatmap on materials
research
Example 1 Granta Data Set: Experimental Commercial Materials
Property Dataset (continue)
Project information concept

> Overlay one specific data

property onto SOM, can
use even categorical
values
> Easily identify patterns
Example of utilizing the project function on
materials research

Example 1 Granta Data Set: Experimental Commercial Materials

Property Dataset (continue), finding the outliers’ uniqueness
Example of utilizing the projection function on
materials research

Example 2 OPV materials study using an experimental dataset

Reference Y.Huang, J. Phys. Chem. C 2020, 124, 12871−12882

> Dataset includes 1203 donor

polymers of Donor-Acceptor
pairs, with properties
related to the proficiency of
the charge transfer.
Molecular Descriptors

Python package of Molecular Descriptor

> There are Python tools to extract molecular

structural or geometrical information from
notation of molecule, such as SMILES (Simplified
molecular-input line-entry system)
> We will introduce Mordred, (covered in the Hands-
on session)
The advantage of using MiniSOM

> SOMPY is not as easy to use as the other packages

introduced in this class.
– The Augmented SOMPY has contribution from a few
Materials Science researchers in our group, including
your TA Jimin, Qian
> MiniSOM is relatively easier to use, well
documented and constantly maintained, and
have the basic implementation of the SOM
algorithm
What MiniSOM provides

> It has :
– The core implementation of SOM
– Visualization
– U-Matrix (“distance map” in MiniSOM)
– Project certain feature onto SOM

> Doesn’t have:

– PCA initialization
– Cannot generate heatmap for each features
– K-Means clustering,
Hyperparameters of SOM

> Length of input vectors (the number of properties)

> Map size, the most important one
> Map topology – rectangular or hexagonal
– Important in defining the notion of “neighbors”
> Sigma – spread of the neighborhood function
> Learning Rate – initial learning rate, decreases with the
number of iterations
> Decay function – defines how much learning rate and sigma
decrease with the number of iterations
> Neighborhood function – defines how much neighbors of
the BMU get impacted at each iteration (eg gaussian,
bubble,…)
> Activation distance function (eg Euclidean distance)
> Initialization method – random or PCA
Hands-on session and HW for this week

Beginners Guide To Anomaly Detection Using Self Organizing Maps
No ratings yet
Beginners Guide To Anomaly Detection Using Self Organizing Maps
10 pages
unit 4 5 NN
No ratings yet
unit 4 5 NN
15 pages
Soft Organizing Maps
No ratings yet
Soft Organizing Maps
13 pages
Ann Som
No ratings yet
Ann Som
46 pages
Kohonen Self Organizing Feature Map Algorithm
No ratings yet
Kohonen Self Organizing Feature Map Algorithm
2 pages
Self-Organizing Maps
No ratings yet
Self-Organizing Maps
12 pages
A Hybrid Parallel SOM Algorithm For Large Maps in Data-Mining
No ratings yet
A Hybrid Parallel SOM Algorithm For Large Maps in Data-Mining
11 pages
RIO Marie Mai 2018 HAL
No ratings yet
RIO Marie Mai 2018 HAL
21 pages
8-Som With E-Miner
No ratings yet
8-Som With E-Miner
8 pages
Self-Organizing Maps (SOM) : Dr. Saed Sayad
No ratings yet
Self-Organizing Maps (SOM) : Dr. Saed Sayad
14 pages
Unsupervised Clustering
No ratings yet
Unsupervised Clustering
30 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Self Organizing Map
No ratings yet
Self Organizing Map
4 pages
Self-Organizing Map Demystified - Peter Leow
No ratings yet
Self-Organizing Map Demystified - Peter Leow
71 pages
Clustering Technique: Mohammad Ali Joneidi
No ratings yet
Clustering Technique: Mohammad Ali Joneidi
3 pages
Self-Organizing Maps (SOM)
No ratings yet
Self-Organizing Maps (SOM)
22 pages
A Distributed Approach For Supervised Som and Application To Facies Classification
No ratings yet
A Distributed Approach For Supervised Som and Application To Facies Classification
6 pages
Kohonen Self Organizing Maps
No ratings yet
Kohonen Self Organizing Maps
36 pages
SOM_Unit
No ratings yet
SOM_Unit
44 pages
NN 4 5
No ratings yet
NN 4 5
15 pages
Discovering Intra Day Price Patterns Using Som
No ratings yet
Discovering Intra Day Price Patterns Using Som
7 pages
Cartogram Representations of Self-Organizing Virtual Geographies
No ratings yet
Cartogram Representations of Self-Organizing Virtual Geographies
107 pages
Self Organizing map
No ratings yet
Self Organizing map
9 pages
Artificial Neural Network Unsupervised Learning: U.S. Congress Synapse
No ratings yet
Artificial Neural Network Unsupervised Learning: U.S. Congress Synapse
2 pages
Unsupervised Learning Handout
No ratings yet
Unsupervised Learning Handout
43 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
31 pages
Som MJJ
No ratings yet
Som MJJ
29 pages
Introduction To Self Organizing Feature Maps
No ratings yet
Introduction To Self Organizing Feature Maps
6 pages
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
No ratings yet
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
58 pages
Unit 4 NN
No ratings yet
Unit 4 NN
8 pages
Metode Data Mining Som
No ratings yet
Metode Data Mining Som
22 pages
Self-Organizing Map: Machine Learning Data Mining
No ratings yet
Self-Organizing Map: Machine Learning Data Mining
10 pages
Ult SCH 94 Benchmark
No ratings yet
Ult SCH 94 Benchmark
14 pages
Book Chapter14 SOM
No ratings yet
Book Chapter14 SOM
23 pages
CR 1341
No ratings yet
CR 1341
4 pages
Physic A A 2004
No ratings yet
Physic A A 2004
9 pages
Self-Organizing Maps: Kevin Pang
No ratings yet
Self-Organizing Maps: Kevin Pang
14 pages
Kohonen Self-Organizing Feature Map (SOM)
No ratings yet
Kohonen Self-Organizing Feature Map (SOM)
19 pages
Self Organizing Map Thesis
100% (2)
Self Organizing Map Thesis
4 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
13 pages
Learning of Alphabets Using Kohonen's Self Organized Featured Map
No ratings yet
Learning of Alphabets Using Kohonen's Self Organized Featured Map
5 pages
SOM Algorithm Aimad
No ratings yet
SOM Algorithm Aimad
4 pages
Unsupervised ANN
No ratings yet
Unsupervised ANN
14 pages
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
No ratings yet
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
34 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Mid 2 NN
No ratings yet
Mid 2 NN
14 pages
Sou Men Neural Network
No ratings yet
Sou Men Neural Network
6 pages
research paper (2) DIYA
No ratings yet
research paper (2) DIYA
12 pages
ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
Self organising Map Techniques for Graph Data Applications to Clustering of XML Documents 1st Edition by Tsoi, Hagenbuchner, Sperduti 9783540370253 download
100% (2)
Self organising Map Techniques for Graph Data Applications to Clustering of XML Documents 1st Edition by Tsoi, Hagenbuchner, Sperduti 9783540370253 download
44 pages
Self-Organizing Map Implementation - CodeProject
No ratings yet
Self-Organizing Map Implementation - CodeProject
14 pages
06 Som
No ratings yet
06 Som
39 pages
Chapter 9 Self-Organizing Maps
No ratings yet
Chapter 9 Self-Organizing Maps
8 pages
Adaptive Coordinates SOM Visualization: Somtoolbox
No ratings yet
Adaptive Coordinates SOM Visualization: Somtoolbox
7 pages
Lab 4 - Unsupervised Learning: K-Means Clustering
No ratings yet
Lab 4 - Unsupervised Learning: K-Means Clustering
7 pages
Maquinas Rotativas
No ratings yet
Maquinas Rotativas
14 pages
But Is It the Bad Kind?: A Story About Uninvited Guests
From Everand
But Is It the Bad Kind?: A Story About Uninvited Guests
Rachel Orgel
No ratings yet
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
hw0_22au
No ratings yet
hw0_22au
5 pages
hw1
No ratings yet
hw1
11 pages
hw4
No ratings yet
hw4
13 pages
hw2
No ratings yet
hw2
10 pages
hw5
No ratings yet
hw5
3 pages
White Board 1-12
No ratings yet
White Board 1-12
20 pages
White Board 1-10
No ratings yet
White Board 1-10
19 pages
More Detailed Content of the Course
No ratings yet
More Detailed Content of the Course
4 pages
HW2
No ratings yet
HW2
2 pages
Biology of Oyster
No ratings yet
Biology of Oyster
16 pages
Matrices Eliminacion Gaussiana
No ratings yet
Matrices Eliminacion Gaussiana
4 pages
IPC F 27 Vol 1
No ratings yet
IPC F 27 Vol 1
769 pages
Phase Diagram
No ratings yet
Phase Diagram
13 pages
SubseaBoosting 2014 Poster
No ratings yet
SubseaBoosting 2014 Poster
1 page
MR 250
No ratings yet
MR 250
11 pages
Mathcad - Butter Worth PLC
No ratings yet
Mathcad - Butter Worth PLC
4 pages
Exp 04
No ratings yet
Exp 04
6 pages
Exp-9 - Liquid Liquid Extraction in A Packed Column
No ratings yet
Exp-9 - Liquid Liquid Extraction in A Packed Column
5 pages
Engraving and Scoring Price List
No ratings yet
Engraving and Scoring Price List
10 pages
Zhuzhou UKO Carbide End Mill
No ratings yet
Zhuzhou UKO Carbide End Mill
78 pages
Chapter 8 - CPU Performance
No ratings yet
Chapter 8 - CPU Performance
40 pages
1727943590GE-167 (1)
No ratings yet
1727943590GE-167 (1)
1 page
The Correlation Between Students' Vocabulary Mastery and Reading Comprehension - I GD PT Agus Sumardi Wijaya (17.0080)
100% (1)
The Correlation Between Students' Vocabulary Mastery and Reading Comprehension - I GD PT Agus Sumardi Wijaya (17.0080)
38 pages
HTML important Questions
No ratings yet
HTML important Questions
4 pages
Java Lab 3
No ratings yet
Java Lab 3
7 pages
Gr. 11 Psychology Chap. 2 - Methods of Psychology Notes 2023-24
No ratings yet
Gr. 11 Psychology Chap. 2 - Methods of Psychology Notes 2023-24
20 pages
Mid Sem 19-20
No ratings yet
Mid Sem 19-20
2 pages
SET13 Chemical Science II (C)
No ratings yet
SET13 Chemical Science II (C)
11 pages
New Wordpad Document
No ratings yet
New Wordpad Document
10 pages
GO Math Guidance Grade 2 - FINALv2
No ratings yet
GO Math Guidance Grade 2 - FINALv2
27 pages
C2S3
No ratings yet
C2S3
14 pages
Steps in Developing Assessment Tools (Step 1 & 2)
No ratings yet
Steps in Developing Assessment Tools (Step 1 & 2)
3 pages
Lab 17: Dynamic Routing With ASA
No ratings yet
Lab 17: Dynamic Routing With ASA
7 pages
Fiware Ngsi Api
No ratings yet
Fiware Ngsi Api
84 pages
Semiconductor Defect Classification
No ratings yet
Semiconductor Defect Classification
6 pages
BSC Csit Stat 159 Second Semester Syllabus
No ratings yet
BSC Csit Stat 159 Second Semester Syllabus
3 pages
O Level Physics Magnetism
No ratings yet
O Level Physics Magnetism
21 pages
Year 11 Mathematics - Measurement and Geometry
No ratings yet
Year 11 Mathematics - Measurement and Geometry
16 pages
Designing and Fabrication of Gearless Electric Car For Handicaps
No ratings yet
Designing and Fabrication of Gearless Electric Car For Handicaps
8 pages

Lecture UnsupervisedML_SOM

Uploaded by

Lecture UnsupervisedML_SOM

Uploaded by

Unsupervised ML SOM

and how to choose a

> Statistical analysis

> Demonstrate some data science methods that are not

> An Unsupervised ML method

> You can think of SOM as an artificial neural network

How does it work?

Each connection can deform

> Dragging Nodes

> After training, the nodes in

> Since this is a dimensionality

> SOM can provide a means to visualize K-Means!

> SOM is just an algorithm, there are many

> Utilizes PCA for initialization, and include K-Means

> Map each node’s

> Overlay one specific data

Example 1 Granta Data Set: Experimental Commercial Materials

Example 2 OPV materials study using an experimental dataset

> Dataset includes 1203 donor

Python package of Molecular Descriptor

> There are Python tools to extract molecular

> SOMPY is not as easy to use as the other packages

> Doesn’t have:

> Length of input vectors (the number of properties)

You might also like