0% found this document useful (0 votes)

121 views18 pages

Non-Negative Matrix Factorization

Non-negative matrix factorization (NMF) is an unsupervised learning method similar to principal component analysis but with non-negativity constraints. NMF learns parts-based representations of data. The documents describe NMF, an algorithm for NMF, examples of NMF on faces, and issues with non-uniqueness of solutions. Archetypal analysis is another related method that approximates data as convex combinations of archetypes, which are themselves convex combinations of the data points.

Uploaded by

Ariake Swyce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views18 pages

Non-Negative Matrix Factorization

Uploaded by

Ariake Swyce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

' $

Sta306b May 27, 2011 Dimension Reduction: 1

Non-negative matrix factorization

• Lee & Seung (1999)

• like principal components (SVD), but data and components are
assumed to be non-negative
• Model
X ≈ WH
where X is n × p, W is n × r, H is r × p, r ≤ p.
• we assume Xij , Wij , Hij ≥ 0.
• criterion: minimize
XX
L(W, H) = [Xiu log(W H)iu − (W H)iu ]
i u

This is the log-likelihood for the model Xiu ∼ Poisson(W H)iu .

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 2

The following alternating algorithm (Lee & Seung 2001) converges

to a local maximum of L(W, H):
Pp
j=1 hkj xij /(WH)ij
wik ← wik Pp
j=1 hkj
PN (1)
i=1 wik xij /(WH)ij
hkj ← hkj PN
i=1 wik

Ccan be viewed as an instance of the MM algorithm (see text) and

iterative proportional scaling for log-linear models.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 3

Example

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 4

Figure 1 Non-negative matrix factorization (NMF) learns a parts-based

representation of faces, whereas vector quantization (VQ) and principal
components analysis (PCA) learn holistic representations. The three
learning methods were applied to a database of m = 2, 429 facial images,
each consisting of n = 19 × 19 pixels, and constituting an n × m matrix
V . All three find approximate factorizations of the form X = W H, but
with three different types of constraints on W and H, as described more
fully in the main text and methods. As shown in the 7 × 7 montages,
each method has learned a set of r = 49 basis images. Positive values are
illustrated with black pixels and negative values with red pixels. A
particular instance of a face, shown at top right, is approximately
represented by a linear superposition of basis images. The coefficients of
the linear superposition are shown next to each montage, in a 7 × 7 grid,
and the resulting superpositions are shown on the other side of the
equality sign. Unlike VQ and PCA, NMF learns to represent faces with a

& %
set of basis images resembling parts of faces.
' $
Sta306b May 27, 2011 Dimension Reduction: 5

Big problem!

See Donoho and Stodden (2004): “When does non-negative matrix

factorization give a correct decomposition into parts?” Advances in
Neural Information Processing Systems, 17, 2004
• columns of W are not required to be orthogonal, as in principal
components
• solution is not unique (even when X = W H holds exactly):
can choose for columns of W any vectors in gap between axes
and the data
• this limits its utility in practice

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 6

Example
W1

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 7

Archetypal Analysis

• This method, due to Cutler & Breiman (1994), approximates

data points by prototypes that are themselves linear
combinations of data points. In this sense it has a similar flavor
to K-means clustering.
• However, rather than approximating each data point by a
single nearby prototype, archetypal analysis approximates each
data point by a convex combination of a collection of
prototypes. The use of a convex combination forces the
prototypes to lie on the convex hull of the data cloud. In this
sense, the prototypes are “pure,”, or “archetypal.”

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 8

Archetypal Analysis- ctd

• The N × p data matrix X is modeled as

X ≈ WH (2)

where W is N × r and H is r × p.
Pr
• We assume that wik ≥ 0 and k=1 wik = 1 ∀i. Hence the N
data points (rows of X) in p-dimensional space are represented
by convex combinations of the r archetypes (rows of H).
• We also assume that
H = BX (3)
PN
where B is r × N with bki ≥ 0 and i=1 bki = 1 ∀k.
• Thus the archetypes themselves are convex combinations of the
data points.
& %
' $
Sta306b May 27, 2011 Dimension Reduction: 9

• Using both (2) and (3) we minimize

J(W, B) = ||X − WH||2

= ||X − WBX||2 (4)

over the weights W and B.

• This function is minimized in an alternating fashion, with each
separate minimization involving a convex optimization. The
overall problem is not convex however, and so the algorithm
converges to a local minimum of the criterion.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 10

The next Figure shows an example with simulated data in two

dimensions. The top panel displays the results of archetypal
analysis, while the bottom panel shows the results from K-means
clustering. In order to best reconstruct the data from convex
combinations of the prototypes, it pays to locate the prototypes on
the convex hull of the data. This is seen in the top panels of the
Figure and is the case in general, as proven by Cutler & Breiman
(1994). K-means clustering, shown in the bottom panels, chooses
prototypes in the middle of the data cloud.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 11

2 Prototypes 4 Prototypes 8 Prototypes

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 12

Archetypal analysis (top panels) and K-means clustering (bottom panels)

applied to 50 data points drawn from a bivariate Gaussian distribution.
The colored points show the positions of the prototypes in each case.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 13

Relation to K-means clustering and NNMF

• We can think of K-means clustering as a special case of the

archetypal model, in which each row of W has a single one and
and the rest of the entries are zero.
• Notice also that the archetypal model (2) has the same general
form as the non-negative matrix factorization model (??).
However, the two models are applied in different settings, and
have somewhat different goals. Non-negative matrix
factorization aims to approximate the columns of the data
matrix X, and the main output of interest are the columns of
W representing the primary non-negative components in the
data.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 14

Relation to K-means clustering and NNMF- ctd

• Archetypal analysis focuses instead on the approximation of

the rows of X using the rows of H, which represent the
archetypal data points.
• Non-negative matrix factorization also assumes that r ≤ p.
With r = p, we can get an exact reconstruction simply
choosing W to be the data X with columns scaled so that they
sum to 1. In contrast, archetypal analysis requires r ≤ N , but
allows r > p.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 15

The next Figure shows the results of archetypal analysis applied to

the database of 3’s discussed earlier. The three rows in the Figure
are the resulting archetypes from three runs, specifying two, three
and four archetypes, respectively. As expected, the algorithm has
produced extreme 3’s both in size and shape.

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 16

& %
' $
Sta306b May 27, 2011 Dimension Reduction: 17

Archetypal analysis applied to the database of digitized 3’s. The rows in

the figure show the resulting archetypes from three runs, specifying two,
three and four archetypes, respectively.

& %
Sta306b May 27, 2011 Dimension Reduction: 17-1

References
Cutler, A. & Breiman, L. (1994), ‘Archetypal analysis’, Technometrics
36(4), 338–347.
Lee, D. D. & Seung, H. S. (1999), ‘Learning the parts of objects by
non-negative matrix factorization’, Nature 401, 788.
Lee, D. D. & Seung, H. S. (2001), Algorithms for non-negative matrix
factorization, in ‘Advances in Neural Information Processing Sys-
tems, (NIPS*2001)’.

Why Machines Learn PDF
No ratings yet
Why Machines Learn PDF
151 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Skew Gaussian Process For Nonlinear Regression
No ratings yet
Skew Gaussian Process For Nonlinear Regression
26 pages
The Rank-Deficient Least Squares Problem: With Column Pivoting
No ratings yet
The Rank-Deficient Least Squares Problem: With Column Pivoting
6 pages
Numsense! Data Science For The Layman
100% (3)
Numsense! Data Science For The Layman
65 pages
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
Matrix Factorization - A Simple Tutorial and Implementation in Python
No ratings yet
Matrix Factorization - A Simple Tutorial and Implementation in Python
9 pages
Kernel Matrix Factorization Models
No ratings yet
Kernel Matrix Factorization Models
8 pages
Football Scores The Poisson Distribution and 30 Ye
No ratings yet
Football Scores The Poisson Distribution and 30 Ye
7 pages
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
100% (1)
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
13 pages
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
No ratings yet
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
14 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
36 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Fast Face Recognition Using Eigen Faces
No ratings yet
Fast Face Recognition Using Eigen Faces
4 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
29 pages
Random Forest
No ratings yet
Random Forest
18 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Model Predictive Control Using YALMIP Getting Started
No ratings yet
Model Predictive Control Using YALMIP Getting Started
5 pages
Clustering Seasonal Performances of Soccer Teams Based On Situational Score Line
No ratings yet
Clustering Seasonal Performances of Soccer Teams Based On Situational Score Line
6 pages
Modeling With Penalized Splines
No ratings yet
Modeling With Penalized Splines
50 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Unit-4 Mwoc 5-12-22
No ratings yet
Unit-4 Mwoc 5-12-22
82 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Optimization Principles: 7.1.1 The General Optimization Problem
No ratings yet
Optimization Principles: 7.1.1 The General Optimization Problem
13 pages
Univariate Time Series Analysis With Matlab - M. Perez
No ratings yet
Univariate Time Series Analysis With Matlab - M. Perez
147 pages
Comparative Study of Holt-Winters Triples Exponent
No ratings yet
Comparative Study of Holt-Winters Triples Exponent
12 pages
Group 3-Image Smoothing - Sayak Khan
No ratings yet
Group 3-Image Smoothing - Sayak Khan
22 pages
PCA
100% (1)
PCA
33 pages
(IJETA-V8I5P1) :yew Kee Wong
No ratings yet
(IJETA-V8I5P1) :yew Kee Wong
5 pages
Levinson and Durbin Algorithm
No ratings yet
Levinson and Durbin Algorithm
4 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
How Google Uses SVD
No ratings yet
How Google Uses SVD
6 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
Implementation Data Mining With K-Means Algorithm For Clustering Distribution Rabies Case Area in Palembang City PDF
No ratings yet
Implementation Data Mining With K-Means Algorithm For Clustering Distribution Rabies Case Area in Palembang City PDF
8 pages
IntroMulti Armed Bandits Slivkin Microsoft PDF
No ratings yet
IntroMulti Armed Bandits Slivkin Microsoft PDF
174 pages
Back Propagation Back Propagation Network Network Network Network
No ratings yet
Back Propagation Back Propagation Network Network Network Network
29 pages
Data Mining Approach For Cyber Security
No ratings yet
Data Mining Approach For Cyber Security
7 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Solution of Discretized Equations
No ratings yet
Solution of Discretized Equations
26 pages
Multivariate Linear Regression
No ratings yet
Multivariate Linear Regression
30 pages
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
No ratings yet
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
12 pages
Fuzzy Min-Max Neural Networks
No ratings yet
Fuzzy Min-Max Neural Networks
32 pages
Pattern Recognition
No ratings yet
Pattern Recognition
12 pages
Time Series Analysis - Economics
100% (1)
Time Series Analysis - Economics
48 pages
Informatics Practices: Numpy - Array
100% (1)
Informatics Practices: Numpy - Array
28 pages
Numerical Solution of Ordinary Differential Equations Part 1 - Intro & Approximation
100% (1)
Numerical Solution of Ordinary Differential Equations Part 1 - Intro & Approximation
15 pages
Introductory Concepts of Probabability & Statistics
No ratings yet
Introductory Concepts of Probabability & Statistics
6 pages
Fast and Robust Archetypal Analysis For Representation Learning
No ratings yet
Fast and Robust Archetypal Analysis For Representation Learning
8 pages
Cutler and Breiman (1993)
No ratings yet
Cutler and Breiman (1993)
37 pages
A a Machine Learning
No ratings yet
A a Machine Learning
7 pages
ML Unit 4 @ VS
No ratings yet
ML Unit 4 @ VS
33 pages
Kernel SVM For Image Classification
No ratings yet
Kernel SVM For Image Classification
20 pages
Multiplicative Updates For The LASSO
No ratings yet
Multiplicative Updates For The LASSO
7 pages
Chapter 11. Goodness of Fit and Contingency Tables
No ratings yet
Chapter 11. Goodness of Fit and Contingency Tables
12 pages
Efficiently Computing The Inverse Square Root Using Integer Operations
No ratings yet
Efficiently Computing The Inverse Square Root Using Integer Operations
13 pages
Predictor Effects Graphics Gallery
No ratings yet
Predictor Effects Graphics Gallery
44 pages
Amul Sip Report
No ratings yet
Amul Sip Report
46 pages
Face Recognition Using PCA and Wavelet Method: Ijgip
No ratings yet
Face Recognition Using PCA and Wavelet Method: Ijgip
4 pages
Image operators image processing in Python First Edition Kinser download pdf
100% (2)
Image operators image processing in Python First Edition Kinser download pdf
52 pages
HP Color Laserjet Pro M476 MFP Part Manual
No ratings yet
HP Color Laserjet Pro M476 MFP Part Manual
42 pages
Opportunities and Challenges Using Non-Targeted Methods For Food Fraud Detection
No ratings yet
Opportunities and Challenges Using Non-Targeted Methods For Food Fraud Detection
22 pages
The Relationship Between Career Management and Organisational Commitment: The Moderating Effect of Openness To Experience
No ratings yet
The Relationship Between Career Management and Organisational Commitment: The Moderating Effect of Openness To Experience
192 pages
Enviro Toxic and Chemistry - 2009 - Tong - Structure‐activity relationship approaches and applications
No ratings yet
Enviro Toxic and Chemistry - 2009 - Tong - Structure‐activity relationship approaches and applications
16 pages
Journal of Performance Management: - Rich Weissman
No ratings yet
Journal of Performance Management: - Rich Weissman
60 pages
Mohamed 2020
No ratings yet
Mohamed 2020
22 pages
Botwasa
No ratings yet
Botwasa
34 pages
Crisp-Dm: Cross Industry Standard Process For Data Mining
No ratings yet
Crisp-Dm: Cross Industry Standard Process For Data Mining
60 pages
PDF Nonlinear Computational Geometry 2010th Edition Ioannis Z Emiris Download
100% (7)
PDF Nonlinear Computational Geometry 2010th Edition Ioannis Z Emiris Download
84 pages
6406 Seminar Report
No ratings yet
6406 Seminar Report
7 pages
ML Questions
No ratings yet
ML Questions
56 pages
1 s2.0 S0038092X24003736 Main
No ratings yet
1 s2.0 S0038092X24003736 Main
16 pages
A Study On Performance of Cricket Players Using Factor Analysis Approach
No ratings yet
A Study On Performance of Cricket Players Using Factor Analysis Approach
5 pages
Factor Analysis To Evaluate Hospital Resilience
No ratings yet
Factor Analysis To Evaluate Hospital Resilience
7 pages
Get Optimal State Estimation for Process Monitoring, Fault Diagnosis and Control 1st Edition Ch. Venkateswarlu M.Tech Phd free all chapters
100% (3)
Get Optimal State Estimation for Process Monitoring, Fault Diagnosis and Control 1st Edition Ch. Venkateswarlu M.Tech Phd free all chapters
36 pages
Socioeconomic Determinants of Infant Mortality Rate in India
No ratings yet
Socioeconomic Determinants of Infant Mortality Rate in India
13 pages
Ijetr022765 PDF
No ratings yet
Ijetr022765 PDF
3 pages
WP-Education & Women Empowerment-Working Paper-N Kamal
No ratings yet
WP-Education & Women Empowerment-Working Paper-N Kamal
15 pages
Machine Learned Reduced Order Modeling
No ratings yet
Machine Learned Reduced Order Modeling
29 pages
An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
No ratings yet
An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
8 pages
Analysis of Member Retention in Fitness Through Satisfaction, Attributes Perception, Expectations and Well-Being
No ratings yet
Analysis of Member Retention in Fitness Through Satisfaction, Attributes Perception, Expectations and Well-Being
15 pages
Image Enhancement: Dr. Abhishek Rawat
No ratings yet
Image Enhancement: Dr. Abhishek Rawat
41 pages
Dimensions of Women'S Empowerment and Their Influence On The Utilization of Maternal Health Services in An Egyptian Village: A Multivariate Analysis
No ratings yet
Dimensions of Women'S Empowerment and Their Influence On The Utilization of Maternal Health Services in An Egyptian Village: A Multivariate Analysis
11 pages
A Report On Face Recognition Using Principal Component Analysis
No ratings yet
A Report On Face Recognition Using Principal Component Analysis
13 pages
Applied Univariate, Bivariate, and Multivariate Statistics: Understanding Statistics for Social and Natural Scientists, With Applications in SPSS and R 2nd Edition Daniel J. Denis - Download the ebook in PDF with all chapters to read anytime
100% (1)
Applied Univariate, Bivariate, and Multivariate Statistics: Understanding Statistics for Social and Natural Scientists, With Applications in SPSS and R 2nd Edition Daniel J. Denis - Download the ebook in PDF with all chapters to read anytime
44 pages

Non-Negative Matrix Factorization

Uploaded by

Non-Negative Matrix Factorization

Uploaded by

' $

Sta306b May 27, 2011 Dimension Reduction: 1

Non-negative matrix factorization

• Lee & Seung (1999)

This is the log-likelihood for the model Xiu ∼ Poisson(W H)iu .

The following alternating algorithm (Lee & Seung 2001) converges

Ccan be viewed as an instance of the MM algorithm (see text) and

Figure 1 Non-negative matrix factorization (NMF) learns a parts-based

See Donoho and Stodden (2004): “When does non-negative matrix

• This method, due to Cutler & Breiman (1994), approximates

Archetypal Analysis- ctd

• The N × p data matrix X is modeled as

• Using both (2) and (3) we minimize

J(W, B) = ||X − WH||2

over the weights W and B.

The next Figure shows an example with simulated data in two

2 Prototypes 4 Prototypes 8 Prototypes

Archetypal analysis (top panels) and K-means clustering (bottom panels)

Relation to K-means clustering and NNMF

• We can think of K-means clustering as a special case of the

Relation to K-means clustering and NNMF- ctd

• Archetypal analysis focuses instead on the approximation of

The next Figure shows the results of archetypal analysis applied to

Archetypal analysis applied to the database of digitized 3’s. The rows in

You might also like