SlideShare a Scribd company logo
Tighter Upper Bound of Real Log
Canonical Threshold of Non-negative
Matrix Factorization and its Application
to Bayesian Inference
Naoki Hayashi* (TokyoTech, Dept. of MCS)
Sumio Watanabe (TokyoTech, Dept. of MCS)
12017/11/28 IEEE SSCI 2017 FOCI, Hawaii
Slide
• This slide is available at
http://watanabe-www.math.dis.titech.ac.jp/~nhayashi
/pdf/hayashi1039.pdf
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 2
Index
• Introduction
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 3
1. INTRODUCTION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 4
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 5
NMF has been applied
• Non-negative Matrix Factorization (NMF) has
been applied to many field
• E. g.
– Purchase basket data → Consumer analysis
– Image, sound,… → Signal processing
– Text data → Text mining
– Microarray data → Bioinformatics
↑ Knowledge/Structure Discovery
NMF: data → knowledge
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 6
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 7
HOWEVER
AIC BIC
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
Hierarchical structure causes non-identifibility :
𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟐 −𝟑
𝟏 𝟐
𝟐 −𝟑
𝟏 𝟐
−𝟏
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏
𝟕
𝟓 𝟑
𝟓 𝟑
𝟔 𝟓
𝟏𝟕 𝟓 𝟐𝟎
𝟗 𝟏 𝟒
=
𝟏𝟔 𝟒 𝟏𝟔
𝟏𝟔 𝟒 𝟏𝟔
𝟐𝟏 𝟓 𝟐𝟎
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 8
HOWEVER
AIC BIC
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
Hierarchical structure causes non-identifibility :
𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟐 −𝟑
𝟏 𝟐
𝟐 −𝟑
𝟏 𝟐
−𝟏
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏
𝟕
𝟓 𝟑
𝟓 𝟑
𝟔 𝟓
𝟏𝟕 𝟓 𝟐𝟎
𝟗 𝟏 𝟒
=
𝟏𝟔 𝟒 𝟏𝟔
𝟏𝟔 𝟒 𝟏𝟔
𝟐𝟏 𝟓 𝟐𝟎
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 9
HOWEVER
AIC BIC
One matrix,
(at least) pairs
about the NMF
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 10
HOWEVER
• Strongly depending on initial value
• Suffering from many local minima
– It seldom reaches to the global minimum.
In Addition +
AIC BIC
Learning Theory of NMF
• NMF has been used for ``data → knowledge’’
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 11
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been yet established
– Prediction accuracy has not been yet clarified
No guarantee for correctness of
numerical calculation
No method for theoretical
hyperparameter tuning
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 12
Learning Theory of NMF
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been yet established
– Prediction accuracy has not been yet clarified
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 13
Constructing its theory is
an important problem
Learning Theory of NMF
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sktech of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 14
• In general [Watanabe, 2001]
– Let n be the sample size
– Bayesian generalization error 𝑮 𝒏 has
an asymptotic behavior:
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
• Learning coefficient
𝝀 depends on the model
• 𝝀 is called real log canonical
threshold (RLCT)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 15
When does RLCT appear?
Error: Bayes<<Freq.
• In hierarchical structure model, Bayesian 𝝀 is
smaller than frequentist’s one and maximum
posterior one [Watanabe,2001 and 2009]
• Bayesian inference is effective for reducing
the generalization error
• We consider Bayesian inference framework
– Bayesian inference for NMF has been proposed
[Cemgil, 2009] Rem: ← is only discrete
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 16
RLCT of NMF is unknown
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been established
– Prediction accuracy has not been clarified
↑ means that the RLCT of NMF has
not been clarified
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 17
Def. RLCT
• RLCT is characterized as a learning coefficient
• It is defined by the largest pole of the
following complex function:
𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘,
where 𝑲 is KL-divergence from true distribution
to learning machine and 𝝋 is prior.
• A statistical model selection method that uses
RLCTs has been proposed [Drton, et al. 2017]
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 18
Def. RLCT
• RLCT is characterized as a learning coefficient
• It is defined by the largest pole of the
following complex function:
𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘,
where 𝑲 is KL-divergence from true distribution
to learning machine and 𝝋 is prior.
• A statistical model selection method that uses
RLCTs has been proposed [Drton, et al. 2017]
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 19
known as sBIC
(singular BIC)
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sktech of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 20
Research Goal
• Constructing learning theory of NMF
→focus theoretical generalization error
→focus RLCT of NMF
• Recently, we derived an upper bound of
RLCT [Hayashi, et. al. 2017]
• We used algebraic geometrical method
(singularity resolution)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 21
Research Goal
• Constructing learning theory of NMF
→focus theoretical generalization error
→focus RLCT of NMF
• In this research, we newly derive the exact
value of the RLCT of NMF in the case rank ≦ 2
• Using the above exact value, we make the
upper bound tighter than previous one
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 22
2. MAIN THEOREM
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 23
Index
• Introduction
• Main Theorem
– Bayesian Framework of NMF
– Main Result
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 24
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 25
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
• True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵
• Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 26
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵 𝑯 𝟎 𝑵
𝑯 𝟎
𝑴 𝑨
𝑩
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
• True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵
• Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵
• What is the Bayesian framework of ↑?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 27
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵 𝑯 𝟎 𝑵
𝑯 𝟎
𝑴 𝑨
𝑩
• Notation of probability density function (PDF)
– 𝑞 𝑊 : true distribution,
– 𝑝 𝑊 𝑋, 𝑌 : learning machine,
– 𝑝∗ 𝑊 : predictive distribution,
whose domains are Euclidian sp.
– 𝜑 𝑋, 𝑌 : prior distribution,
– 𝑝 𝑋, 𝑌 𝑊 𝑛 : posterior distribution given data,
whose domains are compact subsets of Euclidian sp.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 28
Formalizing and Setting
data
parameter
• Assume
𝒒 𝑾 ∝ 𝐞𝐱𝐩 −
𝟏
𝟐
𝑾 − 𝑨𝑩 𝟐 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐞𝐱𝐩 −
𝟏
𝟐
𝑾 − 𝑿𝒀 𝟐
,
and prior 𝝋 is strictly positive and bounded in a
neighborhood of 𝑨, 𝑩 .
• Remark: Poisson and exponential dist. can be
also applied [Hayashi, et al. 2017].
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 29
Formalizing and Setting
Bayesian Framework
• The posterior is defined by
𝒑 𝑿, 𝒀 𝑾 𝒏 =
𝟏
𝒁 𝒏
ෑ
𝒊=𝟏
𝒏
𝒑 𝑾𝒊 𝑿, 𝒀 𝝋 𝑿, 𝒀
where 𝒁 𝒏 is normalizing constant.
• The predictive distribution is defined by
𝒑∗
𝑾 = න𝒑 𝑾 𝑿, 𝒀 𝒑 𝑿, 𝒀 𝑾 𝒏
)𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 30
Bayesian Framework
• The Bayesian generalization error is defined by
KL-divergence from true to predictive dist. :
𝑮 𝒏 = න 𝒒 𝑾 𝐥𝐨𝐠
𝒒 𝑾
𝒑∗ 𝑾
𝒅𝑾.
• This depends on the training data thus it is a
random variable.
• Its expected value among the overall data has
an asymptotic behavior:
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 31
Index
• Introduction
• Main Theorem
– Bayesian Framework of NMF
– Main Result
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 32
Def. RLCT of NMF
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 33
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
• 𝜻 𝒛 can be analytically continued to the entire
complex plane and its poles are negative
rational numbers.
• The largest pole of 𝜻 𝒛
equals −𝝀 . Then,
𝝀 is the RLCT of NMF.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 34
Def. RLCT of NMF
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
• 𝜻 𝒛 can be analytically continued to the entire
complex plane and its poles are negative
rational numbers.
• The largest pole of 𝜻 𝒛
equals −𝝀 . Then,
𝝀 is the RLCT of NMF.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 35
𝐎
𝐗 𝐗 𝐗 𝐗 𝐗
𝒛 = −𝝀
Def. RLCT of NMF
ℂ
Main Theorem
• The RLCT of NMF 𝝀 satisfies the following
inequality:
𝝀 ≤
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
,
where
𝜹 𝑯 𝟎
= ቊ
𝟏 (𝑯 𝟎 ≅ 𝟏, 𝐦𝐨𝐝 𝟐)
𝟎 (𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆)
.
The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 𝐨𝐫 𝟐 or 𝑯 𝟎 = 𝟎.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 36
3. DISCUSSION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 37
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 38
Tighter than previous
• Main Theorem shows an upper bound of the
RLCT of NMF.
• We have derived another bound of it in
previous research.
• How tight is the new bound?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 39
Tighter than previous
• In previous work,
𝝀 ≤ 𝝀 𝒑𝒓𝒗 =
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 .
• In this paper,
𝝀 ≤ 𝝀 𝒏𝒆𝒘 =
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
.
• By comparing them, we improve the bound
𝝀 𝒏𝒆𝒘 − 𝝀 𝒑𝒓𝒗 =
𝟏
𝟐
𝑯 𝟎 − 𝜹 𝑯 𝟎
.
• True dist. is more complex, bound is tighter.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 40
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 41
Bound of Error
• Main Theorem shows an upper bound of the
Bayesian generalization error via
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
.
• Actually, we have
𝔼 𝑮 𝒏 ≤
𝟏
𝟐𝒏
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
+ 𝒐
𝟏
𝒏
.
– This gives guarantee of accuracy!
• What distribution can we bound the error?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 42
Robustness on Dist.
• Main Theorem assumes that elements of
parameter matrices are subject to normal
distribution:
𝒒 𝑾 ∝ 𝓝 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝓝 𝑾 𝑿𝒀 .
• Can Main Theorem be used even for other
distributions?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 43
Robustness on Dist.
• In the prior work [Hayashi, et. al. 2017],
we proved that same zeta function can be
applied to Poisson and exponential distribution:
𝒒 𝑾 ∝ 𝐏𝐨𝐢 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐏𝐨𝐢 𝑾 𝑿𝒀 ,
𝒒 𝑾 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑿𝒀 ,
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 44
Even if
We can use
Robustness on Dist.
• The above result is derived by the fact that
I-divergence and Itakura Saito-divergence have
same RLCT as square error.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 45
Distribution Normal Poisson Exponential
Similarity Sq. error I-divergence IS-divergence
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀
We can use
Same RLCTs
Robustness on Dist.
• The above result is derived by the fact that
I-divergence and Itakura Saito-divergence have
same RLCT as square error.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 46
Distribution Normal Poisson Exponential
Similarity Sq. error I-divergence IS-divergence
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀
We can use
Same RLCTs
Main Thm.
is attained!
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 47
• We carried out experiment to estimate the
exact value of RLCT.
– (+) to compare with the RLCT of reduced rank
regression (RRR); non-restricted matrix factorization.
• The posterior cannot be analytically derived.
→Markov Chain Monte Carlo(MCMC)
– We used Metropolis Hastings method.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 48
Numerical Experiment
• We made artificial data and set the following
cases:
– 1. exact value of RLCT of NMF is known.
– 2. exact value is unknown and rank = rank+.
– 3. exact value is unknown and rank ≠ rank+.
• rank+ : minimal inner dimension of NMF
– ↑ is called non-negative rank.
– In general, rank+≧ rank holds.
– If min{rows, columns} ≦ 3 or rank ≦2, rank =rank+.
– There is a non-negative matrix s.t. rank<rank+.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 49
Non-negative Rank
• Sample size n=200 (parameter dimension≦50)
• The number of data sets D=100
→ we empirically calculated the RLCT using
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
→ 𝝀 ≈ 𝒏𝔼 𝑮 𝒏 ≈
𝒏
𝑫
෍
𝒋=𝟏
𝑫
𝑮 𝒏
𝒋
• MCMC sample size K=1,000
– Burn-in=20,000, thin=20, i.e. sampling iteration is 40,000.
• For calculating 𝑮 𝒏, we generated T=20,000 test
datas from the true distribution.
• Total: 100*(40,000+1,000*20,000) ≈ 𝑶 𝑫𝑲𝑻
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 50
Condition of Experiments
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 51
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 52
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
Numerical results equal
theoretical value: 𝝀 𝑵 = 𝝀.
Numerical calculation is correct!
r: true rank
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 53
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
If rank = rank+, then numerical
results equal RRR case: 𝝀 𝑵 = 𝝀 𝑹.
It seems that if rank = rank+,
then the RLCT of NMF 𝝀 = 𝝀 𝑹.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 54
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
If rank ≠ rank+, then num. results
are larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹.
If rank ≠ rank+,
then the RLCT of NMF is
larger than RRR case:
𝝀 𝑵 > 𝝀 𝑹.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 55
ConjectureFrom the paper:
4. CONCLUSION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 56
Index
• Introduction
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 57
• (Main contribution) We mathematically
improved the upper bound of the RLCT of NMF.
– This made the bound of the generalization error in
Bayesian NMF tighter.
• (Minor contribution) We carried out experiments
and suggested conjecture about the exact value
of the RLCT:
– ・ rank = rank+ ⇒ RLCT of NMF = RLCT of RRR.
– ・ rank ≠ rank+ ⇒ RLCT of NMF > RLCT of RRR.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 58
Conclusion
APPENDIX: SKETCH OF
PROOF FOR MAIN THEOREM
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 59
• We have already derived the exact value in the
case 𝑯 𝟎 = 𝟎 and 𝑯 = 𝑯 𝟎 = 𝟏.
• We newly clarified the exact value in the case
𝑯 = 𝑯 𝟎 = 𝟐 by considering the dimension of
algebraic subvariety in the parameter space.
• We bound the RLCT in the case 𝑯 = 𝑯 𝟎 by using
the clarified exact value mentioned above.
• We bound the RLCT in general case by using the
above results.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 60
Sketch of Proof

More Related Content

PDF
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
PPTX
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
PDF
GAN(と強化学習との関係)
PDF
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
PDF
Polynomial Matrix Decompositions
PDF
Estimating Space-Time Covariance from Finite Sample Sets
PPT
. An introduction to machine learning and probabilistic ...
PDF
Nber slides11 lecture2
【博士論文発表会】パラメータ制約付き特異モデルの統計的学習理論
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
GAN(と強化学習との関係)
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Polynomial Matrix Decompositions
Estimating Space-Time Covariance from Finite Sample Sets
. An introduction to machine learning and probabilistic ...
Nber slides11 lecture2

What's hot (20)

PDF
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
PDF
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
PDF
Probabilistic PCA, EM, and more
PDF
Random Matrix Theory in Array Signal Processing: Application Examples
PDF
17 Machine Learning Radial Basis Functions
PDF
ppt0320defenseday
PDF
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
PDF
Joco pavone
PDF
Uncertainty Awareness in Integrating Machine Learning and Game Theory
PDF
(DL輪読)Matching Networks for One Shot Learning
PPTX
Handling missing data with expectation maximization algorithm
PDF
Tree models with Scikit-Learn: Great models with little assumptions
PDF
better together? statistical learning in models made of modules
PDF
Composing graphical models with neural networks for structured representatio...
PDF
Nonnegative Matrix Factorization
PDF
Sparse codes for natural images
PDF
Graph Neural Network for Phenotype Prediction
PPSX
Prototype-based models in machine learning
PDF
Modified Vortex Search Algorithm for Real Parameter Optimization
PDF
A discussion on sampling graphs to approximate network classification functions
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Probabilistic PCA, EM, and more
Random Matrix Theory in Array Signal Processing: Application Examples
17 Machine Learning Radial Basis Functions
ppt0320defenseday
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
Joco pavone
Uncertainty Awareness in Integrating Machine Learning and Game Theory
(DL輪読)Matching Networks for One Shot Learning
Handling missing data with expectation maximization algorithm
Tree models with Scikit-Learn: Great models with little assumptions
better together? statistical learning in models made of modules
Composing graphical models with neural networks for structured representatio...
Nonnegative Matrix Factorization
Sparse codes for natural images
Graph Neural Network for Phenotype Prediction
Prototype-based models in machine learning
Modified Vortex Search Algorithm for Real Parameter Optimization
A discussion on sampling graphs to approximate network classification functions
Ad

Similar to IEEESSCI2017-FOCI4-1039 (20)

PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
Buku panduan untuk Machine Learning.pdf
PDF
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
PDF
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
PPT
Supervised_Learning.ppt
PDF
Discussion of Persi Diaconis' lecture at ISBA 2016
PPT
AML_030607.ppt
PDF
Unbiased Bayes for Big Data
PDF
Factorized Asymptotic Bayesian Inference for Latent Feature Models
PDF
MUMS: Transition & SPUQ Workshop - Deep Fiducial Inference and Approximate Fi...
PDF
Introduction to machine learning NYU.pdf
PDF
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
PPT
Alpaydin - Chapter 2
PPTX
UP-STAT 2015 Abstract Presentation - Statistical and Machine Learning Methods...
PPTX
Symbolic Background Knowledge for Machine Learning
PDF
Bayesian Inference: An Introduction to Principles and ...
PPTX
The world of loss function
PPT
tutorial.ppt
PDF
Accelerated approximate Bayesian computation with applications to protein fol...
PDF
Machine Learning With MapReduce, K-Means, MLE
Intro to Approximate Bayesian Computation (ABC)
Buku panduan untuk Machine Learning.pdf
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal...
Supervised_Learning.ppt
Discussion of Persi Diaconis' lecture at ISBA 2016
AML_030607.ppt
Unbiased Bayes for Big Data
Factorized Asymptotic Bayesian Inference for Latent Feature Models
MUMS: Transition & SPUQ Workshop - Deep Fiducial Inference and Approximate Fi...
Introduction to machine learning NYU.pdf
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Alpaydin - Chapter 2
UP-STAT 2015 Abstract Presentation - Statistical and Machine Learning Methods...
Symbolic Background Knowledge for Machine Learning
Bayesian Inference: An Introduction to Principles and ...
The world of loss function
tutorial.ppt
Accelerated approximate Bayesian computation with applications to protein fol...
Machine Learning With MapReduce, K-Means, MLE
Ad

More from Naoki Hayashi (19)

PPTX
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
PPTX
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
PPTX
ベイズ統計学の概論的紹介
PPTX
ベイズ統計学の概論的紹介-old
PDF
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
PDF
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
PDF
201803NC
PDF
201703NC
PDF
201709ibisml
PPTX
すずかけはいいぞ
PDF
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
PPTX
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
PPTX
Rogyゼミ7thスライドpublic
PPTX
Rogyゼミスライド6th
PDF
Rogy目覚まし(仮)+おまけ
PPTX
ぼくのつくったこうだいさいてんじぶつ
PPTX
情報統計力学のすすめ
PPT
Rogyゼミ2014 10
PPTX
Rogyzemi
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介-old
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
201803NC
201703NC
201709ibisml
すずかけはいいぞ
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
Rogyゼミ7thスライドpublic
Rogyゼミスライド6th
Rogy目覚まし(仮)+おまけ
ぼくのつくったこうだいさいてんじぶつ
情報統計力学のすすめ
Rogyゼミ2014 10
Rogyzemi

Recently uploaded (20)

PPT
Chemical bonding and molecular structure
PDF
Paleoseismic activity in the moon’s Taurus-Littrowvalley inferred from boulde...
PDF
Directing Generative AI for Pharo Documentation
PPTX
WEEK 4-MONO HYBRID AND DIHYBRID CROSS OF GREGOR MENDEL
PDF
Little Red Dots As Late-stage Quasi-stars
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
N-enhancement in GN-z11: First evidence for supermassive stars nucleosynthesi...
PPTX
Embark on a journey of cell division and it's stages
PDF
Coordination Chemistry(Part-I) - Notes.pdf
PDF
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
PPTX
HETEROCYCLIC CHEMISTRY IN PHARMACEUTICAL CHEMISTRY
PPTX
2019 Upper Respiratory Tract Infections.pptx
DOCX
The beginnings of Microbiology (discovery, development, scope, Identification...
PPTX
Discovery of Novel Antibiotics from Uncultured Microbes.pptx
PDF
Visualizing our changing climate in real-time
PDF
Coordination Chemistry(Part-I) - Notes.pdf
PPT
An Introduction to Particle Accelerators.ppt
PDF
NURSING FOUNDATION LESSON PLAN ON PATIENT EDUCATION.pdf
PPTX
1.pptx 2.pptx for biology endocrine system hum ppt
Chemical bonding and molecular structure
Paleoseismic activity in the moon’s Taurus-Littrowvalley inferred from boulde...
Directing Generative AI for Pharo Documentation
WEEK 4-MONO HYBRID AND DIHYBRID CROSS OF GREGOR MENDEL
Little Red Dots As Late-stage Quasi-stars
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
N-enhancement in GN-z11: First evidence for supermassive stars nucleosynthesi...
Embark on a journey of cell division and it's stages
Coordination Chemistry(Part-I) - Notes.pdf
Even Lighter Than Lightweiht: Augmenting Type Inference with Primitive Heuris...
HETEROCYCLIC CHEMISTRY IN PHARMACEUTICAL CHEMISTRY
2019 Upper Respiratory Tract Infections.pptx
The beginnings of Microbiology (discovery, development, scope, Identification...
Discovery of Novel Antibiotics from Uncultured Microbes.pptx
Visualizing our changing climate in real-time
Coordination Chemistry(Part-I) - Notes.pdf
An Introduction to Particle Accelerators.ppt
NURSING FOUNDATION LESSON PLAN ON PATIENT EDUCATION.pdf
1.pptx 2.pptx for biology endocrine system hum ppt

IEEESSCI2017-FOCI4-1039

  • 1. Tighter Upper Bound of Real Log Canonical Threshold of Non-negative Matrix Factorization and its Application to Bayesian Inference Naoki Hayashi* (TokyoTech, Dept. of MCS) Sumio Watanabe (TokyoTech, Dept. of MCS) 12017/11/28 IEEE SSCI 2017 FOCI, Hawaii
  • 2. Slide • This slide is available at http://watanabe-www.math.dis.titech.ac.jp/~nhayashi /pdf/hayashi1039.pdf 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 2
  • 3. Index • Introduction • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 3
  • 4. 1. INTRODUCTION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 4
  • 5. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 5
  • 6. NMF has been applied • Non-negative Matrix Factorization (NMF) has been applied to many field • E. g. – Purchase basket data → Consumer analysis – Image, sound,… → Signal processing – Text data → Text mining – Microarray data → Bioinformatics ↑ Knowledge/Structure Discovery NMF: data → knowledge 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 6
  • 7. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 7 HOWEVER AIC BIC
  • 8. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used Hierarchical structure causes non-identifibility : 𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟐 −𝟑 𝟏 𝟐 𝟐 −𝟑 𝟏 𝟐 −𝟏 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟕 𝟓 𝟑 𝟓 𝟑 𝟔 𝟓 𝟏𝟕 𝟓 𝟐𝟎 𝟗 𝟏 𝟒 = 𝟏𝟔 𝟒 𝟏𝟔 𝟏𝟔 𝟒 𝟏𝟔 𝟐𝟏 𝟓 𝟐𝟎 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 8 HOWEVER AIC BIC
  • 9. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used Hierarchical structure causes non-identifibility : 𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟐 −𝟑 𝟏 𝟐 𝟐 −𝟑 𝟏 𝟐 −𝟏 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟕 𝟓 𝟑 𝟓 𝟑 𝟔 𝟓 𝟏𝟕 𝟓 𝟐𝟎 𝟗 𝟏 𝟒 = 𝟏𝟔 𝟒 𝟏𝟔 𝟏𝟔 𝟒 𝟏𝟔 𝟐𝟏 𝟓 𝟐𝟎 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 9 HOWEVER AIC BIC One matrix, (at least) pairs about the NMF
  • 10. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 10 HOWEVER • Strongly depending on initial value • Suffering from many local minima – It seldom reaches to the global minimum. In Addition + AIC BIC
  • 11. Learning Theory of NMF • NMF has been used for ``data → knowledge’’ 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 11
  • 12. • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been yet established – Prediction accuracy has not been yet clarified No guarantee for correctness of numerical calculation No method for theoretical hyperparameter tuning 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 12 Learning Theory of NMF
  • 13. • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been yet established – Prediction accuracy has not been yet clarified 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 13 Constructing its theory is an important problem Learning Theory of NMF
  • 14. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sktech of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 14
  • 15. • In general [Watanabe, 2001] – Let n be the sample size – Bayesian generalization error 𝑮 𝒏 has an asymptotic behavior: 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 • Learning coefficient 𝝀 depends on the model • 𝝀 is called real log canonical threshold (RLCT) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 15 When does RLCT appear?
  • 16. Error: Bayes<<Freq. • In hierarchical structure model, Bayesian 𝝀 is smaller than frequentist’s one and maximum posterior one [Watanabe,2001 and 2009] • Bayesian inference is effective for reducing the generalization error • We consider Bayesian inference framework – Bayesian inference for NMF has been proposed [Cemgil, 2009] Rem: ← is only discrete 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 16
  • 17. RLCT of NMF is unknown • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been established – Prediction accuracy has not been clarified ↑ means that the RLCT of NMF has not been clarified 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 17
  • 18. Def. RLCT • RLCT is characterized as a learning coefficient • It is defined by the largest pole of the following complex function: 𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘, where 𝑲 is KL-divergence from true distribution to learning machine and 𝝋 is prior. • A statistical model selection method that uses RLCTs has been proposed [Drton, et al. 2017] 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 18
  • 19. Def. RLCT • RLCT is characterized as a learning coefficient • It is defined by the largest pole of the following complex function: 𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘, where 𝑲 is KL-divergence from true distribution to learning machine and 𝝋 is prior. • A statistical model selection method that uses RLCTs has been proposed [Drton, et al. 2017] 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 19 known as sBIC (singular BIC)
  • 20. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sktech of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 20
  • 21. Research Goal • Constructing learning theory of NMF →focus theoretical generalization error →focus RLCT of NMF • Recently, we derived an upper bound of RLCT [Hayashi, et. al. 2017] • We used algebraic geometrical method (singularity resolution) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 21
  • 22. Research Goal • Constructing learning theory of NMF →focus theoretical generalization error →focus RLCT of NMF • In this research, we newly derive the exact value of the RLCT of NMF in the case rank ≦ 2 • Using the above exact value, we make the upper bound tighter than previous one 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 22
  • 23. 2. MAIN THEOREM 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 23
  • 24. Index • Introduction • Main Theorem – Bayesian Framework of NMF – Main Result • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 24
  • 25. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 25 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵
  • 26. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. • True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵 • Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 26 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵 𝑯 𝟎 𝑵 𝑯 𝟎 𝑴 𝑨 𝑩
  • 27. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. • True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵 • Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵 • What is the Bayesian framework of ↑? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 27 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵 𝑯 𝟎 𝑵 𝑯 𝟎 𝑴 𝑨 𝑩
  • 28. • Notation of probability density function (PDF) – 𝑞 𝑊 : true distribution, – 𝑝 𝑊 𝑋, 𝑌 : learning machine, – 𝑝∗ 𝑊 : predictive distribution, whose domains are Euclidian sp. – 𝜑 𝑋, 𝑌 : prior distribution, – 𝑝 𝑋, 𝑌 𝑊 𝑛 : posterior distribution given data, whose domains are compact subsets of Euclidian sp. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 28 Formalizing and Setting data parameter
  • 29. • Assume 𝒒 𝑾 ∝ 𝐞𝐱𝐩 − 𝟏 𝟐 𝑾 − 𝑨𝑩 𝟐 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐞𝐱𝐩 − 𝟏 𝟐 𝑾 − 𝑿𝒀 𝟐 , and prior 𝝋 is strictly positive and bounded in a neighborhood of 𝑨, 𝑩 . • Remark: Poisson and exponential dist. can be also applied [Hayashi, et al. 2017]. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 29 Formalizing and Setting
  • 30. Bayesian Framework • The posterior is defined by 𝒑 𝑿, 𝒀 𝑾 𝒏 = 𝟏 𝒁 𝒏 ෑ 𝒊=𝟏 𝒏 𝒑 𝑾𝒊 𝑿, 𝒀 𝝋 𝑿, 𝒀 where 𝒁 𝒏 is normalizing constant. • The predictive distribution is defined by 𝒑∗ 𝑾 = න𝒑 𝑾 𝑿, 𝒀 𝒑 𝑿, 𝒀 𝑾 𝒏 )𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 30
  • 31. Bayesian Framework • The Bayesian generalization error is defined by KL-divergence from true to predictive dist. : 𝑮 𝒏 = න 𝒒 𝑾 𝐥𝐨𝐠 𝒒 𝑾 𝒑∗ 𝑾 𝒅𝑾. • This depends on the training data thus it is a random variable. • Its expected value among the overall data has an asymptotic behavior: 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 31
  • 32. Index • Introduction • Main Theorem – Bayesian Framework of NMF – Main Result • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 32
  • 33. Def. RLCT of NMF • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 33
  • 34. • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . • 𝜻 𝒛 can be analytically continued to the entire complex plane and its poles are negative rational numbers. • The largest pole of 𝜻 𝒛 equals −𝝀 . Then, 𝝀 is the RLCT of NMF. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 34 Def. RLCT of NMF
  • 35. • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . • 𝜻 𝒛 can be analytically continued to the entire complex plane and its poles are negative rational numbers. • The largest pole of 𝜻 𝒛 equals −𝝀 . Then, 𝝀 is the RLCT of NMF. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 35 𝐎 𝐗 𝐗 𝐗 𝐗 𝐗 𝒛 = −𝝀 Def. RLCT of NMF ℂ
  • 36. Main Theorem • The RLCT of NMF 𝝀 satisfies the following inequality: 𝝀 ≤ 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 , where 𝜹 𝑯 𝟎 = ቊ 𝟏 (𝑯 𝟎 ≅ 𝟏, 𝐦𝐨𝐝 𝟐) 𝟎 (𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆) . The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 𝐨𝐫 𝟐 or 𝑯 𝟎 = 𝟎. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 36
  • 37. 3. DISCUSSION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 37
  • 38. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 38
  • 39. Tighter than previous • Main Theorem shows an upper bound of the RLCT of NMF. • We have derived another bound of it in previous research. • How tight is the new bound? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 39
  • 40. Tighter than previous • In previous work, 𝝀 ≤ 𝝀 𝒑𝒓𝒗 = 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 . • In this paper, 𝝀 ≤ 𝝀 𝒏𝒆𝒘 = 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 . • By comparing them, we improve the bound 𝝀 𝒏𝒆𝒘 − 𝝀 𝒑𝒓𝒗 = 𝟏 𝟐 𝑯 𝟎 − 𝜹 𝑯 𝟎 . • True dist. is more complex, bound is tighter. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 40
  • 41. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 41
  • 42. Bound of Error • Main Theorem shows an upper bound of the Bayesian generalization error via 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 . • Actually, we have 𝔼 𝑮 𝒏 ≤ 𝟏 𝟐𝒏 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 + 𝒐 𝟏 𝒏 . – This gives guarantee of accuracy! • What distribution can we bound the error? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 42
  • 43. Robustness on Dist. • Main Theorem assumes that elements of parameter matrices are subject to normal distribution: 𝒒 𝑾 ∝ 𝓝 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝓝 𝑾 𝑿𝒀 . • Can Main Theorem be used even for other distributions? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 43
  • 44. Robustness on Dist. • In the prior work [Hayashi, et. al. 2017], we proved that same zeta function can be applied to Poisson and exponential distribution: 𝒒 𝑾 ∝ 𝐏𝐨𝐢 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐏𝐨𝐢 𝑾 𝑿𝒀 , 𝒒 𝑾 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑿𝒀 , 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 44 Even if We can use
  • 45. Robustness on Dist. • The above result is derived by the fact that I-divergence and Itakura Saito-divergence have same RLCT as square error. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 45 Distribution Normal Poisson Exponential Similarity Sq. error I-divergence IS-divergence 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 We can use Same RLCTs
  • 46. Robustness on Dist. • The above result is derived by the fact that I-divergence and Itakura Saito-divergence have same RLCT as square error. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 46 Distribution Normal Poisson Exponential Similarity Sq. error I-divergence IS-divergence 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 We can use Same RLCTs Main Thm. is attained!
  • 47. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 47
  • 48. • We carried out experiment to estimate the exact value of RLCT. – (+) to compare with the RLCT of reduced rank regression (RRR); non-restricted matrix factorization. • The posterior cannot be analytically derived. →Markov Chain Monte Carlo(MCMC) – We used Metropolis Hastings method. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 48 Numerical Experiment
  • 49. • We made artificial data and set the following cases: – 1. exact value of RLCT of NMF is known. – 2. exact value is unknown and rank = rank+. – 3. exact value is unknown and rank ≠ rank+. • rank+ : minimal inner dimension of NMF – ↑ is called non-negative rank. – In general, rank+≧ rank holds. – If min{rows, columns} ≦ 3 or rank ≦2, rank =rank+. – There is a non-negative matrix s.t. rank<rank+. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 49 Non-negative Rank
  • 50. • Sample size n=200 (parameter dimension≦50) • The number of data sets D=100 → we empirically calculated the RLCT using 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 → 𝝀 ≈ 𝒏𝔼 𝑮 𝒏 ≈ 𝒏 𝑫 ෍ 𝒋=𝟏 𝑫 𝑮 𝒏 𝒋 • MCMC sample size K=1,000 – Burn-in=20,000, thin=20, i.e. sampling iteration is 40,000. • For calculating 𝑮 𝒏, we generated T=20,000 test datas from the true distribution. • Total: 100*(40,000+1,000*20,000) ≈ 𝑶 𝑫𝑲𝑻 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 50 Condition of Experiments
  • 51. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 51 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank
  • 52. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 52 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR Numerical results equal theoretical value: 𝝀 𝑵 = 𝝀. Numerical calculation is correct! r: true rank
  • 53. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 53 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank If rank = rank+, then numerical results equal RRR case: 𝝀 𝑵 = 𝝀 𝑹. It seems that if rank = rank+, then the RLCT of NMF 𝝀 = 𝝀 𝑹.
  • 54. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 54 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank If rank ≠ rank+, then num. results are larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹. If rank ≠ rank+, then the RLCT of NMF is larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹.
  • 55. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 55 ConjectureFrom the paper:
  • 56. 4. CONCLUSION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 56
  • 57. Index • Introduction • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 57
  • 58. • (Main contribution) We mathematically improved the upper bound of the RLCT of NMF. – This made the bound of the generalization error in Bayesian NMF tighter. • (Minor contribution) We carried out experiments and suggested conjecture about the exact value of the RLCT: – ・ rank = rank+ ⇒ RLCT of NMF = RLCT of RRR. – ・ rank ≠ rank+ ⇒ RLCT of NMF > RLCT of RRR. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 58 Conclusion
  • 59. APPENDIX: SKETCH OF PROOF FOR MAIN THEOREM 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 59
  • 60. • We have already derived the exact value in the case 𝑯 𝟎 = 𝟎 and 𝑯 = 𝑯 𝟎 = 𝟏. • We newly clarified the exact value in the case 𝑯 = 𝑯 𝟎 = 𝟐 by considering the dimension of algebraic subvariety in the parameter space. • We bound the RLCT in the case 𝑯 = 𝑯 𝟎 by using the clarified exact value mentioned above. • We bound the RLCT in general case by using the above results. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 60 Sketch of Proof