Week12_PCA_BayesianInference_before_lecture
Week12_PCA_BayesianInference_before_lecture
Week 12
1
Recap: Bias and Variance
Underfitting Overfitting
High bias and low variance Low bias and high variance
2
Recap: Overfitting and Underfitting
• Simple Model
• High Bias
• Complex Model
• High Variance
3
Recap: Overfitting and Underfitting
4
Overfitting and Underfitting - Examples
5
Regularization to prevent overfitting
• Dimensionality Reduction
k = 200 k = 50 k=2
6
Dimensionality Reduction
7
Outline
• Examples
8
Carry-on Questions
9
Unsupervised Learning
• Clustering
10
Unsupervised Learning
• Clustering
• Dimensionality Reduction
11
Dimensionality Reduction
12
Dimensionality Reduction
13
Example: Reduce Data from 2D to 1D
(inches)
(cm)
14
Example: Reduce Data from 2D to 1D
2D to 1D
15
Example: Reduce Data from 2D to 1D
2D to 1D
16
Example: Reduce Data from 3D to 2D
• Inefficient algorithm
18
Dimensionality Reduction
• Prevent Overfitting
• Better Visualization
19
Principal Component Analysis
• Unsupervised learning.
20
Principal Component Analysis
21
Principal Component Analysis
23
Principal Component Analysis
24
Example: PCA
25
Example: PCA
26
Example: PCA
Larger projected
variance!
Smaller projected 𝒗
variance!
27
Example: PCA
28
Example: PCA
𝒗 Larger
projected MSE
Smaller
projected MSE
𝒗
29
Derivation of PCA - Optional
𝒛𝒊 = 𝒗𝑇 𝒙𝒊 𝒗
30
Derivation of PCA - Optional
𝒛𝒊 = 𝒗𝑇 𝒙𝒊 𝒗
31
Derivation of PCA - Optional
32
Derivation of PCA - Optional
𝑁 𝑁
1 1
𝑉𝑎𝑟 = (𝒛𝒊 ) = (𝒗𝑇 𝒙𝒊 )2
2
𝑁 𝑁
𝑖=1 𝑖=1
33
Derivation of PCA - Optional
𝑁 𝑁
1 1
𝑉𝑎𝑟 = (𝒛𝒊 )2 = (𝒗𝑇 𝒙𝒊 )2
𝑁 𝑁
𝑖=1 𝑖=1
34
Derivation of PCA - Optional
𝑁
1
max (𝒗𝑇 𝒙𝒊 )2
𝒗 2 =1 𝑁
𝑖=1
35
Derivation of PCA - Optional
𝒙𝒊 𝑻
𝒙𝟏 𝑻
⋮
=
𝑁
𝒙𝒊 𝑻
⋮
𝑻
𝒙𝑵
𝑑: feature dimension
36
Derivation of PCA - Optional
𝑁
1 1 𝑇 𝑇
max (𝒗𝑇 𝒙𝒊 )2 = max 𝒗 𝑿 𝑿𝒗
𝒗 2 =1 𝑁 𝒗 2 =1 𝑁
𝑖=1
37
Derivation of PCA - Optional
𝑁
1 1 𝑇 𝑇
max (𝒗𝑇 𝒙𝒊 )2 = max 𝒗 𝑿 𝑿𝒗
𝒗 2 =1 𝑁 𝒗 2 =1 𝑁
𝑖=1
1 𝑁 1 𝑇 𝑇
• Here σ𝑖=1(𝒗𝑇 𝒙𝒊 )2 = 𝒗 𝑿 𝑿𝒗 is the variance of the projected data.
𝑁 𝑁
• The optimal 𝒗* is the first unit weight vector for the PCA.
38
Derivation of PCA - Optional
39
Derivation of PCA - Optional
• Algorithm:
40
Derivation of PCA - Optional
• Algorithm:
𝚺 𝑇
1. Apply SVD to the data matrix 𝑿, as sv𝑑 𝑿 = 𝑼 𝑽 .
𝟎
𝒗1 𝑇 𝒙𝒊
𝑽 𝒌 𝑻 𝒙𝒊 = ⋮
𝒗𝑘 𝑇 𝒙𝒊
41
Derivation of PCA - Example
• Calculate the first unit weight vector 𝒗, and the first Principal Component.
4 6 10
1. Data Matrix: 𝐗 = 3 10 13
−2 −6 −8
2. SVD: 𝐗 = 𝐔𝚺𝑽𝑻
Sample Variance
with rotations,
scales, etc.
43
PCA Example 1
Reconstructions:
x k=2 k = 10 k = 50 k = 200
44
PCA Example 2
• Image Compression
Original Image
46
PCA Example 2
• Image Compression
PCA compression: 144D → 16D
47
PCA Example 2
• Image Compression
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
48
PCA Example 2
• Image Compression
PCA compression: 144D → 6D
49
PCA Example 2
• Image Compression
PCA compression: 144D → 1D
50
Carry-on Questions
• Apply SVD to the data matrix. Select the unit weight vector 𝒗𝒌
from the V matrix, and calculate (𝒗𝑘 𝑇 𝒙𝒊 )𝒗𝑘 .
51
Bayesian Inference
52
Outline
• Bayes’ Theorem
• Naïve Bayes
• Examples
53
Carry-on Questions
54
From deterministic to probabilistic learning
• Deterministic model: always gives you the same output for the same input.
55
From deterministic to probabilistic learning
• Deterministic inference: gives you the same output for the same input.
• Bayesian Inference
56
Basic Concepts in Probability Theory
57
Basic Concepts in Probability Theory
• h = A hypothesis or an event.
• h = A hypothesis or an event.
59
Basic Concepts in Probability Theory
• h = A hypothesis or an event.
60
Basic Concepts in Probability Theory
61
Basic Concepts in Probability Theory
• h = A hypothesis or an event.
• Conditional Probability 𝑃 𝐴 | 𝐵 :
• Joint probability 𝑃 𝐴, 𝐵 :
• 𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 ∗ 𝑃(𝐵)
63
Bayes’ Theorem
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
• Therefore:
𝑃(𝐴, 𝐵) 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 = =
𝑃(𝐵) 𝑃(𝐵)
64
Bayes’ Theorem
65
Bayes’ Theorem
𝑃 𝐷 ℎ) 𝑃(ℎ)
𝑃 ℎ𝐷 =
𝑃(𝐷)
• Some terminologies we are going to use:
66
Bayes’ Theorem
𝐡𝐌𝐀𝐏 = 𝐚𝐫𝐠𝐦𝐚𝐱 𝐏 𝐃 𝐡
67
Example Questions
68
Example Questions
69
Example Questions
• Quiz 3: Our IE4483 is attended by students from both EEE and IEM.
Only 50% of the IEM students and 30% of the EEE students pass the
exam. Given that 60% of the entire class are EEE students, what is the
percentage of IEM students amongst those who pass the exam?
70
Example Questions
• You’re a contestant on a game show. You see three closed doors, and
behind one of them is a prize. You choose one door, and the host opens
one of the other doors and reveals that there is no prize behind it. Then
he offers you a chance to switch to the remaining door. Should you take
it?
http://en.wikipedia.org/wiki/Monty_Hall_problem 71
Naïve Bayes
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
72
Naïve Bayes
𝑃 𝑎1 , 𝑎2 𝑣𝑗 ) = 𝑃 𝑎1 𝑣𝑗 )𝑃 𝑎2 𝑣𝑗 )
73
Naïve Bayes
• Mathematical Form:
𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = 𝑃 𝑎1 𝑣𝑗 ) … 𝑃 𝑎𝑑 𝑣𝑗 )
𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = ෑ 𝑃 𝑎𝑖 𝑣𝑗 )
𝑖=1
74
Naïve Bayes - Example
• Play Tennis:
• Play Tennis:
S = Strong (Wind)
76
Bayes’ Theorem
𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 ∗ 𝑃(𝐵)
77
Bayes’ Theorem
• Bayes’ Theorem:
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
78
Naïve Bayes - Example
• Play Tennis:
9
• Thus, 𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) = 14 × 0.00823 = 𝟎. 𝟎𝟎𝟓𝟏
79
Naïve Bayes - Example
• Play Tennis:
80
Carry-on Questions
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = ෑ 𝑃 𝑎𝑖 𝑣𝑗 )
𝑖=1
81
What we have learned
• Bayes’ Theorem
• Naïve Bayes
82