CS464_Ch6_FeatureExtraction
CS464_Ch6_FeatureExtraction
Introduction to
Machine Learning
• Example:
– Given 53 blood and urine test results (features) from 65
people
– How can we visualize measurements?
Data Visualization
Data Visualization
Data Visualizations
Data Visualization
Data Visualization
• Is there a representation better than the coordinate
axes?
• Is it really necessary to show all 53 dimensions?
– What if there are strong correlations between the
features?
• How could we find the smallest subspace of the 53-
dimension space that keeps the most of information
about the original data?
⌥ Lower&dimensional&projecJons&
• Rather than picking ⇥ ⌥ m ⌥ ⇤
⌥⌥ (t) a⇥subset
⌥ m ⌥ of features, ⇤ x1, Q(z
xQ(z
2 ,…,| xj )
| xj )
Q(z | xj ) log P (xj | (t)) Q(z | xj ) log
z
x
Q(z
n create
• xj ) lognew features ) from theQ(z
j=1 z
existing ones
| Rather&than&picking&a&subset&of&the&features,&we&can&
P (xj | | xj ) log P
P
(z
(z
|
|
x
x j ,
,
(t) )
(t) )
1 z obtain&new&ones&by&combining&exisJng&features&x
j=1 z &…&x 1
j
n&
⌥
z1 = w0(1) + ⌥wi(1)xi
(1) (1)
z1 = w0 + wi xi
… i
i
(k)
⌥ (k)
zk = w0 + wi xi
i
• New&features&are&linear&combinaJons&of&old&ones&
• New features are linear combinations of old ones
• Reduces&dimension&when&k<n&
• Reduces dimension when k < n
• Let’s&consider&how&to&do&this&in&the&unsupervised&
se]ng&& how to do this in an unsupervised setting
• Let’s consider
just&X,&but&no&Y&
(only X– no Y)
Data Compression
Reduce data from
2D to 1D
(inches)
(cm)
Data Compression
Reduce data from
2D to 1D
(inches)
(cm)
Country
Canada 1.6 1.2
China 1.7 0.3
India 1.6 0.2
Russia 1.4 0.5
Singapore 0.5 1.7
USA 2 1.5
… … …
Andrew Ng Coursera slide
Data represented in two dimensions
The figure becomes recognizable around the 7th or 8th image, but not perfect.
Reconstruction 1
In this next image, we show a similar picture, but with each additional face
representing an additional 8 principle components.
You can see that it takes a rather large number of images before the picture looks
totally correct.
Source: https://www.cs.princeton.edu/~cdecoro/eigenfaces/
Reconstruction 2
However, in this next image, we show images where the dataset excludes all those
images with either glasses or different lighting conditions.
The point to keep in mind is that each new image represents one new principle
component. As you can see, the image converges extremly quickly.
Original Image
Reconstruction Error vs PCA Dimensions
PCA Compression: 144D => 60D
PCA Compression: 144D => 16D
PCA Compression 144D=>6D
PCA Compression: 144D => 3D
PCA: a useful preprocessing step
• Helps reduce computational complexity
• Caveats:
– Directions of greatest variance may not be most
informative (i.e. greatest classification power).
Problematic Dataset for PCA
Problematic Dataset for PCA
PCA summary
Acknowledgements
• Aarthi Singh, Andrew Ng, Barnabás Póczos