0% found this document useful (0 votes)
5 views46 pages

INF2008 Lecture09

This document discusses Principal Component Analysis (PCA) in the context of machine learning, focusing on its importance for dimensionality reduction and real-world applications. It outlines the steps involved in performing PCA, including standardizing data, computing the covariance matrix, and interpreting eigenvalues and eigenvectors. The document also provides a motivating example related to food choices to illustrate how PCA simplifies complex data into key factors.

Uploaded by

phantomsixth6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views46 pages

INF2008 Lecture09

This document discusses Principal Component Analysis (PCA) in the context of machine learning, focusing on its importance for dimensionality reduction and real-world applications. It outlines the steps involved in performing PCA, including standardizing data, computing the covariance matrix, and interpreting eigenvalues and eigenvectors. The document also provides a motivating example related to food choices to illustrate how PCA simplifies complex data into key factors.

Uploaded by

phantomsixth6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

INF2008: Machine Learning

Unsupervised Learning (II)


Principal Component Analysis

Week 09
Learning Objectives
• By the end of this lecture, students should be able to:
• Understand the Need for PCA
• Explain why dimensionality reduction is useful.
• Identify real-world applications where PCA is used.

• Perform PCA Step by Step

• Interpret PCA Results


• Explain what principal components represent in the context of the dataset.
• Analyze how much variance is captured by different principal components.
• Interpret scatter plots of PCA-transformed data.

• Compare Custom PCA with Sklearn Implementation


• Validate results by comparing custom calculations with sklearn's PCA.
Motivating Example: PCA and Understanding Food Choices
• Imagine you are trying to eat healthier, but you’re overwhelmed by the amount of information on food labels.

• Every food item has a lot of numbers:


• Calories
• Carbohydrates
• Sugar
• Fat
• Protein
• Sodium (Salt)

• That’s a lot of information to think about for every meal! Hence it is important to simplify these to just maybe just a few key
numbers.
Motivating Example: PCA and Understanding Food Choices
• Step 1: Finding the Important Patterns
• You realize that some of these numbers tend to be related:
• Foods high in carbohydrates are often high in sugar too (like cakes and soft drinks).
• Foods high in fat are often high in calories (like fried food).
• Foods with high sodium usually don’t have much sugar (like potato chips or processed meats).
• Instead of worrying about all these numbers separately, you could group them into a few key factors that summarize
the most important health aspects!

• Step 2: Reducing Complexity


• Let’s say you have three main health concerns:
• Energy intake → This combines calories, carbs, and sugar (because they all contribute to energy levels).
• Heart health → This combines fat and sodium (since both impact cardiovascular health).
• Protein intake → Since protein works differently, it can remain separate.

• Now, instead of tracking six different numbers, you’ve reduced everything to just three key factors.

• This is exactly what PCA does with complex data—it finds patterns and reduces the number of things you need to
focus on while keeping the most important information intact.
Motivating Example: PCA and Understanding Food Choices
• Step 3: Applying This Knowledge
• If you need more energy, you can focus on the Energy intake factor instead of separately checking calories, carbs, and
sugar.
• If you're watching your heart health, you focus on the Heart health factor instead of stressing over fat and sodium
separately.
• If you need more muscle recovery, you check the Protein intake factor.

• What is the advantage of doing this? You make smarter and faster decisions without getting lost in too much detail.

• Just like we grouped similar food components together, PCA takes large amounts of raw data (like all the food label
numbers) and finds the most meaningful patterns. It then reduces the complexity so we can focus on the key factors that
matter most.
Components in PCA
• In PCA, components are the original individual variables (features) in the dataset.

• In our food example, these are the detailed nutritional values for each food:
• Calories
• Carbohydrates
• Sugar
• Fat
• Protein
• Sodium (Salt)

• Each of these components contributes differently to a food’s overall nutritional profile, just like how different variables
contribute to a dataset.
Principal Components in PCA
• PCA finds patterns in the data and creates Principal Components (PCs), which are new, combined variables that capture the
most important information in a simplified way.

• How the Principal Components are Formed:

• PCA analyzes the relationships among components and finds which ones tend to vary together.

• For example:
• Calories, Carbohydrates, and Sugar are often high together (think of soft drinks and cakes).
• Fat and Sodium tend to be high together (think of fried food and processed meat).
• Protein behaves independently (e.g., chicken breast is high in protein but low in sugar and carbs).
Step 1: Standardize the Data
Food Item Calories Carbohydrates Sugar Fat Sodium Protein
Chicken 165.0 0.0 0.0 3.6 74.0 31.0
Apple 95.0 25.0 19.0 0.3 2.0 0.5
Cake 350.0 50.0 35.0 15.0 300.0 5.0

Food Item Original Value Mean Std Dev Std Scaled Value 𝑋 − 𝑋ത
Chicken 165.0 203.3333 107.57426375 -0.35634
𝑋𝑠𝑐𝑎𝑙𝑒𝑑 =
𝜎
Apple 95.0 203.3333 107.57426375 -1.00706
Chocolate Cake 350.0 203.3333 107.57426375 1.3634

Calories_scaled
-0.35634
-1.00706
1.3634
Test Yourself 1: Calculate the scaled values for sugar
Food Item Calories Carbohydrates Sugar Fat Sodium Protein
Chicken 165.0 0.0 0.0 3.6 74.0 31.0
Apple 95.0 25.0 19.0 0.3 2.0 0.5
Cake 350.0 50.0 35.0 15.0 300.0 5.0

Calories Carbohydrates Sugar Fat Sodium Protein 𝑋 − 𝑋ത


107.57426375 20.41241452 14.30617582 6.2976186 126.95756071 13.44329655
𝑋𝑠𝑐𝑎𝑙𝑒𝑑 =
Std Dev 𝜎

Food Item Original Value Mean Std Dev Std Scaled Value
Chicken
Apple
Chocolate Cake
Test Yourself 1: Calculate the scaled values for sugar
Food Item Calories Carbohydrates Sugar Fat Sodium Protein
Chicken 165.0 0.0 0.0 3.6 74.0 31.0
Apple 95.0 25.0 19.0 0.3 2.0 0.5
Cake 350.0 50.0 35.0 15.0 300.0 5.0

Food Item Original Value Mean Std Dev Std Scaled Value 𝑋 − 𝑋ത
Chicken 0.0 18 14.30617582 -1.2582
𝑋𝑠𝑐𝑎𝑙𝑒𝑑 =
𝜎
Apple 19.0 18 14.30617582 0.0699
Chocolate Cake 35.0 18 14.30617582 1.188298

Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled


-0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
-1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331
Step 2: Compute the Covariance Matrix
• The covariance matrix calculates how nutritional values are connected to one another.
Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
-0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
-1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331

• C is an 𝑛×𝑛 covariance matrix


1 𝑇
𝐶 = 𝑋𝑠𝑐𝑎𝑙𝑒𝑑 𝑋𝑠𝑐𝑎𝑙𝑒𝑑 • 𝑚 is the number of samples (food items)
𝑚 • 𝑛 is the number of features (calories, carbs, sugar, etc.)

Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled


Calories_scaled 1 0.702082 0.666027 0.99858 0.999379 -0.11736
Carbohydrates_scaled 0.702082 1 0.998778 0.739014 0.726732 -0.78957
Sugar_scaled 0.666027 0.998778 1 0.704813 0.691891 -0.81894
Fat_scaled 0.99858 0.739014 0.704813 1 0.999837 -0.17009
Sodium_scaled 0.999379 0.726732 0.691891 0.999837 1 -0.15227
Protein_scaled -0.11736 -0.78957 -0.81894 -0.17009 -0.15227 1
Step 2: Compute the Covariance Matrix (Interpretation)
Coveriance (C) Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
Calories_scaled 1 0.70 0.67 1 1 -0.12
Carbohydrates_scaled 0.70 1 1 0.74 0.73 -0.79
Sugar_scaled 0.67 1 1 0.70 0.69 -0.82
Fat_scaled 1 0.74 0.70 1 1 -0.17
Sodium_scaled 1 0.73 0.70 1 1 -0.15
Protein_scaled -0.12 -0.79 -0.82 -0.17 -0.15 1

• The covariance matrix tells us how different features (Calories, Carbohydrates, Sugar, etc.) relate to each other:
• Positive values indicate a strong positive relationship (i.e., when one value increases, the other also increases).
• Negative values indicate an inverse relationship (i.e., when one value increases, the other decreases).
• Values close to zero suggest little or no relationship between the features.

• Observations from the Covariance Matrix:


• Calories & Fat have a very strong positive correlation (0.999) → This means foods high in calories tend to be high in fat.
• Carbohydrates & Sugar have a strong correlation (0.999) → This makes sense since many high-carb foods also contain
sugar.
• Protein has negative correlations with most features → This suggests that high-protein foods (like Chicken) are less
related to high-carb or high-fat foods.
Test Yourself 2:
• My dataset has 200 rows and each row has 5 attributes. It can be decomposed by PCA to cover a 95% variance in about 1
minute.

• What is the size of the covariance matric?

• 5 by 5, 200 by 200 or 10 by 10?


Test Yourself 2:
• My dataset has 200 rows and each row has 5 attributes. It can be decomposed by PCA to cover a 95% variance in about 1
minute.

• What is the size of the covariance matric?

• 5 by 5, 200 by 200 or 10 by 10?

• The size of the covariance matrix in PCA is determined by the number of features (attributes), not the number of samples
(rows).

• Since your dataset has 5 attributes, the covariance matrix will be 5 × 5 because it captures the variance and relationships
between the attributes.
Step 3: Compute Eigenvalues and Eigenvectors
• The next step calculates something known as eigenvectors and eigenvalues through a step known as eigen decomposition.

𝐶𝑊 = 𝜆𝑊

• C is the covariance matrix calculated from the last step.

• W is the eigenvector matrix (each column is a principal component direction).

• λ (eigenvalues) indicate how much variance each principal component explains.

Feature PC1 PC2 PC3 PC Dir Eigenvalue Variance (%)

Calories 0.4227 0.3601 -0.466 PC1 (Energy) 4.52950863 75.49%

Carbohydrates 0.44288 -0.2755 0.53903 PC2 (Heart) 1.47049137 24.50%

Sugar 0.43458 -0.3135 -0.5096 PC3 (Protein) 0 0%

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775


Step 3: Compute Eigenvalues and Eigenvectors (Interpretation)
Feature PC1 PC2 PC3 PC Dir Eigenvalue Variance (%)

Calories 0.4227 0.3601 -0.466 PC1 (Energy) 4.52950863 75.49%

Carbohydrates 0.44288 -0.2755 0.53903 PC2 (Heart) 1.47049137 24.50%

Sugar 0.43458 -0.3135 -0.5096 PC3 (Protein) 0 0%

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775

Principal Component 1 (PC1) PC1 represents a general measure of energy-dense foods. Foods high in
• Strongly influenced by: Calories, Carbohydrates, Sugar, Fat, and Sodium all contribute positively,
• Calories (0.4227) meaning they increase along PC1. Protein has a smaller negative
• Carbohydrates (0.44288) contribution, suggesting that foods higher in protein tend to have lower
• Sugar (0.43458) energy density.
• Fat (0.43303)
• Sodium (0.42967)

Weak influence from:


• Protein (-0.2534)
Test Yourself 3:
3.1 What does a high score in PC1 indicate?
a) The food item is high in Protein but low in Sugar and Carbohydrates.
b) The food item is high in Calories, Carbohydrates, Sugar, Fat, and Sodium.
c) The food item is mostly water and contains minimal nutrients.
d) The food item is primarily rich in Sodium but low in Calories.

3.2 A food item that is high in Protein but low in Sugar and Carbohydrates is likely to have…
a) A high score on PC1
b) A low score on PC1
c) A high score on PC2
d) A low score on PC2
Test Yourself 3:
3.1 What does a high score in PC1 indicate?
a) The food item is high in Protein but low in Sugar and Carbohydrates.
b) The food item is high in Calories, Carbohydrates, Sugar, Fat, and Sodium.
c) The food item is mostly water and contains minimal nutrients.
d) The food item is primarily rich in Sodium but low in Calories.

Answer: B (PC1 represents overall energy density, so a high score means a food is high in Calories, Carbs, Sugar, Fat, and
Sodium.)

3.2 A food item that is high in Protein but low in Sugar and Carbohydrates is likely to have…
a) A high score on PC1
b) A low score on PC1
c) A high score on PC2
d) A low score on PC2

Answer: C (PC2 contrasts Protein-rich foods against Carbohydrate/Sugar-rich foods, so a high-PC2 score means a food is rich in
Protein.)
Step 4: Compute Principal Components
• In this step, we project the standardized data onto the eigenvectors (principal component directions).

• What this means is that we use the following formula:


• Principal Components=Standardized Data × Eigenvectors
• Standardized Data = The scaled data from Step 1
• Eigenvectors = The principal component directions from Step 3.

• Each row in the result represents a transformed data point in the new PCA space.

• Each row of 6 features will be converted into the PC numbers (PC1, PC2 and PC3)

• Instead of representing food items by Calories, Carbs, Sugar, etc., they are now represented in terms of PC1, PC2, and
PC3.
Step 4: Compute Principal Components
Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
Chicken -0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
Apple -1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
Cake 1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331

Feature PC1 PC2 PC3

X
Calories 0.4227 0.3601 -0.466

Carbohydrates 0.44288 -0.2755 0.53903

Sugar 0.43458 -0.3135 -0.5096

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775

Calories Carbohydrates Sugar Fat Sodium Protein Sum PC1


-0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
X 0.4227 0.44288 0.43458 0.43303 0.42967 -0.2534
Chicken PC1 -0.15062 -0.54241 -0.54679 -0.18565 -0.17373 -0.355 -1.95421
Test Yourself 4:
Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
Chicken -0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
Apple -1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
Cake 1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331

Feature PC1 PC2 PC3

X
Calories 0.4227 0.3601 -0.466

Carbohydrates 0.44288 -0.2755 0.53903

Sugar 0.43458 -0.3135 -0.5096

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775

Calories Carbohydrates Sugar Fat Sodium Protein Sum PC2

X
Apple PC2 ?
Test Yourself 4:
Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
Chicken -0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
Apple -1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
Cake 1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331

Feature PC1 PC2 PC3

X
Calories 0.4227 0.3601 -0.466

Carbohydrates 0.44288 -0.2755 0.53903

Sugar 0.43458 -0.3135 -0.5096

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775

Calories Carbohydrates Sugar Fat Sodium Protein Sum PC2


-1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
X 0.36 -0.276 -0.314 0.32 0.334 0.694
Apple PC2 -0.36264 0 -0.02191 -0.30495 -0.32421 -0.6027 -1.61642
Step 4: Calculated Principal Components
Calories_scaled Carbohydrates_scaled Sugar_scaled Fat_scaled Sodium_scaled Protein_scaled
Chicken -0.35634 -1.22474 -1.2582 -0.42873 -0.40433 1.400946
Apple -1.00706 0 0.0699 -0.95274 -0.97145 -0.86784
Cake 1.363399 1.224745 1.188298 1.381475 1.375788 -0.5331

Feature PC1 PC2 PC3

X
Calories 0.4227 0.3601 -0.466

Carbohydrates 0.44288 -0.2755 0.53903

Sugar 0.43458 -0.3135 -0.5096

Fat 0.43303 0.32008 0.47912

Sodium 0.42967 0.33374 -0.0483

Protein -0.2534 0.69448 0.02775

=
Feature PC1 PC2 PC3

Calories -1.95416 1.304314 -4.25E-16

Carbohydrates -1.00539 -1.61642 6.86903772e-17

Sugar 2.959554 0.31211 7.26546624e-17


Step 5: Analysis of Principal Components
Each point in the scatter plot represents a data row (e.g., a food item). The food is plot only against two
components.

The visualization of the variance can help you see how much variance each principal component captures.
Code Analysis (I)

https://colab.research.google.com/drive/1VlOo4BL3V4aV14i15sXUiePy4EKi6oHW?usp=sharing
Code Analysis (I)

https://colab.research.google.com/drive/1VlOo4BL3V4aV14i15sXUiePy4EKi6oHW?usp=sharing
INF2008: Machine Learning
Unsupervised Learning (II)
t-SNE

Week 09
Learning Objectives
• By the end of this section, learners should be able to:
• Understand the Motivation Behind t-SNE
• Explain why t-SNE is used for dimensionality reduction.
• Critically Analyze the Behavior of t-SNE.
t-SNE Simple Intuition.
• Step 1: Measuring Probabilities in the Marble Bag (High-Dimensional Space)
• Some marbles feel really similar (same size, texture), while others feel very different.
• A soft probability rule:
• If two marbles are similar, the probability of them being neighbors is high.
• If two marbles are far apart, the probability is much lower.
• Mathematically, this is done using a Gaussian distribution (a bell curve), where distances between marbles are
transformed into probabilities.

• Step 2: Random 2D Initialization


• Take all the marbles and drop them randomly on a flat table (2D space).

• Step 3: Mapping High Dimensional Space to Low Dimensional 2D Space.

• Step 4: Shift the Marbles (Optimization by minimizing KL divergence)


• If two marbles were close in the bag but are far on the table → Pull them together.
• If two marbles were far in the bag but are too close on the table → Push them apart.

• Step 5: The Marbles Settle Down (Final Visualization)


• After enough shifting (gradient descent over multiple iterations), the marbles stop moving.
t-SNE Small Image Example.
• Let’s carry out the example of five 3×3 Images
• Each image will be flattened into a 9-dimensional vector:
• ImgX (p1,p2,...,p9)

Table showing the flattened pixel values.


P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img2 121 210 214 74 202 87 116 99 103
Img3 151 130 149 52 1 87 235 157 37
Img4 129 191 187 20 160 203 57 21 252
Img5 235 88 48 218 58 254 169 255 219
Step 1: Compute Pairwise Distances. Img1 Img2 Img3 Img4 Img5
Img1 0 201.7523 214.4271 264.4542 417.8397
• Note that t-SNE does not work with raw high dimensional data. Img2 201.7523 0 272.5638 225.9557 406.6915
Img3 214.4271 272.5638 0 376.5833 353.269
Img4 264.4542 225.9557 376.5833 0 402.199
• Rather we use distance measures such as pairwise distances. Img5 417.8397 406.6915 353.269 402.199 0

• So from the 5x9 matrix we started off with, we will create instead a 5x5 matrix.

• We will use the example of the distance between Img1 and Img2 as an example.
Table showing the distance metric between Img1 and Img2.
P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img2 121 210 214 74 202 87 116 99 103
Img1-Img2 -19 -31 -122 -60 -96 -16 72 -79 -1
Img1-Img2 𝟐 361 961 14884 3600 9216 256 5184 6241 1
Sum Square 40564
Root Sum Square 201.41

• Final 5 x 5 Distance Metric is below:

Table showing the pairwise distance metrics.


Img1 Img2 Img3 Img4 Img5
Img1 0 201.7523 214.4271 264.4542 417.8397
Img2 201.7523 0 272.5638 225.9557 406.6915
Img3 214.4271 272.5638 0 376.5833 353.269
Img4 264.4542 225.9557 376.5833 0 402.199
Img5 417.8397 406.6915 353.269 402.199 0
Test Yourself 1: Calculate the distance between Img1 and Img3
Table showing the flattened pixel values.
P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img2 121 210 214 74 202 87 116 99 103
Img3 151 130 149 52 1 87 235 157 37
Img4 129 191 187 20 160 203 57 21 252
Img5 235 88 48 218 58 254 169 255 219

Table showing the distance metric between Img1 and Img3.


P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img3 151 130 149 52 1 87 235 157 37
Img1-Img3
Img1-Img𝟑 𝟐
Sum Square
Root Sum Square
Test Yourself 1: Calculate the distance between Img1 and Img3
Table showing the flattened pixel values.
P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img2 121 210 214 74 202 87 116 99 103
Img3 151 130 149 52 1 87 235 157 37
Img4 129 191 187 20 160 203 57 21 252
Img5 235 88 48 218 58 254 169 255 219

Table showing the distance metric between Img1 and Img3.


P1 P2 P3 P4 P5 P6 P7 P8 P9
Img1 102 179 92 14 106 71 188 20 102
Img3 151 130 149 52 1 87 235 157 37
Img1-Img3 -49 49 -57 -38 105 -16 -47 -137 65
Img1-Img𝟑 𝟐 2401 2401 3249 1444 11025 256 2209 18769 4225
Sum Square 45979
Root Sum Square 214.4271
Step 2: Convert Euclidean Distances into Similarities.
• Now that we have the pairwise distance matrix, we need to convert these distances into probabilities that represent how
"likely" two images are neighbors.

• Pj∣i is the conditional probability that point j is a neighbor of point i in the original high-dimensional space.

• The formula for Pj∣i is:

2
𝑒𝑥𝑝 −𝑑𝑖𝑗 Τ2𝜎𝑖2
𝑃𝑗|𝑖 = 2 Τ
σ𝑘≠𝑖 −𝑑𝑖𝑘 2𝜎𝑖2

• The formula looks like nightmareish but actually is quite easy to understand.

2
• 𝑑𝑖𝑗 is the distance between image i and image j that we calculated in Step 1.

• σi is the variance of the point i.

• The value of σi depends on a measure based on a measure called perplexity.


Step 2: What is Perplexity?
• Low Perplexity (~5-10) → Very Local Focus
• Each point only considers very close neighbors.
• t-SNE focuses on fine details but might fragment global structure.
• Good for small clusters, but may over-cluster.
• If perplexity is too low, the model may overfit local noise.

• Medium Perplexity (~30-50) → Balanced


• t-SNE considers a reasonable number of neighbors.
• Preserves both local and some global structure.

• High Perplexity (~100+) → Global Focus


• t-SNE considers farther points as well.
• Preserves global structure but may lose fine details.
• Can cause large clusters to blend together.

• Low Perplexity will result in Low Variance.


Dataset Size Recommended Perplexity
Small (~100 points) 5-30 (focus on local structure)
Medium (~1000 points) 30-50 (balanced)
Large (~10,000+ points) 50-100 (global structure)
Step 2: Calculating Probabilities. Table showing the pairwise distance metrics.
Img1 Img2 Img3 Img4 Img5

𝑒𝑥𝑝 −𝑑𝑖𝑗 2𝜎𝑖2 Img1 0 201.7523 214.4271 264.4542 417.8397
𝑃𝑗|𝑖 = 2 Τ
Img2 201.7523 0 272.5638 225.9557 406.6915
σ𝑘≠𝑖 −𝑑𝑖𝑘 2𝜎𝑖2 Img3 214.4271 272.5638 0 376.5833 353.269
Img4 264.4542 225.9557 376.5833 0 402.199
Img5 417.8397 406.6915 353.269 402.199 0

• Assume we want to calculate 𝑃2|1 (1st row, second column).

• We would need d12 = 201.7523 and row 1 of the distance metric. Exponential Unnormalized.
Img1 Img2 Img3 Img4 Img5 Sum
Img1 0.00000 0.13065 0.10036 0.03029 0.00016 0.261474
• Let’s assume a fixed variance 𝜎1=100. Img2 0.13065 0.00000 0.02437 0.07786 0.00026 0.23314
Img3 0.10036 0.02437 0.00000 0.00083 0.00195 0.127513

• Compute Exponential Weight Img4


Img5
0.03029
0.00016
0.07786
0.00026
0.00083
0.00195
0.00000
0.00031
0.00031
0.00000
0.109297
0.002675

2 Τ
𝑒𝑥𝑝 −𝑑12 2𝜎12 = 𝑒𝑥𝑝 − 201.7523 2 Τ 2 ∗ 100 2
= 𝑒𝑥𝑝 − 40703.9905 Τ 2 ∗ 100 2

= 𝑒𝑥𝑝 − 40703.9905 Τ 20000 Normalized Probabilities.


= 𝑒𝑥𝑝 −2.0352
Img1 Img2 Img3 Img4 Img5 Sum
≈0.13065 Img1 0 0.49968 0.38384 0.11586 0.00062 0
Img2 0.56041 0 0.10451 0.33398 0.0011 0.56041
Img3 0.78709 0.19109 0 0.00653 0.01529 0.78709
𝑃2|1 = 0.13065Τ0.26147 Img4 0.27717 0.7124 0.00762 0 0.00281 0.27717
=0.49968 Img5 0.06047 0.09574 0.72897 0.11482 0 0.06047

Img1, Img2 has a 50% chance of being a neighbor.


Test Yourself 2: Calculate 𝑃3|1
Table showing the pairwise distance metrics.
Img1 Img2 Img3 Img4 Img5 2
Img1 0 201.7523 214.4271 264.4542 417.8397 𝑒𝑥𝑝 −𝑑𝑖𝑗 Τ2𝜎𝑖2
Img2 201.7523 0 272.5638 225.9557 406.6915 𝑃𝑗|𝑖 = 2 Τ
Img3 214.4271 272.5638 0 376.5833 353.269 σ𝑘≠𝑖 −𝑑𝑖𝑘 2𝜎𝑖2
Img4 264.4542 225.9557 376.5833 0 402.199
Img5 417.8397 406.6915 353.269 402.199 0
2 Τ
𝑒𝑥𝑝 −𝑑13 2𝜎12 =

Exponential Unnormalized.
𝑃3|1 = Img1 Img2 Img3 Img4 Img5 Sum
Img1 0.00000 0.13065 0.10036 0.03029 0.00016 0.261474
Img2 0.13065 0.00000 0.02437 0.07786 0.00026 0.23314
Img3 0.10036 0.02437 0.00000 0.00083 0.00195 0.127513
Img1, Img3 has a x% chance of being a neighbor. Img4 0.03029 0.07786 0.00083 0.00000 0.00031 0.109297
Img5 0.00016 0.00026 0.00195 0.00031 0.00000 0.002675

Normalized Probabilities.
Img1 Img2 Img3 Img4 Img5
Img1 0 0.49968 0.38384 0.11586 0.00062
Img2 0.56041 0 0.10451 0.33398 0.0011
Img3 0.78709 0.19109 0 0.00653 0.01529
Img4 0.27717 0.7124 0.00762 0 0.00281
Img5 0.06047 0.09574 0.72897 0.11482 0
Test Yourself 2: Calculate 𝑃3|1
Table showing the pairwise distance metrics.
Img1 Img2 Img3 Img4 Img5 2
Img1 0 201.7523 214.4271 264.4542 417.8397 𝑒𝑥𝑝 −𝑑𝑖𝑗 Τ2𝜎𝑖2
Img2 201.7523 0 272.5638 225.9557 406.6915 𝑃𝑗|𝑖 = 2 Τ
Img3 214.4271 272.5638 0 376.5833 353.269 σ𝑘≠𝑖 −𝑑𝑖𝑘 2𝜎𝑖2
Img4 264.4542 225.9557 376.5833 0 402.199
Img5 417.8397 406.6915 353.269 402.199 0
2 Τ
𝑒𝑥𝑝 −𝑑13 2𝜎12 = 𝑒𝑥𝑝 − 214.4271 2 Τ 2 ∗ 100 2
= 𝑒𝑥𝑝 − 45978.98 Τ 2 ∗ 100 2
= 𝑒𝑥𝑝 − 45978.98 Τ 20000
= 𝑒𝑥𝑝 −2.298949 Exponential Unnormalized.
≈0.10036 Img1 Img2 Img3 Img4 Img5 Sum
Img1 0.00000 0.13065 0.10036 0.03029 0.00016 0.261474
Img2 0.13065 0.00000 0.02437 0.07786 0.00026 0.23314
𝑃3|1 = 0.10036Τ0.261474 Img3 0.10036 0.02437 0.00000 0.00083 0.00195 0.127513
=0.38384 Img4 0.03029 0.07786 0.00083 0.00000 0.00031 0.109297
Img5 0.00016 0.00026 0.00195 0.00031 0.00000 0.002675

Img1, Img3 has a 38% chance of being a neighbor.

Normalized Probabilities.
Img1 Img2 Img3 Img4 Img5
Img1 0 0.49968 0.38384 0.11586 0.00062
Img2 0.56041 0 0.10451 0.33398 0.0011
Img3 0.78709 0.19109 0 0.00653 0.01529
Img4 0.27717 0.7124 0.00762 0 0.00281
Img5 0.06047 0.09574 0.72897 0.11482 0
Step 3: Generate random (x, y) coordinates for each points in 2D space.
• We generate random (x, y) coordinates for each of our 5 points in 2D space:
• Yi∼N(0,0.01)

• where Yi is the 2D position for point i.

• This ensures a small random spread around zero, allowing gradual movement during optimization.
Random Points in 2D Space.
X (2D) Y (2D)
Img1 49.67142 -13.8264
Img2 64.76885 152.303
Img3 -23.4153 -23.4137
Img4 157.9213 76.74347
Img5 -46.9474 54.256
Step 4: Compute 𝑄𝑖𝑗​ (Low-Dimensional Similarities).
Random Points in 2D Space. Squared Euclidean Distance
X (2D) Y (2D) Img1 Img2 Img3 Img4 Img5
Img1 0 27826.92 5433.589 19920.94 13970.42
Img1 49.67142 -13.8264 Img2 27826.92 0 38652.8 14386.61 22093.74
Img2 64.76885 152.303 Img3 5433.589 38652.8 0 42914.43 6586.342
Img4 19920.94 14386.61 42914.43 0 42476.88
Img3 -23.4153 -23.4137
Img5 13970.42 22093.74 6586.342 42476.88 0
Img4 157.9213 76.74347
Img5 -46.9474 54.256

2 −1 2 −1
1 + 𝑌𝑖 − 𝑌𝑗 Numerator = 1 + 𝑌𝑖 − 𝑌𝑗
𝑄𝑖𝑗 = −1
σ𝑘≠𝑙 1 + 𝑌𝑘 − 𝑌𝑙 2 −1 = 1 + 27826.92
= 3.59351E-05
Compute Squared Euclidean Distance
Denominator = 0.001362 (working shown later)
𝑌1 − 𝑌2 2 = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2
= 49.67142 − 64.76885 2 + -13.8264 − (152.303) 2 3.59351E-05
𝑄12 = 0.001362
= -15.0974 2 + 0.016613 2
= 227.9326 + 27598.98 = 0.026384096
= 27826.92

So, the squared distance is 27826.92.


Step 4: Compute 𝑄𝑖𝑗​ (Low-Dimensional Similarities).
Random Points in 2D Space. Squared Euclidean Distance
X (2D) Y (2D) Img1 Img2 Img3 Img4 Img5
Img1 0 27826.92 5433.589 19920.94 13970.42
Img1 49.67142 -13.8264 Img2 27826.92 0 38652.8 14386.61 22093.74
Img2 64.76885 152.303 Img3 5433.589 38652.8 0 42914.43 6586.342
Img4 19920.94 14386.61 42914.43 0 42476.88
Img3 -23.4153 -23.4137
Img5 13970.42 22093.74 6586.342 42476.88 0
Img4 157.9213 76.74347
2 −1
Img5 -46.9474 54.256 Numerator = 1 + 𝑌𝑖 − 𝑌𝑗
= 1 + 27826.92 −1
2 −1
1 + 𝑌𝑖 − 𝑌𝑗 = 3.59351E-05
𝑄𝑖𝑗 = 2 −1
σ𝑘≠𝑙 1 + 𝑌𝑘 − 𝑌𝑙
Denominator = 0.001362 (working shown later)
Compute Squared Euclidean Distance
3.59351E-05
𝑄12 = 0.001362
𝑌1 − 𝑌2 2 = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2
= 49.67142 − 64.76885 2 + -13.8264 − (152.303) 2
= 0.026384096
= -15.0974 2 + 0.016613 2
= 227.9326 + 27598.98 Q normalized.
= 27826.92 Img1 Img2 Img3 Img4 Img5
Img1 0 0.026384 0.135101 0.036855 0.052551
Img2 0.026384 0 0.018995 0.051031 0.03323
So, the squared distance is 27826.92. Img3 0.135101 0.018995 0 0.017108 0.111459
Img4 0.036855 0.051031 0.017108 0 0.017285
Img5 0.052551 0.03323 0.111459 0.017285 0
Test Yourself 3: Calculate Q13
Random Points in 2D Space. Squared Euclidean Distance
X (2D) Y (2D) Img1 Img2 Img3 Img4 Img5
Img1 0 27826.92 5433.589 19920.94 13970.42
Img1 49.67142 -13.8264 Img2 27826.92 0 38652.8 14386.61 22093.74
Img2 64.76885 152.303 Img3 5433.589 38652.8 0 42914.43 6586.342
Img4 19920.94 14386.61 42914.43 0 42476.88
Img3 -23.4153 -23.4137
Img5 13970.42 22093.74 6586.342 42476.88 0
Img4 157.9213 76.74347
2 −1
Img5 -46.9474 54.256 Numerator = 1 + 𝑌𝑖 − 𝑌𝑗
2 −1
1 + 𝑌𝑖 − 𝑌𝑗
𝑄𝑖𝑗 = 2 −1
σ𝑘≠𝑙 1 + 𝑌𝑘 − 𝑌𝑙
Denominator = 0.001362 (working shown later)
Compute Squared Euclidean Distance

2= 2+ 2
𝑄13 = 0.001362
𝑌1 − 𝑌2 𝑥2 − 𝑥1 𝑦2 − 𝑦1
=
=
Q normalized.
Img1 Img2 Img3 Img4 Img5
Img1 0 0.026384 0.135101 0.036855 0.052551
Img2 0.026384 0 0.018995 0.051031 0.03323
So, the squared distance is Img3 0.135101 0.018995 0 0.017108 0.111459
Img4 0.036855 0.051031 0.017108 0 0.017285
Img5 0.052551 0.03323 0.111459 0.017285 0
Test Yourself 3: Calculate Q13
Random Points in 2D Space. Squared Euclidean Distance
X (2D) Y (2D) Img1 Img2 Img3 Img4 Img5
Img1 0 27826.92 5433.589 19920.94 13970.42
Img1 49.67142 -13.8264 Img2 27826.92 0 38652.8 14386.61 22093.74
Img2 64.76885 152.303 Img3 5433.589 38652.8 0 42914.43 6586.342
Img4 19920.94 14386.61 42914.43 0 42476.88
Img3 -23.4153 -23.4137
Img5 13970.42 22093.74 6586.342 42476.88 0
Img4 157.9213 76.74347
2 −1
Img5 -46.9474 54.256 Numerator = 1 + 𝑌𝑖 − 𝑌𝑗
2 −1 = 1 + 5433.585 −1
1 + 𝑌𝑖 − 𝑌𝑗 = 0.000184
𝑄𝑖𝑗 = 2 −1
σ𝑘≠𝑙 1 + 𝑌𝑘 − 𝑌𝑙
Denominator = 0.001362 (working shown later)
Compute Squared Euclidean Distance
0.000184
𝑄12 = 0.001362
𝑌1 − 𝑌2 2 = 𝑥2 − 𝑥1 2 + 𝑦2 − 𝑦1 2
= 49.67142 − (-23.4153) 2
2 = 0.1351
+ -13.8264 − (-23.4137)
= 73.08672 2 + 9.5873 2 Q normalized.
= 5341.669 + 91.91632 Img1 Img2 Img3 Img4 Img5
0 0.026384 0.135101 0.036855 0.052551
= 5433.585 Img1
Img2 0.026384 0 0.018995 0.051031 0.03323
Img3 0.135101 0.018995 0 0.017108 0.111459
So, the squared distance is 5433.585 . Img4 0.036855 0.051031 0.017108 0 0.017285
Img5 0.052551 0.03323 0.111459 0.017285 0
Step 5: Updating 2D Positions Using Gradient Descent.
Pij, Normalized Probabilities. Q normalized.
Img1 Img2 Img3 Img4 Img5 Img1 Img2 Img3 Img4 Img5
Img1 0 0.49968 0.38384 0.11586 0.00062 Img1 0 0.026384 0.135101 0.036855 0.052551
Img2 0.56041 0 0.10451 0.33398 0.0011 Img2 0.026384 0 0.018995 0.051031 0.03323
Img3 0.78709 0.19109 0 0.00653 0.01529 Img3 0.135101 0.018995 0 0.017108 0.111459
Img4 0.27717 0.7124 0.00762 0 0.00281 Img4 0.036855 0.051031 0.017108 0 0.017285
Img5 0.06047 0.09574 0.72897 0.11482 0 Img5 0.052551 0.03323 0.111459 0.017285 0

x_j x1 - x_j P_1j (Reference) Q_1j P_1j - Q_1j (P_1j - Q_1j) * Q_1j (P_1j - Q_1j) * Q_1j * (x1 - x_j)
Img1 49.67142 0 0 0 0 0 0
Img2 64.76885 -15.0974 0.49968 0.026384 0.473296 0.012488 -0.18853
Img3 -23.4153 73.08672 0.38384 0.135101 0.248739 0.033605 2.456071
Img4 157.9213 -108.25 0.11586 0.036855 0.079005 0.002912 -0.31519
Img5 -46.9474 96.61882 0.00062 0.052551 -0.05193 -0.00273 -0.26368
Sum 1.688668
Sum * 4 6.75

𝑑𝑋1 = 4 ෍ 𝑃1𝑗 − 𝑄1𝑗 𝑄1𝑗 𝑥1 − 𝑥𝑗


𝑗≠𝑖

𝑋1𝑛𝑒𝑤 = 𝑋1𝑜𝑙𝑑 − 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒 ∗ 𝑑𝑋1


Step 5: Updating 2D Positions Using Gradient Descent.
𝑑𝑋1 = 4 ෍ 𝑃1𝑗 − 𝑄1𝑗 𝑄1𝑗 𝑥1 − 𝑥𝑗
𝑗≠𝑖

• The 4 comes from the derivative of KL divergence with respect to the Student’s t-distribution kernel.

• What Does 𝑃1𝑗 − 𝑄1𝑗 Mean:


• Pij (High-D Similarities):
• This represents how close two points should be in 2D based on their original high-dimensional structure.
• If Pij ​is large, the two points should be close.
• Qij (Low-D Similarities):
• This represents how close the points actually are in 2D.
• If Qij​ is too small, they are too far apart in 2D.
• If Qij​ is too large, they are too close in 2D.

• If Pij > Qij


• The points should be closer → Pull together

• If Pij < Qij


• The points should be further apart → Push apart
Step 5: Updating 2D Positions Using Gradient Descent.
𝑑𝑋1 = 4 ෍ 𝑃1𝑗 − 𝑄1𝑗 𝑄1𝑗 𝑥1 − 𝑥𝑗
𝑗≠𝑖

• If Q1j is large, the points are already close move them less to prevents overcorrections.

• Why Multiply by (𝑥1−𝑥𝑗)?

• This term determines the direction of movement for 𝑥1


• The (𝑥1−𝑥𝑗) term ensures movement is proportional to distance.

• If two points are very far apart, the movement is larger.

• If two points are already close, the movement is smaller.

You might also like