0% found this document useful (0 votes)
3 views

IDS 4 (Week 14)

The document provides an introduction to key concepts in data science, focusing on eigenvectors, dimensionality reduction, and methods such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). It explains how these concepts are used to analyze data by reducing the number of variables while retaining essential information. Additionally, it discusses the applications of these methods in predictive analytics and recommendation systems.

Uploaded by

Junaid Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

IDS 4 (Week 14)

The document provides an introduction to key concepts in data science, focusing on eigenvectors, dimensionality reduction, and methods such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). It explains how these concepts are used to analyze data by reducing the number of variables while retaining essential information. Additionally, it discusses the applications of these methods in predictive analytics and recommendation systems.

Uploaded by

Junaid Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Introduction to Data Science

Dr. Irfan Yousuf


Department of Computer Science (New Campus)
UET, Lahore
(Week 14; April 22 - 26, 2024)
Outline
• Eigenvectors
• Dimensionality Reduction
• Principle Component Analysis
• Singular Value Decomposition
Data Preprocessing
Transformation
• In geometry, transformation refers to the movement of
objects in the coordinate plane.
• In mathematics, a transformation is a function f (usually with
some geometrical meanings) that maps a set X to itself.
Linear Transformation
• A linear transformation is one in which a straight line
before the transformation results in a straight line after the
transformation.
Eigenvectors and Eigenvalues
• In linear algebra, an eigenvector or characteristic vector of
a linear transformation is a nonzero vector that changes at
most by a scalar factor when that linear transformation is
applied to it.

• The corresponding eigenvalue, often denoted by λ (Lambda)


is the factor by which the eigenvector is scaled.
Eigenvectors and Eigenvalues

• An eigenvector is a vector whose direction remains


unchanged when a linear transformation is applied to it.
(Eigen means Specific in German)
• The transformation in this case is a simple scaling with factor
2 in the horizontal direction and factor 0.5 in the vertical
direction, such that the transformation matrix A is defined as:
Eigenvectors and Eigenvalues
Eigenvectors and Eigenvalues
Eigenvectors and Eigenvalues
Eigenvectors and Eigenvalues
Eigenvectors and Eigenvalues
• When the matrix multiplication with vector results in another
vector in the same/opposite direction but scaled in forward /
reverse direction by a magnitude of scaler multiple
(eigenvalue) then the vector is called the eigenvector of that
matrix.
• The diagram representing the eigenvector x of matrix A
because the vector Ax is in the same/opposite direction of x.
Eigenvectors in Data Science
• The concept of Eigenvectors and Eigenvalues is used to
determine a set of important variables (in form of vector)
along with scale along different dimensions (key
dimensions based on variance) for analyzing the data in a
better manner.
When you look at the picture
(data) and identify it as tiger,
what are some of the key
information (dimensions /
principal components) you use to
call it out as tiger? Is it not body,
face, legs etc. information?
Eigenvectors in Data Science
• Predicting the stock prices
• This is a machine learning / predictive analytics problem. Here
the dependent value is stock price and there are a large number of
independent variables on which the stock price depends.

• Can we use the information stored in these variables and


extract a smaller set of variables (features) to train the
models and do the prediction while ensuring that most of the
information contained in the original variables is retained /
maintained.
Dimensionality Reduction
• Dimensionality reduction is the process of reducing the
number of random variables under consideration, by
obtaining a set of principal variables.
• The number of input variables or features for a dataset is
referred to as its dimensionality.
• More input features often make a predictive modeling task
more challenging to model.
• Sometimes, most of these features are correlated, and hence
redundant
Dimensionality Reduction Methods
• The various methods used for dimensionality reduction
include:
• Principal Component Analysis
• Singular Value Decomposition
Principal Component Analysis
• Principal Component Analysis, or PCA, is a dimensionality-
reduction method that is often used to reduce the
dimensionality of large data sets, by transforming a large set
of variables into a smaller one that still contains most of the
information of the large set.
• Reducing the number of variables of a data set naturally
comes at the expense of accuracy, but the trick in
dimensionality reduction is to trade a little accuracy for
simplicity.
• So, to sum up, the idea of PCA is simple — reduce the
number of variables of a data set, while preserving as
much information as possible.
Principal Component Analysis
In what direction we can find more information?
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
• Statistically, PCA finds lines, planes and hyper-planes in the
K-dimensional space that approximate the data as well as
possible in the least squares sense.
• A line or plane that is the least squares approximation of a set
of data points makes the variance of the coordinates on the
line or plane as large as possible.
• Eigenvectors of a matrix are directions of maximum spread
or variance of data
How PCA Works?
• Consider a matrix X with N rows (aka "observations") and K
columns (aka "variables").
• For this matrix, we construct a variable space with as many
dimensions as there are variables, i.e., K.
• Each variable represents one coordinate axis.
How PCA Works?
• In the next step, each observation (row) of the X-matrix is
placed in the K-dimensional variable space.
How PCA Works?
• Next, mean-centering involves the subtraction of the variable
averages from the data. The vector of averages corresponds
to a point in the K-space.

In the mean-centering procedure, we first compute the


variable averages. This vector of averages is interpretable as a
point (here in red) in space.
How PCA Works?
• The subtraction of the averages from the data corresponds to
a re-positioning of the coordinate system, such that the
average point now is the origin.
How PCA Works?
• The first principal component (PC1) is the line in the K-
dimensional variable space that best approximates the data in
the least squares sense.
• This line goes through the average point. Each observation
(yellow dot) may now be projected onto this line in order to
get a coordinate value along the PC-line. This new coordinate
value is also known as the score.
How PCA Works?
• One principal component is insufficient to model the
systematic variation of a data set. Thus, a second principal
component (PC2) is calculated.
• PC2 is also represented by a line in the K-dimensional
variable space, which is orthogonal to the first PC (i.e.,
PC1). This line also passes through the average point, and
improves the approximation of the X-data as much as
possible
How PCA Works?
• Two PCs form a plane. This plane is a window into the
multidimensional space, which can be visualized graphically.
Each observation may be projected onto this plane, giving a
score for each.
PCA Working Example
• Standardize the dataset.
• Calculate the covariance matrix for the features in the
dataset.
• Calculate the eigenvalues and eigenvectors for the
covariance matrix.
• Sort eigenvalues and their corresponding eigenvectors.
• Pick k eigenvalues and form a matrix of eigenvectors.
• Transform the original matrix.
Principal Component Analysis
PCA Working Example: Data
• Dataset with 4 features and 5 observations

https://medium.com/analytics-vidhya/understanding-principle-component-analysis-pca-step-by-
step-e7a4bb4031d9
PCA Working Example: Step 1
• First, we need to standardize the dataset and for that, we
need to calculate the mean and standard deviation for each
feature.
PCA Working Example: Step 2
• Calculate the covariance matrix for the whole dataset.

• Variance is a measure of the variability or spread in a set of


data.

• Covariance is a measure of the joint variability of two


random variables. It is a measure of the relationship between
two random variables. The metric evaluates how much – to
what extent – the variables change together.
PCA Working Example: Step 2
• Calculate the covariance matrix for the whole dataset.

• The diagonal entries of the covariance matrix are the


variances and the other entries are the covariances.
PCA Working Example: Step 2
• Calculate the covariance matrix for the whole dataset.
PCA Working Example: Step 3
• Calculate eigenvalues and eigen vectors
PCA Working Example: Step 3
• Calculate eigenvalues and eigen vectors
PCA Working Example: Step 3
• Calculate eigenvalues and eigen vectors
PCA Working Example: Step 4
• Sort eigenvalues and their corresponding eigenvectors.

Already Sorted
PCA Working Example: Step 5
• Pick k (here k=2) eigenvalues and form a matrix of
eigenvectors
PCA Working Example: Step 6
• Transform the original matrix.
PCA: Important Notes
• PCA tries to summarize as much information as possible in
the first PC, the rest in the second, and so on…

• PC’s do not have an interpretable meaning, being a linear


combination of features.

• Eigenvectors of the covariance matrix are actually directions


of the axes where there is most variance.
Advantages of PCA
• Eradication of correlated features.
• Speeds-up algorithm
• Reduces overfitting
• Improves visualization
Disadvantages of PCA
• Less interpretable
• Data standardization is necessary
• Loss of Information
Principal Component Analysis

Implementation
Recommendation Systems
• Recommender systems are the systems that are designed to
recommend things to the user based on different factors.
• These systems predict the most likely product that are of
interest to the user.
• The recommender system deals with a large volume of
information present by filtering the most important
information based on the data provided by a user and other
factors that take care of the user’s preference and interest.
Types of recommender Systems
Collaborative Filtering
• Collaborative filtering technique works by building a
database (user-item matrix) of preferences for items by
users.
• It then matches users with relevant interest and preferences
by calculating similarities between their profiles to make
recommendations. Such users build a group called
neighborhood.
• A user gets recommendations to those items that he has not
rated before but that were already positively rated by users in
his neighborhood.
Types of Collaborative Filtering
Matrix Factorization
• Matrix factorization is used to factorize a matrix, i.e., to
find out two (or more) matrices such that when you
multiply them, you’ll get back the original matrix.

• Matrix factorization can be used to discover features


underlying the interactions between two different kinds of
entities.

• One obvious application is to predict ratings in


collaborative filtering—in other words, to recommend
items to users.
Matrix Factorization
Singular Value Decomposition
• When we have millions of users and/or items, computing
pairwise correlations is expensive and slow.
• Is it possible to reduce the size of the ratings matrix?
• Is it really critical that we store every single item?
• For example, if a user liked a movie and its sequels (e.g.
The Terminator, The Matrix). Do we really need to have a
column in our ratings matrix for each movie?
• What if the set of movies could be represented using only k
latent features where each feature corresponds to a category
(e.g. genre) with common characteristics?
Singular Value Decomposition
• Similarly, the users could be represented using k
dimensions.
• We can think of each feature as a demographic group (e.g.
age, occupation) with similar preferences.
Singular Value Decomposition
• Suppose we had a ratings matrix with m users and n items.
We can factor the matrix into two other matrices P and Q,
and a diagonal matrix Sigma.
Singular Value Decomposition
• The key piece of information here is that each of the
absolute values in the diagonal matrix Sigma represent
how important the associated dimension is in terms of
expressing the original ratings matrix.
• If we sort those values in descending order and pick top k
(user defined) features, we can obtain the best
approximation of the ratings matrix by multiplying the
truncated matrices back together.
Singular Value Decomposition
• The Singular Value Decomposition (SVD) of a matrix is a
factorization of that matrix into three matrices.

• It has some interesting algebraic properties and conveys


important geometrical and theoretical insights about linear
transformations.
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD)

Rank: maximal number of


linearly independent columns
of A
SVD: Example

Data Information Retrieval Brain Lungs


CS1 1 1 1 0 0
CS2 2 2 2 0 0
CS3 1 1 1 0 0
CS4 5 5 5 0 0
MD1 0 0 0 2 2
MD2 0 0 0 3 3
MD3 0 0 0 1 1
SVD: Example
SVD: Example
SVD: Example
SVD: Example
SVD: Example
Summary
• Eigenvectors
• Dimensionality Reduction
• Principle Component Analysis
• Singular Value Decomposition

You might also like