0% found this document useful (0 votes)
36 views

Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022

This document discusses various applications of linear algebra in information technology, including loss functions, regularization, covariance matrices, support vector machine classification, principal component analysis, singular value decomposition, word embeddings, latent semantic analysis, image representation as tensors, and convolution and image processing.

Uploaded by

RaNa AhMaD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022

This document discusses various applications of linear algebra in information technology, including loss functions, regularization, covariance matrices, support vector machine classification, principal component analysis, singular value decomposition, word embeddings, latent semantic analysis, image representation as tensors, and convolution and image processing.

Uploaded by

RaNa AhMaD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

LINEAR ALGEBRA

Linear algebra & its application in IT

SUBMITTED BY AHMAD SAEED

SUBMITTED TO SIR MUZZAM ALI

BITM-F18-022

BB
Linear Algebra
A branch of mathematics that is concerned with mathematical structures closed
under the operations of addition and scalar multiplication and that includes the theory of systems
of linear equations, matrices, determinants, vector spaces, and linear transformations.

Application of linear algebra


There are Some application of linear algebra.

• Loss functions
• Regularization
• Covariance Matrix
• Support Vector Machine Classification
• Principal Component Analysis (PCA)

• Singular Value Decomposition (SVD)

• Word Embeddings

• Latent Semantic Analysis

• Image Representation as Tensors

• Convolution and Image Processing

❖ Loss functions
A loss function is an application of the Vector Norm in Linear
Algebra. The norm of a vector can simply be its magnitude. There are many types of
vector norms. I will quickly explain two of them:

L1 Norm: Also known as the Manhattan Distance or Taxicab Norm. The L1 Norm
is the distance you would travel if you went from the origin to the vector if the only
permitted directions are parallel to the axes of the space.

In this 2D space, you could reach the vector (3, 4) by traveling 3 units along the x-axis and then
4 units parallel to the y-axis (as shown). Or you could travel 4 units along the y-axis first and
then 3 units parallel to the x-axis. In either case, you will travel a total of 7 units.

• L2 Norm: Also known as the Euclidean Distance. L2 Norm is the shortest distance of
the vector from the origin as shown by the red path in the figure below:
This distance is calculated using the Pythagoras Theorem (I can see the old math
concepts flickering on in your mind!). It is the square root of (3^2 + 4^2), which is equal
to 5.

❖ Regularization
Regularization is a very important concept in data science. It’s a
technique we use to prevent models from overfitting. Regularization is actually another
application of the norm.

A model is said to overfit when it fits the training data too well. Such a model does not
perform well with new data because it has learned even the noise in the training data. It
will not be able to generalize on data that it has not seen before. The below illustration
sums up this idea really well:

Regularization penalizes overly complex models by adding the norm of the weight
vector to the cost function. Since we want to minimize the cost function, we will need
to minimize this norm. This causes unrequired components of the weight vector to
reduce to zero and prevents the prediction function from being overly complex.

❖ Covariance Matrix
Bivariate analysis is an important step in data exploration. We want
to study the relationship between pairs of variables. Covariance or Correlation are measures
used to study relationships between two continuous variables.

Covariance indicates the direction of the linear relationship between the variables. A positive
covariance indicates that an increase or decrease in one variable is accompanied by the same
in another. A negative covariance indicates that an increase or decrease in one is accompanied
by the opposite in the other.

❖ Support Vector Machine Classification


Support vector machines. One of the most common
classification algorithms that regularly produces impressive results. It is an application of the
concept of Vector Spaces in Linear Algebra.

Support Vector Machine, or SVM, is a discriminative classifier that works by finding a decision
surface. It is a supervised machine learning algorithm.

In this algorithm, we plot each data item as a point in an n-dimensional space (where n is the
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyperplane that differentiates the two
classes very well i.e. with the maximum margin, which is C is this case.
❖ Dimensionality Reduction
You will often work with datasets that have hundreds and even
thousands of variables. That’s just how the industry functions. Is it practical to look at each
variable and decide which one is more important?

That doesn’t really make sense. We need to bring down the number of variables to perform any
sort of coherent analysis. This is what dimensionality reduction is. Now, let’s look at two
commonly used dimensionality reduction methods here.

❖ Principal Component Analysis (PCA)


Principal Component Analysis, or PCA, is an unsupervised
dimensionality reduction technique. PCA finds the directions of maximum variance and projects
the data along them to reduce the dimensions.

Without going into the math, these directions are the eigenvectors of the covariance matrix of
the data.

Eigenvectors for a square matrix are special non-zero vectors whose direction does not change
even after applying linear transformation (which means multiplying) with the matrix. They are
shown as the red-colored vectors in the figure below:

❖ Singular Value Decomposition


In my opinion, Singular Value Decomposition (SVD) is
underrated and not discussed enough. It is an amazing technique of matrix decomposition with
diverse applications. I will try and cover a few of them in a future article.

For now, let us talk about SVD in Dimensionality Reduction. Specifically, this is known
as Truncated SVD.

• We start with the large m x n numerical data matrix A, where m is the number of rows
and n is the number of features
• Decompose it into 3 matrices as shown here:

❖ Word Embeddings
Machine learning algorithms cannot work with raw textual data.
We need to convert the text into some numerical and statistical features to create model inputs.
There are many ways for engineering features from text data, such as:

1. Meta attributes of a text, like word count, special character count, etc.
2. NLP attributes of text using Parts-of-Speech tags and Grammar Relations like the
number of proper nouns
3. Word Vector Notations or Word Embeddings
Word Embeddings is a way of representing words as low dimensional vectors of numbers while
preserving their context in the document. These representations are obtained by training
different neural networks on a large amount of text which is called a corpus. They also help in
analyzing syntactic similarity among words:

❖ Latent Semantic Analysis (LSA)


What is your first thought when you hear this group of words –
“prince, royal, king, noble”? These very different words are almost synonymous.

Now, consider the following sentences:

• The pitcher of the Home team seemed out of form


• There is a pitcher of juice on the table for you to enjoy

The word ‘pitcher’ has different meanings based on the other words in the two sentences. It
means a baseball player in the first sentence and a jug of juice in the second.

Both these sets of words are easy for us humans to interpret with years of experience with the
language. But what about machines? Here, the NLP concept of Topic Modeling comes into play:

❖ Image Representation as Tensors


A computer does not process images as humans do.
Like I mentioned earlier, machine learning algorithms need numerical features to work with.

A digital image is made up of small indivisible units called pixels. Consider the figure below:

This grayscale image of the digit zero is made of 8 x 8 = 64 pixels. Each pixel has a value in the
range 0 to 255. A value of 0 represents a black pixel and 255 represents a white pixel.

❖ Convolution and Image Processing


2D Convolution is a very important operation in image
processing. It consists of the below steps:

1. Start with a small matrix of weights, called a kernel or a filter


2. Slide this kernel on the 2D input data, performing element-wise multiplication
3. Add the obtained values and put the sum in a single output pixel.

The function can seem a bit complex but it’s widely used for performing various image
processing operations like sharpening and blurring the images and edge detection. We just
need to know the right kernel for the task we are trying to accomplish.

You might also like