0% found this document useful (0 votes)
3 views

Linear Algebra

The document provides an overview of linear algebra concepts essential for machine learning, particularly deep learning. It explains the definitions and properties of scalars, vectors, matrices, and tensors, including their notation and operations such as addition and transposition. Additionally, it introduces the concept of broadcasting in the context of adding matrices and vectors.

Uploaded by

mentorsahila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Linear Algebra

The document provides an overview of linear algebra concepts essential for machine learning, particularly deep learning. It explains the definitions and properties of scalars, vectors, matrices, and tensors, including their notation and operations such as addition and transposition. Additionally, it introduces the concept of broadcasting in the context of adding matrices and vectors.

Uploaded by

mentorsahila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

Linear Algebra:
A good understanding of linear algebra is essential for understanding and working with many machine
learning algorithms, especially deep learning algorithms.
1.1 Scalars, Vectors, Matrices and Tensors
The study of linear algebra involves several types of mathematical objects:
● Scalars: A scalar is just a single number, in contrast to most of the other objects studied in linear algebra,
which are usually arrays of multiple numbers. We write scalars in italics. We usually give scalars lower-

might say ―Let s ∈ R be the slope of the line,‖ while defining a real-valued scalar, or ―Let n ∈ N be the
case variable names. When we introduce them, we specify what kind of number they are. For example, we

number of units,‖ while defining a natural number scalar.

● Vectors: A vector is an array of numbers. The numbers are arranged in order. We can identify each individual
number by its index in that ordering. Typically we give vectors lower case names written in bold typeface,
such as x. The elements of the vector are identified by writing its name in italic typeface, with a subscript.
The first element of x is x1, the second element is x2 and so on. We also need to say what kinds of numbers
are stored in the vector. If each element is in R, and the vector has n elements, then the vector lies in the set
formed by taking the Cartesian product of R n times, denoted as Rn. When we need to explicitly identify the
elements of a vector, we write them as a column enclosed in square brackets:
We can think of vectors as identifying points in space, with each element giving the coordinate along a
different axis. Sometimes we need to index a set of elements of a vector. In this case, we define a set
containing the indices and write the set as a subscript. For example, to access x1, x3 and x6 , we define the
set S = {1, 3, 6} and write x S . We use the − sign to index the complement of a set. For example x −1 is the
vector containing all elements of x except for x 1, and x−S is the vector containing all of the elements of x
except for x1, x3 and x6.

● Matrices: A matrix is a 2-D array of numbers, so each element is identified by two indices instead of just one.

A has a height of m and a width of n, then we say that A ∈ Rm×n. We usually identify the elements of a
We usually give matrices upper-case variable names with bold typeface, such as A. If a real-valued matrix

matrix using its name in italic but not bold font, and the indices are listed with separating commas. For
example, A1,1 is the upper left entry of A and Am,n is the bottom right entry. We can identify all of the
numbers with vertical coordinate i by writing a ―:‖ for the horizontal coordinate. For example, A i,: denotes
the horizontal cross section of A with vertical coordinate i. This is known as the i-th row of A. Likewise, A:,i
is the i-th column of A. When we need to explicitly identify the elements of a matrix, we write them as an
array enclosed in square brackets:

Sometimes we may need to index matrix-valued expressions that are not just a single letter. In this case, we
use subscripts after the expression, but do not convert anything to lower case. For example, f (A) i,j gives
element (i, j) of the matrix computed by applying the function f to A.

● Tensors: In some cases we will need an array with more than two axes. In the general case, an array of
numbers arranged on a regular grid with a variable number of axes is known as a tensor. We denote a tensor
named ―A‖ with this typeface: A. We identify the element of A at coordinates (i, j, k) by writing Ai,j,k. One
important operation on matrices is the transpose. The transpose of a matrix is the mirror image of the matrix
across a diagonal line, called the main diagonal, running down and to the right, starting from its upper left
corner. See Fig. 2.1 for a graphical depiction of this operation. We denote the transpose of a matrix A as AT,
and it is defined such that

Vectors can be thought of as matrices that contain only one column. The transpose of a vector is therefore a
matrix with only one row. Sometimes we define a vector by writing out its elements in the text inline as a
row matrix, then using the transpose operator to turn it into a standard column vector,
e.g., x = [x1, x2, x3
]T.

A scalar can be thought of as a matrix with only a single entry. From this, we can
see that a scalar is its own transpose: a = a T. We can add matrices to each other, as
long as they have the same shape, just by adding their corresponding elements: C
= A +B where Ci,j = Ai,j + Bi,j.We can also add a scalar to a matrix or multiply a
matrix by a scalar, just by performing that operation on each element of a matrix:
D = a · B + c where Di,j = a · Bi,j + c.
In the context of deep learning, we also use some less conventional notation. We allow the addition of
matrix and a vector, yielding another matrix: C = A + b, where Ci,j = Ai,j + bj. In other words, the
vector b is added to each row of the matrix. This shorthand eliminates the need to define a matrix with
b copied into each row before doing the addition. This implicit copying of b to many locations is
called broadcasting

You might also like