0% found this document useful (0 votes)
14 views

Maths$Stats_NOTES.docx

The document covers various topics in Linear Algebra, Calculus, and Statistics, including vector properties, matrix operations, eigenvalues, derivatives, integration, and probability distributions. It explains concepts such as vector addition, gradient descent, hypothesis testing, and different statistical tests like z-test, t-test, and chi-square test. Additionally, it highlights the applications of these mathematical concepts in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Maths$Stats_NOTES.docx

The document covers various topics in Linear Algebra, Calculus, and Statistics, including vector properties, matrix operations, eigenvalues, derivatives, integration, and probability distributions. It explains concepts such as vector addition, gradient descent, hypothesis testing, and different statistical tests like z-test, t-test, and chi-square test. Additionally, it highlights the applications of these mathematical concepts in machine learning and data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Linear Algebra

Video:1

Magnitude = Scaler i.e value


Direction = Velocity
Example = 40m/s south
Magnitude is 40 & direction is south.

More examples: 4i+9j = 4m/s in x-direction & 9m/s in y-direction


Video:2
Vector Addition:

Vector Subtraction:

Note-

Vector— A

Vector— –A

When we want negative of a vector we just need to reverse the vector.


Video:3
Properties:
1. Commutative: u+v = v+u (where u and v are vectors)
2. Associative: u+(v+w) = (u+v)+w (where u,v,w are vectors)
3. Identity: u+0 = u
4. Additive Inverse: u+(-u) = 0

Note:

2-D vector is like 2i+3j i.e 2 units in x-direction and 3 units in y-direction

3-D vector is like 2i+3j+5k i.e 2 units in x-direction,3 units in y-direction and 5 units in z
direction respectively.

More example:
Suppose a vector has magnitude 5 then it can be 3i+4j or it can also be 4i+3j.
Video:4
Vector Addition:

General Method:

Note: When angle between 2 vectors is 90 degrees formula is square root of a square + b square.
Video:5
Dot Product:

Note:
To find the projection of any vector on another use this:
Video 6&7: Python functions of dot product.

Video:8
L1 Regularization -- Lasso Regression
L2 Regularization – Ridge Regression
Note: Refer to the python functions of L1 and L2 Norms.

Video:9
When 2 vectors are orthogonal it means that the angle between the
vector’s is 90 degree’s.
(Apply this concept for changing the co-ordinate system)

Video:10
Linear Independent Vectors

Video:11
Basis Vector
If you can write every vector in a given space as a Linear combination of
some vectors and these vectors are independent of each other then we can
call them as BASIS vector for the space.
(Note: Basis vectors are not unique)
Video:12
Matrices Introduction
Note: Similarly you can find the other outputs.

Video:13
Type of Matrices:

Note: Matrices Multiplication


Properties of Matrices Multiplication:

Note: Trick to find size of Matrix


Video:14
Rank of Matrix:
The maximum number of linearly independent columns (or rows) of a matrix is called the rank of
a matrix. The rank of a matrix cannot exceed the number of its rows or columns.
Note: Please refer to some more examples to find rank of a matrix.

Video: 15
Null space for matrix:
The null space of any matrix A consists of all the vectors B such that AB = 0 and B is not zero. It
can also be thought as the solution obtained from AB = 0 where A is known matrix of size m x n
and B is matrix to be found of size n x k.
Video: 16 &17
Linear Equation
Note: Refer to some more examples

Video: 20
Vector Line Equation and Projection

General Equations

Video: 21
Hyper-Planes

What is hyperplane used for?

Hyperplanes are often used in classification algorithms such as support vector


machines (SVMs) and linear regression to separate data points belonging to different
classes. They are also used in clustering algorithms to identify clusters of data points in
the input space.
Video: 22
Eigen Values and Eigen Vectors

General Form:

Eigenvalues are the special set of scalar values that is associated with the set of linear
equations most probably in the matrix equations. The eigenvectors are also termed as
characteristic roots. It is a non-zero vector that can be changed at most by its scalar factor after
the application of linear transformations.

Eigen Vectors:
The eigenvector is a vector that is associated with a set of linear equations. The eigenvector of
a matrix is also known as a latent vector, proper vector, or characteristic vector. These are
defined in the reference of a square matrix.

Applications in Machine Learning:


They are used to reduce dimension space. The technique of Eigenvectors and Eigenvalues are
used to compress the data. As mentioned above, many algorithms such as PCA rely on
eigenvalues and eigenvectors to reduce the dimensions.
Video: 23
Properties of Eigen values and Eigen vectors
CALCULUS

Video: 1 to 3:
Functions
General Equation: y = F(x) ; Where F(x) = Independent
Variable
y = Dependent
Variable

General form to find the derivative:

y’ = 2x
Derivative represents the slope of the particular equation
y’’ = 2 (double derivative)
Advanced Derivative Concept:
Note: In General we are finding these derivatives to get the minimum and
maximum values according to machine learning (as most of the ML Algorithms
use the concept of gradient descent which is to find the point where sum of errors
is minimum.
Video: 4
Continuous function
Continuous functions are functions that have no restrictions throughout their domain or a given
interval. Their graphs won't contain any asymptotes or signs of discontinuities as well.
Example:
Video: 5
Integration:
Integration is the inverse of differentiation. In other words, if you reverse the process of
differentiation, you are just doing integration. The following example shows it: y = x2 => dy/dx =
2x. So, ∫ (dy/dx) dx = ∫ 2x dx = x2. ∫ and dx go hand in hand and indicate the integration of the
function with respective to x.
Example:
Video: 6
Maxima and Minima:
Video: 7
Gradient Descent:
Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function. Gradient descent in machine learning is simply used to find the values of a function's
parameters (coefficients) that minimize a cost function as far as possible.
(Used mostly in Linear regression and Logistic regression and Deep Learning to minimize the
loss function and cost functions)

WHY GRADIENT DESCENT?


Gradient Descent is an algorithm that solves optimization problems using first-order iterations.
Since it is designed to find the local minimum of a differential function, gradient descent is widely
used in machine learning models to find the best parameters that minimize the model's cost
function.

Application:
In Gradient Descent to need to move step by step(small steps) in order to reach the
minima point.
Example of a simple gradient descent:

Statistics and probability


Mean, Median, Mode:

Population and sample mean:


A sample is a subset from the population.
Population Variance & Sample variance:
Measure of dispersion
Standard Deviation:

Example:
Probability:
Probability is the likelihood or chance of an event occurring. For example, the probability of
flipping a coin and it being heads is ½, because there is 1 way of getting a head and the total
number of possible outcomes is 2 (a head or tail). We write P(heads) = ½ .

Type of Events:
Example:

Conditional Probability:
Conditional probability is defined as the likelihood of an event or
outcome occurring, based on the occurrence of a previous event or
outcome. Conditional probability is calculated by multiplying the
probability of the preceding event by the updated probability of the
succeeding, or conditional, event.

Cheat sheet:

Note:
Example:
Marginal Probability:
Marginal probability: the probability of an event occurring (p(A)), it may
be thought of as an unconditional probability. It is not conditioned on
another event. Example: the probability that a card drawn is red
(p(red) = 0.5). Another example: the probability that a card drawn is a
4 (p(four)=1/13).
Normal Distribution:
Normal distribution, also known as the Gaussian distribution, is a
probability distribution that is symmetric about the mean, showing that
data near the mean are more frequent in occurrence than data far
from the mean. In graphical form, the normal distribution appears as a
"bell curve".

Mean, Median and Mode lie on the central black line.


Example:

Important Note:
Binomial Distribution:
Binomial distribution is calculated by multiplying the probability of
success raised to the power of the number of successes and the
probability of failure raised to the power of the difference between the
number of successes and the number of trials.

Poisson Distribution:
A Poisson distribution is a discrete probability distribution. It gives the
probability of an event happening a certain number of times (k) within
a given interval of time or space. The Poisson distribution has only
one parameter, λ (lambda), which is the mean number of events.

Central Limit Theoram:


The central limit theorem says that the sampling distribution of the
mean will always be normally distributed, as long as the sample size is
large enough. Regardless of whether the population has a normal,
Poisson, binomial, or any other distribution, the sampling distribution
of the mean will be normal.
The three rules of the central limit theorem are as follows:
● The data should be sampled randomly.
● The samples should be independent of each other.
● The sample size should be sufficiently large but not exceed 10% of the population.
Hypothesis:
Hypothesis testing is a statistical interpretation that examines a sample to determine
whether the results stand true for the population. The test allows two explanations for
the data—the null hypothesis or the alternative hypothesis. If the sample mean matches
the population mean, the null hypothesis is proven true.
Types:
Application of hypothesis testing:

Degree of freedom:
Degrees of freedom and hypothesis testing. The degrees of freedom
of a test statistic determines the critical value of the hypothesis test.
The critical value is calculated from the null distribution and is a cut-off
value to decide whether to reject the null hypothesis.
Application:
Z-Test:
A z-test is a statistical test to determine whether two population means
are different when the variances are known and the sample size is
large. A z-test is a hypothesis test in which the z-statistic follows a
normal distribution. A z-statistic, or z-score, is a number representing
the result from the z-test.

Application:

T-Test:
A t test is a statistical test that is used to compare the means of two
groups. It is often used in hypothesis testing to determine whether a
process or treatment actually has an effect on the population of
interest, or whether two groups are different from one another.
Chi-Square Test:
A chi-square test is a statistical test used to compare observed results with expected
results. The purpose of this test is to determine if a difference between observed data
and expected data is due to chance, or if it is due to a relationship between the
variables you are studying.
Application:

Chi-Square practical:
Working of chi-square test:

You might also like