Maths$Stats_NOTES.docx
Maths$Stats_NOTES.docx
Video:1
Vector Subtraction:
Note-
Vector— A
Vector— –A
Note:
2-D vector is like 2i+3j i.e 2 units in x-direction and 3 units in y-direction
3-D vector is like 2i+3j+5k i.e 2 units in x-direction,3 units in y-direction and 5 units in z
direction respectively.
More example:
Suppose a vector has magnitude 5 then it can be 3i+4j or it can also be 4i+3j.
Video:4
Vector Addition:
General Method:
Note: When angle between 2 vectors is 90 degrees formula is square root of a square + b square.
Video:5
Dot Product:
Note:
To find the projection of any vector on another use this:
Video 6&7: Python functions of dot product.
Video:8
L1 Regularization -- Lasso Regression
L2 Regularization – Ridge Regression
Note: Refer to the python functions of L1 and L2 Norms.
Video:9
When 2 vectors are orthogonal it means that the angle between the
vector’s is 90 degree’s.
(Apply this concept for changing the co-ordinate system)
Video:10
Linear Independent Vectors
Video:11
Basis Vector
If you can write every vector in a given space as a Linear combination of
some vectors and these vectors are independent of each other then we can
call them as BASIS vector for the space.
(Note: Basis vectors are not unique)
Video:12
Matrices Introduction
Note: Similarly you can find the other outputs.
Video:13
Type of Matrices:
Video: 15
Null space for matrix:
The null space of any matrix A consists of all the vectors B such that AB = 0 and B is not zero. It
can also be thought as the solution obtained from AB = 0 where A is known matrix of size m x n
and B is matrix to be found of size n x k.
Video: 16 &17
Linear Equation
Note: Refer to some more examples
Video: 20
Vector Line Equation and Projection
General Equations
Video: 21
Hyper-Planes
General Form:
Eigenvalues are the special set of scalar values that is associated with the set of linear
equations most probably in the matrix equations. The eigenvectors are also termed as
characteristic roots. It is a non-zero vector that can be changed at most by its scalar factor after
the application of linear transformations.
Eigen Vectors:
The eigenvector is a vector that is associated with a set of linear equations. The eigenvector of
a matrix is also known as a latent vector, proper vector, or characteristic vector. These are
defined in the reference of a square matrix.
Video: 1 to 3:
Functions
General Equation: y = F(x) ; Where F(x) = Independent
Variable
y = Dependent
Variable
y’ = 2x
Derivative represents the slope of the particular equation
y’’ = 2 (double derivative)
Advanced Derivative Concept:
Note: In General we are finding these derivatives to get the minimum and
maximum values according to machine learning (as most of the ML Algorithms
use the concept of gradient descent which is to find the point where sum of errors
is minimum.
Video: 4
Continuous function
Continuous functions are functions that have no restrictions throughout their domain or a given
interval. Their graphs won't contain any asymptotes or signs of discontinuities as well.
Example:
Video: 5
Integration:
Integration is the inverse of differentiation. In other words, if you reverse the process of
differentiation, you are just doing integration. The following example shows it: y = x2 => dy/dx =
2x. So, ∫ (dy/dx) dx = ∫ 2x dx = x2. ∫ and dx go hand in hand and indicate the integration of the
function with respective to x.
Example:
Video: 6
Maxima and Minima:
Video: 7
Gradient Descent:
Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function. Gradient descent in machine learning is simply used to find the values of a function's
parameters (coefficients) that minimize a cost function as far as possible.
(Used mostly in Linear regression and Logistic regression and Deep Learning to minimize the
loss function and cost functions)
Application:
In Gradient Descent to need to move step by step(small steps) in order to reach the
minima point.
Example of a simple gradient descent:
Example:
Probability:
Probability is the likelihood or chance of an event occurring. For example, the probability of
flipping a coin and it being heads is ½, because there is 1 way of getting a head and the total
number of possible outcomes is 2 (a head or tail). We write P(heads) = ½ .
Type of Events:
Example:
Conditional Probability:
Conditional probability is defined as the likelihood of an event or
outcome occurring, based on the occurrence of a previous event or
outcome. Conditional probability is calculated by multiplying the
probability of the preceding event by the updated probability of the
succeeding, or conditional, event.
Cheat sheet:
Note:
Example:
Marginal Probability:
Marginal probability: the probability of an event occurring (p(A)), it may
be thought of as an unconditional probability. It is not conditioned on
another event. Example: the probability that a card drawn is red
(p(red) = 0.5). Another example: the probability that a card drawn is a
4 (p(four)=1/13).
Normal Distribution:
Normal distribution, also known as the Gaussian distribution, is a
probability distribution that is symmetric about the mean, showing that
data near the mean are more frequent in occurrence than data far
from the mean. In graphical form, the normal distribution appears as a
"bell curve".
Important Note:
Binomial Distribution:
Binomial distribution is calculated by multiplying the probability of
success raised to the power of the number of successes and the
probability of failure raised to the power of the difference between the
number of successes and the number of trials.
Poisson Distribution:
A Poisson distribution is a discrete probability distribution. It gives the
probability of an event happening a certain number of times (k) within
a given interval of time or space. The Poisson distribution has only
one parameter, λ (lambda), which is the mean number of events.
Degree of freedom:
Degrees of freedom and hypothesis testing. The degrees of freedom
of a test statistic determines the critical value of the hypothesis test.
The critical value is calculated from the null distribution and is a cut-off
value to decide whether to reject the null hypothesis.
Application:
Z-Test:
A z-test is a statistical test to determine whether two population means
are different when the variances are known and the sample size is
large. A z-test is a hypothesis test in which the z-statistic follows a
normal distribution. A z-statistic, or z-score, is a number representing
the result from the z-test.
Application:
T-Test:
A t test is a statistical test that is used to compare the means of two
groups. It is often used in hypothesis testing to determine whether a
process or treatment actually has an effect on the population of
interest, or whether two groups are different from one another.
Chi-Square Test:
A chi-square test is a statistical test used to compare observed results with expected
results. The purpose of this test is to determine if a difference between observed data
and expected data is due to chance, or if it is due to a relationship between the
variables you are studying.
Application:
Chi-Square practical:
Working of chi-square test: