0% found this document useful (0 votes)

3 views

mean-variance

The document discusses the concepts of mean, variance, covariance, and correlation in the context of random variables. It explains how to calculate the mean and variance, the significance of linearity of expectation, and the effects of normalization on random variables. Additionally, it covers the interpretation of covariance matrices and their role in understanding the distribution of data, along with exercises to reinforce the concepts.

Uploaded by

Avnish Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

mean-variance

Uploaded by

Avnish Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Mean

The mean of a random variable (also called its expected value or first moment) is
defined as

E(X) = ∑ x P (X = x)
x

For example, if X is the number of spots that show up on the roll of a fair die, then

1 1 1 1 1 1 1
E(X) = 1+ 2+ 3+ 4+ 5+ 6=3
6 6 6 6 6 6 2

The mean tells us what the value of X will be on average: sometimes the value will be
larger, sometimes it will be smaller, but the total positive and negative fluctuations will
tend to cancel out over time.

ˉ . (We've also used a bar to denote the

An alternate notation for the mean is E(X) = X
complement of a Boolean random variable, but the meaning will typically be clear from
context.)

The mean of X has the same units as X itself: e.g., if X is the height of a person in cm,
then E(X) is measured in cm as well.

Scatter plot

In low dimensions we can visualize our data with a scatter plot: we take a sample from
P (X) and plot each of the samples X1 , X2 , … as a single point.
The mean is the center of mass of the scatter plot. While there might not be any
samples exactly on the mean, the displacements of the individual samples away from
the mean tend to add up to zero. (The average displacement would be exactly zero in an
infinitely large sample.)

Linearity of expectation
One of the most useful properties for working with means is linearity of expectation: for
a real-valued random variable X and constants a, b,

E(aX + b) = a E(X) + b

That is, expectation is a linear function from random variables to real numbers, so that we can
pass linear functions into or out of expectations.

For example, since the expected number of spots on a die roll is 3.5, the expected value
of three times the number of spots is 10.5.

We can do this with more complicated expressions too. For example, the solution to the
normal equations XX T w = Xy T depends linearly on y when X is fixed. So if y is a
random variable, then the solution w is also a random variable, and we can write
XX T E(w) = X E(y T ).

Variance
The mean tells us what value our random variable will take on average, but it ignores
how much the variable fluctuates around this average. There are a number of ways to
quantify this fluctuation — that is, to answer how far a random variable will typically be
from its mean. The most widely used ways are a pair of related measurements called the
variance and the standard deviation.

The variance is defined as

V (X) = E((X − E(X))2 )

In words, the variance is the expected squared difference between a random variable
and its mean.

For example, suppose X is a fair coin flip: it takes values 0 and 1 with equal probability.
The mean is E(X) = 12 . The difference between X and E(X) is either + 12 or − 12 , with
equal probability; squaring and averaging tells us that V (X) = 14 .

The units of variance are the square of the units of the original random variable. For
example, the variance of a height might be measured in cm2 . This is not very intuitive.
So, it's common to report the standard deviation instead: this is the square root of the
variance, and it is often written σ(X). The standard deviation has the same units as X
and E(X).

The standard deviation is a pretty good match for our intuitive idea of how far a random
variable tends to be from its mean. Its main flaw is that it is sensitive to outliers: that is, if
there is a small probability of encountering a measurement that is very far from the
mean, that will have an outsized effect on the standard deviation.

For example, let's make a low-probability change to our fair coin flip from above.
Suppose that X takes values 0 or 1 with probability 49.95% each; but with the remaining
probability of 0.1%, it takes value 100. The low-probability event causes the mean of X
to move somewhat away from 0.5, to around 0.6. But the variance is now around 10
instead of 14 — a much larger change than the mean.
This sort of sensitivity means that it can be difficult to estimate the variance accurately.
In the example above, if we take fewer than 1000 samples of X , we have a good chance
of not even seeing the large value, and thinking that X is an ordinary fair coin.

Exercise: if the variance of X is 1, what is the variance of aX + b?

Exercise: if the standard deviation of X is 1, what is the standard deviation of aX + b?

Normalizing a random variable

When working with a random variable, it's common to (approximately) subtract out its
mean, so that its mean is (approximately) zero. This is called centralizing the random
variable.

After centralizing, it's also common to (approximately) divide by the standard deviation,
so that the standard deviation is (approximately) 1. This is called standardizing or z -
scoring the random variable.

Both of these transformations are sometimes called normalizing the random variable;
this is a less-specific name that encompasses any processing that's intended to remove
some kind of idiosyncracy in a random variable.

Normalizing affects different downstream machine learning procedures in different

ways. So, it is sometimes but not always a good idea to normalize; and different learning
procedures need different kinds of normalization.

Moments

We can measure lots of additional properties of a random variable by looking at E(f (X))
for different functions f . Such expectations are called moments of X .

For example, if

1 if X ∈ [3, 4]
Q(X) = {
0 o/w

then E(Q(X)) is the probability that X ∈ [3, 4]. Or, if

Q(X) = e2πitX
for some value of t, then E(Q(X)) tells us whether X has a periodicity of length 2π/t:
this moment will far from zero if there is some value x such that X tends to take values
close to x, x ± 2π/t, x ± 4π/t, …, and close to zero if not (i.e., if X is approximately
aperiodic).

The polynomials are a common and useful source of functions to use in defining
moments. The expectations of the monomials X , X 2 , X 3 , ... are common enough to
have special names: they are called the first, second, third, ... moments.

We can recognize from the definitions that the mean is the first moment. It's common to
remove the mean (centralize the random variable) before taking second, third, and
higher moments, so that we get E((X − E(X))2 ), E((X − E(X))3 ), and so forth. These
are called central moments, and we can recognize from the definitions that the variance
is the second central moment.

The third central moment is called skew, and it measures the symmetry of a distribution:
if X has positive skew it means that large positive values of X are more likely than large
negative ones, while negative skew means the reverse. The fourth central moment is
called kurtosis, and it measures the balance between small and large values of X : if we
keep the variance fixed, a high kurtosis means that we put higher probability on very
large values of X while compensating by making small values smaller.

High order polynomial moments are even more sensitive to outliers than the variance is.
That makes it even harder to estimate them from data. But, if we know lots of moments,
that tells us a lot about a random variable.

Exercise: what is the variance of a biased coin flip, in terms of the probability p of
seeing 1? What about skew or kurtosis?

Covariance and correlation

Suppose that we have two random variables X and Y . The covariance between them is
defined as

Cov(X, Y ) = E((X − E(X))(Y − E(Y )))

If the covariance is positive, it means that positive values of X tend to co-occur with
positive values of Y , and negative values of X tend to co-occur with negative values of
Y . If the covariance is negative, it means the reverse: positive values of X are seen with
negative values of Y on average, and vice versa.

If we standardize X and Y , then their covariance is also called the correlation of X and
Y:

X − E(X) Y − E(Y )
Corr(X, Y ) = E ( )
σ(X) σ(Y )

Correlation is always in the range [−1, 1]: a correlation of +1 means that Y is a linear
function of X with positive slope, while a correlation of −1 means that Y is a linear
function of X with negative slope.

Correlation and covariance are not the same; for example, covariance can be bigger
than +1 or smaller than −1. We haven't defined independence yet, but we will see later
that it is separate from both covariance and correlation. If two variables are
independent, they will have zero covariance and zero correlation, but the reverse is not
true.

Exercise: given an example of two random variables that are perfectly dependent (i.e.,
one is a function of the other) but have zero correlation.

Conditional mean and variance

Suppose we have a random variable X with some mean E(X) and variance V (X). If we
condition on an event A, the distribution of X can change; this new distribution will have
its own mean and variance, which we write E(X ∣ A) and V (X ∣ A). These are called the
conditional mean and variance of X given A.

If we have a family of events Y = 1, Y = 2, …, then E(X ∣ Y ) represents a table: it shows

E(X ∣ Y = y) for all possible values of y .

Vector-valued random variables

If X is a random variable that takes values in some vector space V , then we define E(X)
exactly as before:

E(X) = ∑ x P (X = x)
x

If our vector space is Rn , this is equivalent to taking the mean componentwise. (This
follows from linearity of expectation.)

For X ∈ Rn , we generalize the definition of the variance of X : we define the covariance

matrix of X to be

V (X) = E[(X − E(X))(X − E(X))T ]

or, expanding,

V (X) = ∑(x − E(X))(x − E(X))T P (X = x)

By comparing this formula to the scalar case, we can see that the diagonal elements of
V (X) are the variances of the individual components of X , and the off-diagonal
elements are the covariances of pairs of components of X . We'll also sometimes refer to
V (X) as just the variance of X , with the understanding that V (X) is a scalar if X is a
scalar, and a matrix if X is a vector.

Note that the covariance matrix is always symmetric — we can see this from the above
expression, or by noting that the covariance of Xi and Xj doesn't depend on which way
we order them.

Also note that the covariance matrix is always positive semidefinite: from the definition
of variance and linearity of expectation, for any x we have

xT V (X)x = E[xT (X − E(X))(X − E(X))T x] = E[(xT (X − E(X))2 ] ≥ 0

If X takes values in some inner product space V instead of Rn , we define the variance to be a
linear operator on V : let Y = X − E(X) and set V (X) z = E(Y ⟨Y , z⟩) for each z ∈ V . This
reduces to the above definition if V = Rn .

The simplest possible covariance matrix is the identity, V (X) = I . This means that each
individual coordinate has variance 1, and any two coordinates are uncorrelated. We can
make a sample with variance I by sampling each coordinate independently and
standardizing. NumPy provides a convenient method for doing so:
numpy.random.randn(m, n) makes an m × n matrix of independent random variables
with mean zero and variance 1. If we take each column of this matrix as one of our
samples, we will have the variance matrix equal to I as desired.

Interpreting covariance matrices

The covariance matrix Σ = V (X) tells us something about the shape of the distribution
of X . In R2 or R3 we can visualize this shape with a scatter plot.

By looking at the plot, we can see some information about how the samples tend to vary.
This information shows up as well in the covariance matrix. In the sample shown above,
the covariance matrix is the identity, which results in a perfectly symmetric blob of
points around the origin.

In this section we're showing scatter plots for samples that have zero mean. If the mean were
nonzero they would look the same except for being shifted.

The diagonal entries of Σ are the variances of individual coordinates of X . A large

variance Σii means that the distribution of Xi is spread out: the scatter plot will stretch
out along dimension i. The spread is proportional to the standard deviation, the square
root of Σii . A small variance means that the corresponding coordinate is tightly packed
together. In the limit, a variance of 0 means that the value is always the same, making
the scatter plot into a line in 2D or a pancake in 3D.

If V (X) is a diagonal matrix, then this sort of stretching or squashing of the individual
axes means that our data distribution will look basically like an ellipsoid, with the axes of
the ellipsoid pointing along the coordinate axes. The diameter of the ellipsoid in each
direction will be proportional to the standard deviation in that direction (the square root
of the variance). Here is a plot where the horizontal axis has variance 5 and the vertical
axis has variance 1.

The off-diagonal entries of Σ are the covariances of pairs of coordinates of X . A positive

covariance means that the two coordinates tend to have the same sign; so, the scatter
plot stretches out to the top right and bottom left. The larger the covariance, the more
the scatter plot stretches. A negative covariance, on the other hand, means that the two
coordinates tend to have opposite signs. In the following plot, each coordinate of X has
variance 2, and the covariance is +1:
If we fix the variances of two coordinates while increasing their covariance, their
relationship will become closer and closer to deterministic. In 2D, the scatter plot will
approach a diagonal line, so that we can perfectly predict one coordinate from the other.
The slope of the line will depend on the standard deviations of the individual
coordinates. The largest possible covariance will be the geometric mean of the
individual variances, Σij = ± Σii Σjj . At this value, the scatter plot will be a perfect line,
and the correlation between the two coordinates will be ±1. (The sign depends on
whether the slope of the line is positive or negative.) In the following plot, the covariance
3 1.7
is ( ); note that 1.7 is just slightly smaller than 3, the geometric mean of 1
1.7 1
and 3.
Variance of a linear transform
If we apply a linear transform A to a random variable X , we have

V (AX) = A V (X)AT

We can prove this fact using linearity of expectation.

Exercise: do so. (Hint: assume without loss of generality that E(X) = 0; then use the
definition of variance V (X) = E(XX T ).)

There are a lot of uses for this identity. For example, we can use it to generate samples
with any desired covariance matrix Σ, given a factorization of Σ. Say we have the
Cholesky factorization Σ = LLT : we start by generating samples X with covariance I . If
we then transform them to Y = LX , we have

V (Y ) = V (LX) = L V (X)LT = LLT = Σ

SVD
Putting all of the above together, we can now describe what a scatter plot looks like for a
general covariance matrix. To do so, we'll use the singular value decomposition of V (X):

V (X) = U SU T

Here U is square and orthonormal, and S is diagonal and nonnegative. Note that we've
used the form of the SVD for symmetric positive semidefinite matrices, where we have
the same orthonormal matrix on both the left and the right.

Using the identity for covariance of a linear transform, we know that

V (U T X) = U T V (X)U = (U T U )S(U T U ) = S

That is, the covariance of Y = U T X is diagonal. That means we understand the shape of
the distribution of Y : its scatter plot will look like an axis-parallel ellipsoid, with the
diameter along axis i proportional to Sii .

But now we can go back to X = U Y . We can think of the columns of U as an

orthonormal basis. Multiplying Y by U means transforming each coordinate axis of Y
into one of the basis vectors. Since the basis vectors are orthonormal, the overall effect
will be a combination of rigid rotations and reflections.

So, the scatter plot of X will look like a rotated (and maybe reflected) version of the
scatter plot of Y : that is, it will be an ellipsoid, but not necessarily axis-parallel any more.
Each column Ui will point along one axis of the ellipsoid, and the spread of the ellipsoid
along the direction Ui will be proportional to Sii .
The vectors Ui are called the singular vectors of the covariance matrix. The variances Sii are
called the singular values. For a symmetric PSD matrix, the singular values and singular vectors
are also called the eigenvalues and eigenvectors — though eigenvectors and singular vectors are
different for general matrices. So you will often see a distribution described in terms of the
eigenvectors of its covariance matrix.

Get Mathematical Foundations for Signal Processing Communications and Networking 1st Edition Erchin Serpedin (Editor) free all chapters
No ratings yet
Get Mathematical Foundations for Signal Processing Communications and Networking 1st Edition Erchin Serpedin (Editor) free all chapters
55 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (4)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Willi-Hans Steeb, Yorick Hardy - Problems & Solutions in Quantum Computing & Quantum Information-World Scientific Publishing Company (2004) PDF
No ratings yet
Willi-Hans Steeb, Yorick Hardy - Problems & Solutions in Quantum Computing & Quantum Information-World Scientific Publishing Company (2004) PDF
262 pages
Lecture 4 Characteristics and Some
No ratings yet
Lecture 4 Characteristics and Some
34 pages
lecture 6. Statistics (1)
No ratings yet
lecture 6. Statistics (1)
28 pages
PME-lec7-ch4-a
No ratings yet
PME-lec7-ch4-a
67 pages
Lesson 4: Mean and Variance of A Discrete Random Variables
No ratings yet
Lesson 4: Mean and Variance of A Discrete Random Variables
11 pages
Variance
No ratings yet
Variance
22 pages
2A2. Review of Probability
No ratings yet
2A2. Review of Probability
8 pages
Variance PDF
No ratings yet
Variance PDF
14 pages
04 Ekspektasi - Matematik - SLIDE
No ratings yet
04 Ekspektasi - Matematik - SLIDE
28 pages
Lecture 2
No ratings yet
Lecture 2
45 pages
Variance: Variance Is The Expectation of The
No ratings yet
Variance: Variance Is The Expectation of The
21 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
47 pages
FRM Part 1: Basic Statistics
No ratings yet
FRM Part 1: Basic Statistics
28 pages
Stat-and-Prob-Q1-M5
No ratings yet
Stat-and-Prob-Q1-M5
12 pages
Mit18 05 s22 Class05-Prep-A
No ratings yet
Mit18 05 s22 Class05-Prep-A
7 pages
Introductory Econometrics: Probability and Statistics Refresher
No ratings yet
Introductory Econometrics: Probability and Statistics Refresher
35 pages
Lecture 15 N
No ratings yet
Lecture 15 N
37 pages
Mean Variance and SD of A Discrete Random Variable
No ratings yet
Mean Variance and SD of A Discrete Random Variable
12 pages
Lesson 2-07 Properties of Means and Variances
100% (1)
Lesson 2-07 Properties of Means and Variances
9 pages
Variance - Wikipedia, The Free Encyclopedia
No ratings yet
Variance - Wikipedia, The Free Encyclopedia
18 pages
Basics
No ratings yet
Basics
8 pages
Statistics and Probability Notes Part 1
No ratings yet
Statistics and Probability Notes Part 1
23 pages
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
No ratings yet
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
41 pages
Haseeb Akbar 37 Statistics Assignment
No ratings yet
Haseeb Akbar 37 Statistics Assignment
45 pages
Mosconi W1
No ratings yet
Mosconi W1
14 pages
Variance PDF
No ratings yet
Variance PDF
11 pages
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
No ratings yet
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
3 pages
Expected Values and Variance
No ratings yet
Expected Values and Variance
6 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
Basic - Statistics 30 Sep 2013 PDF
100% (1)
Basic - Statistics 30 Sep 2013 PDF
20 pages
Measures of Variation
No ratings yet
Measures of Variation
10 pages
Why Study Dispersion?: Spread of The Data
No ratings yet
Why Study Dispersion?: Spread of The Data
31 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
L-10 Expectation & Variance PDF
No ratings yet
L-10 Expectation & Variance PDF
34 pages
Ec310 Day 2 Lecture Notes
No ratings yet
Ec310 Day 2 Lecture Notes
10 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
13 Discrete RV
No ratings yet
13 Discrete RV
29 pages
Statistics in Real Life
No ratings yet
Statistics in Real Life
19 pages
Econometrics I Lecture 2 Wooldridge
No ratings yet
Econometrics I Lecture 2 Wooldridge
40 pages
Physical Meaning of SM
No ratings yet
Physical Meaning of SM
39 pages
Variance of Discrete Random Variables Class 5, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Variance of Discrete Random Variables Class 5, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
74 pages
Business Quantitative Techniques
No ratings yet
Business Quantitative Techniques
52 pages
Stats and Prob Reviewer, Q3 Jess Anch.
No ratings yet
Stats and Prob Reviewer, Q3 Jess Anch.
8 pages
1_module_notes
No ratings yet
1_module_notes
9 pages
Expectation
No ratings yet
Expectation
19 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Sum of Variances
No ratings yet
Sum of Variances
11 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Variance and Standard Deviation
100% (3)
Variance and Standard Deviation
15 pages
Statests
No ratings yet
Statests
20 pages
week two note
No ratings yet
week two note
19 pages
Describing Data:: Numerical Measures
No ratings yet
Describing Data:: Numerical Measures
15 pages
Machine Learning and Pattern Recognition Expectations
No ratings yet
Machine Learning and Pattern Recognition Expectations
3 pages
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
48 pages
Variance and Standard Deviation Math 217 Probability and Statistics
No ratings yet
Variance and Standard Deviation Math 217 Probability and Statistics
3 pages
Random Variables Review - unannotated
No ratings yet
Random Variables Review - unannotated
9 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Demand
No ratings yet
Demand
9 pages
Business Cycles
No ratings yet
Business Cycles
5 pages
Supply
No ratings yet
Supply
7 pages
27 Consumption
No ratings yet
27 Consumption
14 pages
26.fiscal Policy 2
No ratings yet
26.fiscal Policy 2
3 pages
Learning The Meaning of Music: Brian Whitman MIT Media Lab April 14 2005
No ratings yet
Learning The Meaning of Music: Brian Whitman MIT Media Lab April 14 2005
65 pages
Total Variation-Regularized Weighted Nuclear Norm Minimization For Hyperspectral Image Mixed Denoising
No ratings yet
Total Variation-Regularized Weighted Nuclear Norm Minimization For Hyperspectral Image Mixed Denoising
22 pages
LADR4 e
No ratings yet
LADR4 e
404 pages
Chapter 5 Eigenvalue Eigenvector
No ratings yet
Chapter 5 Eigenvalue Eigenvector
26 pages
Thiagarajar MTech Pse Syllabus
No ratings yet
Thiagarajar MTech Pse Syllabus
128 pages
Turban Dss9e Ch07
No ratings yet
Turban Dss9e Ch07
45 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
A Trace Inequality of John Yon Neumann: (Received 12 December 1973)
No ratings yet
A Trace Inequality of John Yon Neumann: (Received 12 December 1973)
4 pages
Lab Experiment-5 Discrete Cosine Transform/discrete Fourier Transform Aim
No ratings yet
Lab Experiment-5 Discrete Cosine Transform/discrete Fourier Transform Aim
29 pages
M.SC., Data Science
No ratings yet
M.SC., Data Science
128 pages
Matrix Principal Component Analysis For Image Compression and Recognition
No ratings yet
Matrix Principal Component Analysis For Image Compression and Recognition
6 pages
Understanding Singular Value Decomposition and its Application in Data Science _ by Reza Bagheri _ Towards Data Science
No ratings yet
Understanding Singular Value Decomposition and its Application in Data Science _ by Reza Bagheri _ Towards Data Science
126 pages
Mathlab PDF
100% (2)
Mathlab PDF
242 pages
MID TERM medicine recommended system report
No ratings yet
MID TERM medicine recommended system report
43 pages
Full Download Linear Algebra with Python: Theory and Applications 1st Edition Makoto Tsukada PDF DOCX
100% (2)
Full Download Linear Algebra with Python: Theory and Applications 1st Edition Makoto Tsukada PDF DOCX
40 pages
Maths For Intelligent Systems
No ratings yet
Maths For Intelligent Systems
76 pages
mathematics
No ratings yet
mathematics
33 pages
NI-Predictive Maintenance and Machine Health Monitoring
100% (1)
NI-Predictive Maintenance and Machine Health Monitoring
34 pages
Movie Recommendation System RAHULdocx
No ratings yet
Movie Recommendation System RAHULdocx
46 pages
Toufik - Linear and Graph LAB
No ratings yet
Toufik - Linear and Graph LAB
31 pages
Computational inverse techniques in nondestructive evaluation 1st Edition G.R. Liu - The ebook in PDF format is ready for immediate access
100% (2)
Computational inverse techniques in nondestructive evaluation 1st Edition G.R. Liu - The ebook in PDF format is ready for immediate access
54 pages
MATLAB Course - Part 1
No ratings yet
MATLAB Course - Part 1
72 pages
A Survey On The Dai-Liao Family of Nonlinear Conjugate Gradient Methods
No ratings yet
A Survey On The Dai-Liao Family of Nonlinear Conjugate Gradient Methods
16 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Efficient Large Language Models- A Survey
No ratings yet
Efficient Large Language Models- A Survey
67 pages
Algorithms For Ellipsoids
No ratings yet
Algorithms For Ellipsoids
49 pages
Linear Algebra Done Right: Sheldon Axler
No ratings yet
Linear Algebra Done Right: Sheldon Axler
408 pages

mean-variance

Uploaded by

mean-variance

Uploaded by

Mean

ˉ . (We've also used a bar to denote the

The variance is defined as

V (X) = E((X − E(X))2 )

Exercise: if the variance of X is 1, what is the variance of aX + b?

Exercise: if the standard deviation of X is 1, what is the standard deviation of aX + b?

Normalizing a random variable

Normalizing affects different downstream machine learning procedures in different

then E(Q(X)) is the probability that X ∈ [3, 4]. Or, if

Covariance and correlation

Cov(X, Y ) = E((X − E(X))(Y − E(Y )))

Conditional mean and variance

If we have a family of events Y = 1, Y = 2, …, then E(X ∣ Y ) represents a table: it shows

Vector-valued random variables

For X ∈ Rn , we generalize the definition of the variance of X : we define the covariance

V (X) = E[(X − E(X))(X − E(X))T ]

V (X) = ∑(x − E(X))(x − E(X))T P (X = x)

xT V (X)x = E[xT (X − E(X))(X − E(X))T x] = E[(xT (X − E(X))2 ] ≥ 0

Interpreting covariance matrices

The diagonal entries of Σ are the variances of individual coordinates of X . A large

The off-diagonal entries of Σ are the covariances of pairs of coordinates of X . A positive

We can prove this fact using linearity of expectation.

V (Y ) = V (LX) = L V (X)LT = LLT = Σ

Using the identity for covariance of a linear transform, we know that

But now we can go back to X = U Y . We can think of the columns of U as an

You might also like