0% found this document useful (0 votes)
9 views6 pages

Covariance Matrix

The covariance matrix is a square matrix that represents the variance and covariance between pairs of elements in a dataset. It is symmetric, positive semi-definite, and essential for stochastic modeling and principal component analysis. The document explains the definition, properties, formulas, and examples of calculating covariance matrices.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Covariance Matrix

The covariance matrix is a square matrix that represents the variance and covariance between pairs of elements in a dataset. It is symmetric, positive semi-definite, and essential for stochastic modeling and principal component analysis. The document explains the definition, properties, formulas, and examples of calculating covariance matrices.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Covariance Matrix

Covariance matrix is a type of matrix that is used to represent the covariance values between
pairs of elements given in a random vector. The covariance matrix can also be referred to as
the variance-covariance matrix. This is because the variance of each element is represented
along the main diagonal of the matrix.

A covariance matrix is always a square matrix. Furthermore, it is positive semi-definite, and


symmetric. This matrix is very useful in stochastic modelling and principle component
analysis. In this article, we will learn about the variance covariance matrix, its formula,
examples, and various important properties associated with it.

What is Covariance Matrix?

Covariance matrix is a square matrix that displays the variance exhibited by elements of
datasets and the covariance between a pair of datasets. Variance is a measure of dispersion
and can be defined as the spread of data from the mean of the given dataset. Covariance is
calculated between two variables and is used to measure how the two variables vary together.

Covariance Matrix Definition

Variance covariance matrix is defined as a square matrix where the diagonal elements
represent the variance and the off-diagonal elements represent the covariance. The covariance
between two variables can be positive, negative, and zero. A positive covariance indicates
that the two variables have a positive relationship whereas negative covariance shows that
they have a negative relationship. If two elements do not vary together then they will display
a zero covariance.

Covariance Matrix Example

Suppose there are two data sets X = {3, 2} and Y = {7, 4}. The sample variance of dataset X
= 0.5, and Y = 4.5. The covariance between X and Y is 1.5. The covariance matrix is
expressed as follows:

0.5 1.5
� �
1.5 4.5

A detailed description of how to find the variance-covariance matrix is covered below.

Covariance Matrix Formula


𝑽𝑽𝑽𝑽𝑽𝑽(𝑿𝑿𝟏𝟏 ) ⋯ 𝑪𝑪𝑪𝑪𝑪𝑪(𝑿𝑿𝒏𝒏 , 𝑿𝑿𝟏𝟏 )
� ⋮ ⋱ ⋮ �
𝑪𝑪𝑪𝑪𝑪𝑪(𝑿𝑿𝒏𝒏 , 𝑿𝑿𝟏𝟏 ) ⋯ 𝑽𝑽𝑽𝑽𝑽𝑽(𝑿𝑿𝒏𝒏 )
To determine the covariance matrix, the formulas for variance and covariance are required.
Depending upon the type of data available, the variance and covariance can be found for both sample
data and population data. These formulas are given below.

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝜇𝜇)
2
Population Variance: var(x) =
𝑛𝑛

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝜇𝜇𝑥𝑥 )�𝑦𝑦𝑖𝑖 −𝜇𝜇𝑦𝑦 �
Population Covariance: cov(x, y) =
𝑛𝑛

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )
2
Sample Variance: var(x) =
𝑛𝑛−1

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ )(𝑦𝑦𝑖𝑖 −𝑦𝑦
�)
Sample Covariance: cov(x, y) =
𝑛𝑛−1

μ= mean of population data.

𝑥𝑥̅ = mean of sample data.

n = number of observations in the dataset.

xi = observations in dataset x.

Using these formulas, the general form of a variance covariance matrix is given as follows:

𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋1 ) ⋯ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋1 , 𝑋𝑋𝑛𝑛 )


� ⋮ ⋱ ⋮ �
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋𝑛𝑛 , 𝑋𝑋1 ) ⋯ 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋𝑛𝑛 )

Covariance Matrix 2 × 2

A 2 × 2 matrix is one which has 2 rows and 2 columns. It is computed for 2 attributes in a
dataset. The formula for a 2 × 2 covariance matrix is given as follows:

𝒗𝒗𝒗𝒗𝒗𝒗(𝑿𝑿) 𝒄𝒄𝒄𝒄𝒄𝒄(𝑿𝑿, 𝒀𝒀)


� �
𝒄𝒄𝒄𝒄𝒄𝒄(𝑿𝑿, 𝒀𝒀) 𝒗𝒗𝒗𝒗𝒗𝒗(𝒀𝒀)

Covariance Matrix 3 × 3

If there are 3 attributes in a dataset or three separate 1 attribute dataset, x, y, and z, then the
formula to find the 3 × 3 covariance matrix is given below:

𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑍𝑍)


�𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌) 𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌, 𝑍𝑍)�
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑍𝑍) 𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌, 𝑍𝑍) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑍𝑍)

How to calculate Co-variance Matrix?


The number of variables determines the dimension of a variance-covariance matrix. For example, if
there are two variables (or datasets) it indicates that the covariance matrix will be 2 dimensional.
Suppose the math and science scores of 3 students are given as follows:

Student Math (X) Science (Y)


1 92 80
2 60 30
3 100 70

The steps to calculate the covariance matrix for the sample are given below:

• Step 1: Find the mean of one variable (X). This can be done by dividing the sum of
all observations by the number of observations. Thus, (92 + 60 + 100) / 3 = 84
• Step 2: Subtract the mean from all observations; (92 - 84), (60 - 84), (100 - 84)
• Step 3: Take the sum of the squares of the differences obtained in the previous step.
(92 - 84)2 + (60 - 84)2 + (100 - 84)2.
• Step 4: Divide this value by 1 less than the total to get the sample variance of the first
variable (X). var(X) = [(92 - 84)2 + (60 - 84)2 + (100 - 84)2] / (3 - 1) = 448
• Step 5: Repeat steps 1 to 4 to find the variances of all variables. Using these steps,
var(Y) = 700.
• Step 6: Choose a pair of variables (X and Y).
• Step 7: Subtract the mean of the first variable (X) from all observations; (92 - 84), (60
- 84), (100 - 84).
• Step 8: Repeat step 7 for the second variable (Y); (80 - 60), (30 - 60), (70 - 60).
• Step 9: Multiply the corresponding observations. (92 - 84)(80 - 60), (60 - 84)(30 -
60), (100 - 84)(70 - 60).
• Step 10: Add these values and divide them by (n - 1) to get the covariance. cov(x, y)
= cov(y, x) = [(92 - 84)(80 - 60) + (60 - 84)(30 - 60) + (100 - 84)(70 - 60)] / (3 - 1) =
520.
• Step 11: Repeat steps 6 to 10 for different pairs of variables.
• Step 12: Now using the general formula for covariance matrix arrange these values in
matrix form. Thus, the variance covariance matrix for the example is given as
448 520
� �
520 700
The same steps can be followed while calculating the covariance matrix for a population. The only
difference is that the population variance and covariance formulas will be applied.

Covariance matrix is a very important tool used by data scientists to understand and analyze
multivariate data. Listed below are the various properties of this matrix that make it
extremely useful.

• A covariance matrix is always a square matrix. This means that the number of rows of
the matrix will be equal to the number of columns.
• The matrix is symmetric. Suppose M is the covariance matrix then MT = M.
• It is positive semi-definite. Let u be a column vector, uT is the transpose of that vector
and M be the covariance matrix then uTMu ≥ 0.
• All eigenvalues of the variance covariance matrix are real and non-negative.

Important Notes on Covariance Matrix

• The covariance matrix depicts the variance of datasets and covariance of a pair of
datasets in matrix format.
• The diagonal elements represent the variance of a dataset and the off-diagonal terms
give the covariance between a pair of datasets.
• The variance covariance matrix is always square, symmetric, and positive semi-
definite.
• The general formula to represent a covariance matrix is

𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥1 ) ⋯ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑥𝑥1 , 𝑥𝑥𝑛𝑛 )


� ⋮ ⋱ ⋮ �
𝐶𝐶𝐶𝐶𝐶𝐶(𝑥𝑥𝑛𝑛 , 𝑥𝑥1 ) ⋯ 𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥𝑛𝑛 )

Examples of Covariance Matrix

Example 1: Find the population covariance matrix for the following table.

Score Age
68 29
60 26
58 30
40 35

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝜇𝜇)
2
Solution: The formula for population variance is
𝑛𝑛

μx= 56.5, n = 4

var(x) = [(68 - 56.5)2 + (60 - 56.5)2 + (58 - 56.5)2 + (40 - 56.5)2 ] / 4 = 104.75

μy = 30, n = 4

var(y) = [(29 - 30)2 + (26 - 30)2 + (30 - 30)2 + (35 - 30)2] / 4 = 10. 5

∑4𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝜇𝜇𝑥𝑥 )�𝑦𝑦𝑖𝑖 −𝜇𝜇𝑦𝑦 �


cov(x, y) =
4

cov(x, y) = -27

The variance covariance matrix is given as follows:


104.7 −27
� �
−27 10.5

Example 2: Find the covariance matrix for the following sample data.

X Y Z
15 12.5 50
35 15.8 55
20 9.3 70
14 20.1 65
28 5.2 80

∑𝑛𝑛
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )
2
Solution: The sample variance formula is
𝑛𝑛

Substituting the values of observations for each variable in this formula we get,

n = 5, 𝑋𝑋�= 22.4, var(X) = 321.2 / (5 - 1) = 80.3

𝑌𝑌�= 12.58, var(Y) = 132.148 / 4 = 33.037

𝑍𝑍̅ = 64, var(Z) = 570 / 4 = 142.5

∑5𝑖𝑖=1(𝑥𝑥𝑖𝑖 −22.4)(𝑦𝑦𝑖𝑖 −12.58)


Cov(X,Y)= = -13.865
5−1

∑5𝑖𝑖=1(𝑥𝑥𝑖𝑖 −22.4)(𝑧𝑧𝑖𝑖 −64)


Cov(X, Z)= = 14.25
5−1

∑5𝑖𝑖=1(𝑦𝑦𝑖𝑖 −12.58)(𝑧𝑧𝑖𝑖 −64)


Cov(Y, Z)= = -39.525
5−1

The covariance matrix is

𝟖𝟖𝟖𝟖. 𝟑𝟑 −𝟏𝟏𝟏𝟏. 𝟖𝟖𝟖𝟖𝟖𝟖 𝟏𝟏𝟏𝟏. 𝟐𝟐𝟐𝟐


−𝟏𝟏𝟏𝟏. 𝟖𝟖𝟖𝟖𝟖𝟖 𝟑𝟑𝟑𝟑. 𝟎𝟎𝟎𝟎𝟎𝟎 −𝟑𝟑𝟑𝟑. 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓
𝟏𝟏𝟏𝟏. 𝟐𝟐𝟐𝟐 −𝟑𝟑𝟑𝟑. 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓 𝟏𝟏𝟏𝟏𝟏𝟏. 𝟓𝟓

Example 3: How will you interpret the covariance matrix given below?

𝑋𝑋 𝑌𝑌 𝑍𝑍
� 𝑋𝑋 500 320 −40�
𝑌𝑌 320 340 0
𝑍𝑍 −40 0 800
Solution: The variance covariance matrix can be interpreted as follows:

1) The diagonal elements 500, 340 and 800 indicate the variance in data sets X, Y and Z
respectively. Y shows the lowest variance whereas Z displays the highest variance.

2) The covariance for X and Y is 320. As this is a positive number it means that when X
increases (or decreases) Y also increases (or decreases)

3) The covariance for X and Z is -40. As it is a negative number it implies that when X
increases Z decreases and vice - versa.

4) The covariance for Y and Z is 0. This means that there is no predictable relationship
between the two data sets.

------------------------X------------------------

You might also like