0% found this document useful (0 votes)
6 views

STA3005 Exploratory Data Analysis Notes

This document provides an introduction to multivariate data analysis and exploratory data analysis techniques. It discusses multivariate data which has multiple variables and observations. Examples of multivariate data sets are provided including foreign exchange rates and a data set with variables X1, X2, and X3. Key concepts introduced include the mean vector, sample variance, covariances, and the sample variance-covariance matrix. Properties of matrix algebra and multivariate normal distributions are also covered.

Uploaded by

Ajani McPherson
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

STA3005 Exploratory Data Analysis Notes

This document provides an introduction to multivariate data analysis and exploratory data analysis techniques. It discusses multivariate data which has multiple variables and observations. Examples of multivariate data sets are provided including foreign exchange rates and a data set with variables X1, X2, and X3. Key concepts introduced include the mean vector, sample variance, covariances, and the sample variance-covariance matrix. Properties of matrix algebra and multivariate normal distributions are also covered.

Uploaded by

Ajani McPherson
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

STA3005

MULTIVARIATE DATA ANALYSIS

Exploratory Data Analysis


Multivariate data take the form
Variable1 Variable2 Variable3 Variable p
Item1 X11 X12 X13 ... X1p
Item2 X21 X22 X23 ... X2p
Item3 X31 X32 X33 ... X3p
. . . . ... .
. . . . ... .
Item n Xn1 Xn2 Xn3 ... Xnp
Example Foreign Exchange rates for a Bank with variables
namely Cash buy, Cheque buy and Sell

Country Cash buy cheque buy Sell


US 125.95 139.61 150.09
GBP 183.87 189.94 197.65
CAD 104.26 108.12 112.86
EUR 164.13 167.82 178.04
KYD 157.35 163.90 178.16

Etc.
Example2

Take a different multivariate data set with


variables X1, X2 and X3 but only 4 Items of data

Variables
X1 X2 X3
7 10 12
5 15 18
10 12 20
6 11 10
Example 2 (continued)

The mean vector is the vector with the means for each of the
variables.
Mean of X1 is (7+5+10+6)/4=7
Mean of X2 is (10+15+12+11)/4=12
Mean of X3 is 15
The mean vector is represented by x-bar in bold type or for
the purpose of hand written notes X-bar with a line below
That is = or =

Also,
The sample variance may be found for each variable.

For X1 the variance may be represented by:

S11 or = =4.7

Similarly, for variable X2, but


The covariances are calculated by
Cov(X1,X2)=
=

Cov(X1,X3)==
=
= = 4.7

Similarly, Cov(X2,X3)==6.7
But generally, . Therefore, and
And finally,
All of these value make up the sample variance-covariance
matrix S.
The variance-covariance matrix is given by
Properties of Matrix Algebra and Random Vectors related to
multivariate data
These many properties are stated within MAT1043
The prerequisite module, Linear Algebra

Important properties
Population Variance-Covariance matrices represented by upper
case sigma is a non-singular symmetric matrix. This makes it
ideal for all the mathematical manipulations in multivariate
statistical methods.
This is the star actor
on which the entire module is based
The Variance-Covariance matrix
Suppose for
The diagonal elements form the variance matrix
consisting of the variances of the individual variables.
The standard deviation matrix is the diagonal
matrix with the square roots of each element

The standard deviation matrix multiplied by itself gives the


Variance matrix.
The correlation coefficient for two variables is given by

.
Similarly the correlation matrix is given by

where is the inverse of the matrix found by taking the


reciprocal of each diagonal element.
This matrix is a positive definite matrix for which the

eigen values are all strictly positive and the eigen vectors are

mutually perpendicular. As a result such a matrix will have a

square root matrix.


Multivariate Normal Distribution

The multivariate normal density function is the matrix form


analogous to the univariate normal density function given by

,
Multivariate parameter estimates

The maximum Likelihood Estimates of the


population mean vector and the
population variance-covariance matrix Σ
are the sample mean vector and the sample
variance-covariance matrix S respectively.
If are jointly normally distributed
with mean and covariance Σ then

1. is distributed as
2. is distributed as a Wishart random matrix
with n-1 degrees of freedom
3. and are independent

You might also like