Lecture 21
Lecture 21
Multivariate Analysis:
Data Analysis: V ariable 1 V ariable 2 V ariable 3 x11 x12 x13 x21 x22 x23 x31 x32 x33 . . . . . . . . . . . . Obs n xn1 xn2 xn3 Obs 1 Obs 2 Obs 3 . . . V ariable p x1p x2p x3p . . . xnp
Notation : xjk = measurement of the k-th variable on the j-th item. In matrix notation: The j-th observation is xj = (xj1 , ..., xjk ) , and x11 x12 x13 x21 x22 x23 X = x31 x32 x33 . . . . . . . . . xn1 xn2 xn3 X= x1 x2 . . . xn
. . .
or
Descriptive statistics:
Sample means = xk = Sample variance: s2 = k
1 n 1 n1 n j=1
xjk ,
k = 1, , p. xk ) 2
n j=1 (xjk
1 n1
n j=1 (xji
xi )(xjk xk ),
i, k = 1, , p
Sample correlation coecients: sik = rik = si sk Remarks on Correlation: 1. 1 rik 1 2. rik measures the strength of linear association. 3. rik is scale invariant. 4. rik is the referred to as the Pearsons correlation coecient. For measurement of general dependence (including nonlinear), one can use Kendalls tau or Spearmans rho.
n j=1 (xji n j=1 (xji
xi )(xjk xk )
n j=1 (xjk
xi ) 2
xk ) 2
Kendalls tau:
Let F be a continuous bivariate cumulative distribution function (CDF) of random variable x = (x1 , x2 ) . Let (X1 , X2 ) and (X1 , X2 ) be independent random pairs with distribution F . Then, Kendalls tau is = P r[(X1 X1 )(X2 X2 ) > 0] P [(X1 X1 )(X2 X2 ) < 0]. = 2P r[(X1 X1 )(X2 X2 ) > 0] 1 = 4 F dF 1. 2
Kendall- is also known as Kendall rank correlation coecient. It is a measure of the similarity of the orderings of the data when ranked by each of the quantities.
nc nd 1 n(n 1) 2
where nc is the number of concordant pairs and nd the number of discordant pairs in the data set. The denominator is the total number of pairs of the data. A high indicates the most pairs are concordant, indicating that the two rankings are consistent. By a corcondant pair, we mean sign(X2 X1 ) = sign(Y2 Y1 ), where sign(d) = 1, 0, 1 for d < 0, = 0, > 0, respectively. A pair is discordant if sign(X2 X1 ) = sign(Y2 Y1 ). Based on the sign function, Kendall- can be rewritten as =
i<j
sign(Xj Xi ) sign(Yj Yi )
n(n1) 2
Spearmans rho:
Let F be a continuous bivariate cumulative distribution function (CDF) of random variable x = (x1 , x2 ) . Let F1 and F2 be the two marginal CDF. Assume (X1 , X2 ) F . Then, Spearmans rho is the correlation of F1 (X1 ) and F2 (X2 ). Since F1 (X1 ) and F2 (X2 ) are uniform U (0, 1), their means are S = 12
1 2
1 . 12
Spearmans rho is also a rank correlation co-ecient. For data (Xi , Yi )n , let (Ri , Si )n be i=1 i=1 the corresponding rank pairs and di = Ri Si be the dierence betwen the ranks. Since, R=S=
n+1 1 . n 2 n i=1 (Ri
n+1 2 ) 2
(n2 1) 12
R)(Si S) n(n2 1) 3
d2 i
i=1
=
i=1
(Ri R)2 +
i=1
(Si S)2 2
i=1
(Ri R)(Si S)
Therefore, the most common form of the spearman co-ecient of rank correlation is obtained as R=1 6 n d2 i=1 i n(n2 1)