Coherence, Phase and Cross-Correlation: X, Y S T S T X, Y X, Y Y, X X, Y X, Y
Coherence, Phase and Cross-Correlation: X, Y S T S T X, Y X, Y Y, X X, Y X, Y
Spring 2007
This lab will cover quite a lot of material. You will use both time domain methods and frequency domain methods to study connections between two data sets. You will use the acf command from previous labs to estimate cross-correlation functions and the spectrum command to estimate the coherence between two data sets. You will also construct approximate condence bands for these estimates. In bivariate analyses you may want to describe a connection between two data sets. Can the inclusion of one data set help describe the other data set and make predictions more reliable? How much of the stochastic behavior of one data set can be explained by the other? In the second part of the lab we will look into how to model this kind of data. In the last part of the lab, we will estimate the transfer function for an AR(1) and an MA(1) process constructed from a known innovation series, and compare the estimates to the true transfer functions. You will learn that a phase plot of the estimated transfer function can tell you if one data set is leading or lagging the other, among other things.
1 T
(x(t) x)(y(t + ) y )
t=1
(x(t) x)(y(t + ) y )
t= +1
when < 0. Construct two WN processes where the second is the same as the rst but lagged 3 time units. Estimate the cross-correlation function and state the result. 1
e<-rnorm(300) e1<-ts(e[1:250]) e2<-ts(e[4:253]) acf(ts.union(e1,e2)) You clearly see the lag of e1 on e2 in the plot (e1 takes on the same value as e2 3 time units later). What happens if you use acf(ts.union(e2,e1))? The conclusion is to be careful and consistent with what you consider your dependent or output signal and what your input signal is. From now on lets consider e2 (the leading series) to be the input to some system and e1 what we get out from the system. The dashed lines indicating condence bands in the acf plot are based on WN processes as before. This may grossly underestimate the variance of the cross-correlation estimates! If there is a lot of structure in the individual data sets the white noise condence bands for the cross-correlation estimate can be very misleading. You can try this yourself on two independent data set that we introduce some structure to by low-pass ltering them. Use an MA(q) model for some q as a lowpass lter and create two independent time series. Now estimate their cross correlation function. e1f<-filter(rnorm(250),filter=?) e2f<-filter(rnorm(250),filter=?) acf(ts.union(ts(e1f),ts(e2f))) (Note, you have to remove NA values in e1f, e2f before using acf). As you can see, even though the data sets are independent of each other there are some signicant sample cross-correlation estimates. The reason is that the condence bands are underestimating the variance of the cross-correlation estimate. You could try prewhitening the data before estimating the cross-correlation function (Do and Plot). You could also try bootstrapping. There are several bootstrap programs posted on the class homepage. Download cfdirect.q, statboot.q and FDJ2.q and try them out. (Review the handout for more details: the direct method, the stationary bootstrap and frequency domain jackknife). cfband(e2f,e1f) FDJ2(e2f,e1f) statboot(.05,500,e2f,e1f) Read the header part of each function to understand what the dierent input defaults are. Try changing some defaults (e.g. lagmax, p, B). Comment.
IX,Y = (
s=0
eis X(s)
t=0
eit Y (t))/2T
However, if we write JX () =
T 1 is X(s) s=0 e
M od(IX,Y ())2 M od(JX ())M od(JY ()) = =1 M od(JX ())M od(JY ()) (IX,X ()IY,Y ())1/2
I.e. the estimated coherence would always be 1. To get a better estimate we smooth the periodograms and estimate the coherence as above but using the smoothed versions. You can get the function cohplot1 from the class home page. This will plot the coherence and 95 percent condence bands. We went through the approximate asymptotic distribution of coherence and phase estimates in class. To be brief, we apply a variance stabilizing transform (arctanh) to the estimated coherence. The transformed estimate is then (approximately) asymptotically normally distributed, with variance gg/2 where gg = 2/v (v is the smoothing parameter). The phase is estimated similarly using smoothed periodograms. The function phaplot1, also on the class home page, will plot the phase spectrum for you with condence bands. Plot the coherence and the phase spectrum for the time series e1 and e2 and comment. f<-spectrum(ts.union(e1,e2), spans=c(3,3)) cohplot1(f) phaplot1(f) Notice that the coherence is very close to 1 for all frequencies. The phase plot may look a little funny. I have restricted the plot to limits to . Between the jump points you can see that the phase plot has a slope. Calculate what the slope is. plot(f$freq,f$phase) Divide the slope you get by 2 pi. What is the result? State why. Lets use the coherence and phase plot on another data set. Simulate two data sets that are unrelated but have the same frequency component but phase shifted by some amount. Plot the coherence and phase spectrum. Where are the condence bands narrow? What is the phase shift between the series for that frequency? t<-seq(1:250) y1<-ts(8*sin(2*pi*0.15*t)+rnorm(250)) y2<-ts(8*sin(2*pi*0.15*t-1)+rnorm(250)) plot(y1) plot(y2) f<-spectrum(ts.union(y2,y1),spans=c(3,3)) cohplot1(f) phaplot1(f) 3
ysa<-ts(ys[3:250]) xsa1<-ts(xs[2:249]) xsa2<-ts(xs[1:248]) mod<-arima(ysa,order=c(1,0,0),xreg=cbind(xsa1,xsa2)) tsdiag(mod) mod$coef As you can see I have re-aligned the series ys with the lag of xs it depends on. Check the values of the estimated coecients. In general if you want to t an ARMAX model to your data you will construct a matrix of lagged values of the input signal. Lets say your study of the series and the acf and possibly a t of a multivariate AR has led you to believe that this model might be appropriate: Y (t) = a1Y (t 1) + e(t) + b(1)e(t 1) + c0X(t) + ... + clX(t l) You would start by setting up the matrix xmat as follows: xmat<-cbind(xs[(l+1):length(xs)],xs[l:(length(xs)-1)],...,x[1:(length(xs)-l)]) Now you can t the model using arima.mle as above mod<-arima(ys[(l+1):length(ys)],order=c(p,0,q),xreg=xmat) Of course, you can also take dierences, do seasonal adjustment, include longer MA and AR components etc.
, where R2 is the coherence and L is the smoothing parameter of the spectrums. As you can see, if the coherence is near 1 the variance of the estimate is small. The estimated phase of A is given by the phase of the cross-spectrum. The variance of the estimated phase of A is the same as the variance of log(M od(A)). Lets estimate some transfer functions. An MA(1) process can be obtained from an innovation process like this. Y (t) = e(t) + e(t 1) The transfer function, relating the series Y to e is A() = 1 + ei Simulate an innovation process e and the MA(1) process with parameter = 0.8. Estimate the transfer function and compare to the true A as stated here. The function esttransfer(output,input) can be found on the section home page. It estimates the transfer function using the smoothed spectrums and then plots the logarithm of the modulus of A, and the phase, with condence bands. e<-ts(rnorm(251)) xt<-ts(filter(e,filter=c(1,.8))[1:250]) e<-ts(e[2:251]) acf(ts.union(xt,e)) f<-spectrum(ts.union(xt,e),spans=c(3,3)) cohplot1(f) phaplot1(f) lam<-f$freq*2*pi AA<-1+0.8*exp(-1i*lam) mA<-esttransfer(xt,e) plot(f$freq,log(Mod(AA))) plot(f$freq,Arg(AA)) The coherence between the two series is near 1 for almost all frequencies so the condence band of the estimated transfer function is quite narrow. Lets try another process. Simulate an AR(1) 1 process with parameter = 0.8. The transfer function is now A() = 10.8ei . Estimate the transfer function of the AR process as above and compare to the true A. Now, change the sample size, and/or the coecient value of the AR - what happens to the transfer function in these settings? Now we will use the series e1 and e2 from above. e2 was the input series and e1 the output. Estimate the transfer function in this case. What do the gain and phase plots look like? How about the series y1 and y2. What do the plots look like? Comment. What is the gain and phase of the estimated transfer function for frequencies where the coherence is high? Explain, make sure you get the right units (Hint 1: 2, log. Hint 2: Use p<-locator() to get the coordinates of a point in a plot.) Verify these results with the theoretical transfer functions under the simulation models.