Kalman Filtering: Part I: Instructor: Istvan Szunyogh Class: February 23, 2007
Kalman Filtering: Part I: Instructor: Istvan Szunyogh Class: February 23, 2007
Recommended Readings: Geir Evensen, 2006: Data Assimilation: The Ensemble Kalman Filter, Springer, 280 pages, is a nice handbook that also provides a good summary of the history. Cautionary notes There have been many important developments since book has been completed (most likely in early 2005) When considering the computational cost of the alternative computational algorithms, the book does not really consider that the algorithms are usually implemented on parallel computers (this is the case for an operational NWP model) A little too much credit is claimed by the authorthis limits the value of the book only as a source on history Brian Hunt, Eric Kostelich and Istvan Szunyogh, 2007: Ecient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Kalman Filter. Physica D. Available from the Weather-Chaos web page. 1
Mathematical Formulation, following Brian Hunt (the most elegant formulation I am aware of) The Analysis Problem: Consider a system governed by the ordinary dierential equation dx = F (t, x), (1) dt where x is an m-dimensional vector representing the state of the system at a given time. Suppose we are given a set of (noisy) observations of the system made at various times. We want to determine which trajectory {x(t)} of (1) best ts the observations. For any given t, this trajectory gives an estimate of the system state at time t. 2
Notation Let us assume that the observations are the result of measuring quantities that depend on the system state in a known way, with Gaussian measurement errors.
o o An observation at time tj is a triple (yj , Hj , Rj ), where yj is a vector of observed values, and Hj and Rj describe the o relationship between yj and x(tj ): o yj = Hj (x(tj )) + j ,
where j is a Gaussian random variable with mean 0 and covariance matrix Rj . Here, a perfect model is assumed: the observations are based on a trajectory of (1), and our problem is simply to infer which trajectory produced the observations. In a real application, the observations come from a trajectory of the physical system for which (1) is only a model. 3
The maximum likelihood estimate for the trajectory that best ts the observations at times t1 < t2 < < tn . The likelihood of a trajectory x(t) is proportional to
n o o exp([yj Hj (x(tj ))]T R1 [yj Hj (x(tj ))]), j j=1
since the observational errors are normally distributed and are assumed to be independent at the dierent observation times. The most likely trajectory is the one that maximizes this expression. Equivalently, the most likely trajectory is the one that minimizes the cost function
n
J o ({x(t)}) =
j=1
Thus, the most likely trajectory is also the one that best ts the observations in a least square sense. 4
Replacing the Trajectory with the State at a Particular Time (2) expresses the cost J o as a function of the trajectory {x(t)}. To minimize the cost, it is more convenient to write it as a function of the system state at a particular time t. Let Mt,t be the map that propagates a solution of (1) from time t to time t . Then
n
Jto (x) =
j=1
expresses the cost in terms of the system state x at time t. To estimate the state at time t, we attempt to minimize Jto . 5
Remarks In practice the observations do not have to be all collected at tn . In a typical implementation, at tn we assimilate all observations that were collected at times t in the window tn t/2 < t < tn +t/2, where t = tj tj1 , j = 2, . . . , n. For a nonlinear model, there is no guarantee that a unique minimum exists. Even if a minimum exist, evaluating Jto is apt to be computationally expensive, and minimizing it may be impractical. But, if both the model and the observation operators Hj are linear, the minimization is quite tractable, because Jto is then quadratic. Furthermore, one can compute the minimum by an iterative method, namely the Kalman Filter (Kalman 1960; Kalman and Bucy 1961). 6
Linear Scenario: the Kalman Filter In the linear scenario, we can write Mt,t (x) = Mt,t x and Hj (x) = Hj x where Mt,t and Hj are matrices. We now describe how to perform a forecast step from time tn1 to time tn followed by an analysis step at time tn , in such a way that if we start with the most likely system state, given the observations up to time tn1 , we end up with the most likely state given the observations up to time tn .
The estimate of the state and the uncertainty at tn1 Suppose the analysis at time tn1 has produced a state estimate a xn1 and an associated covariance matrix Pa . n1 In probabilistic terms, a xn1 and Pa represent the mean n1 and covariance of a Gaussian probability distribution that represents the relative likelihood of the possible system states given the observations from time t1 to tn1 . Algebraically, what we assume is that for some constant c,
n1 o o [yj Hj Mtn1 ,tj x]T R1 [yj Hj Mtn1 ,tj x] = j j=1
(4)
= [x a ]T (Pa )1 [x a ] + c. xn1 xn1 n1 In other words, the analysis at time tn1 has completed the square to express the part of the quadratic cost function Jton1 that depends on the observations up to that time as a single quadratic form plus a constant. The Kalman Filter determines a and Pa such that an xn n analogous equation holds at time tn . 8
The Kalman Filter I We propagate the analysis state estimate a xn1 and its coa variance matrix Pn1 using the forecast model to produce a background state estimate b and covariance Pb for the xn n next analysis: b = Mtn1 ,tn a , xn xn1 (5) (6)
Under a linear model, a Gaussian distribution of states at one time propagates to a Gaussian distribution at any other time, and the equations above describe how the model propagates the mean and covariance of such a distribution.
The Kalman Filter II Next, we want to rewrite the cost function Jton given by (3) in terms of the background state estimate and the observations at time tn . (This step is often formulated as applying Bayes Rule to the corresponding probability density functions.) In (4), x represents a system state at time tn1 . In our expression for Jton , we want x to represent a system state at time tn Using (5) and (6) yields that part of the cost function at tn that reects the eect of observations collected up to tn
n1 o o xn [yj Hj Mtn ,tj x]T R1 [yj Hj Mtn ,tj x] = [xb ]T (Pb )1 [xxb ]+c. n n j j=1
It follows that the total cost function at tn is o o Jton (x) = [xb ]T (Pb )1 [xb ]+[yn Hn x]T R1 [yn Hn x]+c. xn xn n n (7) where the second term reects the eects of observations collected at tn 10
The Kalman Filter III To complete the data assimilation cycle, we determine the state estimate a and its covariance Pa so that xn n for some constant c .
Pa = (Pb )1 + HT R1 Hn n n n n
Equating the terms of degree 1, we get
(8)
o a = Pa (Pb )1 b + HT R1 yn . xn xn n n n n
(9)
The last equation in some sense (consider, for example, the case where Hn is the identity matrix) expresses the analysis state estimate as a weighted average of the background state estimate and the observations, weighted according to the inverse covariance of each. 11
The Kalman Filter IV Equations (8) and (9) can be written in many dierent but equivalent forms Using (8) to eliminate (Pb )1 from (9) yields n
o o a = b + Pa HT R1 (yn Hn b ) = b + K(yn Hn b ) (10) xn xn xn xn xn n n n
The matrix K = Pa HT R1 is called the Kalman gain. It n n n multiplies the dierence between the observations at time tn and the values predicted by the background state estimate to yield the increment between the background and analysis state estimates. Rearranging (8) yields
Pa = (I + Pb HT R1 Hn )1 Pb = (I KH)Pb . n n n n n n
(11)
This expression is better than the previous one from a practical point of view, since it does not require inverting Pb . n 12
Many approaches to data assimilation for nonlinear problems are based on the Kalman Filter, or at least on minimizing a cost function similar to (7). At a minimum, a nonlinear model forces a change in the forecast equations (5) and (6), while nonlinear observation operators Hn force a change in the analysis equations (10) and (11) The Extended Kalman Filter (see, for example, Jazwinxn1 ski 1970) computes b = Mtn1 ,tn (a ) using the nonlinear xn b using the linearization M model, but computes Pn tn1 ,tn of a x Mtn1 ,tn around n1 . The analysis then uses the linearization Hn of Hn around b . xn
13
Diculties with the Implementation of the Extended Kalman Filter It is not easy to linearize the dynamics for a complex, highdimensional model, such as a global weather prediction model. The number of model variables m is several million, and as a result the m m matrix inverse required by the analysis cannot be performed in a reasonable amount of time. The use of the linear evolution equations can lead to an unbounded linear instability (see chapter 4.2.3 in Evensen 2006).
14
Practical Implementations at the NWP Centers Approaches used in operational weather forecasting generally eliminate for pragmatic reasons the time iteration of the Kalman Filter. NCEP/NWS: data assimilation is done every 6 hours with a 3D-VAR method, in which the background covariance Pb is replaced by a constant matrix B The 3D-VAR cost n function also includes a nonlinear observation operator Hn , and is minimized numerically to produce the analysis state estimate xa . n The 4D-VAR method (e.g., Le Dimet and Talagrand 1986; Talagrand and Courtier 1987) used by the European Centre for Medium-Range Weather Forecasts uses a cost function that includes a constant-covariance background term as in 3D-VAR together with a sum like (2) accounting for the observations collected over a 12 hour time span. 15