0% found this document useful (0 votes)
5 views

KF PF

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

KF PF

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Data Filtering, Smoothing,

and Prediction

Kalman Filter + Particle Filter


Relation to This Course
 The famous KF is based on parametric
estimation
 The advanced PK is based on density
estimation (non-parametric estimation)
 Both use Bayesian frameworks we just
discussed
Problem Statement
 A recurring theme in many online analysis
and prediction tasks
 How can information
 from different sources
 with different accuracy (corrupted by noise)
 may even be time varying
Be integrated?
 Estimating the position of a line from multiple sample points
 Estimating shape using information from multiple sensors
 Estimating moving robot location from sensor data and dead
reckoning

Computer Vision and Image Analysis


Complication
 There are many more examples, where
 not all data (observation) are gathered at the same time
 not all data (observation) are equally reliable
 the estimated quantities (state) change
 The state variables may not be single-peaked (PF)
Progression -KF
 We will progress through a number of
scenarios
 Static state, observation data available all at
once, of the same quality
 Static state, observation data available all at
once, of different quality
 Static state, observation data not all at once, of
the same or different quality
 Dynamic state, observation data not all at once,
of the same or different quality
General Principles
 Bayesian principle underlies all of the
analysis
 Two things to remember (because we will
use them over and over again)
 Data should be trusted based on their expected
accuracy
 Weighted sum based on covariance
 Stateshould be trusted based on their ability to
explain the sensor observation
 Covariance can change
Simplest Case
 Two (or multiple) measurements with the
same or different uncertainty
 States are directly measured

Batch Iterative Gain innovation


xˆ  x 1  x 2
 12
x1 x2 xˆ  x 1  2 (x 2  x 1 )
xˆ   2
 1 2
2
1   2
2
1   2  1   22
22 2

 2
1

1

1
  2   12 ,  2   22  2   12  2 1 2  12
2  12  22 1   2
Some Important Intuition
 Information is good
 Variance will always decrease
 All information can and should be used
 The worst case is to ignore totally uncertain
information
 Information integration can be incremental
 Interms of innovation
 Properly weighed innovation
 Not all data at once, not saving all past data
Linear Least Squares
 Second simplest of all formulations
 States (X) are not directly measured
 Observation (B) or measurements relate to state linearly
 Observation are equally reliable
 gathered at the same time
A mn X n1  B m1  A T
nm A mn X n1  A T
nm B m1
1
X n1  (A T
nm A mn ) A T
nm B m1
m: number of constraints (observations)
n: number of parameters (states)
if m<n multiple solutions B m1  A mn X n1  em1
if m=n exact solution
if m>n least-squares solution
e: noise (assume white and Gaussian)
Weighted Least Square
 Slightly more complicated
data are not equally reliable

 gathered at the same time
Wmm A mn X n1  Wmm B m1
Wmm A mn T Wmm A mn X n1  Wmm A mn T Wmm B m1
X n1  ( Wmm A mn  Wmm A mn ) 1 Wmm A mn  Wmm B m1
T T


 A mn W
T T
m m Wmm A mn 1 T
A mn W T mm Wmm B m1
 A 
T 1 T
m n Cmm A mn A mn Cmm B m1
 Lmm B m1
X ˆ  LB
Xn1  (A T nm A mn ) 1 A T nm B m1 (before)
Weighted Least Square (cont.)
 Weights (W) do not appear directly but only
indirectly in C
 What are the right choice of weights?
 It can be shown that the right weights are
inversely proportional to the standard
deviation in the scalar case and covariance in
the vector case
 Kind of make sense - the larger the
uncertainty the less you will trust the data
BLUE
(best linear unbiased estimator)
 C=V-1givesBLUE (V: variance of “noise” in
the measurements)
 Matrix operator is certainly linear
 Unbiased means that expected error is neither
positive or negative E (X  Xˆ)0
 Or an unbiased estimator must be such as it is the
left inverse of A
ˆ )  E ( X  LB )  E ( X  LAX  Le )  E[( I  LA ) X]  0
E (X  X
 I  LA

 Non-square matrices can have multiple left inverse


Proof of Optimality
 Unbiased – all unbiased operators are similar
and satisfy
L  A V A  A V  A V ( A ) A V  A
T 1 1 T 1 1 T 1 T 1 1
o
 Optimality
e  B  AXˆ  B  A( X ˆ  X  X)  B  A ( X
ˆ  X)  AX  A ( X  X
ˆ )  (X  X
ˆ )  A 1e  Le
ˆ )( X  X
P  E[( X  X ˆ )T ]
 E[Le (Le )T ]  LE[eeT ]LT  LVLT
L  L o  (L  L o )
P  (L o  (L  L o ))V (L o  (L  L o ))T
0 0T
 L o VL o  (L  L o )VL  L V (L  L )  (L  L ) V (L  L o )T
T T
o o o o

(L  L o )VL o  (L  L o )V[( A T V 1A ) 1 A T V 1 ]T


T

 (L  L o )V (V 1 )T A[( A T V 1A ) 1 ]T
 (L  L o )VV 1A ( A T V 1A ) 1
 (L  L o ) A ( A T V 1A ) 1  (LA  L o A )( A T V 1A ) 1
 (I  I )( A T V 1A ) 1  0( A T V 1A ) 1  0
Final Equations

xˆ  Lo B  A V A T 1
1
A TV 1B

ˆ )( X  X
P  E[( X  X ˆ )T ]  E[( X  LB )( X  LB )T ]
 E[( X  L o AX  L o e' )( X  L o AX  L oe' )T ]
 E[L oe' (L o e' )T ]  L o E[e' e'T ]L o  L o VL o
T T

 ( A T V 1A ) 1 A T V 1V[( A T V 1A) 1 A T V 1 ]T


 ( A T V 1A ) 1 A T V 1VV 1A ( A T V 1A ) 1
 ( A T V 1A ) 1 A T V 1A ( A T V 1A ) 1
 ( A T V 1A ) 1
Recursive Least Squares
 More complicated
 data are not equally reliable (the same reliability
is a special case)
 gathered not at the same time
 But for the same state

 How can we build estimates recursively


without recomputing everything from scratch?

X o  A o Vo A o
T 1
 1 1
A o Vo B o  Po A o Vo B o
T T 1

 A 
T 1 1
Po o Vo A o
V 0
V o If noise is uncorrelated over time
0 V1 
T 1
1  A  V 0  A o  1 1 1 1
  o  o    
T T T
P1    A o Vo A o A 1 V1 A 1 P0 A 1 V1 A1
 A1   0 V1   A 1 
T
Ao  1  o 
B 1 1
X1  P1   V    P1 (A o Vo B o  A 1 V1 B 1 )
T T

 A1  B1 
1 1 1
 P1 (Po X o  A 1 V1 B 1 )  X o  Po A o Vo B o
T T

1 1 1 1 1 1
 P1 (P1 X o  A 1 V1 A 1 X o  A 1 V1 B 1 )  P0  A 1 V1 A 1
T T T
 P1
1
 X o  P1 A 1 V1 (B1  A 1 X o )
T

gain innovation
Final Equations

1 1 1
 Pi 1  A i Vi A i
T
Pi
1
X i  X i 1  Pi A i Vi (B i  A i X i 1 )
T
Dynamic States
 State evolves over time
 Two mechanisms
 Observation: noise white and Gaussian
 State propagation: noise white and Gaussian

A i X i  Bi A i X i  e1  B i
Fi X i  X i 1 Fi X i  e 2  X i 1
 A0  B 0   A0  B 0 
 F  X 0  cF  X 0
I    cI  0   
 0  0   0
 
  
 X 1   B1   A1  X1   B1 
   
A1
     
 cF1 cI   X 2   0 
  F1 I  X 2   0  
  A 2  B 2 
A 2  B 2 
Dynamic States
 Each time instance
 Add one column (xi)
 Add one row Axi=Bi

 Solution
 Gauss said least square
 Kalman said recursive
 Kalman wins
 Do remember that x_o, x_1, x_2, etc are affected by
new data b_2
 x_o and x_1 given b_0, b_1, b_2 a smoothing problem
 x_2 given b_o, b_1, b_2 a filtering problem
Kalman’s Iterative Formulation
 Tounderstand it, you actually need to
remember just two things
 Rule 1: Linear operations on Gaussian random
variables remain Gaussian
 Rule 2: Linear combinations of jointly Gaussian
random variables are also Gaussian
Y  AX Z  AX  BY  C
m z  Am x  Bm y  C
m y  Am x
Pzz  APxx A T  APxy BT  BPyx A T  BPyy BT
Pyy  APxx A T
X: states
Y: observations
Z: prediction based on states + observations
A, B, C: linear prediction mechanism (from X, Y to Z)
P: covariance matrix
More Rules
 Rule3: Any portion of a Gaussian random
vector is still a Gaussian
X
Z 
Y 
m x 
mz   
m y 
 Pxx Pxy 
Pzz   
P
 yx P yy 
Intuition
 Initialstate estimate is Gaussian
 State propagation mechanism is linear
 Propagation of state over time is corrupted
by Gaussian noise
 Sensor measurement is linearly related to
state
 Sensor measurement also corrupted by
Gaussian noise
 Updated state estimate is again Gaussian
Kalman Filter Properties
 For linear system and white Gaussian
errors, Kalman filter is “best” estimate
based on all previous measurements
 For non-linear system optimality is
‘qualified’ (EKF, SKF, etc.)
 Doesn’t need to store all previous
measurements and reprocess all data each
time step
Graphic Illustration
 When noise is white and uncorrelated
 Starting out as a Gaussian process the
evolution will stay a Gaussian process

  FX  BU  Gw
X ˆ (t  )
X
t  ti t  ti 1
i

ˆ (t  )
X ˆ (t  ) X
X ˆ (t  )
i i 1 i 1

Z(ti )  H(ti ) X(ti )  v(ti )


Math Details
 IfGaussian assumption is assumed, all we
need to derive are the mechanisms for
propagating mean and variance Using the
now familiar update equation of
 New = old + gain * innovation
 Goal: determine the right gain expression

 
Xi  X i 1  Ki (Zi  Hi X )i
Starting Condition

X i 1  Φi X i  w i
z i  Hi X i  v i
Qi i j
E (w i w )  
T

i  j
j
0
R i i j
E (v i v )  
T

i  j
j
0
E (w i vTj )  0
State Propagation

ˆ Φ X
X ˆ
i 1 i i

 ˆ  )(X   X
Pi1  E (X i 1  X i 1 i 1
ˆ  )T
i 1


 E (Φi X i  w i  Φi Xˆ  )(Φ X   w  Φ X
i i i i i
ˆ  )T
i 

 E (Φ ( X   X
i
ˆ  )  w )(Φ (X   X
i i i
ˆ  )  w )T
i i i i 
 E Φ (X
i

i
ˆ  )(X   X
X i i i i 
ˆ  )T ΦT  E (w wT )
i j

 Φi Pi ΦTi  Qi
State Update

ˆ  ˆ  
X i 1  X i 1  Ki 1 (Zi  Hi 1X i 1 )
  1
Ki 1  P H i 1
T
i 1 (H P H
i 1 i 1
T
i 1  R i 1 )
 
Pi 1  (I  Ki 1Hi 1 )P i 1
Conceptual Overview

y
 Lost on the 1-dimensional line (imagine that you are
guessing your position by looking at the stars using
sextant)
 Position – y(t)
 Assume Gaussian distributed measurements
29
Conceptual Overview
0.16

0.14

State space – position 0.12

Measurement - 0.1
position
0.08

0.06

0.04

0.02

0
0 10 20 30 40 50 60 70 80 90 100

• Sextant Measurement at t1: Mean = z1 and Variance = z1 Sextant is


• Optimal estimate of position is: ŷ(t1) = z1 not perfect
• Variance of error in estimate: 2x (t1) = 2z1
• Boat in same position at time t2 - Predicted position is z1
30
Conceptual Overview
0.16

0.14

prediction ŷ-(t2) 0.12


State (by looking Measurement
at the stars at t2) 0.1 usign GPS z(t2)

0.08

0.06

0.04

0.02

0
0 10 20 30 40 50 60 70 80 90 100

• So we have the prediction ŷ-(t2)


• GPS Measurement at t2: Mean = z2 and Variance = z2
• Need to correct the prediction by Sextant due to measurement to
get ŷ(t2)
• Closer to more trusted measurement – should we do linear
interpolation? 31
Conceptual Overview
prediction ŷ-(t2)
0.16

0.14 corrected optimal Kalman filter helps


estimate ŷ(t2)
0.12
you fuse
0.1
measurement and
prediction on the
0.08 measurement
z(t2) basis of how much
0.06
you trust each
0.04

0.02
(I would trust the
0
0 10 20 30 40 50 60 70 80 90 100 GPS more than the
sextant)
• Corrected mean is the new optimal estimate of position (basically
you’ve ‘updated’ the predicted position by Sextant using GPS
• New variance is smaller than either of the previous two variances
32
More Example
 Suppose you have a hydrologic model that predicts river
water level every hour (using the usual inputs).
 You know that your model is not perfect and you don’t
trust it 100%. So you want to send someone to check the
river level in person.
 However, the river level can only be checked once a day
around noon and not every hour.
 Furthermore, the person who measures the river level can
not be trusted 100% either.
 So how do you combine both outputs of river level (from
model and from measurement) so that you get a ‘fused’
and better estimate? – Kalman filtering
Graphically speaking

34
Navigation using PF
 Autonomous Land Vehicle (ALV), Google’s
Self-Driving Car, etc.
 One important requirement: track the
position of the vehicle
 Kalman Filter, loop of
 (Re)initialization
 Prediction
 Observation
 Correction
Interesting YouTube Videos
 Introduction to Autonomous Vehicle
 Introduction to Robot Localization
 Introduction to Particle Filters
 Example of Probabilistic Localization
 Example of Probabilistic Localization Using
Particle Filters
 Monte Carlo Localization Formulation for Vehicle
Localization
 Particle Filters Algorithms
Navigation
 Hypothesis and verification
 Classic Approach like Kalman Filter maintains
a single hypothesis
 Newer approach like particle filter maintains
multiple hypotheses (Monte Carlo sampling of
the state space)
Single Hypothesis
 Ifthe “distraction” – noise – is white and
Gaussian
 State-space probability profile remains
Gaussian (a single dominant mode)
 Evolving and tracking the mean, not a
whole distribution
Multi-Hypotheses
 The distribution can have multiple modes
 Sample the probability distribution with
“importance” rating
 Evolve the whole distribution, instead of
just the mean
Key – Baeys Rule
p(o, si ) p (o | si ) P( si )
P( si | o)    p(o | si ) P( si )
p (o) p (o )
s : state
o : observatio n

 In the day time, some animal runs in front of


you on the bike path, you know exactly what it
is (p(o|si) is sufficient)
 In the night time, some animal runs in front of
you on the bike path, you can hardly distinguish
the shape (p(o|si) is low for all cases, but you
know it is probably a squirrel, not a lion
because of p(si))
Initialization: before observation and measurement

Observation: after seeing a door

P(s): probability of state


P(o|s): probably of observation given current state
Prediction : internal mechanism saying that robot moves right

Correction : prediction is weighed by confirmation with observation


Particles + weights
controls
measurements

Total weights
new particles
+ weights

You might also like