mati ngth e state. ▫. The state is estimated from the updated distribution. ❑ ......
nun ifor m dist ribu tion. ▫. Pred icte d st ate dist ribu tion for the n ext ti me i nsta.
11-755 Machine Learning for Signal Processing
Prediction and Estimation, Part II
Recap: An automotive example
Determine automatically, by only listening to a running automobile, if it is:
Class 18. 1 Nov 2012
11-755/18797
1 Nov 2012
1
Idling; or Travelling at constant velocity; or Accelerating; or Decelerating
Assume (for illustration) that we only record energy level (SPL) in the sound
0.33
The Model!
The SPL is measured once per second
1 Nov 2012
11-755/18797
0.5
Accelerating state
0.33
0.25 0.33 0.33 0.25
P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1)
0.33
Cruising state
0.5
0.5
I
A
I
0
C
1/3 1/3
0 1/3 1/3
PREDICT D
0 1/3 1/3
0.25 0.25 0.25 0.25
P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)
UPDATE
At T=0 the predicted state distribution is the initial state probability At each time T, the current estimate of the distribution over states considers all observations x0 ... xT
3
11-755/18797
Update the distribution of the state at T after observing xT
Predict the distribution of the state at T
65
0
A C D
The state-space model Assuming all transitions from a state are equally probable
Ot-2
2
Overall procedure
P(x|accel)
T=T+1
70 P(x|idle)
0.5 Idling state 45
0.25 0.25
0.33 Decelerating state 60
1 Nov 2012
Prediction with HMMs
Ot-3
A natural outcome of the Markov nature of the model
The prediction+update is identical to the forward computation for HMMs to within a normalizing constant
1 Nov 2012
4
11-755/18797
Prediction with HMMs
Ot-1
Ot
Ot-3
At t, you have some beliefs for the states
1 Nov 2012
5
11-755/18797
Ot-2
Ot-1
Ot
At t, you have some beliefs for the states You predict the beliefs for the state at t+1
1 Nov 2012
6
11-755/18797
Prediction with HMMs
Estimating the state T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1)
Ot-3
Ot-2
Ot-1
Ot
Ot+1
At t, you have some beliefs for the states You predict the beliefs for the state at t+1 And update these after observing
Ot+1
1 Nov 2012
7
11-755/18797
P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)
Estimate(ST)
Estimate(ST) = argmax STP(ST | x0:T)
Update the distribution of the state at T after observing xT
Predict the distribution of the state at T
The state is estimated from the updated distribution
The updated distribution is propagated into time, not the state
1 Nov 2012
8
11-755/18797
Continuous state system
Predicting the next observation T=T+1 P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1)
st f (st 1 , t )
P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)
ot g (st , t )
Update the distribution of the state at T after observing
Predict the distribution of the state at T
xT
Predict P(xT|x0:T-1)
Predict xT
P(xT|x0:T-1) = SST P(xT|ST) P(ST|x0:T-1)
2
1
t
t 1
The state is the position of navlab or the star
Sensor readings (for navlab) or recorded image (for the telescope)
ot Bt st t
P( )
P( )
1 (2 ) d | |
1 (2 ) d | |
exp 0.5 m 1 m
T
exp 0.5 m 1 m T
A linear state dynamics equation
P(s
The observations are dependent on the state and are the only way of knowing about the state
9
st At st 1 t
s f (s , ) t 1
t
ot g (st , t )
P( st | O 0:t -1 )
The state is a continuous valued parameter that is not directly seen
The probability distribution for the observations at the next time is a mixture:
The actual observation can be predicted from P(x |x0:T-1) 11-755/18797
1 Nov 2012T
Special case: Linear Gaussian model
Discrete vs. Continuous State Systems
3
0
Prediction at time t:
P( st | O0:t -1 ) P(st 1 | O0:t -1 ) P(st | st 1 ) st 1
Probability of state driving term is Gaussian Sometimes viewed as a driving term m and additive zero mean noise
| O 0:t -1 ) P( st | st 1 )dst 1
Update after Ot:
P(st | O0:t ) CP(st | O0:t -1 ) P(Ot | st ) P(st | O0:t ) CP(st | O0:t -1 ) P(Ot | st )
A linear observation equation
Probability of observation noise is Gaussian
At, Bt and Gaussian parameters assumed known
May vary with time
1 Nov 2012
12
11-755/18797
The Linear Gaussian model (KF) P (s) Gaussian(s; s , R) 0
P(st | st 1 ) Gaussian(st ; m At st 1 , ) P(ot | st ) Gaussian(ot ; Bt st , )
P(st | o0:t 1 ) Gaussians; st , Rt
t
0:t
T t
t
t
T t
1
t
t
s As t
t t 1
t
The Kalman filter
Prediction
ot Bt st t
t
t 1
Update
t
1
Rˆt I Kt Bt Rt 13
11-755/18797
st At st 1 t
1 Nov 2012
14
11-755/18797
Problems
P(Ot|st)
Transition prob.
ot g (st , t )
State output prob
C
K t Rt BtT Bt Rt BtT
sˆt st Kt ot Bt st
sˆt st K t o Bt st Rˆt I K t Bt Rt
Iterative prediction and update
a priori
P(s)
P(s0)
P(s0)
st At 1sˆt m
Rt At 1Rˆt 1 AtT1
st m At sˆt 1
R A Rˆ AT
t
P(s | o ) Gaussian s; sˆ , Rˆ t
K R B B R B t
1 Nov 2012
Linear Gaussian Model
st f (st 1 , t )
ot Bt st t P(st|st-1)
P(s)
O0)
P(s0|
P(O0|
s0)
f() and/or g() may not be nice linear functions
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
Conventional Kalman update rules are no longer valid
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P( s2 | O 0:1 )
P(s
1
| O 0:1 ) P( s2 | s1 )ds1
and/or may not be Gaussian
P(s2| O0:2) C P(s2| O0:1) P(O2| s2)
Gaussian based update rules no longer valid
All distributions remain Gaussian 1 Nov 2012
t
st f (st 1 , t )
Gaussian based update rules no longer valid
16
11-755/18797
The problem with non-linear functions
Problems s f (s , ) t 1
t
ot g (st , t )
ot g (st , t )
f() and/or g() may not be nice linear functions
Conventional Kalman update rules are no longer valid
17
11-755/18797
P( st | o 0:t -1 )
P(s
t 1
| o 0:t -1 ) P( st | st 1 )dst 1
P(st | o0:t ) CP(st | o0:t -1 ) P(ot | st )
Estimation requires knowledge of P(o|s)
and/or may not be Gaussian
Difficult to estimate for nonlinear g() Even if it can be estimated, may not be tractable with update loop
Estimation also requires knowledge of P(st|st-1)
1 Nov 2012
Difficult for nonlinear f() May not be amenable to closed form integration
1 Nov 2012
18
11-755/18797
The problem with nonlinearity ot g (st , t )
Example: a simple nonlinearity o log(1 exp( s))
The PDF may not have a closed form P(ot | st )
:g ( st , ) ot
| J g ( st , ) (ot ) |
P ( )
| J g ( st , ) (ot ) |
ot (1) ot (1) ... (1) (n) ot (n) ot (n) (1) (n)
P(o|s) = ?
0 ; s
Assume is Gaussian P() = Gaussian(; m, )
Even if a closed form exists initially, it will typically become intractable very quickly
1 Nov 2012
19
11-755/18797
Example: a simple nonlinearity
1 Nov 2012
20
11-755/18797
Example: At T=0.
o log(1 exp( s))
o log(1 exp( s))
P(o|s) = ?
0 ; s
; s=0
P( ) Gaussian( ; m , )
P(o | s) Gaussian(o; m log(1 exp( s)), )
Assume initial probability P(s) is Gaussian P(s0 ) P0 (s) Gaussian(s; s , R) Update
P(s0 | o0 ) CP(o0 | s0 ) P(s0 )
P(s | o ) CGaussian(o; m log(1 exp( s )), )Gaussian(s ; s , R) 0
1 Nov 2012
21
11-755/18797
UPDATE: At T=0.
0
1 Nov 2012
0
0
22
11-755/18797
Prediction for T = 1
m 0; s 0 1; R 1
o log(1 exp( s))
st st 1
P( ) Gaussian ( ;0, )
Trivial, linear state transition equation P(st | st 1 ) Gaussian(st ; st 1 , )
; s=0
P(s0 | o0 ) CGaussian(o; m log(1 exp( s0 )), )Gaussian(s0 ; s , R) 0.5( m log(1 exp( s0 )) o)T 1 ( m log(1 exp( s0 )) o) P( s | o ) C exp 0.5( s0 s )T R 1 ( s0 s ) 0
Prediction P( s1 | o 0 ) P(s0 | o 0 ) P( s1 | s0 )ds0
0.5( m log( 1 exp( s0 )) o)T 1 ( m log( 1 exp( s0 )) o) T 1 P( s1 | o 0 ) C exp exp s1 s0 s1 s0 ds0 0.5( s0 s )T R 1 ( s0 s )
0
= Not Gaussian
1 Nov 2012
23
11-755/18797
= intractable
1 Nov 2012
24
11-755/18797
Update at T=1 and later
Update at T=1 P(st | o0:t ) CP(st | o0:t -1 ) P(ot | st )
The State prediction Equation st f (st 1 , t )
Intractable
Prediction for T=2
P( st | o 0:t -1 )
P(s
t 1
| o 0:t -1 ) P( st | st 1 )dst 1
Similar problems arise for the state prediction equation P(st|st-1) may not have a closed form Even if it does, it may become intractable within the prediction and update equations
Intractable
1 Nov 2012
11-755/18797
25
Simplifying the problem: Linearize
Particularly the prediction equation, which includes an integration operation
1 Nov 2012
11-755/18797
26
Simplifying the problem: Linearize
o log(1 exp( s))
o log(1 exp( s))
s
s
The tangent at any point is a good local approximation if the function is sufficiently smooth
1 Nov 2012
27
11-755/18797
Simplifying the problem: Linearize
The tangent at any point is a good local approximation if the function is sufficiently smooth
1 Nov 2012
28
11-755/18797
Simplifying the problem: Linearize
o log(1 exp( s))
s
s
The tangent at any point is a good local approximation if the function is sufficiently smooth
1 Nov 2012
29
11-755/18797
The tangent at any point is a good local approximation if the function is sufficiently smooth
1 Nov 2012
30
11-755/18797
Linearizing the observation function
Most probability is in the low-error region
P(s) Gaussian(s , R) o g (s)
o g (s ) J g (s )(s s )
Simple first-order Taylor series expansion
J() is the Jacobian matrix
P(s) Gaussian(s , R)
Simply a determinant for scalar state
Expansion around a priori (or predicted) mean of the state
s
P(s) is small approximation error is large
1 Nov 2012
11-755/18797
31
Most of the probability mass of s is in low-error regions
1 Nov 2012
11-755/18797
32
UPDATE.
Linearizing the observation function
o g (s ) J g (s )(s s )
P(s) Gaussian(s , R) o g (s)
P(o | s) Gaussian(o; g (s ) J g (s )(s s ), )
o g (s ) J g (s )(s s )
P(s) Gaussian(s; s , R) P(s | o) CP(o | s) P(s)
Observation PDF is Gaussian
P(s | o) CGaussian(o; g (s ) J g (s )(s s ), )Gaussian(s; s , R)
P( ) Gaussian( ;0, )
Gaussian!!
33
11-755/18797
P(s | o) Gaussian s; s RJ g (s )T ( J g (s ) RJ g (s )T ) 1 o g (s ), I RJ g (s )T ( J g (s ) RJ g (s )T ) 1 J g (s ) R
P(o | s) Gaussian(o; g (s ) J (s )(s s ), ) g
1 Nov 2012
Prediction? st f (st 1 )
Note: This is actually only an approximation
1 Nov 2012
34
11-755/18797
Prediction P( ) Gaussian ( ;0, )
st f (st 1 )
Solution: Linearize
Again, direct use of f() can be disastrous
st f (sˆt 1 ) J f (sˆt 1 )(st 1 sˆt 1 )
P(st 1 | o0:t 1 ) Gaussian(st 1; sˆt 1 , Rˆt 1 )
P( ) Gaussian( ;0, )
The state transition probability is now: P(st | st 1 ) Gaussian(st ; f (sˆt 1 ) J f (sˆt 1 )(st 1 sˆt 1 ), )
P(st 1 | o0:t 1 ) Gaussian(st 1; sˆt 1 , Rˆt 1 )
st f (st 1 )
st f (sˆt 1 ) J f (sˆt 1 )(st 1 sˆt 1 )
The predicted state probability is:
Linearize around the mean of the updated distribution of s at t-1
1 Nov 2012
Which should be Gaussian
P( st | o 0:t -1 )
P(s
t 1
| o 0:t -1 ) P( st | st 1 )dst 1
35
11-755/18797
1 Nov 2012
36
11-755/18797
Prediction P(st 1 | o0:t 1 ) Gaussian(st 1; sˆt 1 , Rˆt 1 )
P(s | s ) Gaussian(s ; f (sˆt 1 ) J f (sˆt 1 )(st 1 sˆt 1 ), ) t
t 1
t
P( st | o 0:t -1 )
P(s
t 1
| o 0:t -1 ) P( st | st 1 )dst 1
The linearized prediction/update ot g (st ) st f (st 1 )
P( st | o 0:t -1 ) Gaussian( st 1; sˆt 1 , Rˆt 1 )Gaussian( st ; f ( sˆt 1 ) J f ( sˆt 1 )(st 1 sˆt 1 ), )dst 1
The predicted state probability is:
P(st | o0:t -1 ) Gaussian st ; fˆ (st 1 ), J f (sˆt 1 ) Rˆt 1 J f (sˆt 1 )T
Given: two nonlinear functions for state update and observation generation Note: the equations are deterministic nonlinear functions of the state variable
Gaussian!! This is actually only 11-755/18797 an approximation
1 Nov2012
37
They are linear functions of the noise! Nonlinear functions of stochastic noise are slightly more complicated to handle
1 Nov 2012
Linearized Prediction and Update
Prediction for time t
P(st | o0:t -1 ) Gaussianst ; st , Rt
38
At J f ( sˆt 1 ) Bt J g ( st )
Rt At Rˆt 1 At T
st f (sˆt 1 )
Update at time t
Update at time t
P(st | o0:t ) Gaussian st ; sˆt , Rˆt
T T sˆ st Rt Bt ( Bt Rt Bt ) 1 ot g ( st ) T T Rˆt I Rt Bt ( Bt Rt Bt ) 1 Bt Rt t
39
11-755/18797
Prediction
11-755/18797
Linearized Prediction and Update
Prediction for time t
P(st | o0:t -1 ) Gaussianst ; st , Rt
Rt J f (sˆt 1 ) Rˆt 1 J f (sˆt 1 )T
st f (sˆt 1 )
P(st | o0:t ) Gaussian st ; sˆt , Rˆt
sˆt st Rt J g ( st )T ( J g ( st ) Rt J g ( st )T ) 1 ot g ( st ) Rˆt I Rt J g ( st )T ( J g ( st ) Rt J g ( st )T ) 1 J g ( st ) Rt 1 Nov 2012
1 Nov 2012
The Extended Kalman filter
Update
40
11-755/18797
The Kalman filter At J f ( sˆt 1 )
st f (sˆt 1 )
Prediction
st At sˆt 1 m
Bt J g ( st )
Rt At Rˆt 1 AtT
Rt At Rˆt 1 AtT
K t Rt BtT Bt Rt BtT
Update
1
K t Rt BtT Bt Rt BtT
Rˆt I Kt Bt Rt
Rˆt I Kt Bt Rt
sˆt st Kt ot Bt st
sˆt st Kt ot g (st )
1 Nov 2012
41
11-755/18797
1 Nov 2012
1
42
11-755/18797
EKFs
EKFs have limitations
EKFs are probably the most commonly used algorithm for tracking and prediction
Most systems are nonlinear Specifically, the relationship between state and observation is usually nonlinear The approach can be extended to include nonlinear functions of noise as well
The term “Kalman filter” often simply refers to an extended Kalman filter in most contexts.
43
Unstable
The estimate is often biased
11-755/18797
If the non-linearity changes too quickly with s, the linear approximation is invalid
But..
1 Nov 2012
A different problem: Non-Gaussian PDFs ot g (st )
The true function lies entirely on one side of the approximation
Various extensions have been proposed:
Invariant extended Kalman filters (IEKF) Unscented Kalman filters (UKF)
1 Nov 2012
11-755/18797
44
Linear Gaussian Model
st f (st 1 ) P(st|st-1)
P(s)
We have assumed so far that:
a priori
P(Ot|st)
Transition prob.
P0(s) is Gaussian or can be approximated as Gaussian P() is Gaussian P() is Gaussian
State output prob P(s0) P(s) P(s0| O0) C P(s0) P(O0| s0)
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
This has a happy consequence: All distributions remain Gaussian
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P( s2 | O 0:1 )
P(s
1
| O 0:1 ) P( s2 | s1 )ds1
But when any of these are not Gaussian, the results are not so happy
P(s2| O0:2) C P(s2| O0:1) P(O2| s2) All distributions remain Gaussian
1 Nov 2012
11-755/18797
45
A different problem: Non-Gaussian PDFs ot g (st )
A simple case
st f (st 1 )
1
o Bs t
P( ) wi Gaussian( ; mi , i )
t
i 0
We have assumed so far that:
P0(s) is Gaussian or can be approximated as Gaussian P() is Gaussian P() is Gaussian
P() is a mixture of only two Gaussians
o is a linear function of s
Non-linear functions would be linearized anyway
This has a happy consequence: All distributions remain Gaussian
P(o|s) is also a Gaussian mixture!
1
P(ot | st ) P( ot Bst ) wi Gaussian(o; mi Bst , i ) i 0
But when any of these are not Gaussian, the results are not so happy
1 Nov 2012
11-755/18797
47
P( )
1 Nov 2012
P(ot | st )
11-755/18797
48
When distributions are not Gaussian P(s) =
a priori
P(st|st-1) =
P(Ot|st) =
Transition prob.
When distributions are not Gaussian P(s) =
a priori
State output prob
P(st|st-1) =
P(Ot|st) =
Transition prob.
P(s0) P(s)
State output prob P(s0) P(s) P(s0| O0) C P(s0) P(O0| s0)
When distributions are not Gaussian P(st|st-1) =
P(s) =
a priori
P(Ot|st) =
Transition prob.
When distributions are not Gaussian P(st|st-1) =
P(s) =
a priori
State output prob
P(Ot|st) =
Transition prob.
P(s0| O0) C P(s0) P(O0| s0)
P(s0| O0) C P(s0) P(O0| s0)
P(s0) P(s)
P(s0) P(s)
P(st|st-1)
=
State output prob
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
When distributions are not Gaussian P(s) =
a priori
P(Ot|st)
Transition prob.
=
P(s1| O0:1) C P(s1| O0) P(O1| s0)
When distributions are not Gaussian P(s) =
P(st|st-1)
a priori
State output prob
=
P(Ot|st)
Transition prob.
= State output prob
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P(s0| O0) C P(s0) P(O0| s0)
P(s0| O0) C P(s0) P(O0| s0)
P(s0) P(s)
P(s0) P(s)
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P( s | O 2
0:1
) P(s
1
P( s2 | O 0:1 )
| O ) P( s | s )ds 0:1
2
1
1
P(s
1
| O 0:1 ) P( s2 | s1 )ds1
P(s2| O0:2) C P(s2| O0:1) P(O2| s2)
When P(Ot|st) has more than one Gaussian, after only a few time steps…
When distributions are not Gaussian P( s | O ) t
0:t
Related Topic: How to sample from a Distribution?
“Sampling from a Distribution P(x; G) with parameters G” Generate random numbers such that
The distribution of a large number of generated numbers is P(x; G) The parameters of the distribution are G
Many algorithms to generate RVs from a variety of distributions
Generation from a uniform distribution is well studied Uniform RVs used to sample from multinomial distributions Other distributions: Most commonly, transform a uniform RV to the desired distribution
We have too many Gaussians for comfort.. 1 Nov 2012
Sampling from a multinomial
11-755/18797
56
Sampling a multinomial 1.0
Given a multinomial over N symbols, with probability of ith symbol = P(i)
Randomly generate symbols from this distribution
Can be done by sampling from a uniform distribution
P(1)
11-755/18797
P(2)
1.0
57
P(1)+P(2)+P(3)
P(3)
P(2)
P(3)
P(N)
Segment a range (0,1) according to the probabilities P(i)
The P(i) terms will sum to 1.0
Randomly generate a number from a uniform distribution
1 Nov 2012
Matlab: “rand”. Generates a number between 0 and 1 with uniform probability
If the number falls in the ith segment, select the ith symbol
1 Nov 2012
58
11-755/18797
Related Topic: Sampling from a Gaussian
Sampling a multinomial P(1)+P(2)
Many algorithms
P(1)
P(N)
Segment a range (0,1) according to the probabilities P(i)
The P(i) terms will sum to 1.0
Simplest: add many samples from a uniform RV The sum of 12 uniform RVs (uniform in (0,1)) is approximately Gaussian with mean 6 and variance 1 For scalar Gaussian, mean m, std dev s: 12
x ri 6
Randomly generate a number from a uniform distribution
i 1
Matlab: “rand”. Generates a number between 0 and 1 with uniform probability
Matlab : x = m + randn* s
If the number falls in the ith segment, select the ith symbol
1 Nov 2012
59
11-755/18797
“randn” draws from a Gaussian of mean=0, variance=1
1 Nov 2012
60
11-755/18797
Related Topic: Sampling from a Gaussian
Multivariate (d-dimensional) Gaussian with mean m and covariance
Compute eigen value matrix L and eigenvector matrix E for
Sampling from a Gaussian Mixture
w Gaussian( X ; m , ) i
61
i
Select a Gaussian by sampling the multinomial distribution of weights:
j ~ multinomial(w1, w2, …)
x d]
Multiply X by the square root of L and E, add m
11-755/18797
Sample from the selected Gaussian Gaussian( X ; m j , j )
1 Nov 2012
P(Ot|st) =
Transition prob.
i
i
= E L ET
Generate d 0-mean unit-variance numbers x1..xd Arrange them in a vector: .. T X= [x1
Y = m + E sqrt(L) X 1 Nov 2012
62
11-755/18797
The problem of the exploding distribution
Returning to our problem:
P(st|st-1) =
P(s) =
a priori
State output prob
P(s0) P(s)
The complexity of the distribution increases exponentially with time This is a consequence of having a continuous state space
Only Gaussian PDFs propagate without increase of complexity
P(s0| O0) C P(s0) P(O0| s0)
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
Discrete-state systems do not have this problem
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P( s2 | O 0:1 )
P(s
1
| O 0:1 ) P( s2 | s1 )ds1
The number of states in an HMM stays fixed However, discrete state spaces are too coarse
P(s2| O0:2) C P(s2| O0:1) P(O2| s2)
Solution: Combine the two concepts
When P(Ot|st) has more than one Gaussian, after only a few time steps…
Discretize the state space dynamically
1 Nov 2012
Discrete approximation to a distribution
A PDF can be approximated as a uniform probability distribution over randomly drawn samples
Since each sample represents approximately the same probability mass (1/M if there are M samples)
P( x) 65
11-755/18797
64
11-755/18797
Discrete approximation: Random sampling
A largeenough collection of randomlydrawn samples from a distribution will approximately quantize the space of the random variable into equiprobable regions We have more random samples from highprobability regions and fewer samples from lowprobability reigons
1 Nov 2012
1 M
1 Nov 2012
M 1
( x x ) i
i 0
66
11-755/18797
Note: Properties of a discrete distribution 1 P( x) M
M 1
( x x ) i 0
i
M 1
P( x) P( y | x) P( y | x ) ( x x ) i 0
i
i
Discretizing the state space
The product of a discrete distribution with another distribution is simply a weighted discrete probability M 1
P( x) wi ( x xi ) i 0
i
i 0
At each time, discretize the predicted state space P( st | o0:t )
M 1
P( x) P( y | x)dx w P( y | x )
i
1 M
M 1
(s i 0
t
si )
si are randomly drawn samples from P(st|o0:t)
Propagate the discretized distribution
The integral of the product is a mixture distribution
1 Nov 2012
67
11-755/18797
Particle Filtering P(st|st-1) =
P(s) =
a priori
1 Nov 2012
68
11-755/18797
Particle Filtering P(Ot|st) =
Transition prob.
P(st|st-1) =
P(s) =
a priori
State output prob P(s0)
predict
P(Ot|st) =
Transition prob.
P(s)
State output prob P(s0) P(s)
sample
Assuming that we only generate FOUR samples from the predicted distributions
Assuming that we only generate FOUR samples from the predicted distributions
Particle Filtering P(s) =
P(st|st-1)
a priori
=
Particle Filtering P(Ot|st)
Transition prob.
=
P(s) =
P(st|st-1)
a priori
State output prob
=
P(Ot|st)
Transition prob.
= State output prob
P(s0| O0) C P(s0) P(O0| s0)
P(s0| O0) C P(s0) P(O0| s0)
P(s0) P(s)
P(s0) P(s)
update
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
predict
Assuming that we only generate FOUR samples from the predicted distributions
Assuming that we only generate FOUR samples from the predicted distributions
Particle Filtering P(s) =
a priori
P(st|st-1) =
P(Ot|st) =
Transition prob.
Particle Filtering P(s) =
a priori
State output prob
P(st|st-1) =
P(Ot|st) =
Transition prob.
State output prob
P(s0| O0) C P(s0) P(O0| s0)
P(s0| O0) C P(s0) P(O0| s0)
P(s0) P(s)
P(s0) P(s)
P( s
1
| O ) P(s 0
0
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
| O ) P( s | s )ds 0
1
0
0
sample
P(s1| O0:1) C P(s1| O0) P(O1| s0) update
Assuming that we only generate FOUR samples from the predicted distributions
Assuming that we only generate FOUR samples from the predicted distributions
Particle Filtering P(st|st-1) =
P(s) =
a priori
Particle Filtering P(Ot|st) =
Transition prob.
P(st|st-1) =
P(s) =
a priori
State output prob
P(Ot|st) =
Transition prob.
State output prob
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P(s0| O0) C P(s0) P(O0| s0)
P(s0| O0) C P(s0) P(O0| s0)
P(s0) P(s)
P(s0) P(s)
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P(s1| O0:1) C P(s1| O0) P(O1| s0)
P( s2 | O 0:1 )
P(s
1
P( s2 | O 0:1 )
| O 0:1 ) P( s2 | s1 )ds1 sample
predict
Assuming that we only generate FOUR samples from the predicted distributions
P(st|st-1)
a priori
=
P(s
1
| O 0:1 ) P( s2 | s1 )ds1
Assuming that we only generate FOUR samples from the predicted distributions
Particle Filtering
Particle Filtering
Discretize state space at the prediction step
P(s) =
=
P(Ot|st)
Transition prob.
By sampling the continuous predicted distribution
State output prob P(s0) P(s)
If appropriately sampled, all generated samples may be considered to be equally probable
Sampling results in a discrete uniform distribution
Update step updates the distribution of the quantized state space
P(s0| O0) C P(s0) P(O0| s0)
Results in a discrete nonuniform distribution
P( s1 | O 0 ) P(s0 | O 0 ) P( s1 | s0 )ds0
P(s1| O0:1) C P(s1| O0) P(O1| s0)
Predicted state distribution for the next time instant will again be continuous
Must be discretized again by sampling
P(s
P( s2 | O 0:1 )
1
| O 0:1 ) P( s2 | s1 )ds1
P(s2| O0:2) C P(s2| O0:1) P(O2| s2)
update
Assuming that we only generate FOUR samples from the predicted distributions
At any step, the current state distribution will not have more components than the number of samples generated at the previous sampling step
The complexity of distributions remains constant
1 Nov 2012
78
11-755/18797
Particle Filtering P(s) =
a priori
P(st|st-1) =
P(Ot|st) =
Transition prob.
State output prob
Particle Filtering P ( )
P ( )
st f (st 1 )
ot g (st ) Prediction at time t:
P( st | O 0:t -1 )
sample
P(s
t 1
| O 0:t -1 ) P( st | st 1 )dst 1
predict
Update at time t:
At t = 0, sample the initial state distribution P( s0 | o1 ) P( s0 )
P(st | O0:t ) CP(st | O0:t -1 ) P(Ot | st ) update
Number of mixture components in predicted distribution governed by number of samples in discrete distribution By deriving a small (100-1000) number of samples at each time instant, all distributions are kept manageable
1 M
M 1
(s i 0
0
si0 ) where si0 P0 ( s)
Update the state distribution with the observation
P( s | o t
0:t
M 1
) C P (o g ( s )) ( s i 0
t
t i
t
1 Nov 2012
Particle Filtering
s ) t i
C
1 M 1
P (o g (s )) i 0
11-755/18797
t
t i
80
Particle Filtering P ( )
P ( )
st f (st 1 )
ot g (st )
st f (st 1 ) P ( )
ot g (st ) P ( )
Predict the state distribution at t
M 1
P( st | o0:t 1 ) C P (ot 1 g ( sit 1 )) P ( st f ( sit 1 ))
Predict the state distribution at the next time
i 0
Sample the predicted state distribution at t
M 1
P( st | o0:t 1 ) C P (ot 1 g ( sit 1 )) P ( st f ( sit 1 ))
P( st | o0:t 1 )
i 0
Sample the predicted state distribution P( st | o0:t 1 )
1 M
M 1
(s i 0
t
1 Nov 2012
1 M
M 1
(s
t
i 0
sit ) where sit P( st | o0:t 1 )
Update the state distribution at t
sit ) where sit P( st | o0:t 1 )
M 1
P( st | o0:t ) C P (ot g ( sit )) ( st sit ) 81
11-755/18797
Estimating a state
i 0
1 Nov 2012
C
11-755/18797
1 M 1
P (o g (s )) i 0
t i
t
82
Simulations with a Linear Model
The algorithm gives us a discrete updated distribution over states:
ot st xt
st st 1 t
M 1
P( st | o0:t ) C P (ot g ( sit )) ( st sit )
i 0
t
has a Gaussian distribution with 0 mean, known variance
xt has a mixture Gaussian distribution with known parameters Simulation:
The actual state can be estimated as the mean of this distribution
Generate state sequence
M 1
sˆt C sit P (ot g ( sit ))
ot from st and xt
Generate observation sequence
Alternately, it can be the most likely sample
st from model
Generate sequence of xt from model with one xt term for every st term
i 0
Attempt to estimate
st from ot
sˆt s tj : j arg max i P (ot g (sit )) 1 Nov 2012
83
11-755/18797
Simulation: Synthesizing data Generate state sequence according to: t is Gaussian with mean 0 and variance 10
st st 1 t
Simulation: Synthesizing data st st 1 t Generate state sequence according to: t is Gaussian with mean 0 and variance 10 Generate observation sequence from state sequence according to: ot st xt xt is mixture Gaussian with parameters: Means = [-4, 0, 4, 8, 12, 16, 18, 20] Variances = [10, 10, 10, 10, 10, 10, 10, 10] Mixture weights = [0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125]
1 Nov 2012
85
11-755/18797
Simulation: Synthesizing data
1 Nov 2012
11-755/18797
86
SIMULATION: TIME = 1 predict
PREDICTED STATE DISTRIBUTION AT TIME = 1
Combined figure for more compact representation
1 Nov 2012
87
11-755/18797
SIMULATION: TIME = 1
1 Nov 2012
11-755/18797
sample
89
11-755/18797
88
SIMULATION: TIME = 1 sample
predict
SAMPLED VERSION OF PREDICTED STATE DISTRIBUTION AT TIME = 1
1 Nov 2012
SAMPLED VERSION OF PREDICTED STATE DISTRIBUTION AT TIME = 1
1 Nov 2012
11-755/18797
90
SIMULATION: TIME = 1 sample
update
SIMULATION: TIME = 1 update
update, t