0% found this document useful (0 votes)

2 views78 pages

UKF ppt

The document discusses the state estimation process in systems, particularly focusing on the Unscented Kalman Filter (UKF) for non-linear systems. It details the implementation of UKF in hardware through various design strategies: Serial, Parallel, and Pipeline, each with their own advantages and trade-offs. Additionally, it covers the application of UKF in Simultaneous Localization and Mapping (SLAM) for UAVs, including system and sensor models.

Uploaded by

sri srujan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views78 pages

UKF ppt

Uploaded by

sri srujan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

STATE OF A SYSTEM

 Information relevant to the operation of the

system.
 These can include position and orientation of
space craft, electrical current to the motor, etc.
STATE ESTIMATION

• State estimation is the process of using

parameters to asses the operation of system
based on the information available.
GENERAL SYSTEM EQUATION
• The general system described for discrete time “k” is shown below:-

• Here,
f – process model z – observation vector
h – observation model w – process/control noise
u – control input v – measurement/observation noise
x – state vector

• Famous among the estimations of these kind of systems is the Kalman Filter, and is optimal for linear
systems.
• However, all real-world applications are non-linear in nature.
UNSCENTED KALMAN FILTER
• Instead of solving non-linearity using Jacobian, like
what was done in EKF, UKF takes the current state as
probability distribution with mean and covariance.
• A set of “sigma points” are drawn from the
probability distribution.
• UKF is also not completely accurate. It can still
diverge to an incorrect state when trying to combine
data from multiple sensors
UKF FLOW
The formalisation of UKF of the discrete system mentioned above is as follows:-
• Defining an augmented state vector xa , with length M that concatenates the
process/control noise and measurement noise terms with the state variables as:

• The augmented state vector and associated augmented state covariance, P a, are initialised
with:
•Where is the expected value of the initial (regular) state and P k is the (regular) state
covariance
•The current augmented state and covariance are used to generate the set of sigma points,
X, using:

•Where i refers to the i-th column of the matrix 'square-root'; is a

scaling parameters; determines the spread of sigma points about the mean, and k is a
secondary scaling parameter
•Each sigma point has an associated weight given by:

•Where (m) and (c) denote whether the weight is used for a mean
calculation or a
covariance calculation, and beta is used to incorporate prior
knowledge of the distribution around the mean.
THE PREDICT STEP
• The predict step begins with the sigma points being propagated through the system
model:

• The state and covariance are then predicted as:

• The parameters passed inside the observation function is highly application specific. Which
means, the values keep changing based on the application.
THE UPDATE STEP
• For the update step, the sigma points that were updated in the predict step are
propagated through the observation model:

• The mean and covariance of the observation-transformed sigma points are calculated:
UPDATE CONTD.
• Followed by the cross-covariance:

• And the Kalman gain:

• Finally, the current system state is estimated by:

• Where z dash is the current set of observations and the current covariance is updated with:
HW/SW CODESIGN
• The predict and update states are highly application specific. For example, consider
a 3D rigid-body dynamics model of a multi-rotor micro-UAV which accounts
for gravity and air resistance. Changing to a different system would require the
predict model be updated to reflect the new system.
• The same applies to the update step, and even that changes from application to
application. For example, if the sensors used keep changing, we need to change
the parameters also. This also makes it highly application specific.
• For greater efficiency, all the non-specific application part is implemented on
hardware. For specific application, we use software implementation since it is easy
to develop, and is portable.
• There are three ways of implementing the design on hardware:
1. Serial design
2. Parallel design
3. Pipeline design
SERIAL DESIGN
• The Serial design strategy is to minimise the area and power consumption
as much as possible with the intent to include the design into a greater
SoC.
• As such, the design forgoes one of the main benefits of hardware
implementations: wide parallelism.
• The UKF algorithm can be logically divided into two parts: the predict step
(for the Serial design, we consider sigma points generation as part of the
predict step) and the update step.
• The algorithm must first be initialised with an augmented state estimate
and covariance before any calculations can begin.
STATE MACHINE OF SERIAL DESIGN
• The Serial design IP core has five top-level states: an idle state, two initialisation states and
one state each for the two parts of the UKF.
PREDICT STEP
• The predict step for the Serial design generates the new set of sigma points and calculates
the priori state estimate.
• A block diagram of the predict step architecture is shown below:

• The predict step begins by using the current augmented state vector and covariance to
calculate new sigma points.
• To calculate the new set of sigma points, first the matrix 'square-root' of the current
augmented covariance must be calculated.
TRIANGULAR EQUATION SOLVER
• In addition to the matrix 'square-root', the Choleksy Decomposition is also used in the
Kalman gain calculation which involves a matrix inversion.
MATRIX MULTIPLY-ADD
• The matrix multiply-add data path is a standard element-wise multiplication and
accumulation.
• The element-wise calculation is given by:

• The elements of the matrix to be added, C, can simply be injected into the accumulation
directly, instead of performing an additional matrix addition after a matrix multiplication.
• The data path of the same is shown below:
CALCULATED MEAN AND COVARIANCE
• Calculating the mean and covariance of the transformed sigma points are both very similar,
meaning both can be calculated by the same datapath.
• Calculation of mean from the predict step is given as:

• The calculation of the covariance is given as:

• To avoid the subtraction operation, we can assume Xi= so the covariance reduces to:
UPDATE STEP
• The update step corrects the a priori state estimate with a set of observations to generate the
new state estimate.
• Many of the calculations in the update step are very similar to the predict step. The block
diagram is shown below:
CONTD.
• The update step starts with the copying of the current sensor
observations.
• Similar to the predict step, the observation mean is used to
calculate the update ‘sigma point residuals' (subtract) before
the covariances are calculated.
• The observation covariance is calculated with the update
'sigma point residuals' and the cross covariance is calculated
with both the predict and update 'sigma points residuals‘.
• After the current state estimate and covariance are
calculated, they are, like the predict step, written back to the
processor memory buffer as well as, respectively, the
augmented state and covariance local memory blocks.
PARALLEL DESIGN
• The Parallel design reintroduces the main benefit of hardware implementations: wide
parallelism.
• This design strategy will use much more resources than the Serial design, but also increases
performance.
• The design does so by encapsulating certain parts of the major datapaths into a sub-module
called a processing element (PE), then uses multiple instances of these PEs in parallel, allowing
multiple elements of an algorithm to be calculated at once.

• A memory 'prefetch', which fetches data from a serial memory block and places it into parallel
memory blocks.
• A memory ‘serialiser’ which collects calculations from a parallel scheme and outputs in serial
fashion.
TOP LEVEL DESIGN
• Instead of having digital control lines, the control register has been incorporated into the
memory map of the memory buffer.

• Instead of a simple FIFO, the memory buffer for the Parallel design has a proper internal
memory map to ensure the control information and data is coherent between the processor
and the IP core.
STATE MACHINE OF PARALLEL DESIGN
• The IP core is controlled by a state machine which has 5 states: idle, init, sig_gen, predict and
update.

• During the init state, the processor initialises the internal memory of the IP core with initial
values for the augmented state and covariance.
CONTD.
• The sig_gen state handles the calculation of the latest set of
sigma points.
• After the new sigma points have been propagated through
the predict model, the predict state uses the transformed
sigma points to calculate the a priori state and covariance.
• Similarly, the update state uses the update transformed sigma
points to calculate the current state and covariance.
• The predict and update steps may be performed together if
valid observations are available, or independently as required.
SIGMA POINTS GENERATION
• The functionality of the sig_gen is taking the matrix 'square-root' of the augmented
covariance, then multiplying the result by a weighting matrix, before adding the augmented
state column-wise.

• The main difference from the Serial design is the need to introduce a memory prefetch as
well as a memory serialiser module.
TRISOLVE
• For the Parallel design, the fused multiply-add module and feedback FIFO has been
encapsulated to form a processing element which can be instantiated multiple times in
parallel.
MATRIX MULTIPLY-ADD
• The entire datapath from the Serial design has been enclosed as one processing element and
additional PEs are added to handle calculations in parallel.
PREDICT STEP
• The architecture for the predict step is shown below:

• The processor may initiate a predict step once it has placed valid transformed sigma points into
the memory buffer.
• The sigma point residuals are once again calculated first before the covariance calculation.
• Memory serialisers are necessary after the mean and covariance calculation as the memory
buffer is a serial memory.
UPDATE STEP
• The update step for the Parallel design is very similar to the update step in the Serial design.

• First, the prefetch module converts the transformed sigma points into a parallel memory
structure.
• The mean and 'sigma point residuals' are calculated, then used to calculate the observation
covariance.
• The update 'sigma point residuals' are also combined with the predict 'sigma point residuals',
which were calculated during the predict step, to calculate the cross covariance between the two
system models.
PIPELINE DESIGN
• The Pipeline design reinforces the main benefit of hardware implementations wide
parallelism with a 'high-level' pipeline to increase performance even further.
• This design strategy uses the most resources but also has the highest performance in terms
of algorithm throughput.
• It has three major steps: sig_gen, predict, update.
• The top-level block diagram of the Pipeline design is shown below:
STAGES OF UKF PIPELINE
• sig_gen step contains two large matrix operations trisolve and the matrix multiply-add.
• sig_gen step is broken up into two stages the first two stages for the pipeline.
• The third stage is the software 'stage' where the processor propagates the sigma points
through the system models.
• The final two stages are simply for the predict and update steps.
SIGMA POINTS GENERATION
• To start the sig_gen module, the processor must first place the current augmented state and
covariance estimate into the memory buffer.

• The first stage (sig_gen (a))contains the matrix 'square-root' and a prefetch module to hold the
augmented state vector.
• The second stage (sig_gen (b)) contains just the matrix multiply-add.
• The sig_gen module is able to accept new data (i.e. the augmented state and covariance of another
UKF instance) once the first stage is completed.
PREDICT STEP
• The functionality of the predict step in the Pipeline design is also very similar to the Parallel design
except that the a priori state estimate and covariance are output to a FIFO.
• This is because these values are necessary during calculations in the update step and there are no
longer any local memory blocks for the augmented state and covariance; once the values are output
into the FIFO, the predict module can continue with the next UKF instance.

• The processor may initiate a predict step once it has placed a valid set of transformed sigma points into
the memory buffer.
• The processor must propagate the sigma points, generated from the sig_gen module, through both the
predict model as well as the update model as both the predict and update steps are calculated
together in succession.
UPDATE STEP
• The update module is functionally similar to the Parallel but has some key practical differences since
none of the hardware is reused.

• The update step proceeds immediately after the predict step.

• In the Pipeline design, the two covariance calculations occur in parallel.
• These two covariances are used to calculate the Kalman gain which is then used, along with the a priori
state estimate and covariance, to calculate the new state estimate and covariance.
• the matrix multiply-add module is instantiated twice and the two matrix multiply-add calculations occur
in parallel.
• Finally, the new state estimate is written to the memory buffer for the processor to collect.
SIMULTANEOUS LOCALISATION AND
MAPPING(SLAM)
• Autonomous navigation is one of the main focus areas in increasing the capabilities, and thus
viability, of robotic/autonomous systems in any given application.
• The challenge lies however for SLAM is when how an autonomous system can construct or
update a map of the environment while simultaneously keeping track of its location within
that environment.
• The broader issue lies when the system needs to localise within its surroundings a map is
needed. But In order to generate a map, the system must know where it is in the surrounding
• Common sensor sets used in SLAM solutions include laser range- finders and cameras for
vision-based navigation.
SLAM FOR UAV
• A Matlab implementation is used to validate the
formulation of the UKF.
• The performance of the IP core is done using Xilinx's
Vivado Simulator (Xilinx, 2017b) to perform a
behavioural simulation.
• The IP core used the floating point IP cores from
Xilinx's IP catalogue for its basic arithmetic and was
synthesised with a target frequency of 100 MHz.
SYSTEM MODEL
• The outputs of any SLAM solution in this case the UAV, are the positions of relevant landmarks
in the environment which can be used to generate a map.
• The motion of the UAV is modelled with the angular rates given by:

• And the linear rates given by:

• where vR is the linear velocity of the UAV in the UAV frame.

CONTD.
• Aq(q) given by, is the rotation of the UAV frame with respect to the world frame:

• The UAV linear and angular rates are controlled via inputs:
CONTD.
• Where uxyz is the desired linear motion for the x, y and z axes respectively, is the desired
angular motion about the roll, pitch, yaw rotational axes respectively and and are the
zero-mean Gaussian control noise terms.
• Landmarks in the environment are represented using the inverse depth parameterisation. For i-th
landmark Li:

• Where xi, yi, zi are the co-ordinates of the UAV when the landmark was first seen in the world
frame. Alpha and Beta are azimuth and elevation to the landmark respectively when it was first
seen in the world frame.
• is the inverse depth (i.e. =1/d where d is the distance to the landmark) of the landmark.

• The inverse depth parameterisation provides low linearisation errors at low parallax and has the
ability to represent any distance from the system immediately.
• Features effectively at infinity from the system are unstable attracting additional processing to
treat or discard those sensor readings, but, in this parameterisation, is simply treated as zero.
SENSOR MODEL
• The sensor used in this is only for the pinhole camera. Fixed to the front of the UAV and its
aperture aligned perpendicular to the UAV x-/roll axis.
• The sensor readings are simply the camera/image frame co-ordinates to the landmark.
• Camera model for the co-ordinates of some point P, which exists in the environment, in the
camera frame is given by:

• Where fu and fv are the distances from the centre of the aperture of the camera to the centre
of the image plane. xP ; yP ; zP are the Cartesian co-ordinates of the point P in the UAV frame
and I is a zero-mean Gaussian noise term.
PREDICT MODEL
• The predict model uses a dead reckoning model and the control inputs to predict the motion of the
UAV.
• The positions of known landmarks are also tracked so let the state vector be:

• Where p = [px; py; pz] is the Cartesian position of the UAV in the world frame.
• The predict model, f is then:
UPDATE MODEL
• The update model uses new measurements of one of the landmarks to update the state of both
the UAV and that landmark.
• The observation model ‘h’ is given by:

• where vk = is the observation noise and xL; yL; zL are the co-ordinates of landmark in the world
frame calculated via:

• where Li,xyz is the Cartesian position of the UAV in the world frame when the i-th landmark was first
seen and pk-1 is the a priori estimate of the position of the UAV in the world frame.
CONTD.
• As the UAV keeps moving around, the landmarks are not static. If a new landmark is detected, the
state vector must be expanded and initialised with new information.
• Adding a new landmark to the tracking is done by passing the current state and observation to an
inverse sensor model, h-1, given by:

• where lx, ly, lz are the co-ordinates of the newly detected landmark in the world frame.
• New landmarks are detected when observations cannot be associated with existing known
landmarks
SIMULATION MODEL
• The initial augmented state vector is:

• The initial augmented covariance is a diagonal matrix with diagonal terms:

• The length of the state vector is 7 + 6n where n is the number of known features, the number
of observation variables is 2 and the augmented state vector has 15 + 6n variables.
• The maximum number of landmarks considered in this simulation is 3 (i.e. n = 3) so the
maximum state and augmented state vector has 25 and 33 variables respectively.
• The control noise terms are modelled with covariances = [0.002812, 0.004349,
0.002248] m.s and
-1
= [0.01993, 0.03476, 0.03223] rad.s-1 while the observation

noise is modelled with covariance = 5.

RESULTS
• The UAV is own around in a polygon shape, roughly 3x2m in size, maintaining its initial
altitude and orientation.
• The SLAM solution has been able to track to the motion of the UAV along its path i.e the
estimated and true path coverage; dead reckoning path is included for comparision.
CONTD.
• The true path and the SLAM UKF estimated path are nearly indistinguishable, but the dead
reckoning path clearly deviates further and further as the simulation proceeds particularly in
the (world) y-axis as the UAV makes many more turns.
CONTD.
• The dead reckoning estimate rapidly becomes poor while the SLAM UKF estimate
is able to maintain a high level (< 50 mm) of accuracy for the whole simulation.
• However, there is some instability on the accuracy of the SLAM UKF estimate
because the number of landmarks in view is not static.
• The figure shows the position error of the UAV over the whole simulation.
•Though the performance of the hardware part of the codesign, the IP core, was measured
using a behavioural simulation in the Vivado Simulator, the software part of the codesign was
still implemented in C onto the Zedboard which allows for a slightly more accurate
performance estimate.
•This means the performance of the hardware part and software part was measured
independently then combined later.
•The overall latency of the SLAM solution for the Matlab, Serial and Parallel (2 PE)
Implementations is shown below.

•The latency of the algorithm depends on the number of landmarks that are visible at any
given time step.
•For the SLAM solution, however, multiple update steps needs to be performed depending on
how many observations were made in a single time step; in some cases, no observations of
landmarks were made and so no update step was performed.
•In addition to this, because the update step updates the augmented state and covariance,
subsequent update steps require the sigma points to be re-sampled.
•In this example, the predict state and augmented state vector would only need to have 7
and 13 variables respectively (only including the UAV pose and control noise but not the
measurement noise)
•While the update state and augmented state vector would have 13 and 15 variables
respectively (including the UAV pose, landmark position and measurement noise).
•Segmenting the UKF in this way benefits the hardware/software approach more so than
microprocessor-based approaches since with appropriate choices of processing elements, the
time complexity of the each UKF instance could be reduced even further.
IMPLEMENTATION OF HW/SW CODESIGN
• The HW/SW codesign, all three variants, was implemented for
a wide range of parameters in order to demonstrate the
flexibility and effectiveness of the design.
• Implementations for three example applications are
presented: an expanded implementation of the nanosatellite
application, a theoretical implementation which features a
large number of observation variables, and an
implementation where alternative parameterisation schemes
for the number of PEs were explored.
ANALYSIS OVERVIEW
• For all implementations described in this chapter, synthesis and implementation runs were
targeted at the Zynq-7000 XC7Z045 at a target frequency of 100 Mhz.
• Resources utilisation of the device by the IP core is reported by Vivado post-implementation.
• The power analysis is done via the Xilinx Power Estimator (XPE) post-implementation; all
power estimates exclude the device static power dissipation and the processing system
power draw.
• The execution time (latency) for any hardware part is measured via behavioural simulation in
Vivado Simulator, assuming a clock frequency of 100 MHz; this assumption was validated
post-implementation for all designs.
• The entire IP core utilises synchronous logic and is on a single clock domain which makes
confirming the proper distribution of the assumed clock signals, in this case 100 MHz,
relatively straightforward.
• The execution time (latency) of any software part is measured via the ARMv7 Performance
Monitor Unit (PMU) which counts processor clock cycles between two epochs; because the
number of processor clock cycles to perform a given task can vary, each measurement was
conducted at least 10 times and the average latency measured is reported here.
EXAMPLE APPLICATION: NANOSATELLITES

• Expanded implementation results for the

nanosatellite example application are presented and
analysed.
• This application, involves the attitude determination
of a nanosatellite, had 7 state variables, 6
observation variables and 20 augmented state
variables.
SYNTHESIS RESULTS
• Synthesis results for the Serial design and a selection of different numbers of processing
elements for the Parallel design are shown below.

• The initial numbers of PEs were chosen to be multiples of the number of augmented state
variables so that the major datapaths remained data efficient.
• If the number of PEs is not a multiple of the size of the matrix, then the last iteration of the
calculations will not have enough data to fill all the PEs making the datapath slightly
inefficient.
CONTD.
• Synthesis results for the Pipeline design can be seen below.

• The Pipeline design uses a huge amount of resources compared to the Serial and Parallel
designs.
• The Pipeline 2 PE implementation uses nearly the same amount of resources as the Parallel
10 PE implementation.
• This most likely makes the Pipeline infeasible on low-end devices, although in mid-range
devices the design could still potentially be part of a SoC for low numbers of processing
elements.
POWER CONSUMPTION
• A power consumption breakdown for the hardware IP core of the Serial and Parallel designs
is shown below :

• The power consumption of the Serial design is reasonably low, due to the area efficiency
design goals and the heavy utilisation of the FPGA clock enable resources to disable modules
that are not currently in use.
CONTD.
• A power consumption breakdown of the IP core for the Pipeline design is shown below:

• The power consumption of the Pipeline design is much larger than the Serial and Parallel
designs.
• However, the performance gains of the Pipeline design may outweigh the downsides in
power consumption, especially for a constellation.
TIMING ANALYSIS
• A breakdown of the execution time (latency) of different modules for the Serial and Parallel
designs are shown below:

• The design spends a large amount of the time propagating the sigma points through the two
system models.
• For the hardware part, the majority of time is spent in the sig_gen step. The two modules in
the sig_gen step, the triangular linear equations solver and the matrix multiply-add, are both
large matrix operations which scale with the number of augmented state variables.
CONTD.
• A breakdown of the time spent in different modules for the Pipeline design is shown below:
CONTD.
• A timing diagram for the whole pipeline is shown below:

• there exists further additional overhead when writing the augmented state/covariance into
the memory buffer at the start of each UKF instance, and when reading the current state
estimate from the memory buffer at the end of each UKF instance so the overall latency ends
up being roughly 10 times the longest stage.
EXAMPLE APPLICATION: LARGE NUMBER OF
OBSERVATION VATIABLES
• The application is presented to explore what happens
when there are a greater number of observation
variables than state variables, i.e for Mobs > Mstate.
• Since the update step is generally the most complex
sub-module in any variant of the HW/SW codesign,
increasing the number of observation variables may
have a disproportionate impact on the
implementation of the IP core.
SYNTHESIS RESULT
• Synthesis results for the Serial design and a range of processing elements for the Parallel design is shown
below:

• Resource usage seems to be dominated by the number of processing elements rather than changes in
the number of state or observation variables

• Synthesis results for the Pipeline design is shown above:

POWER CONSUMPTION
• A power estimate for the Serial and Parallel designs is shown below:

• The additional power usage, in this example, appears to be entirely from the BRAMs.
• The update step does use more memory than either the sig_gen or the predict steps which
means that increasing the number of observation variables leads to these memories being
larger and could be why this implementation has a slightly higher power consumption.
CONTD.
• The power estimate for the Pipeline design is shown below:

• Unlike the Parallel design, the Pipeline design only shows increases in power consumption for
the 5+ processing element cases; however, the majority of increases are in the signals, logic
and DSPs rather than the BRAMs.
• The Pipeline design does not use as much memory as the Serial/Parallel designs because
many of the intermediate products need not be stored and so the increase in power
consumption may simply be from the increase in activity in the update step.
TIMING ANALYSIS
• The latency across each step for the Serial and Parallel designs is shown below:

• The IP core now spends roughly the same amount of time in the sig_gen and update steps,
likely because of the trisolve module.
• Overall, the IP core for this implementation is slightly slower than the IP core in the
nanosatellite implementation.
CONTD.
• The latency across each step for the Pipeline design is shown below:

• Where, as with the Serial and Parallel designs, the increase in observation variables causes
the update step to outweigh the reduction in augmented state variables.
LATENCY: UKF STEPS
• A closer look at the latency of each of the sig_gen,
predict and update steps is presented in this section.
• It can be seen in previous implementations that the
IP core as a whole suffers from diminishing returns as
the number of processing elements increases.
• For example, in the nanosatellite application going
from the Serial design to the 2 PE Parallel design
reduces the execution time by 90 micro-s but adding
another 8 PEs to implement the 10 PE case only
reduces the execution time by 60 micro- s.
SIGMA POINTS GENERATION
• The figure below shows a graph of the latency of the sig_gen step versus the number of
processing elements.
•The first thing to note is that the latency of the trisolve module
barely changes with increasing numbers of processing elements.
•The Cholesky Decomposition cannot be effectively parallelised.
Instantiating additional processing elements for this module
appears to be a waste of resources in the sig_gen step.
•Conversely, the matrix multiply-add, greatly benefits from the
additional processing elements.
•Therefore the trisolve module will remain the main hindrance in
the sig_gen datapath regardless of instantiated processing
elements while the matrix multiply-add greatly benefits from the
same.
PREDICT STEP
• The figure shows a graph of the latency of the predict step versus the number of processing
elements.
•It can be seen that none of the modules in the predict datapath
disproportionately cause any congestion; furthermore, all modules
appear to benefit from additional processing elements.
•For inefficient processing element numbers, i.e. a non-multiple of
the state variables, additional processing elements actually slightly
increase the latency.
•However, the total latency of the predict step is much lower than
the other two steps.
•Even though the predict step benefits from additional processing
elements, it may not be necessary to use them since the other steps
take much longer in terms of overall latency anyway.
UPDATE STEP
• Figure shows a graph of the latency of the update step versus the number of processing
elements.
•As with the predict step, additional processing elements reduce the
latency of every module in the update step.
•Unlike the sig_gen step, the trisolve module here actually does
decrease in latency when additional processing elements are used.
•This is most likely due to the fact that the trisolve module here is
used for the matrix right 'divide'; i.e. the Cholesky Decomposition
followed by forward elimination then back substitution.
•Although the Cholesky Decomposition cannot be effectively
parallelised, the forward elimination and back substitution can be,
meaning those operations benefit from additional processing
elements.
LATENCY: AUGMENTED STATE VARIABLES
• The increase in augmented state variables, state variables and observation variables have
different impacts on each of the steps for the IP core.
• In this section, an implementation exploring the effect these variables have on the latency of
the design is examined.
• Consider an application with an even split between the number of state and observation
variables and perfect system models (i.e. Mstate = M=2, Mobs = M=2).
• A graph of the latency versus the number of augmented state variables for the Serial design is
shown below:
CONTD.
• The Cholesky Decomposition in both steps, as well as the large matrix multiplication for sigma
points generation, dominate the execution time, especially as the state vector gets larger.
• A graph of the latency of each step versus the number of augmented state variables for the 5
PE case is shown below:
CONTD.
• The number of augmented state variables was capped at a much lower level compared to the previous
image in order to show some of the small effects of the processing elements more clearly.
• In both cases, and as seen in the previous implementations, the sig_gen step takes the longest out of the
three steps.
• The increase in the sig_gen step's latency also rises faster than the other two steps.
• A graph of the latency of each step versus the number of augmented state variables for 10 PE is shown
below:
•Small dips in the overall latency can be seen in both cases.
•This is because the parallelisation scheme of many of the modules discussed previously is
most efficient when the number of processing elements is some multiple of the size of the
matrix being calculated.
•For example, consider the matrix multiply-add, if the row size of the matrix to be multiplied
is 10, and 10 processing elements are used, then the calculation only required one iteration
as each processing element calculates one row.
•If the row size of the matrix to be multiplied is 11-20, then the number of iterations
necessary is 2.
•Thus, for matrices of size 11-19, the module is now somewhat inefficient, since not all
processing elements are used every iteration.
•In the ‘Latency vs. augmented state variables for the Parallel design (5 PE)’ figure, small
dips can be seen at every multiple of 5 for the total latency and the sig_gen curves. This is
likely because of the large matrix multiply-add during the sig_gen step.
•In the ‘Latency vs. augmented state variables for the Parallel design (10 PE)’ figure,
although there is a very obvious dip at M = 20, after that the curves are more or less smooth.
•As the augmented state vector grows much larger than the number of processing elements,
the impact of the parallelisation becomes smaller.
• The Figure below shows the 10 processing element case for much larger augmented state
vectors where the complexity at O(M2.5) is not quite as poor as the Serial design.
•The Figure below shows the latency for the 20 processing element case with two power series
fits for augmented state variables lower and higher than the number of processing elements.

•There is an increase in complexity as the augmented state vector passes the 20 mark
and at these low numbers of augmented state variables (compared to the number of
processing elements), the complexity itself is even less than quadratic.

Database Architecture
100% (2)
Database Architecture
26 pages
Incremental Encoder Reader Circuit With Error Detection and Correction Using Microcontroller
100% (1)
Incremental Encoder Reader Circuit With Error Detection and Correction Using Microcontroller
4 pages
Digital Signal Processing PDF
0% (1)
Digital Signal Processing PDF
6 pages
Komatsu Hydraulic Excavator Pc35 45r8 Shop Manual
100% (37)
Komatsu Hydraulic Excavator Pc35 45r8 Shop Manual
20 pages
Mould Questions
No ratings yet
Mould Questions
6 pages
4215D (Tier2) GCIC
No ratings yet
4215D (Tier2) GCIC
2 pages
A New Fault Tolerant Nonlinear Model Predictive Controller Incorporating An UKF-Based Centralized Measurement Fusion Scheme
No ratings yet
A New Fault Tolerant Nonlinear Model Predictive Controller Incorporating An UKF-Based Centralized Measurement Fusion Scheme
6 pages
Graph Theoretic Approach To Solve Measurement Placement Problem For Power System
No ratings yet
Graph Theoretic Approach To Solve Measurement Placement Problem For Power System
9 pages
WHP04 Q2
No ratings yet
WHP04 Q2
2 pages
International Transactions on Electrical Energy Systems - 2021 - Jin - An improved algorithm for cubature Kalman filter
No ratings yet
International Transactions on Electrical Energy Systems - 2021 - Jin - An improved algorithm for cubature Kalman filter
23 pages
Mokhzani - Form PAE02 & PROJECT TECHNICAL REPORT v2
No ratings yet
Mokhzani - Form PAE02 & PROJECT TECHNICAL REPORT v2
31 pages
4760 - Moving Charges and Magnetism - Solution
No ratings yet
4760 - Moving Charges and Magnetism - Solution
10 pages
wewqewqe
No ratings yet
wewqewqe
7 pages
Kalman Filtering
No ratings yet
Kalman Filtering
20 pages
Adaptive Signal Processing: Synopsis
No ratings yet
Adaptive Signal Processing: Synopsis
2 pages
Investigating Machine Learning Techniques For Predicting Risk of Asthma Exacerbations: A Systematic Review
No ratings yet
Investigating Machine Learning Techniques For Predicting Risk of Asthma Exacerbations: A Systematic Review
22 pages
EMI2011_OnlineDamageDetectionUsingEKF
No ratings yet
EMI2011_OnlineDamageDetectionUsingEKF
9 pages
Railway Curves 7
No ratings yet
Railway Curves 7
21 pages
Phase Modulation
100% (1)
Phase Modulation
2 pages
Observadores de Estado
No ratings yet
Observadores de Estado
28 pages
Unscented KF Using Agumeted State in The Presence of Additive Noise
No ratings yet
Unscented KF Using Agumeted State in The Presence of Additive Noise
4 pages
6
No ratings yet
6
8 pages
To The Application of New Methods and Tools To Improve The Accuracy of Nano-Satellite'S Control Systems
No ratings yet
To The Application of New Methods and Tools To Improve The Accuracy of Nano-Satellite'S Control Systems
15 pages
MM Price Ing Issue
No ratings yet
MM Price Ing Issue
2 pages
Tutorial3
No ratings yet
Tutorial3
37 pages
Laaraiedh PythonPapers Kalman
No ratings yet
Laaraiedh PythonPapers Kalman
5 pages
Kalman Observer
No ratings yet
Kalman Observer
5 pages
An Efficient Adaptive Fir Filter Based On Distributed Arithmetic
No ratings yet
An Efficient Adaptive Fir Filter Based On Distributed Arithmetic
6 pages
Selecting A Model and Validation
No ratings yet
Selecting A Model and Validation
49 pages
DATA ACQUISITION AND ANALYSIS1
No ratings yet
DATA ACQUISITION AND ANALYSIS1
52 pages
2 - Chem End-Sem Re - Exam1
No ratings yet
2 - Chem End-Sem Re - Exam1
2 pages
Robotic System Modelling
No ratings yet
Robotic System Modelling
20 pages
A Short To Kalman Filters: Alan Washburn
No ratings yet
A Short To Kalman Filters: Alan Washburn
31 pages
Data Driven SMC Aboukheir
No ratings yet
Data Driven SMC Aboukheir
6 pages
sfmanual
No ratings yet
sfmanual
118 pages
Invariant Extended Kalman Filter for Measurements on Lie Groups
No ratings yet
Invariant Extended Kalman Filter for Measurements on Lie Groups
72 pages
CIV 10 CCR1036-12G-4S-EMv2
No ratings yet
CIV 10 CCR1036-12G-4S-EMv2
1 page
State Estimation of A Power System
No ratings yet
State Estimation of A Power System
8 pages
UDCovariance Factorizationfor Unscented Kalman Filterusing Sequential Measurements Update
No ratings yet
UDCovariance Factorizationfor Unscented Kalman Filterusing Sequential Measurements Update
9 pages
Pages From L05-Basic of Digital Filter
No ratings yet
Pages From L05-Basic of Digital Filter
66 pages
Analysis and Synthesis of Discrete-Time Systems: Sharif University of Technology
No ratings yet
Analysis and Synthesis of Discrete-Time Systems: Sharif University of Technology
16 pages
i_hate_digital_logic
No ratings yet
i_hate_digital_logic
48 pages
Ce (PC) 602
No ratings yet
Ce (PC) 602
21 pages
An Innovations Approach To Fault Detection and Diagnosis in Dynamic Systems
No ratings yet
An Innovations Approach To Fault Detection and Diagnosis in Dynamic Systems
4 pages
Estimation For Quadrotors
No ratings yet
Estimation For Quadrotors
12 pages
Digital Signal Processing: Dr. Md. Aynal Haque
No ratings yet
Digital Signal Processing: Dr. Md. Aynal Haque
63 pages
Tracking
No ratings yet
Tracking
26 pages
Optimal Control
No ratings yet
Optimal Control
51 pages
Leica Dm1750m-Dm750m-Dm2500m Brochure en
No ratings yet
Leica Dm1750m-Dm750m-Dm2500m Brochure en
12 pages
Kalman Filtering
100% (2)
Kalman Filtering
70 pages
Direct-Current Machines
No ratings yet
Direct-Current Machines
108 pages
Kalman Filter
No ratings yet
Kalman Filter
19 pages
Pipelined CORDIC Design On FPGA For A Digital Sine and Cosine Waves Generator
No ratings yet
Pipelined CORDIC Design On FPGA For A Digital Sine and Cosine Waves Generator
4 pages
C Code Generation For A MATLAB Kalman Filtering Algorithm - MATLAB & Simulink Example - MathWorks India
No ratings yet
C Code Generation For A MATLAB Kalman Filtering Algorithm - MATLAB & Simulink Example - MathWorks India
8 pages
Lesson Plan in Mathematics
No ratings yet
Lesson Plan in Mathematics
2 pages
DSP Domains: Digital Signal Processing (DSP) Is Concerned With The Representation of
No ratings yet
DSP Domains: Digital Signal Processing (DSP) Is Concerned With The Representation of
4 pages
State Estimation Report
No ratings yet
State Estimation Report
15 pages
"Factors Affecting The Low Proficiency in Mathematics of Grade 9 Students of Manuel S. Enverga Memorial School of Arts and Trades" S.Y. 2017-2018
No ratings yet
"Factors Affecting The Low Proficiency in Mathematics of Grade 9 Students of Manuel S. Enverga Memorial School of Arts and Trades" S.Y. 2017-2018
44 pages
STC Lecture Series: An Introduction To The Kalman Filter
No ratings yet
STC Lecture Series: An Introduction To The Kalman Filter
0 pages
CDA 3200 Digital Systems: Instructor: Dr. Janusz Zalewski Developed By: Dr. Dahai Guo Spring 2012
No ratings yet
CDA 3200 Digital Systems: Instructor: Dr. Janusz Zalewski Developed By: Dr. Dahai Guo Spring 2012
31 pages
Brief Concept of DSP
No ratings yet
Brief Concept of DSP
5 pages
Operation On Discrete-Time Signals & Discrete-Time System: Lecture - 3
No ratings yet
Operation On Discrete-Time Signals & Discrete-Time System: Lecture - 3
36 pages
Paper Title (Use Style: Paper Title) : Subtitle As Needed (Paper Subtitle)
No ratings yet
Paper Title (Use Style: Paper Title) : Subtitle As Needed (Paper Subtitle)
4 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
5 pages
Protection Concept For U Aung Ko Htay
No ratings yet
Protection Concept For U Aung Ko Htay
58 pages
4.3.1 The Kalman Filter
No ratings yet
4.3.1 The Kalman Filter
3 pages
Design of Extended Kalman Filter For Object Position Tracking IJERTV7IS070025
No ratings yet
Design of Extended Kalman Filter For Object Position Tracking IJERTV7IS070025
9 pages
8.design and Implementation of Low Power Digital FIR Filter Based On Low Power Multipliers and Adders On Xilinx FPGA
No ratings yet
8.design and Implementation of Low Power Digital FIR Filter Based On Low Power Multipliers and Adders On Xilinx FPGA
5 pages
Maths CH-2
No ratings yet
Maths CH-2
5 pages
Presentation 2
No ratings yet
Presentation 2
52 pages
Fpga Implementation of Adaptive Weight PDF
No ratings yet
Fpga Implementation of Adaptive Weight PDF
7 pages
Kalman Report
No ratings yet
Kalman Report
24 pages
Midas Gen: Project Title
No ratings yet
Midas Gen: Project Title
5 pages
Introduction To MS Excel 2007
No ratings yet
Introduction To MS Excel 2007
12 pages
Kalman Filter Implementation: First Part of Implementation
No ratings yet
Kalman Filter Implementation: First Part of Implementation
10 pages
Group # 7: Kalman Filter Based Parameter Estimation
No ratings yet
Group # 7: Kalman Filter Based Parameter Estimation
11 pages
Lab # 06 PDF
No ratings yet
Lab # 06 PDF
12 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
6 pages
DigitalControlSystems Lecture 1 PDF
No ratings yet
DigitalControlSystems Lecture 1 PDF
50 pages
Design and Finite Element Analysis of Shell & Tube Heat Exchanger Using Nano Fluids
No ratings yet
Design and Finite Element Analysis of Shell & Tube Heat Exchanger Using Nano Fluids
87 pages
Conditional Formatting
No ratings yet
Conditional Formatting
46 pages
Digital Filter Design and Algorithm Implementation With Embedded Signal Processors
No ratings yet
Digital Filter Design and Algorithm Implementation With Embedded Signal Processors
7 pages
DSP
0% (1)
DSP
94 pages
Risk Management in Commodity Markets (2008)
100% (2)
Risk Management in Commodity Markets (2008)
323 pages
Kalman Filter and Surveying Applications
No ratings yet
Kalman Filter and Surveying Applications
30 pages
Digital Signal Processing (DSP)
No ratings yet
Digital Signal Processing (DSP)
0 pages
Biology IA Exemplar
No ratings yet
Biology IA Exemplar
19 pages
Introduction to the simulation of power plants for EBSILON®Professional Version 15
From Everand
Introduction to the simulation of power plants for EBSILON®Professional Version 15
Steffen Swat
No ratings yet
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator
From Everand
Simulation of Some Power System, Control System and Power Electronics Case Studies Using Matlab and PowerWorld Simulator
Dr. Hedaya Mahmood Alasooly
No ratings yet
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
From Everand
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
Dr. Hidaia Mahmood Alassouli
No ratings yet

UKF ppt

Uploaded by

UKF ppt

Uploaded by

STATE OF A SYSTEM

 Information relevant to the operation of the

• State estimation is the process of using

•Where i refers to the i-th column of the matrix 'square-root'; is a

• The state and covariance are then predicted as:

• And the Kalman gain:

• Finally, the current system state is estimated by:

• The calculation of the covariance is given as:

• The update step proceeds immediately after the predict step.

• And the linear rates given by:

• where vR is the linear velocity of the UAV in the UAV frame.

• The initial augmented covariance is a diagonal matrix with diagonal terms:

noise is modelled with covariance = 5.

• Expanded implementation results for the

• Synthesis results for the Pipeline design is shown above:

You might also like