2502.09587v1

The document presents a rolling diffusion-based traffic scene simulation model that enhances the realism of non-playable characters (NPCs) by allowing them to react to the behaviors of other agents in a more efficient manner. This model combines the benefits of autoregressive and diffusion methods, achieving a balance between reactivity and computational efficiency while generating realistic driving scenarios. Experimental results demonstrate that this approach outperforms traditional methods in producing more realistic traffic scenes, particularly in closed-loop simulations.

Uploaded by

Nuri Taş

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views11 pages

2502.09587v1

Uploaded by

Nuri Taş

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Rolling Ahead Diffusion for Traffic Scene Simulation

Yunpeng Liu1,2 Matthew Niedoba1,2 William Harvey1 Adam Ścibior2

Berend Zwartsenberg2 Frank Wood1,2,3
1
University of British Columbia, 2 Inverted AI, 3 Amii
arXiv:2502.09587v1 [cs.LG] 13 Feb 2025

Abstract addition, real-world agents react to the ego agent’s behavior

with diverse actions, a dynamic not captured by IDM.
Realistic driving simulation requires that NPCs not only
mimic natural driving behaviors but also react to the behavior An increasingly popular alternative involves controlling
of other simulated agents. Recent developments in diffusion- simulated NPCs with generative driving behavior mod-
based scenario generation focus on creating diverse and re- els (Suo et al. 2021; Ścibior et al. 2021; Xu et al. 2023),
alistic traffic scenarios by jointly modelling the motion of which learn to generate realistic driving trajectories from
all the agents in the scene. However, these traffic scenarios recorded behaviors. These models facilitate the generation
do not react when the motion of agents deviates from their of a diverse set of synthetic driving logs. More recently,
modelled trajectories. For example, the ego-agent can be con- diffusion model based traffic scene prediction models have
trolled by a stand along motion planner. To produce reactive become more popular (Guo et al. 2023; Jiang et al. 2023;
scenarios with joint scenario models, the model must regen-
Niedoba et al. 2024). These models generate joint future tra-
erate the scenario at each timestep based on new observations
in a Model Predictive Control (MPC) fashion. Although re- jectories for all agents based on initial observations and al-
active, this method is time-consuming, as one complete pos- low for flexible conditioning at test time with classifier guid-
sible future for all NPCs is generated per simulation step. Al- ance for tasks like adversarial scenario generation (Zhong
ternatively, one can utilize an autoregressive model (AR) to et al. 2023) and scenario editing (Niedoba et al. 2024).
predict only the immediate next-step future for all NPCs. Al- While these diffusion based models are suitable for open-
though faster, this method lacks the capability for advanced loop simulations, employing them in closed-loop simula-
planning. We present a rolling diffusion based traffic scene tions is time-consuming due to the inherently slow gener-
generation model which mixes the benefits of both methods ation times of diffusion models. In such setups, one would
by predicting the next step future and simultaneously predict- need to replan at each simulation timestep to maintain maxi-
ing partially noised further future steps at the same time. We
mum reactivity to the uncontrolled ego agent. Current meth-
show that such model is efficient compared to diffusion model
based AR, achieving a beneficial compromise between reac- ods utilizing diffusion models for closed-loop traffic scene
tivity and computational efficiency. planning typically generate a small window of steps ahead,
then replan (Chang et al. 2023) to create adversarial scenar-
ios. However, this approach may be inefficient, our method
Introduction improves upon this by partially denoising the future plan and
Traffic simulation is essential in the development of au- only predicting a clean subsequent state after realizing all
tonomous driving systems, particularly for addressing sim- agents’ states at the last simulation step.
to-real issues during real-world deployment. A key compo- We propose a rolling diffusion-based traffic simulation
nent is the realistic behavior of simulated surrounding agents planner that efficiently plans for all agents in the scene in
or non-playable characters (NPCs), as noted by (Gulino et al. an autoregressive manner. With a 2-second look-ahead plan-
2024). One approach involves replaying recorded driving ning window, our model requires four times fewer function
behaviors, obtained either from overhead drones (Zhan et al. evaluations than a typical autoregressive diffusion model
2019) or onboard vehicle recordings (Sun et al. 2020). Ac- for generating 4-second scenarios, while remaining reac-
cording to (Gulino et al. 2024), reinforcement learning poli- tive to other agents. Since the denoising operation can-
cies trained on such driving logs outperform those trained not benefit from parallel computing, this iterative process
with NPCs controlled by the rule-based reactive planner like becomes the bottleneck in simulation speed. Furthermore,
the Intelligent Driver Model (IDM) (Treiber, Hennecke, and our experiments, which involve training on real-world driv-
Helbing 2000). However, this log-replay approach has two ing logs (Zhan et al. 2019), demonstrate that our rolling-
drawbacks, the collection of driving records is both costly ahead traffic planner produces more realistic scenes than our
and time-consuming (Rempe et al. 2022; Liu et al. 2023). In diffusion-based AR baseline, as measured qualitatively and
by scene-level displacement metrics.
Workshop on Machine Learning for Autonomous Driving at
AAAI 2025 (ML4AD@AAAI 2025), Philadelphia, PA USA.
https://ml4ad.github.io
Background as the dimensions of x0 grow, particularly for long se-
Diffusion Models quence modeling. The authors of the rolling diffusion model
(RDM) (Ruhe et al. 2024) introduce a method by modelling
Diffusion models (Sohl-Dickstein et al. 2015; Ho, Jain, and a sliding window of x0 , given the assumptions that elements
Abbeel 2020; Kingma et al. 2021; Karras et al. 2022; Song that are far in the past from the sliding window are irrel-
et al. 2020) are probabilistic generative models which have evant.The authors propose assigning temporally correlated
recently been applied to the problem of traffic scenario mod- noise levels to the elements within the sliding window, in-
elling. They are defined based on a forward process which troducing a temporal inductive bias to the model. Previous
gradually adds Gaussian noise to data. The amount of noise work has shown that this particular inductive bias helps gen-
added at any point is based on the “time” in the forward pro- erating diverse and high quality samples in long term human
cess, τ . We will refer to the copy of the data at a certain dif- motion predictions (Zhang et al. 2023) and being efficient
fusion timestep as xτ . When τ equals zero, xτ = x0 is the in NLP tasks like summarization and translation (Wu et al.
original data with no noise. For positive τ , the distribution 2024). RDM formalizes the temporal noise correlation ap-
of xτ given x is proach and provides a local and global perspective of mod-
q(xτ |x0 ) = N (xτ ; ατ x0 , στ2 I), (1) elling long sequences of videos.
We begin our discussion by examining a general rule for
where ατ and στ are τ -dependent scalars and τ ∈ [0, 1]. In
diffusion models that incorporate temporal noise correlation
this paper, we will set ατ = 1 for all τ following EDM (Kar-
for sequences. We then delve into the specific mechanisms
ras et al. 2022). We will define στ to be an increasing func-
of the RDM, reviewing both its forward and reverse pro-
tion which is zero when τ is zero and large when τ = 1. The
cesses along with its objective function, and the two operat-
signal-to-noise ratio is defined by
ing stage as depicted in Figure 2.
ατ2 Given a long sequence of data with length T , RDM ap-
SNR(τ ) = . (2) proaches the problem from a local perspective by examin-
στ2
ing a single sliding window of length W . Within this frame-
Given our choices of ατ and στ , the signal to noise ratio is work, RDM defines a function g that maps a global diffusion
small for τ = 1. This means that q(x1 |x0 ) will be well- step τ to a local diffusion step τw ∈ [0, 1] given the local
approximated by a zero-mean
R Gaussian, and therefore the window index w. The fundamental rule for diffusion models
marginal q(x1 |x0 ) = q(x1 |x0 )pdata (x0 )dx0 will also be with temporal noise correlation is
roughly Gaussian.
It is possible to ”invert” this forward process, yielding a SNR(τw+1 ) < SNR(τw ),
reverse process that can transport samples from q(x1 ) to indicating the temporal nature of the sequence results in in-
q(x0 ), as shown by (Song et al. 2020). This reverse pro- creased uncertainty as time progresses. As shown in Fig-
cess maps a roughly-Gaussian distribution to the data distri- ure 2, darker-colored circles represent higher uncertainty be-
bution, allowing us to make realistic samples out of noise. cause they are at a higher noise level.
Doing so only requires an approximation of the score func- Given a sampled sequence index t, The forward process
tion ∇xτ log q(xτ ). Such an approximation can be learned within the local window in RDM is
by optimizing the loss (Song et al. 2020)
Y−1
t+W
Ldiffusion (θ) = q(xt:t+W
τ |xt:t+W
0 )= N (xw w 2
τ ; ατw x0 , στw I), (4)
(3)
Epdata (x0 )q(xτ |x0 )u(τ ) ω(τ )∥Dθ (xτ ; τ ) − x0 ∥22
w=t
where the diffusion parameter ατw and στw are local diffu-
where u(τ ) is a distribution over diffusion times to use dur- sion step τw -dependent scalars rather than τ in Eq. (1). This
ing training, ω(τ ) is a function that weights the contribution represents the fact that more noise is added to the later frame
of each timestep to the loss, and Dθ (·, ·) represents a neu- as ατ and στ is monotonically increasing with respect to τ .
ral network that produces an output of the same shape as For the reverse process, RDM defines
x0 . This neural network learns to estimate clean data given
noisy data, and such an estimate can be used to produce an pθ (xτt:t+W t:t+W
−1 |xτ ) :=
estimate of the score function (Song et al. 2020). This can be Y−1
t+W
used to simulate the SDE that produces sampled clean data q(xw t:t+W
xw = fθ (xτ , τw )).
τ −1 |xτ , (5)
x0 given samples from a Gaussian approximating q(x1 ). w=t
Through out the paper we use τ as the time in the diffusion
process and t as the chronological time within a scenario. The benefit arising from such formulation is that instead
Note that the diffusion model can be made conditional on of training on the full sequence T which is memory in-
any extra information by inputting the extra information into efficient and complex, RDM’s objective function is defined
the neural network in Eq. (3) and providing it to the neural only within a sampled sliding window with length W ,
network in the same way at test-time (Tashiro et al. 2021).
X−1
t+W
Temporally Correlated Diffusion Models Eτ ∼U (0,1), ω(τw )∥Dθ (xt:t+W
τ ; τw ) − xw 2
0 ∥2 .
The above formulation of diffusion models becomes less xτt:t+W ∼q w=t
efficient in terms of memory and computational resources (6)
Figure 1: From top to bottom row: DJINN (Niedoba et al. 2024), autoregressive (AR), and RoAD (Ours). The adversarial
agent, marked with a red dot, follows its replay log and slows down to reach only half its trajectory by the end of the simulation.
Brown circles highlight the interaction region. The agents controlled by the RoAD and AR models slow down to react to
the adversarial agent, while agents controlled by the DJINN model do not. Ground truth trajectories are shown in gray, and
predicted trajectories are shown in orange.

There are two stages of RDM, the warm-up stage and the Related Work
rolling stage. In the warm-up stage, the model handles the Traffic Simulation with Diffusion Models
initial boundary condition by generating from white noise,
as shown in the first row of Figure 2 (left), and denoises it Predicting the motion of road users is a critical task for au-
to produce one clean element and partially denoised future tonomous vehicle driving or simulation. For this reason, the
elements in the sliding window as shown in the bottom row. number of methods which have attempted to model traffic
Once it reaches the temporal correlated noise stage (Bottom behavior is vast. The literature contains a variety of tech-
row of Figure 2 left), RDM takes few denoising steps for the niques for modelling the distribution of driving behavior, in-
next step prediction shown in Figure 2 (right). This requires cluding mixture models (Chai et al. 2019; Cui et al. 2019;
the model to train two tasks, where β controls the training Nayakanti et al. 2023), variational autoencoders (Ścibior
task distribution and for each tasks, RDM designs an asso- et al. 2021; Suo et al. 2021), and generative adversarial net-
ciated function g for calculating the local diffusion time τw works (Zhao et al. 2019).
given τ and window index w. In addition, we can condition Our work builds upon recent methods which model driv-
n number of clean observations within the sliding window, ing behavior using diffusion models. In CTG (Zhong et al.
g is defined for warm-up and rolling stage as 2023), the authors model the motion of each agent in the
scene independently with a Diffuser (Janner et al. 2022)
w
gwarm-up (τ, w) := max(min( + τ, 1.0), 0.0) (7) based diffusion model. The authors of (Chang et al. 2023)
W also model agent motions via diffusion, with a focus on con-
w+τ −n trollability. By contrast, most other diffusion based traffic
grolling (τ, w) := max(min( , 1.0), 0.0), (8)
W −n models model entire traffic scenes. This includes Motion-
where n,W are application-dependent hyperparameters. Diffuser (Jiang et al. 2023), Scenario Diffusion (Pronovost
Time Time Noise
Denoised clean
element

Diffusion Time

Diffusion Time
Rolling window with
length

Warm-up stage Rolling stage

Figure 2: Rolling Diffusion Model. Columns represent sequence timesteps and rows represent diffusion timesteps. Circles are
shown in white if the corresponding sequence timestep is fully denoised; black if the sequence timestep is pure noise; and grey
if in between. During the denoising process, the SNR for each element in the rolling window depends on the local diffusion
time τw which can be calculated using Eq. (7) or Eq. (8), depending on whether it is in the warm-up or rolling stage.

et al. 2023) and SceneDM (Guo et al. 2023) which all diffuse conditioned on the most recent state and actions of the ego
the joint motion of all agents in the scene. Our work builds agent.
directly on that of DJINN (Niedoba et al. 2024), which uti-
lizes a transformer based network to generate joint traffic Replanning with a joint prediction model
scenarios based on a variable set of agent state observations. Our baseline planner relies on a conditional diffusion model
Crucially, due to the expensive computational cost of diffu- p(xtobs :T |x0:tobs , M, c), which jointly predicts the scenario
sion model sampling, only CTG (Zhong et al. 2023) utilize for all agents in the scene up to time T given the map M and
their model for closed-loop scenario simulation. Twice per additional conditioning information c. Although diffusing
second, they incorporate new state observations and resam- the joint states of all agents is a flexible way of modelling the
ple trajectories for each agent. By comparison, our method distribution of traffic scenarios, the model does not respond
does not require iterative replanning, greatly improving sim- to the ego agent trajectories which deviate from modelled
ulation speed. behavior. To mitigate this, one option is to regenerate the
traffic scenario after each simulator step to incorporate new
Methods ego agent state observations. We select DJINN (Niedoba
Problem Formulation et al. 2024) as our conditional diffusion model and we de-
note this method of iterative planning as DJINN-MPC as it
We refer to the motion of A agents across T discrete times resembles a traditional model predictive control loop. This
in an environment M as a traffic scenario. Formally, we de- allows the scenario simulation planner πsim to adjust its
fine the scenario as x ∈ RA×T ×3 , where we represent the predictions at every simulation step in response to the stan-
state of each agent a ∈ A at time t ∈ T as the combination dalone ego agent in the scene.
of its 2D position and 1D orientation. We introduce a proba-
bilistic planner πsim which jointly predicts the future states Diffusion based autoregressive model (Diff-AR)
for all agents, conditioned on static map information M and
previously observed agent states xobs ∈ RA×tobs ×3 . One key drawback of DJINN-MPC is that we must fully dif-
A more difficult form of this planning problem is closed- fuse a new traffic scenario at every simulator step, at signifi-
loop traffic simulation. In closed-loop simulation one agent, cant cost. As an alternative, one can train an diffusion based
known as the ego agent aego , is typically controlled by a autoregressive model as the simulation planner, which fac-
standalone motion planner πego which may be a black-box torizes the conditional probability as
and which may cause the ego agent to drive very differently −1
TY
to any agents in the training data and thus is not amenable to p(xtobs :T |x0:tobs , M, c) = p(xt |x00:t−1 , M, c).
accurate prediction by the traffic scenario planner πsim . At t=tobs
each time step t, the standalone motion planner πego plans
Given the past observations, the model only predicts one
the single next step for the ego-agent given the entire his-
subsequent step. In practice, the history of past observations
tory of the scenario, x0:t and the map M. The closed-loop
x00:t−1 is truncated to a fixed length. Compared to the pre-
traffic simulation problem is to model the behavior of ev-
vious method, Diff-AR is slightly more efficient as it only
ery other agent in this scene, including potential interactions
denoises the single-step future from scratch. However, Diff-
with the ego agent. Since the state of the ego agent is neither
AR cannot anticipate other agents’ long-term behaviors be-
controllable nor known in advance, the traffic scenario plan-
yond the immediate next step which is important for effec-
ner πsim must continually update its plan to be continuously
tive planning in many traffic scenarios.
Rolling ahead autoregressive model our score estimator Dθ takes a vector τ = {τw }W w=0 to re-
We propose a rolling diffusion based model (RoAD) for traf- flect the temporal correlation of different nose levels in xW
τ .
fic scenario planning based on RDM. We start by providing The weighting term is also a vector that takes vector τ as
an overview of our autoregressive traffic planner, then dis- input then assign different weights according to each τw .
cuss some details of our design choices on the diffusion pro- While our rolling ahead autoregressive model is efficient
cess and the model with our updated objective function. for long traffic scenario planning, the partially denoised fu-
We utilize a sliding window of length W , which is much ture plan affects the reactivity of our model. In traffic sim-
smaller than the scenario length T . This sliding window ulation, such degradation may cause a higher collision rate
includes tobs clean observations for all agents. Within this with the uncontrolled ego agent in the scene. The reactivity
window, only the tobs+1 th state is fully denoised at each of the model depends on the SNR of future states. We empir-
scenario time for all agents, while the remainder of the se- ically evaluate the reactivity of our model compared to the
quence undergoes partial denoising. At the next simulation AR baseline in our experiment section.
step, we then shift the sliding window and repeat the pre-
vious process. By focusing on a smaller window and selec- Conditioning Augmentation
tively denoising, our approach maintains computational effi- We have empirically found that noise conditioning augmen-
ciency while preserving the ability to adaptively plan for the tation, as described by (Ho et al. 2022b), is essential for all
immediate future. models operating in an autoregressive manner. This augmen-
We follow the design choices from EDM (Karras et al. tation is critical for autoregressive human motion genera-
2022) in designing our diffusion process. Given a local dif- tion (Yin et al. 2023), cascaded diffusion models for class-
fusion time τw , during training, our στ is a continuous ver- conditional generation, and super-resolution video gener-
sion of the sampling noise schedule in EDM, ation (Ho et al. 2022a). Noise conditioning augmentation
1 1 1
enhances the model’s robustness against generated noise,
στ = (σmax ρ + τ (σmin ρ − σmax ρ ))ρ , which serves as observations for subsequent predictions (Ho
et al. 2022b), and it mitigates the risk of the model overfit-
where we keep the default hyper-parameter choice for σmax , ting to its autoregressive nature. In the context of traffic sim-
σmin and ρ from (Karras et al. 2022). We apply the Heun 2nd ulation, this augmentation aids in generating smooth trajec-
order sampler to sample at prediction time with the same tories and ensures that the model does not ignore other con-
hyper-parameter reported in EDM. We referred the reader to ditional factors, such as the presence of other agents and, im-
EDM for the detailed denoising algorithms. portantly, the map M of the environment. Previous work on
As we are interested in modelling a joint traffic planner diffusion-based traffic simulation (Zhang et al. 2023; Chang
for all agents in the scene, we diffuse in a global coordinate. et al. 2023) circumvents the issue of noisy observations by
We adopt the map representation from (Niedoba et al. 2024) relying on a kinematic model to produce smooth trajecto-
where M is represented as an unordered set of polylines, ries; however, our autoregressive traffic simulation planner
each polylines describing lane centers and normalized for does not require such a kinematic model.
length and scale to match the agent states. Our model, built We follow (Ho et al. 2022b) to employ conditioning aug-
on a transformer-based architecture utilizes a feature tensor mentation for our rolling ahead traffic planner with an im-
shaped [A, W, F ] to process agent trajectories and map in- portant modification. During training, given a sampled train-
formation. It embeds noisy and observed states, temporal in- ing segment xW with length W , and n observations xobs
dices, and the local diffusion step τw into high-dimensional within this segment, we apply Gaussian noise augmentation
vectors with feature dimension F . We apply per-agent posi- to the xobs , where the noise level στca is sampled uniformly
tional embeddings to these feature vectors, which are then between σmin and σmax . Unlike (Ho et al. 2022b), we found
fed into a series of transformer blocks that perform self- jointly predicting all elements within the sliding window, in-
attention in the time and agent dimensions, as well as cross- cluding the noised observations is enssential for our applica-
attention with the map features. tion. At testing time, we apply Gaussian noise with σmin to
Rather than denoise the sequence one by one as in Eq. (6), our observations for minimal level of augmentation.
our transformer architecture jointly predicts the score for all From a broader perspective, conditioning augmentation
noisy states in the window. Denote xW as the sliding win- addresses a well-understood issue in imitation learning with
dow of interest. Our score estimator Dθ takes in xW τ , τ , and autoregressive-style methods, where at test-time the model
the map M, along with additional conditional information must condition on samples that it produced earlier. Through-
c that includes the dimensions of each agent. Our updated out a roll-out, the distribution of these samples may shift
objective function is so that they appear out-of-distribution relative to the train-
ExW W W 2
W [ω(τ )∥Dθ (xτ , M, c, τ ) − x0 ∥2 ]. (9) ing data. One solution to this problem allows the model
0 ,τ ,xτ to learn from its own mistakes using a differentiable sim-
Note that xW τ is sampled from RDM forward process de-
ulator (Ścibior et al. 2021). In the diffusion model con-
fined in Eq. (4) that contains states with noise level accord- text, though, this requires sampling from the reverse pro-
ing to τw . While local diffusion time τw depends on the win- cess which is expensive. We denote this type of augmen-
dow index w and the global diffusion steps τ , all agents in tation as reverse process conditioning augmentation, where
the scene has a consistent local diffusion time τw . Therefore, the noise originates from the model’s prediction. Existing
work (Ho et al. 2022b) on cascaded diffusion models has
achieved comparable performance through both reverse pro- performance of DJINN as the upper bound for this task since
cess conditioning augmentation and forward process condi- DJINN is trained only at this fixed time horizon and is not an
tioning augmentation for high-resolution image generation autoregressive model by nature. Following (Niedoba et al.
conditioned on a low-resolution image. Therefore, we opt 2024), displacement metrics are calculated by generating
for the more efficient forward process conditioning augmen- 24 samples for each scenario and fit a 6 component Gaus-
tation approach. sian mixture model to cover all future modes. DJINN-10
(MPC-1) achieves slightly better results than AR but per-
Experiments forms worse than RoAD due to larger accumulated errors
We evaluate our rolling ahead scene generation model from replanning at each simulation step.
(RoAD) on the INTERACTION dataset (Zhan et al. 2019), RoAD models with window sizes of 15 (RoAD-15) and
which contains 16.5 hours of driving records across 11 lo- 20 (RoAD-20) achieved lower displacement metrics than
cations. Our baselines include an autoregressive diffusion AR models, as RoAD also considers the noisy future steps
model (AR), which takes observations of length 10 and pre- beyond the next immediate one. Additionally, RoAD mod-
dicts the next step future for all agents. Another baseline, els exhibit a slightly lower miss rate. We also observed that
DJINN, is a scene generation model that takes 10 observa- RoAD models with larger window sizes demonstrate better
tions and jointly predicts the next 30 steps at 10Hz for all displacement metrics. Displacement metrics are one indica-
agents in a one-shot manner. We have also trained a version tor of the quality of the generated samples. We show quali-
of DJINN that predicts 10 future steps ahead jointly for all tatively in Figure 3 that RoAD reconstructs to ground truth
agents (DJINN-10). As our RoAD model with window size trajectories marked in grey better than AR.
20 (RoAD-20) predicts 10 partially denoised future steps as Reactivity The RoAD models efficiently roll out long sce-
well, we believe DJINN-10 provides a reasonable compari- narios by partially denoising future states. However, this
son. DJINN-10 (MPC-X) is a variant of DJINN-10 that has limits its ability to adapt to perturbations, such as an agent
been trained with conditioning augmentation and deployed controlled by a different motion planner while being ob-
in an MPC style, enabling us to replan after executing X served by our model. This is a typical setup in closed-loop
steps of predictions for all agents. simulation. To evaluate this, we evaluate the RoAD models’
We first compare RoAD with AR, DJINN, and DJINN- adaptation to an adversarial agent. We select one agent per
10 (MPC) using standard scene-level displacement metrics scene and control it using its replay log, slowing it down to
such as minSceneADE and minSceneFDE to demonstrate reach only half its trajectory by the end of the simulation.
the quality of samples generated by RoAD. We then as- The simulation runs for 40 time steps at 10 Hz, given ini-
sess the reactivity of DJINN, DJINN-10 (MPC-1), AR and tial 10-step observations, which makes the performance of
RoAD with an adversarial agent, which is not controlled by DJINN one-shot a lower bound since it is blind to the adver-
the scene generation model, by measuring the collision rates sarial agent during the simulation.
with the adversarial agent. In total, we select 1,440 scenes from the INTERACTION
Implementation Details We adopt the same transformer validation set, focusing on the top six locations with the
architecture from DJINN (Niedoba et al. 2024) for all of our largest number of scenarios. Scenes with a low number of
models. We apply 0.2 conditional augmentation for AR and participants are filtered out, as agents in these scenarios are
DJINN-10 and 0.5 for RoAD, as we found that higher con- less likely to interact. We take three samples per scenario
ditional augmentation ratio for AR and DJINN-10 results in and for each model, then report the average collision rate
worse performance. We train our RoAD planner with obser- in Table 1. In addition, we reported the prediction time for
vation length 10 and task ratio β=0.1. a single sample on a RTX2080Ti GPU in Table 2 for each
model to highlight the efficiency of our RoAD models.
Evaluation Metrics We measure the accuracy of our gen- DJINN-10 (MPC-1) achieves the lowest collision rate
erated trajectories with standard displacement metrics. To compared to other models while having the longest total pre-
measure the joint motion forecasting quality, we follow diction time. AR reduces the collision rate by 3x compared
(Ngiam et al. 2021) reporting minSceneADE and min- to the lower bound (DJINN). DJINN-10 (MPC-1) achiev-
SceneFDE. Both metrics capture the minimum joint dis- ing better results than AR aligns with our expectations, as
placements error for all agents across 6 joint traffic scenario DJINN-10 (MPC-1)can look ahead ten steps into the future,
samples. To measure per-agent motion forecasting perfor- whereas AR predicts only one step ahead. Our proposed
mance, we report the miss rate; the rate of agents where none RoAD-15 performs close to AR which reduces the collision
of the six predicted trajectories have a final displacement rate over 2.5x compared to DJINN while having half of the
error less than 2 meters. To measure the reactivity of each prediction time compared to AR and DJINN-10 (MPC-1).
model, we report the collision rate, the number of collisions As an ablation, we measured the collision rate for RoAD-
divided by the total number of simulated scenarios. 20 in the reactivity experiment. We observed that increas-
Motion Forecasting We compare RoAD with AR and ing the window size can reduce the prediction time further
DJINN-10 on the motion forecasting task using the valida- while having a slightly higher collision rate. Figure S1 in the
tion set of the INTERACTION dataset (Zhan et al. 2019). Supplementary Materials shows an example where RoAD-
We generate three seconds of driving behavior at 10 hertz, 20 failing to react, while RoAD-15 avoids a collision in the
conditioned on one second of observations. We consider the same scenario. This feature provides practitioners with the
Figure 3: From Top to Bottom row, AR, RoAD-20. By looking ahead of the subsequent step, the pedestrian marked with a red
dot controlled by RoAD-20 planner avoided colliding with the vehicle. Brown circles highlight the interaction region. Grey
trajectories denote replay logs and orange trajectories are the full predicted future. This example demonstrates that RoAD-20,
with a longer planning horizon compared to AR can anticipate and mitigate interactions with other agents effectively.

Table 1: Comparison of displacement metrics and collision rate.

Location All
Model Type minSceneADE minSceneFDE Miss Rate
DJINN (One shot) 0.388 1.004 0.049
DJINN-10 (MPC-1) 0.692 1.675 0.166
AR 0.695 1.670 0.168
RoAD-15 0.673 1.596 0.160
RoAD-20 0.654 1.553 0.142

Table 2: Performance with an adversarial ego agent. Table 3: Ablation on conditioning argumentation (CA)
across six locations from INTERACTION dataset.
Model Collision Rate Prediction time (min)
Metrics RoAD w/o CA RoAD w CA
DJINN 0.052 0.07 minSceneADE 0.930 0.663
DJINN-10 minSceneFDE 2.197 1.579
0.014 0.69
(MPC-1) ego minADE 0.693 0.475
AR 0.016 0.68
RoAD-15 0.019 0.34
RoAD-20 0.024 0.20 Conclusion
In conclusion, we have proposed a rolling diffusion-based
traffic scene planning framework that strikes a benefi-
flexibility to decide whether reactivity or computational ef- cial compromise between reactivity and computational effi-
ficiency is more important for their simulation needs. ciency. We believe this work addresses a gap in the commu-
Ablation on conditioning augmentation We show the nity by enabling the autoregressive generation of traffic sce-
significance of conditioning augmentation by measuring the narios for all agents jointly, and it offers insights into the cru-
displacement metrics for two RoAD models trained with cial role of conditioning augmentation techniques. For fu-
same configuration but one without conditioning augmenta- ture work, we aim to explore test-time conditioning with this
tion in Table 3. We can see the displacement errors increased model and seek to enhance model performance through flex-
significantly without conditioning augmentation. ible conditioning on past observations (Harvey et al. 2022).
Acknowledgment Ho, J.; Saharia, C.; Chan, W.; Fleet, D. J.; Norouzi, M.; and
We acknowledge the support of the Natural Sciences and Salimans, T. 2022b. Cascaded diffusion models for high
Engineering Research Council of Canada (NSERC), the fidelity image generation. Journal of Machine Learning Re-
Canada CIFAR AI Chairs Program, Inverted AI, MITACS, search, 23(47): 1–33.
and Google. This research was enabled in part by technical Hoogeboom, E.; Gritsenko, A. A.; Bastings, J.; Poole, B.;
support and computational resources provided by the Digi- Berg, R. v. d.; and Salimans, T. 2021. Autoregressive diffu-
tal Research Alliance of Canada Compute Canada (alliance- sion models. arXiv preprint arXiv:2110.02037.
can.ca), the Advanced Research Computing at the Univer- Janner, M.; Du, Y.; Tenenbaum, J. B.; and Levine, S. 2022.
sity of British Columbia (arc.ubc.ca), and Amazon. Planning with Diffusion for Flexible Behavior Synthesis. In
Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvári, C.; Niu,
References G.; and Sabato, S., eds., International Conference on Ma-
Austin, J.; Johnson, D. D.; Ho, J.; Tarlow, D.; and Van chine Learning, ICML 2022, 17-23 July 2022, Baltimore,
Den Berg, R. 2021. Structured denoising diffusion mod- Maryland, USA, volume 162 of Proceedings of Machine
els in discrete state-spaces. Advances in Neural Information Learning Research, 9902–9915. PMLR.
Processing Systems, 34: 17981–17993. Jiang, C.; Cornman, A.; Park, C.; Sapp, B.; Zhou, Y.;
Chai, Y.; Sapp, B.; Bansal, M.; and Anguelov, D. 2019. Mul- Anguelov, D.; et al. 2023. Motiondiffuser: Controllable
tiPath: Multiple Probabilistic Anchor Trajectory Hypotheses multi-agent motion prediction using diffusion. In Proceed-
for Behavior Prediction. In Kaelbling, L. P.; Kragic, D.; and ings of the IEEE/CVF Conference on Computer Vision and
Sugiura, K., eds., 3rd Annual Conference on Robot Learn- Pattern Recognition, 9644–9653.
ing, CoRL 2019, Osaka, Japan, October 30 - November 1, Karras, T.; Aittala, M.; Aila, T.; and Laine, S. 2022. Eluci-
2019, Proceedings, volume 100 of Proceedings of Machine dating the design space of diffusion-based generative mod-
Learning Research, 86–99. PMLR. els. Advances in Neural Information Processing Systems,
Chang, W.-J.; Pittaluga, F.; Tomizuka, M.; Zhan, W.; and 35: 26565–26577.
Chandraker, M. 2023. Controllable Safety-Critical Closed- Kingma, D.; Salimans, T.; Poole, B.; and Ho, J. 2021. Vari-
loop Traffic Simulation via Guided Diffusion. arXiv preprint ational diffusion models. Advances in neural information
arXiv:2401.00391. processing systems, 34: 21696–21707.
Cui, H.; Radosavljevic, V.; Chou, F.; Lin, T.; Nguyen, T.; Liu, Y.; Lioutas, V.; Lavington, J. W.; Niedoba, M.; Sefas, J.;
Huang, T.; Schneider, J.; and Djuric, N. 2019. Multi- Dabiri, S.; Green, D.; Liang, X.; Zwartsenberg, B.; Ścibior,
modal Trajectory Predictions for Autonomous Driving us- A.; et al. 2023. Video Killed the HD-Map: Predicting Multi-
ing Deep Convolutional Networks. In International Con- Agent Behavior Directly From Aerial Images. In 2023 IEEE
ference on Robotics and Automation, ICRA 2019, Montreal, 26th International Conference on Intelligent Transportation
QC, Canada, May 20-24, 2019, 2090–2096. IEEE. Systems (ITSC), 3261–3267. IEEE.
Gulino, C.; Fu, J.; Luo, W.; Tucker, G.; Bronstein, E.; Lu, Nayakanti, N.; Al-Rfou, R.; Zhou, A.; Goel, K.; Refaat,
Y.; Harb, J.; Pan, X.; Wang, Y.; Chen, X.; et al. 2024. Way- K. S.; and Sapp, B. 2023. Wayformer: Motion Forecasting
max: An accelerated, data-driven simulator for large-scale via Simple & Efficient Attention Networks. In IEEE In-
autonomous driving research. Advances in Neural Informa- ternational Conference on Robotics and Automation, ICRA
tion Processing Systems, 36. 2023, London, UK, May 29 - June 2, 2023, 2980–2987.
Guo, Z.; Gao, X.; Zhou, J.; Cai, X.; and Shi, B. IEEE.
2023. SceneDM: Scene-level multi-agent trajectory gen- Ngiam, J.; Caine, B.; Vasudevan, V.; Zhang, Z.; Chiang, H.-
eration with consistent diffusion models. arXiv preprint T. L.; Ling, J.; Roelofs, R.; Bewley, A.; Liu, C.; Venugopal,
arXiv:2311.15736. A.; et al. 2021. Scene transformer: A unified architecture
Han, B.; Peng, H.; Dong, M.; Ren, Y.; Shen, Y.; and Xu, for predicting multiple agent trajectories. arXiv preprint
C. 2024. AMD: Autoregressive Motion Diffusion. In Pro- arXiv:2106.08417.
ceedings of the AAAI Conference on Artificial Intelligence, Niedoba, M.; Lavington, J.; Liu, Y.; Lioutas, V.; Sefas, J.;
volume 38, 2022–2030. Liang, X.; Green, D.; Dabiri, S.; Zwartsenberg, B.; Scibior,
Harvey, W.; Naderiparizi, S.; Masrani, V.; Weilbach, C.; and A.; et al. 2024. A Diffusion-Model of Joint Interactive Navi-
Wood, F. 2022. Flexible diffusion modeling of long videos. gation. Advances in Neural Information Processing Systems,
Advances in Neural Information Processing Systems, 35: 36.
27953–27965. Pronovost, E.; Ganesina, M. R.; Hendy, N.; Wang, Z.;
Ho, J.; Chan, W.; Saharia, C.; Whang, J.; Gao, R.; Gritsenko, Morales, A.; Wang, K.; and Roy, N. 2023. Scenario Dif-
A.; Kingma, D. P.; Poole, B.; Norouzi, M.; Fleet, D. J.; et al. fusion: Controllable driving scenario generation with diffu-
2022a. Imagen video: High definition video generation with sion. Advances in Neural Information Processing Systems,
diffusion models. arXiv preprint arXiv:2210.02303. 36: 68873–68894.
Rempe, D.; Philion, J.; Guibas, L. J.; Fidler, S.; and Litany,
Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion
O. 2022. Generating Useful Accident-Prone Driving Scenar-
probabilistic models. Advances in neural information pro-
ios via a Learned Traffic Prior. In Conference on Computer
cessing systems, 33: 6840–6851.
Vision and Pattern Recognition (CVPR).
Ruhe, D.; Heek, J.; Salimans, T.; and Hoogeboom, E. Zhang, Z.; Liu, R.; Aberman, K.; and Hanocka, R. 2023.
2024. Rolling Diffusion Models. arXiv preprint TEDi: Temporally-entangled diffusion for long-term motion
arXiv:2402.09470. synthesis. arXiv preprint arXiv:2307.15042.
Ścibior, A.; Lioutas, V.; Reda, D.; Bateni, P.; and Wood, Zhao, T.; Xu, Y.; Monfort, M.; Choi, W.; Baker, C.; Zhao,
F. 2021. Imagining the road ahead: Multi-agent trajectory Y.; Wang, Y.; and Wu, Y. N. 2019. Multi-agent tensor fu-
prediction via differentiable simulation. In 2021 IEEE In- sion for contextual trajectory prediction. In Proceedings of
ternational Intelligent Transportation Systems Conference the IEEE/CVF conference on computer vision and pattern
(ITSC), 720–725. IEEE. recognition, 12126–12134.
Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; and Zhong, Z.; Rempe, D.; Xu, D.; Chen, Y.; Veer, S.; Che, T.;
Ganguli, S. 2015. Deep unsupervised learning using Ray, B.; and Pavone, M. 2023. Guided conditional diffu-
nonequilibrium thermodynamics. In International confer- sion for controllable traffic simulation. In 2023 IEEE Inter-
ence on machine learning, 2256–2265. PMLR. national Conference on Robotics and Automation (ICRA),
Song, Y.; Sohl-Dickstein, J.; Kingma, D. P.; Kumar, A.; Er- 3560–3566. IEEE.
mon, S.; and Poole, B. 2020. Score-based generative model-
ing through stochastic differential equations. arXiv preprint
arXiv:2011.13456.
Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Pat-
naik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al.
2020. Scalability in perception for autonomous driving:
Waymo open dataset. In Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition, 2446–
2454.
Suo, S.; Regalado, S.; Casas, S.; and Urtasun, R. 2021. Traf-
ficsim: Learning to simulate realistic multi-agent behaviors.
In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 10400–10409.
Tashiro, Y.; Song, J.; Song, Y.; and Ermon, S. 2021. Csdi:
Conditional score-based diffusion models for probabilistic
time series imputation. Advances in Neural Information
Processing Systems, 34: 24804–24816.
Treiber, M.; Hennecke, A.; and Helbing, D. 2000. Con-
gested traffic states in empirical observations and micro-
scopic simulations. Physical review E, 62(2): 1805.
Uria, B.; Murray, I.; and Larochelle, H. 2014. A deep and
tractable density estimator. In International Conference on
Machine Learning, 467–475. PMLR.
Wu, T.; Fan, Z.; Liu, X.; Zheng, H.-T.; Gong, Y.; Jiao, J.; Li,
J.; Guo, J.; Duan, N.; Chen, W.; et al. 2024. Ar-diffusion:
Auto-regressive diffusion model for text generation. Ad-
vances in Neural Information Processing Systems, 36.
Xu, D.; Chen, Y.; Ivanovic, B.; and Pavone, M. 2023. Bits:
Bi-level imitation for traffic simulation. In 2023 IEEE Inter-
national Conference on Robotics and Automation (ICRA),
2929–2936. IEEE.
Yin, W.; Tu, R.; Yin, H.; Kragic, D.; Kjellström, H.; and
Björkman, M. 2023. Controllable Motion Synthesis and Re-
construction with Autoregressive Diffusion Models. In 2023
32nd IEEE International Conference on Robot and Human
Interactive Communication (RO-MAN), 1102–1108. IEEE.
Zhan, W.; Sun, L.; Wang, D.; Shi, H.; Clausse, A.; Naumann,
M.; Kümmerle, J.; Königshof, H.; Stiller, C.; de La Fortelle,
A.; and Tomizuka, M. 2019. INTERACTION Dataset:
An INTERnational, Adversarial and Cooperative moTION
Dataset in Interactive Driving Scenarios with Semantic
Maps. arXiv:1910.03088 [cs, eess].
Supplementary Materials SNR ratio within the window. Denoising for the next sim-
Introduction ulation step does not start from Gaussian noise, resulting in
lower accumulation errors compared to DJINN-10 (MPC-1).
We provide an extended discussion on related work regard-
ing autoregressive diffusion models. We also detail the com- Table 4: Accumulation Errors caused by replanning for
putational resources and datasets used in our experiments. DJINN-10
Furthermore, we present additional results on accumulation
errors caused by replanning for DJINN-10, as well as visual- Metrics DJINN-10 (MPC-1) DJINN-10 (MPC-5)
izations showcasing the reactivity of the RoAD model with minSceneADE 0.692 0.583
varying window sizes. minSceneFDE 1.675 1.351
Miss Rate 0.166 0.091
Additional Related Work
Autoregressive Diffusion Models (ARDM) (Hoogeboom
et al. 2021) introduce an order-agnostic autoregressive dif- Additional Visualizations
fusion model that combines an order-agnostic autoregressive
model (Uria, Murray, and Larochelle 2014) with a discrete In Figure 4, we demonstrate that the flexibility of our RoAD
diffusion model (Austin et al. 2021). The order-agnostic na- model by adjusting the sliding window size.
ture of this model eliminates the need for generating sub-
sequent predictions in a specific order, thereby enabling
faster prediction times through parallel sampling. Addition-
ally, relaxing the causal assumption leads to a more efficient
per-time-step loss function during training. However, such
a model is not suitable for our application due to the se-
quential nature of traffic simulation. AMD (Han et al. 2024)
proposes an auto-regressive motion generation approach for
human motion given a text prompt, but unlike the Rolling
Diffusion Model, it denoises one clean motion sample at a
time, which is slow at prediction time. The Rolling Diffu-
sion Model (RDM) (Ruhe et al. 2024) proposes a sliding
window approach targeted at long video generation but does
not specifically study its application in a multi-agent system,
particularly for closed-loop traffic simulation. We investi-
gate the level of reactivity when applying rolling diffusion
models as a traffic scene planner.

Compute resources
We run all our experiments on four NVIDIA V100 GPUs
hosted by a cloud provider. We trained our RoAD models
for 9 days, and so 36 GPU-days. AR and DJINN were also
trained for 36 GPU-days. In total, including preliminary runs
and ablations, we estimate that the project required roughly
300 GPU-days.

Dataset
We experiment with the INTERACTION dataset (Zhan et al.
2019) which is available for non-commercial use following
the guidelines at https://interaction-dataset.com/.

Accumulation Errors for DJINN-10

We observe that DJINN-10, even when trained with con-
ditional augmentation, still experiences accumulation errors
caused by autoregressive replanning. In Table 4, we compare
the displacement error of DJINN-10 at a replanning rate of
10 Hz (MPC-1) and 2 Hz (MPC-5) for a prediction horizon
of 40 with an observation length of 10. DJINN-10 (MPC-1)
exhibits significantly higher displacement error. In contrast,
RoAD utilizes a sliding window approach with decreasing
Figure 4: From top to bottom row: RoAD-20, RoAD-15. The adversarial agent, marked with a red dot, follows its replay log
and slows down to reach only half its trajectory by the end of the simulation. Brown circles highlight the interaction region.
RoAD-15 achieves better reactivity than RoAD-20, as reducing the window size causes the model to denoise the next element
from a lower signal-to-noise ratio (SNR), which provides the model with greater flexibility to adjust to the adversarial agent.

Instant Access to Transport Simulation Beyond Traditional Approaches 1st Edition Edward Chung ebook Full Chapters
100% (1)
Instant Access to Transport Simulation Beyond Traditional Approaches 1st Edition Edward Chung ebook Full Chapters
80 pages
2306.12241v7
No ratings yet
2306.12241v7
27 pages
sukthankar_rahul_1997_1
No ratings yet
sukthankar_rahul_1997_1
226 pages
Autonomous Behavior Selection for Self-driving Car
No ratings yet
Autonomous Behavior Selection for Self-driving Car
25 pages
Generalized Predictive Model for Autonomous Driving
No ratings yet
Generalized Predictive Model for Autonomous Driving
33 pages
Ref 4
No ratings yet
Ref 4
9 pages
2505.11247v1
No ratings yet
2505.11247v1
13 pages
SONG-SENIORTHESIS-2019
No ratings yet
SONG-SENIORTHESIS-2019
50 pages
WD-RL
No ratings yet
WD-RL
24 pages
9deeee0c-a812-46dd-8ecd-c29b31d1618d
No ratings yet
9deeee0c-a812-46dd-8ecd-c29b31d1618d
34 pages
sensors-23-09225
No ratings yet
sensors-23-09225
27 pages
Self Driving Cars
No ratings yet
Self Driving Cars
26 pages
184 One Model To Drift Them Al
No ratings yet
184 One Model To Drift Them Al
27 pages
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
No ratings yet
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
16 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
26 pages
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
No ratings yet
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
16 pages
End-To-End Autonomous Driving: Challenges and Frontiers
No ratings yet
End-To-End Autonomous Driving: Challenges and Frontiers
21 pages
Chatscene: Knowledge-Enabled Safety-Critical Scenario Generation For Autonomous Vehicles
No ratings yet
Chatscene: Knowledge-Enabled Safety-Critical Scenario Generation For Autonomous Vehicles
18 pages
2501.12408v1
No ratings yet
2501.12408v1
16 pages
2412.18607v1
No ratings yet
2412.18607v1
15 pages
How Simulation Helps Autonomous Driving
No ratings yet
How Simulation Helps Autonomous Driving
18 pages
GAN-Based Generation of Synthetic Data for Vehicle Driving Events
No ratings yet
GAN-Based Generation of Synthetic Data for Vehicle Driving Events
38 pages
Learning naturalistic driving environment with statistical realism
No ratings yet
Learning naturalistic driving environment with statistical realism
14 pages
LiDARsim: Realistic LiDAR Simulation by Leveraging The Real World
No ratings yet
LiDARsim: Realistic LiDAR Simulation by Leveraging The Real World
11 pages
Convolutional Social Pooling For Vehicle Trajectory Prediction
No ratings yet
Convolutional Social Pooling For Vehicle Trajectory Prediction
9 pages
T: L L D S: Rajeglish Earning The Anguage of Riving Cenarios
No ratings yet
T: L L D S: Rajeglish Earning The Anguage of Riving Cenarios
19 pages
GENDRIVE
No ratings yet
GENDRIVE
7 pages
2404.09111v1
No ratings yet
2404.09111v1
7 pages
Chen DeepDriving Learning Affordance ICCV 2015 Paper
No ratings yet
Chen DeepDriving Learning Affordance ICCV 2015 Paper
9 pages
IGP2 emnvironment
No ratings yet
IGP2 emnvironment
7 pages
Paper 3 Mlis
No ratings yet
Paper 3 Mlis
9 pages
Paper 11
No ratings yet
Paper 11
15 pages
Huang_GameFormer_Game-theoretic_Modeling_and_Learning_of_Transformer-based_Interactive_Prediction_and_ICCV_2023_paper
No ratings yet
Huang_GameFormer_Game-theoretic_Modeling_and_Learning_of_Transformer-based_Interactive_Prediction_and_ICCV_2023_paper
11 pages
Microscopic Traffic Simulation by Cooperative Multi-Agent Deep Reinforcement Learning
No ratings yet
Microscopic Traffic Simulation by Cooperative Multi-Agent Deep Reinforcement Learning
9 pages
Final - Research - Report
No ratings yet
Final - Research - Report
9 pages
Drivergym: Democratising Reinforcement Learning For Autonomous Driving
No ratings yet
Drivergym: Democratising Reinforcement Learning For Autonomous Driving
8 pages
Yang 等 - 2024 - A Multi-Task Learning Network With a Collision-Aware Graph Transformer for Traffic-Agents Trajectory
No ratings yet
Yang 等 - 2024 - A Multi-Task Learning Network With a Collision-Aware Graph Transformer for Traffic-Agents Trajectory
14 pages
Towards Truly Agent-Based Traffic and Mobility Simulations
No ratings yet
Towards Truly Agent-Based Traffic and Mobility Simulations
8 pages
2019 EG STAR A Survey On Visual Traffic Simulation and Animation
No ratings yet
2019 EG STAR A Survey On Visual Traffic Simulation and Animation
22 pages
End-To-End Contextual Perception and Prediction With Interaction Transformer
No ratings yet
End-To-End Contextual Perception and Prediction With Interaction Transformer
8 pages
MSR Thesis Document
No ratings yet
MSR Thesis Document
43 pages
Master Thesis Proposal Santiago Amaya
No ratings yet
Master Thesis Proposal Santiago Amaya
5 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
24 pages
Reactive and Safe Road User Simulations Using Neural Barrier Certificates
No ratings yet
Reactive and Safe Road User Simulations Using Neural Barrier Certificates
8 pages
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
No ratings yet
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
8 pages
Scenario Generation For Autonomous Vehicles With Deep-Learning-Based Heterogeneous Driver Models Implementation and Verification
No ratings yet
Scenario Generation For Autonomous Vehicles With Deep-Learning-Based Heterogeneous Driver Models Implementation and Verification
14 pages
1Z0-1077-24 (1) (1) - converted
No ratings yet
1Z0-1077-24 (1) (1) - converted
97 pages
Behav Plan Rep
No ratings yet
Behav Plan Rep
7 pages
Nuplan: A Closed-Loop Ml-Based Planning Benchmark For Autonomous Vehicles
No ratings yet
Nuplan: A Closed-Loop Ml-Based Planning Benchmark For Autonomous Vehicles
5 pages
Me 2017 Dec7
No ratings yet
Me 2017 Dec7
5 pages
Regenerative Agriculture
No ratings yet
Regenerative Agriculture
301 pages
End-To-End Learning of Driving Models From Large-Scale Video Datasets
No ratings yet
End-To-End Learning of Driving Models From Large-Scale Video Datasets
9 pages
21AIE401DRL TeamNo4 AIE19005 20 36 Report
No ratings yet
21AIE401DRL TeamNo4 AIE19005 20 36 Report
7 pages
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
No ratings yet
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
14 pages
Mainrep
No ratings yet
Mainrep
6 pages
Mainrep
No ratings yet
Mainrep
6 pages
Muslim Morning and Evening Routine
100% (1)
Muslim Morning and Evening Routine
23 pages
Responsive Web Design with Adobe Photoshop 1st Edition Dan Rose instant download
No ratings yet
Responsive Web Design with Adobe Photoshop 1st Edition Dan Rose instant download
62 pages
Integrating Deep Reinforcement Learning With Model-Based Path Planner
No ratings yet
Integrating Deep Reinforcement Learning With Model-Based Path Planner
6 pages
传感器问题
No ratings yet
传感器问题
21 pages
Kalari Pay at Tu
100% (1)
Kalari Pay at Tu
5 pages
Quante C
No ratings yet
Quante C
4 pages
Englishmeat PDF
No ratings yet
Englishmeat PDF
286 pages
Python Full
No ratings yet
Python Full
28 pages
Procedure For MV Cable Splicing and Termination
100% (1)
Procedure For MV Cable Splicing and Termination
14 pages
Ozone Furniture Locks
No ratings yet
Ozone Furniture Locks
13 pages
Practice Paper - 2
No ratings yet
Practice Paper - 2
10 pages
Devops Assignment
No ratings yet
Devops Assignment
10 pages
The Secrets of Life and Death by Rebecca Alexander - Excerpt
0% (1)
The Secrets of Life and Death by Rebecca Alexander - Excerpt
30 pages
CRH Case & Question
0% (2)
CRH Case & Question
9 pages
Ite2004 Software-Testing Eth 1.0 37 Ite2004
No ratings yet
Ite2004 Software-Testing Eth 1.0 37 Ite2004
2 pages
Pre-final Examination_ Attempt Review
No ratings yet
Pre-final Examination_ Attempt Review
7 pages
Rule 34 Web App Launches Allows Users To Create Their Own Image-Boards, Share Images and Post Comments Anonymously in Realtime
No ratings yet
Rule 34 Web App Launches Allows Users To Create Their Own Image-Boards, Share Images and Post Comments Anonymously in Realtime
2 pages
Talking Politics
No ratings yet
Talking Politics
4 pages
Geology of Bengal Basin - Indian Author
No ratings yet
Geology of Bengal Basin - Indian Author
31 pages
PROCEDURE FOR CALIBRATION OF MULTIMETER
No ratings yet
PROCEDURE FOR CALIBRATION OF MULTIMETER
6 pages
Assembly / Installation Instructions:: 6 Corporate Parkway Goose Creek, Sc. 29445 Www. Quoizel. Com
No ratings yet
Assembly / Installation Instructions:: 6 Corporate Parkway Goose Creek, Sc. 29445 Www. Quoizel. Com
1 page
VPMPGF 19g526 Ac
No ratings yet
VPMPGF 19g526 Ac
1 page
Assessing-Learning-in-Social-Studies
No ratings yet
Assessing-Learning-in-Social-Studies
6 pages
Movie Analysis (The Boy in The Striped Pajamas)
No ratings yet
Movie Analysis (The Boy in The Striped Pajamas)
2 pages
15kW FinePackagesHome
No ratings yet
15kW FinePackagesHome
2 pages
M-Arch Thesis Program Report-2020-Final
No ratings yet
M-Arch Thesis Program Report-2020-Final
31 pages
Real Numbers and The Role of Mathematics in Business
No ratings yet
Real Numbers and The Role of Mathematics in Business
4 pages
Cryo Vs Mech Food Freeze
No ratings yet
Cryo Vs Mech Food Freeze
2 pages
EM6 Magnetic Effects of Current
No ratings yet
EM6 Magnetic Effects of Current
3 pages
Past Exam Questions On Section 3g
No ratings yet
Past Exam Questions On Section 3g
4 pages
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
From Everand
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Foundations of Driverless Technology: Definitive Reference for Developers and Engineers
From Everand
Foundations of Driverless Technology: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet

2502.09587v1

Uploaded by

2502.09587v1

Uploaded by

Rolling Ahead Diffusion for Traffic Scene Simulation

Yunpeng Liu1,2 Matthew Niedoba1,2 William Harvey1 Adam Ścibior2

Abstract addition, real-world agents react to the ego agent’s behavior

Warm-up stage Rolling stage

Table 1: Comparison of displacement metrics and collision rate.

Accumulation Errors for DJINN-10

You might also like