0% found this document useful (0 votes)

13 views

Spatio-Temporal Transformer Networks for Trajectory Prediction in Autonomous Driving

Uploaded by

chenweihuang666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Spatio-Temporal Transformer Networks for Trajectory Prediction in Autonomous Driving

Uploaded by

chenweihuang666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Proceedings of Machine Learning Research 157, 2021 ACML 2021

S2TNet: Spatio-Temporal Transformer Networks for

Trajectory Prediction in Autonomous Driving

Weihuang Chen [email protected]

Fangfang Wang [email protected]
Hongbin Sun(corresponding author) [email protected]
Xi’an Jiaotong University, Xi’an, China

Editors: Vineeth N Balasubramanian and Ivor Tsang

Abstract
To safely and rationally participate in dense and heterogeneous traffic, autonomous vehi-
cles require to sufficiently analyze the motion patterns of surrounding traffic-agents and
accurately predict their future trajectories. This is challenging because the trajectories of
traffic-agents are not only influenced by the traffic-agents themselves but also by spatial
interaction with each other. Previous methods usually rely on the sequential step-by-step
processing of Long Short-Term Memory networks (LSTMs) and merely extract the inter-
actions between spatial neighbors for single type traffic-agents. We propose the Spatio-
Temporal Transformer Networks (S2TNet), which models the spatio-temporal interactions
by spatio-temporal Transformer and deals with the temporel sequences by temporal Trans-
former. We input additional category, shape and heading information into our networks
to handle the heterogeneity of traffic-agents. The proposed methods outperforms state-of-
the-art methods on ApolloScape Trajectory dataset by more than 7% on both the weighted
sum of Average and Final Displacement Error.
Keywords: Trajectory prediction, Transformer, Autonomous Driving.

1. Introduction
Autonomous driving is an innovative and advanced research field that can reduce the num-
ber of road fatalities, increase traffic efficiency, decrease environmental pollution and give
mobility to handicapped members of our society (Milakis et al. (2017)). In order to achieve
desired goals and avoid collisions of other agents, autonomous vehicles need to have the
ability to perceive the environment and make intelligent decisions. As a part of perception,
trajectory prediction can well reflect the future behaviors of surrounding agents and build
a bridge between perception and decision-making. However, complex temporal prediction
is inevitably accompanied by spatial agent-agent interactions at the same time, especially
in the dense and highly dynamic traffic composed of heterogeneous traffic-agents, including
pedestrians, cyclists, human drivers. The heterogeneity means that these traffic-agents have
diverse shapes, sizes, dynamics and behaviors. Moreover, a variety of potentially reason-
able spatial interactions between traffic-agents may occur, e.g. human drivers may overtake
another vehicle or slow down to follow other vehicles (Lefèvre et al. (2014)). Consequently,
trajectory prediction is a challenging task that plays an important role in autonomous
driving.

© 2021 W. Chen, F. Wang & H. Sun.

Chen Wang Sun

Classical methods treat traffic-agents as individual entities without any spatial inter-
actions and abstract their motion as kinematic and dynamic models (Brännström et al.
(2010)), Gaussian Processes (Rasmussen (2003)) and etc., making it difficult to compre-
hend complex scenarios or accomplish long-term predictions. With the success in deep
neural networks, recent trajectory prediction methods mainly focus on using these net-
works to extract features on spatial and temporal dimensions (Alahi et al. (2016); Huang
et al. (2019); Ivanovic and Pavone (2019); Mohamed et al. (2020);). Long Short-Term
Memory networks (LSTMs) are widely used for modeling temporal features. The LSTMs
are based on consecutively processing sequences and storing the latent states to represent
knowledge about the motion of traffic-agents (Giuliari et al. (2021)). However, LSTM-based
methods remember the history with a single vector with limited memory and regularly have
difficulty in handling complex temporal dependencies (Vaswani et al. (2017)). After that,
pooling mechanism (Deo and Trivedi (2018)), attention mechanisms (Ivanovic and Pavone
(2019)) and graph convolution mechanisms (Li et al. (2019); Yu et al. (2020)) are used to
model the spatial interactions. The limitation of these methods is that they only model the
interactions of spatially proximal traffic-agents and ignore the influence by traffic-agents
beyond the given spatial limits. This assumption may work well when the speed of traffic-
agents is low, but lose efficacy with speed increasing. Besides, the majority of trajectory
prediction algorithms are developed for homogeneous traffic-agents in a single scene, which
corresponding to human pedestrians in crowds (Alahi et al. (2016)) or moving vehicles on
a highway (Deo and Trivedi (2018). These methods may have great limitation on dealing
with dense urban environments where heterogeneous traffic-agents coexist and interact with
each other.
In this paper, we address all these limitations by employing Spatio-Temporal Trans-
former Networks (S2TNet) for heterogeneous traffic-agents trajectory prediction. S2TNet
is proposed based on the vanilla Transformer architecture, which discards the sequential
nature of data and models features with only the effective self-attention mechanism. For
the spatial dimension, we propose spatial self-attention mechanism to capture the interac-
tions between all traffic-agents in the road network, not limited to the interactions between
spatial neighbors. For the temporal dimension, temporal convolution network (TCN) is
adopted to extract temporal dependencies of consecutive frame and combined with spatial
self-attention to form the spatio-temporal Transformer where a set of new spatio-temporal
features are obtained. Based on temporal self-attention mechanism, temporal Transformer
could refine the temporal features for each traffic-agent independently and produce the fu-
ture trajectories auto-regressively. In addition to history trajectories, we input additional
shape, heading, category features into our networks to handle the heterogeneity of traffic-
agents. Main contributions of this paper are summarized as follows:

• We put forward an innovative approach for heterogeneous traffic-agents trajectory

prediction, employing Transformer-based networks to accurately extract interaction
information both on the spatial and temporal dimensions.

• Spatio-temporal Transformer is designed creatively to merge spatial and temporal in-

formation from the history features of traffic-agents. After that, temporal Transformer
is utilized to enhance capturing temporal dependencies and output future trajectories
with specified length.
S2TNet

• S2TNet outperforms prior methods on ApolloScape Trajectory dataset by 7.2% on the

weighted sum of Average Displacement Error (WSFDE) and 7.7% on the weighted
sum of Final Displacement Error (WSFDE)

2. Background
2.1. Problem Formulation
Trajectory prediction aims to accurately predict the future long-term trajectories of traffic-
agents, given their history trajectories and other information such as shapes and categories.
The input of S2TNet is
X = [x1 , · · · , xtobs ] (1)
where,
xi = {(xi0 , y0i , l0i , w0i , θ0i , τ0i , · · · , xin , yni , lni , wni , θni , τni ) | i ∈ (1 : tobs )} (2)
are the history feature vectors (including global coordinates x and y, lengths l, widths
w, headings θ and categories τ ) of n traffic-agents being predicted in a road network. The
subscript n in (2) refers to all agents in general and varies with different scenes. We currently
take into account five types of traffic-agents c ∈ (1, 2, 3, 4, 5), representing small vehicles,
big vehicles, pedestrian, cyclist and others sequentially. We hold that additional features if
available to each traffic-agent could handle the heterogeneity of traffic-agents and improve
trajectory accuracy.
The output of S2TNet is
Y = [ytobs , · · · , ytf ut ] (3)
where,
yi = {(xi0 , y0i , · · · , xin , yni ) | i ∈ (tobs+1 : tf ut )} (4)
are the future feature vectors including global coordinates x and y. It is noted that S2TNet
outputs future positions of all observed traffic-agents simultaneously other than merely
predicting the location of one specific traffic-agent.
With the objective to hierarchically represent the trajectory sequences, we construct
a spatio-temporal graph G = (V, E) on a trajectory sequence with N traffic-agents and
T frames featuring both intra-frame and inter-frame connection. In this graph, the node
set V = {xti | t ∈ (1, T ), i ∈ (1, N )} includes all the feature vectors of traffic-agents, and
E represents the set of edges connected between nodes. We utilize node and traffic-agent
equally in the following description. The edge set E consists of two subsets. The fist subsets
depicts the virtual spatial connection between traffic-agents in the same frame, denotes
as ES = {(xti , xtj ) | i, j ∈ (1, N ), t ∈ (1, T )}. The second subset contains the temporal
edges which connects the same traffic-agent in consecutive frames as ET = {(xti , xi t1 ) | i ∈
(1, N ), t, t1 ∈ (1, T )}.

2.2. Trajectory prediction networks overview

Trajectory prediction using RNNs Recurrent Neural Networks (RNNs) and their
variant structures such as LSTMs and Gated Recurrent Units (GRU) have made great
progress in trajectory prediction. As one of the earliest RNNs using in trajectory prediction,
Chen Wang Sun

Social-LSTM (Alahi et al. (2016)) addresses interaction between neighborhood by defining

a spatial grid based pooling scheme to aggregate the recurrent outputs of all the agents
around the agent being predicted. However, this hand-crafted solution is inefficient and
fails to capture global context since cells in one grid are treated equally. CS-LSTM (Gupta
et al. (2018)) combines a novel pooling mechanism with generative adversarial networks to
tackle intrinsically multimodality of pedestrian trajectory. TrafficPredict (Ma et al. (2019))
utilizes LSTM to refine the similarities of motion pattern of instances into category features
for heterogeneous agent prediction. SR-LSTM (Zhang et al. (2019)) introduces a message
passing and selecting mechanism to capture the crucial current intention of the neighbors.
Trajectory prediction using hybrid networks Recently, the approaches of trajec-
tory prediction have been extended to hybrid networks by combining RNNs with Convolu-
tion Networks (CNNs), Generative Adversarial Network (GAN) or Graph Neural Networks
(GNNs). Traphic (Chandra et al. (2019)) introduces CNNs into pooling mechanism for
maneuver based trajectory prediction. Sophie (Sadeghian et al. (2019)) concatenates the
outputs of social and physical attention with the scene features extracted by a CNNs and
takes advantage of GAN to generate more realistic samples for the path prediction of mul-
tiple interacting agents. Social-BiGAT (Kosaraju et al. (2019)) captures the social interac-
tions information between pedestrians on the basis of graph attention networks. GRIP (Li
et al. (2019)) directly models traffic-agents history trajectories as a spatio-temporal graph
and forecasts future trajectories based on Spatio-Temporal Graph Convolutional Networks
(ST-GCNs).
Trajectory prediction using Transformers On account of Transformer’s (Vaswani
et al. (2017)) unique attention mechanism and superior performance in NLP, there is an
emerging interest in applying Transformer architectures to prediction tasks. Without con-
sidering any complicated interaction information, (Giuliari et al. (2021)) utilizes vanilla
Transformer for pedestrian trajectory forecasting and achieves plausible results. STAR
(Yu et al. (2020)) interleaves a variant of the graph convolution, named TGConv, and the
original Temporal Transformer for spatio-temporal interactions modeling. Inspired by the
parallel version of transformer used in (Carion et al. (2020)), mmTransformer (Liu et al.
(2021)) uses stacked transformers to aggregate multiple information and achieves multi-
modal prediction.

2.3. Self-attention in Transformer

The core of Transformer networks is their unique self-attention mechanism which is used
in parallel, while LSTMs only serially combine current word with the embedding of pre-
vious words which have been processed. The first step in calculating the self-attention of
Transformer is to learn three vectors separately, i.e. query qi ∈ Rdq , key ki ∈ Rdk and value
vi ∈ Rdv through trainable linear mapping from each embedding wi ∈ Re , i ∈ (1, n), where
n is the number of words being considered. After that, a score is calculated by the dot
product of a query and key, αij = qi · kjT , ∀i, j ∈ (1, n), where superscript T is the transpose
of vector. Through the softmax function, all scores belonging to the same node are normal-
ized. Finally, the ith self-attention is obtained by multiplying each vj by normalized scores
αij and summing the weighted results.
S2TNet

Predicted Trajectories N×
Output coordinates
y Add & Norm
x Trajectory Generator

Temporal Convolution
Temporal
FC
Transformer Decoder
History features Spatial Self-attention
τ T
w Temporal
l Transformer Encoder
θ N
y LEGEND
x
Concatenation
Spatio-temporal
FC Transformer Encoder Positional Encoding

FC Fully connected layer

Figure 1: Overview of S2TNet. S2TNet leverages the encoders representation of history

features, i.e. x and y coordinates, length l, width w, heading θ and category τ ,
of all N traffic-agents in T frames, and the decoder to obtain the refined out-
put spatio-temporal features, and further generates future trajectories by passing
them to trajectory generator. The two encoders and one decoder contains a stack
of N = 6 identical layers respectively. The detailed Temporal Transformer can
be found in appendix A.

In practice, the attention function is computed on a set of words simultaneously. Three

vectors (q, k, v) of all words are individually packed together into three matrices (Q, K, V ).
The output of this process, named as scaled dot-product attention, can be written as:

QKT
Attention(Q,K,V) = sof tmax( √ )V (5)
dk
√
where dk is the dimension of each query. The division by dk is used to increase gradients
stability.
By adding multi-head attention mechanism, we can further improve the performance of
self-attention. It gives multiple representation sub-spaces for self-attention and enables the
model to jointly deal with information from varied sub-spaces at separate positions.

3. Proposed S2TNet Model

3.1. Overview of S2TNet Model
As illustrated in Fig. 1, the whole model can be viewed as an encoder-decoder architecture in
which spatio-temporal Transformer encoder, temporal Transformer encoder and temporal
Transformer decoder are aggregated hierarchically. For the sake of acquiring abundant
motion information, the history feature vectors of each traffic-agent are embedded onto a
higher dimensional space by means of a fully connected layer. Then, the spatial interactions
in intra-frame are captured by spatial self-attention and the temporal features of inter-
frame are obtained by TCN. Our model emphasizes the coupled spatio-temporal modeling
by interleaving the spatial self-attention and TCN in a single spatio-temporal Transformer
Chen Wang Sun

n2t n3t

n1t mt41
mt43 n14
t t
n42 n4t obs
m
42
n 4

mt44

（a） Spatial Self-Attention （b） Temporal Self-Attention

Figure 2: Spatial and Temporal Self-Attention. (a) The spatial interactions of node 4 in
frame t is modeled. nti (i = 1, 2, 3, 4) is the embeddings of node i. mt4j (j =
1, 2, 3, 4) is the message passing from node j to 4. (b) The temporal correlations
between inter-frame are computed in temporal Transformer where the nodes are
independent of each other.

layer. In order to further capture the temporal dependencies on all history frames, we
perform post-processing of the input embeddings with the second temporal Transformer
encoder. Temporal Transformer decoder refines the output embeddings based on the spatio-
temporal features provided by encoders and the previously predicted output embeddings
produced by previously output coordinates. Finally, the trajectory generator outputs all
the traffic-agents future trajectories Y(tobs +1,tf ut ) simultaneously by decoding the output
embeddings.

3.2. Spatio-temporal Transformer

In order to handle the spatial interactions coupled with temporal continuity, we creatively
design a spatio-temporal Transformer encoder that captures spatial information through
a spatial self-attention sub-layer and extracts dependencies along the temporal dimension
through a TCN sub-layer. We interleave two sub-layers to merge the spatio-temporal fea-
tures.
Spatial Self-attention sub-layer From a different perspective of Transformer, the spa-
tial attention could be regard as spatial-edge in the spatio-temporal graph. We adopt
message passing mechanism on the spatial-edge to preform the suitable processing. For
each node i in the scene at time t, query qit ∈ Rdq , key kit ∈ Rdk and value vit ∈ Rdv is
computed by the linear projection from input embeddings hti ∈ RC :

qit = Wq · hti , kit = Wk · hti , vit = Wv · hti (6)

where Wq ∈ RC×dq , Wk ∈ RC×dk , Wv ∈ RC×dv . Attention score between node i and j ∈

(1, V ) is then obtained by applying scaled dot-product between qit and kjt , representing the
S2TNet

spatial-edge massage mtij send from j to i, as depicted in Fig. 2(a).

T
mtij = qit · kjt (7)

The messages sent from all j to i is normalized over the weights of spatial-edges and summed
to get a single attention head of node i, as in the following:
X mtij
headti = sof tmax( √ )vj (8)
j
dk

By repeating this embedding extraction process h times, multi-head attention are concate-
nated and projected to output embeddings with an fully connected layer:

M ultiHeadti = Wo · concat([headti0 , · · · , headtih ]) (9)

Temporal Convolution sub-layer After spatial information is obtained, we impose

temporal convolution operation on the temporal-edge in the spatio-temporal graph to model
temporal dynamics within trajectory sequence. Given the input graph of shape (T, N, C),
where T is history frame, N is node number and C is the embeddings, we use a standard
2D convolution with the kernel size (K × 1) to force on processing along the temporal
dimension, as expressed in the following:

outputi = ConvK×1 (M ultiHeadti ) (10)

As a Transformer structure, we regularly imply layer normalization (Ba et al. (2016))

after skip connection in the end of TCN sub-layer. That is, the output of sub-layer is
LayerN orm(x + Sublayer(x)). In this way, we have a well-defined operation on the con-
structed spatio-temporal graph.

3.3. Temporal Transformer

Temporal Transformer consists of an encoder and decoder. The capability of temporal
Transformer encoder is performed to better study the dynamics of each node independently
along the temporal dimension. The temporal Transformer decoder is used to refine the
output embeddings by encoder outputs and the previously predicted embeddings.
Encoder Temporal Transformer encoder layer is composed of temporal self-attention sub-
layer and separable convolution sub-layer. Each encoder layer has two sub-layers. The
first sub-layer, called temporal self-attention, uses a multi-head self-attention mechanism
similar to spatial self-attention sub-layer in Spatial Transformer with the difference that
correlations along the temporal dimension are computed independently for each node. As
shown in Fig. 2(b), the temporal self-attention for node i represented as:

M ultiHeadi = Wu · concat([headi0 , · · · , headih ]) (11)

Qi · K T
where, headi = sof tmax( √ i )Vi (12)
dk
Where Qi , K i and V i are query, key and value matrix learned from the embeddings of input
node i.
Chen Wang Sun

Instead of fully connected network used in vanilla Transformer, the second sub-layer is
the separable convolution (Chollet (2016)) in order to achieve higher accuracy.
Decoder To inject the relative position information of previous output trajectories to
decoder, we add the positional encodings to output embeddings:

P E(pos,2i) = sin(pos/100002i/dmodel ) (13)

P E(pos,2i) = cos(pos/100002i/dmodel ) (14)

where pos is the position, i is the dimension and dmodel the total dimensions of the output
embeddings.
Compared with temporal self-attention in encoder, decoder employs a masked temporal
self-attention sub-layer to ensure that the predictions for time t can only depend on the
known outputs at times less than t. Besides masked temporal self-attention and separable
convolution, a third sub-layer is inserted into decoder layer which performs multi-head
attention over the output of the temporal Transformer encoder.

3.4. Implementation Details

The scheme is implemented using PyTorch. The dimensions of embedding features is set to
32. We apply dropout to the output of each sub-layer before the skip connection step and
the output of positional encodings in the decoder stacks. The dropout ratio is Pdrop = 0.1.
An L2-loss is adopted
X T
t t 2
Loss = |Ypred − YGT | (15)
t=tobs +1
t
where Ypred and YGTt are predicted positions and ground truth positions respectively. We

use Adam Kingma and Ba (2014) as the optimizer and impose a learning rate variation
strategy as follows:

learning rate = d−0.5

dmodel · min(step num
−0.5
, step num · warmu steps−1.5 ) (16)

where warmup step is set to 5000. Random rotation is implemented for data augmentation
in the training.

4. Experiments
4.1. Dataset and Evaluation Metrics
Our model is evaluated on ApolloScape Trajectory dataset (Ma et al. (2019)) which is
collected by Apollo autonomous vehicles. The ApolloScape Trajectory dataset contains
images, point clouds, and manually annotated trajectories. It is gathered under various
lighting conditions and traffic densities in Beijing, China. More specifically, it comprises
vastly complex traffic flows mixed with vehicles, riders, and pedestrians. The dataset in-
cludes 53 minute training sequences and 50 minute S2TNet sequences captured at 2 frames
per second. We need to predict six future frames based on six history frames. Due to the
S2TNet

testset of ApolloScape Trajectory dataset is not public, we obtain the results of our model
and other baselines by uploading to the ApolloScape Trajectory Leaderboard 1 .
Two metrics are used to evaluate model performance: the Average Displacement Error
(ADE) (Pellegrini et al. (2009)) and the Final Displacement Error (FDE). ADE is the
mean Euclidean distance over all predicted positions and ground truth positions during
the prediction time, and FDE is the last item of ADE. Obviously, ADE shows the average
prediction performance, while the FDE reflects just the prediction accuracy at the end
points. Because the trajectories of heterogeneous traffic-agents are diverse in scales, we use
the following weighted sum of ADE (WSADE) and weighted sum of FDE (WSFDE) as
metrics:
W SADE = Dv · ADEv + Dp · ADEp + Db · ADEb (17)
W SF DE = Dv · F DEv + Dp · F DEp + Db · F DEb (18)
where Dv = 0.20, Dp = 0.58, and Db = 0.22 are relevant with reciprocals of the average
velocity of vehicles, pedestrian and cyclist in the dataset.

4.2. Baselines
To evaluate the performance of S2TNet, we compare S2TNet with a wide range of baselines,
including:
• Constant Velocity (CV): We use the average velocity of history trajectories as the
constant velocity during the future to predict trajectories.
• TrafficPredict: A LSTM-based method using a hierarchical architecture by (Ma et al.
(2019)).
• StarNet: (Zhu et al. (2019)) builds a star topology to consider the collective influence
among all pedestrians.
• Social LSTM (S-LSTM): (Alahi et al. (2016)) uses LSTM to extract single pedestrian
feature and devises a social pooling mechanism to capture neighbor information.
• Social GAN (S-GAN): (Gupta et al. (2018)) predicts socially plausible futures by a
conditional GAN.
• Transformer : (Giuliari et al. (2021)) uses vanilla temporal Transformer to model
pedestrian separately without any complex human-human interactions nor scene in-
teraction terms.
• STAR: (Yu et al. (2020)) interleaves spatial and temporal Transformer to capture the
social intersection between pedestrians.
• TPNet: (Fang et al. (2020)) first generates a candidate set of future trajectories, then
gets the final predictions by classifying and refining the candidates.
• GRIP++: (Li et al. (2019)) is the SOTA trajectory predictor which uses a enhanced
graph to represent the interactions of close objects, and applies ST-GCNS to extract
spatio-temporal features.
1. http://apolloscape.auto/leader board.htmll
Chen Wang Sun

4.3. Quantitative Results and Analyses

We compare S2TNet with the state-of-the-art approaches as mentioned in Section 4.2. All
methods are compared by the results released on ApolloScape Trajectory Leaderboard. The
main results are presented in Table 1.
From Table 1 we observe that the performance of S2TNet is superior to the baseline
methods of all traffic-agent types by a large margin. More specifically, our method reduces
the ADE of vehicles, pedestrians, and cyclists over GRIP++ by 11.28%, 4.31% and 10.24%
respectively. Meanwhile, our method reduces the FDE of vehicles, pedestrians, and cyclists
over GRIP++ by 12.21%, 4.98% and 5.87% sequentially. It is notice worthy that the de-
gree of improvement in vehicles and cyclists is better than pedestrians. We believe it is
because that the motion pattern of pedestrians are more flexible than vehicles and bikes
with non-holonomic constraint. Another remarkable finding is that simple model CV which
just makes use of average velocity of history trajectories outperforms many deep learning
methods including the SOTA model, STAR. This suggests that homogeneous methods may
not handle dense urban scenes efficiently. On the contrary, our approach performs bet-
ter in heterogeneous and dense urban environments. We will further demonstrate this in
Section 4.4 with visualized results.

Table 1: Comparison with baselines models on ApolloScape Trajectory dataset.

Method WSADE ADEv ADEp ADEb WSFDE FDEv FDEp FDEb
TrafficPredict 8.5881 7.9467 7.1811 12.8805 24.2262 12.7757 11.121 22.7912
S-LSTM 1.8922 2.9456 1.2856 2.5337 3.4024 5.2802 2.3240 4.5384
S-GAN 1.5829 3.0430 0.9836 1.8354 2.7796 5.0913 1.7264 3.4547
STAR 1.5400 2.5644 0.9473 2.1714 2.8602 4.6324 1.8029 4.0366
CV 1.4762 2.6454 0.8547 2.0519 2.7601 4.7944 1.6428 3.8564
StraNet 1.3425 2.3860 0.7854 1.8628 2.4984 4.2857 1.5156 3.4645
Transformer 1.2803 2.2322 0.7398 1.8398 2.4024 4.0317 1.4309 3.4826
TPNet 1.2800 2.2100 0.7400 1.8500 2.3400 3.8600 1.4100 3.4000
GRIP++ 1.2588 2.2400 0.7142 1.8024 2.3631 4.0762 1.3732 3.4155
S2TNet 1.1679 1.9874 0.6834 1.7000 2.1798 3.5783 1.3048 3.2151

4.4. Qualitative Results and Analyses

In Fig. 3, we visualize several prediction results of ApolloScape Trajectory dataset. We
separately present the trajectory of single traffic-agent with different type selected from
complex scenes and show the complete prediction results of a scene.

• S2TNet has the ability to forecast long horizon trajectories for different categories of
traffic-agents. After observing 6 frames (3s) of history trajectories, S2TNet could
accurately predict the trajectories over 3 seconds horizon. Moreover, S2TNet does
well in the case of sharp turns for the vehicle, e.g. Fig. 3(a) and (b). With the
increase of prediction length, the prediction results of S2TNet are more realistic and
the cumulative error of S2TNet is better than GRIP++, e.g. Fig. 3(c) and (d).
S2TNet

• S2TNet is able to model spatio-temporal interaction accurately. In the top right por-
tion of Fig. 3(e) and (f), a vehicle runs in opposite directions to an unknown traffic-
agent. While the predicted trajectories of GRIP++ deviates from ground truth,
S2TNet precisely captures the interactive routes.
• S2TNet successfully identify the stationary traffic-agent. In the lower-left of Fig. 3(e)
and (f), two vehicles decelerate to near standstill. Compared with GRIP++, S2TNet
successfully predicts the corresponding stationary trajectories.

Observation Ground_Truth GRIP++ S2TNet

(a) (b)
(e)

(c) (d) (f)

Figure 3: Visualized Prediction Results in heterogeneous and dense traffic. S2TNet success-
fully captures spatio-temporal information and outperforms the SOTA model,
GRIP++. (a, b, c, d) Comparison the future trajectories of different types of
traffic-agents between two methods. (e, f) The prediction results of GRIP++
and S2TNet in a complete traffic scene.

4.5. Ablation Studies

In this section, we conduct extensive ablation studies and focus on the effect of the proposed
components. The results are presented in Table 2.
• The spatio-temporal Transformer could sufficiently extract information both in spatial
and temporal dimensions. In (1), (2) and (3), we remove one or two sub-layers in
spatio-temporal Transformer. Compared (1) to (2), the model contains TCN sub-
layer outperforms solely temporal Transformer. On the contrast to outperforming in
our validation set, (3) which contains the spatial self-attention sub-layer and temporal
Transformer is worse than (1) in final test set. We hold that merely stacking attention
on the spatial dimension without merging temporal information results in overfitting.
• The temporal Transformer encoder enhance capturing temporal dependencies. In (4),
we remove the temporal Transformer encoder and gain a lower performance compared
Chen Wang Sun

with (8). This indicates that temporal self-attention mechanism could effectively
improve the ability to extract temporal information.

• The separable convolution outperforms full connected feed-forward network in tem-

poral Transformer. In (5), we replace separable convolution sub-layer in temporal
Transformer with full connected feed-forward network and the performance slightly
descends.

• More features, higher accuracy. Instead of feeding all features into S2TNet, we input
only history trajectories in (6). We find that rich information helps the network to
understand the heterogeneity of traffic-agents.

• The spatial self attention of the whole scene is better than that of the given spatial
limits We use a masked attention mechanism in (7) to ignore the influence out of the
given spatial limits (15m) as (Li et al. (2019)) does. We find that the traffic-agents in
the whole scene have a great influence on the accuracy of trajectory prediction.

Table 2: Ablation study. SS denotes spatial self-attention sub-layer in spatio-temporal

Transformer. TE denotes temporal Transformer encoder layer. SC denotes sepa-
rable convolution. FC denotes full connected layer. TD denotes temporal Trans-
former decoder layer. HF denotes history features. A denotes history features
including global coordinates, category, length, width and heading. C denotes only
global coordinates. LM denotes spatial limits used in spatial self-attention sub-
layer. W denotes the spatial self-attention without spatial limits. N denotes the
spatial self-attention of neighbors (15m).

Components Performance
SS TCN TE TD HF LM (WSADE/WSFDE)
(1) × × SC SC A W 1.2300/2.2949
(2) × X SC SC A W 1.2189/2.2570
(3) X × SC SC A W 1.2500/2.3561
(4) X X × SC A W 1.2674/2.4086
(5) X X FC FC A W 1.1945/2.2613
(6) X X SC SC C W 1.2170/2.3036
(7) X X SC SC A N 1.2686/2.3548
(8) X X SC SC A W 1.1679/2.1798

5. Conclusion
In this paper, we propose S2TNet, a Transformer-based framework to predict the trajec-
tories of heterogeneous traffic-agents around autonomous driving cars. Spatio-temporal
Transformer is designed to capture spatio-temporal interactions between all traffic-agents,
not limited to spatial neighbor. The temporal Transformer is utilized to enhance modeling
S2TNet

temporal dependencies and output future trajectories auto-regressively. The experimental

results from ApolloScape Trajectory dataset show that the proposed method achieves the
state-of-the-art performance and substantially improves the accuracy of the predicted tra-
jectories. In the future work, we intend to integrate additional map information on S2TNet
framework and implement real time prediction on autonomous driving platform by S2TNet.

Acknowledgments
This research is supported by National Natural Science Foundation of China (No. 61790563).

References
Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei,
and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
961–971, 2016.

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.

Mattias Brännström, Erik Coelingh, and Jonas Sjöberg. Model-based threat assessment
for avoiding arbitrary vehicle collisions. IEEE Transactions on Intelligent Transportation
Systems, 11(3):658–669, 2010.

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov,
and Sergey Zagoruyko. End-to-end object detection with transformers. In European
Conference on Computer Vision, pages 213–229. Springer, 2020.

Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, and Dinesh Manocha. Traphic: Tra-
jectory prediction in dense and heterogeneous traffic using weighted interactions. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 8483–8492, 2019.

François Chollet. Xception: Deep learning with depthwise separable convolutions. CoRR,
abs/1610.02357, 2016. URL http://arxiv.org/abs/1610.02357.

Nachiket Deo and Mohan M Trivedi. Convolutional social pooling for vehicle trajectory
prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pages 1468–1476, 2018.

Liangji Fang, Qinhong Jiang, Jianping Shi, and Bolei Zhou. Tpnet: Trajectory proposal
network for motion prediction. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, June 2020.

Francesco Giuliari, Irtiza Hasan, Marco Cristani, and Fabio Galasso. Transformer networks
for trajectory forecasting. In 2020 25th International Conference on Pattern Recognition,
pages 10335–10342. IEEE, 2021.
Chen Wang Sun

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. So-
cial GAN: socially acceptable trajectories with generative adversarial networks. CoRR,
abs/1803.10892, 2018. URL http://arxiv.org/abs/1803.10892.

Yingfan Huang, Huikun Bi, Zhaoxin Li, Tianlu Mao, and Zhaoqi Wang. Stgat: Model-
ing spatial-temporal interactions for human trajectory prediction. In Proceedings of the
IEEE/CVF International Conference on Computer Vision, pages 6272–6281, 2019.

Boris Ivanovic and Marco Pavone. The trajectron: Probabilistic multi-agent trajectory
modeling with dynamic spatiotemporal graphs. In Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision, pages 2375–2384, 2019.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.

Vineet Kosaraju, Amir Sadeghian, Roberto Martı́n-Martı́n, Ian Reid, S Hamid Rezatofighi,
and Silvio Savarese. Social-bigat: Multimodal trajectory forecasting using bicycle-gan
and graph attention networks. arXiv preprint arXiv:1907.03395, 2019.

Stéphanie Lefèvre, Dizan Vasquez, and Christian Laugier. A survey on motion prediction
and risk assessment for intelligent vehicles. ROBOMECH journal, 1(1):1–14, 2014.

Xin Li, Xiaowen Ying, and Mooi Choo Chuah. GRIP: graph-based interaction-aware tra-
jectory prediction. CoRR, abs/1907.07792, 2019. URL http://arxiv.org/abs/1907.
07792.

Yicheng Liu, Jinghuai Zhang, Liangji Fang, Qinhong Jiang, and Bolei Zhou. Multimodal
motion prediction with stacked transformers. arXiv preprint arXiv:2103.11624, 2021.

Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha.
Trafficpredict: Trajectory prediction for heterogeneous traffic-agents. In Proceedings of
the AAAI Conference on Artificial Intelligence, volume 33, pages 6120–6127, 2019.

Dimitris Milakis, Bart Van Arem, and Bert Van Wee. Policy and society related implications
of automated driving: A review of literature and directions for future research. Journal
of Intelligent Transportation Systems, 21(4):324–348, 2017.

Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. Social-stgcnn:
A social spatio-temporal graph convolutional neural network for human trajectory pre-
diction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 14424–14432, 2020.

S. Pellegrini, A. Ess, K. Schindler, and L. van Gool. You’ll never walk alone: Modeling
social behavior for multi-target tracking. In 2009 IEEE 12th International Conference
on Computer Vision, pages 261–268, 2009. doi: 10.1109/ICCV.2009.5459260.

Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on

machine learning, pages 63–71. Springer, 2003.
S2TNet

Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatofighi, and
Silvio Savarese. Sophie: An attentive gan for predicting paths compliant to social and
physical constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 1349–1358, 2019.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint
arXiv:1706.03762, 2017.

Cunjun Yu, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi. Spatio-temporal graph
transformer networks for pedestrian trajectory prediction. CoRR, abs/2005.08514, 2020.
URL https://arxiv.org/abs/2005.08514.

Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, and Nanning Zheng. Sr-lstm:
State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12085–
12094, 2019.

Yanliang Zhu, Deheng Qian, Dongchun Ren, and Huaxia Xia. Starnet: Pedestrian trajectory
prediction using deep neural network in star topology. CoRR, abs/1906.01797, 2019. URL
http://arxiv.org/abs/1906.01797.

Appendix A. Temporal Transformer encoder and decoder

The detailed temporal Transformer architecture used in S2TNet is visualized in Fig. 4.
Input embeddings is passed to the temporal Transformer encoder to enhance capturing
the temporal features of observed traffic-agents. Then, the temporal Transformer decoder
receives the previously output embeddings and produced the refined output embeddings
through masked temporal self-attention, decoder-encoder attention and separable convolu-
tion layers.
Chen Wang Sun

Temporal Transformer Decoder

N×

Add & Norm

Temporal Transformer Encoder

Separable Convolution
N×

Add & Norm Add & Norm

Separable Convolution Temporal Self-attention

Add & Norm Add & Norm

Masked
Temporal Self-attention
Temporal Self-attention

Input embeddings Output embeddings

Figure 4: Temporal Transformer encoder and decoder

traj11
No ratings yet
traj11
16 pages
Applsci 13 12580
No ratings yet
Applsci 13 12580
19 pages
Forecasting Transportation Network Speed Using Deep Capsule Networks With Nested LSTM Models
No ratings yet
Forecasting Transportation Network Speed Using Deep Capsule Networks With Nested LSTM Models
12 pages
traffic transform
No ratings yet
traffic transform
12 pages
Mo 等 - 2022 - Multi-Agent Trajectory Prediction With Heterogeneous Edge-Enhanced Graph Attention Network
No ratings yet
Mo 等 - 2022 - Multi-Agent Trajectory Prediction With Heterogeneous Edge-Enhanced Graph Attention Network
14 pages
2
No ratings yet
2
10 pages
16542-Article Text-20036-1-2-20210518
No ratings yet
16542-Article Text-20036-1-2-20210518
8 pages
1-s2.0-S095741742302883X-main
No ratings yet
1-s2.0-S095741742302883X-main
15 pages
Modeling Vehicle Interactions Via Modified LSTM Models For Trajectory Prediction
No ratings yet
Modeling Vehicle Interactions Via Modified LSTM Models For Trajectory Prediction
10 pages
Nhom4 Report
No ratings yet
Nhom4 Report
16 pages
Aaai 21 1
No ratings yet
Aaai 21 1
9 pages
CADAYONA (CE466-A445) - M2 PAPER REVIEW
No ratings yet
CADAYONA (CE466-A445) - M2 PAPER REVIEW
8 pages
Journal Pre-Proofs: Neurocomputing
No ratings yet
Journal Pre-Proofs: Neurocomputing
31 pages
ISWA-D-24-00743_reviewer (1)
No ratings yet
ISWA-D-24-00743_reviewer (1)
18 pages
Peerj Cs 2527
No ratings yet
Peerj Cs 2527
37 pages
Yang 等 - 2024 - A Multi-Task Learning Network With a Collision-Aware Graph Transformer for Traffic-Agents Trajectory
No ratings yet
Yang 等 - 2024 - A Multi-Task Learning Network With a Collision-Aware Graph Transformer for Traffic-Agents Trajectory
14 pages
Dynamic Spatial-Temporal Representation Learning For Traffic Flow Prediction
No ratings yet
Dynamic Spatial-Temporal Representation Learning For Traffic Flow Prediction
15 pages
Training Seminar Report 20117119
No ratings yet
Training Seminar Report 20117119
19 pages
Thesis Abstract Final
No ratings yet
Thesis Abstract Final
6 pages
Temporal Pyramid Network For Pedestrian Trajectory Prediction With
No ratings yet
Temporal Pyramid Network For Pedestrian Trajectory Prediction With
9 pages
STGC GNN
No ratings yet
STGC GNN
14 pages
TNT Target-driveN Trajectory Prediction
No ratings yet
TNT Target-driveN Trajectory Prediction
13 pages
0532 Exploring Dynamic Context For Multi-Path Trajectory Prediction
No ratings yet
0532 Exploring Dynamic Context For Multi-Path Trajectory Prediction
7 pages
Part A Traffic Flow Prediction_168
No ratings yet
Part A Traffic Flow Prediction_168
7 pages
ASurvey on Modern DeepNeural Network
No ratings yet
ASurvey on Modern DeepNeural Network
18 pages
Urban_Traffic_Prediction_from_Mobility_Data_Using_Deep_Learning
No ratings yet
Urban_Traffic_Prediction_from_Mobility_Data_Using_Deep_Learning
7 pages
Smart City Traffic Flow and Signal Optimization Using STGCN-LSTM and PPO Algorithms
No ratings yet
Smart City Traffic Flow and Signal Optimization Using STGCN-LSTM and PPO Algorithms
17 pages
End-To-End Contextual Perception and Prediction With Interaction Transformer
No ratings yet
End-To-End Contextual Perception and Prediction With Interaction Transformer
8 pages
Do We Really Need Graph Neural Networks for Traffic Forecasting
No ratings yet
Do We Really Need Graph Neural Networks for Traffic Forecasting
12 pages
Spatio Temporal Fourier Enhanced Heterogeneous Grap - 2024 - Expert Systems With
No ratings yet
Spatio Temporal Fourier Enhanced Heterogeneous Grap - 2024 - Expert Systems With
11 pages
Kong2021_AdaptiveSpatial-temporalGraphA
No ratings yet
Kong2021_AdaptiveSpatial-temporalGraphA
17 pages
Ye TPCN Temporal Point Cloud Networks for Motion Forecasting CVPR 2021 Paper
No ratings yet
Ye TPCN Temporal Point Cloud Networks for Motion Forecasting CVPR 2021 Paper
10 pages
TNT: Target-Driven Trajectory Prediction
No ratings yet
TNT: Target-Driven Trajectory Prediction
12 pages
1-s2.0-S0031320323003710-main
No ratings yet
1-s2.0-S0031320323003710-main
11 pages
ZHENG, Ge_Ph.D._2022
No ratings yet
ZHENG, Ge_Ph.D._2022
217 pages
BiLSTM LSTM
No ratings yet
BiLSTM LSTM
11 pages
Major base 3
No ratings yet
Major base 3
43 pages
Few-Sample_Traffic_Prediction_With_Graph_Networks_Using_Locale_as_Relational_Inductive_Biases
No ratings yet
Few-Sample_Traffic_Prediction_With_Graph_Networks_Using_Locale_as_Relational_Inductive_Biases
15 pages
Navigation in Crowded Spaces Using Trajectory Prediction
No ratings yet
Navigation in Crowded Spaces Using Trajectory Prediction
3 pages
Ganet: Goal Area Network For Motion Forecasting
No ratings yet
Ganet: Goal Area Network For Motion Forecasting
7 pages
EN+4.2.1.2024+official+reference+check
No ratings yet
EN+4.2.1.2024+official+reference+check
12 pages
Convolutional Social Pooling For Vehicle Trajectory Prediction
No ratings yet
Convolutional Social Pooling For Vehicle Trajectory Prediction
9 pages
Using LSTM and GRU Neural Network Methods For Traffic Ow Prediction
No ratings yet
Using LSTM and GRU Neural Network Methods For Traffic Ow Prediction
6 pages
Shen 2020
No ratings yet
Shen 2020
27 pages
Enhancing Spatiotemporal Traffic Prediction Throug
No ratings yet
Enhancing Spatiotemporal Traffic Prediction Throug
10 pages
T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction
No ratings yet
T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction
8 pages
6 CoverNet
No ratings yet
6 CoverNet
10 pages
2002.01852v3
No ratings yet
2002.01852v3
14 pages
TRec An Efficient Recommendation System For Hunting Passengers With Deep Neural Networks
No ratings yet
TRec An Efficient Recommendation System For Hunting Passengers With Deep Neural Networks
14 pages
A Comparative Study on Traffic Modeling Techniques for Predicting and Simulating Traffic Behavior Alghamdi
No ratings yet
A Comparative Study on Traffic Modeling Techniques for Predicting and Simulating Traffic Behavior Alghamdi
5 pages
Research on Vehicle Driving Trajectory Prediction Methods by Considering Driving Intention and Driving Style
No ratings yet
Research on Vehicle Driving Trajectory Prediction Methods by Considering Driving Intention and Driving Style
15 pages
An LSTM-Based Method with Attention Mechanism for Travel Time Prediction
No ratings yet
An LSTM-Based Method with Attention Mechanism for Travel Time Prediction
22 pages
Social_LSTM_Human_Trajectory_Prediction_in_Crowded_Spaces (1)
No ratings yet
Social_LSTM_Human_Trajectory_Prediction_in_Crowded_Spaces (1)
11 pages
Ting_Ta_Jiun_202111_MAS_thesis
No ratings yet
Ting_Ta_Jiun_202111_MAS_thesis
77 pages
heuristic
No ratings yet
heuristic
12 pages
Eecs 2015 243 PDF
No ratings yet
Eecs 2015 243 PDF
73 pages
1-s2.0-S2949715923000021-main
No ratings yet
1-s2.0-S2949715923000021-main
16 pages
Sequential Graph Neural Network For Urban Road Tra
No ratings yet
Sequential Graph Neural Network For Urban Road Tra
12 pages
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Multi-token prediction paper9036
No ratings yet
Multi-token prediction paper9036
29 pages
Fake News Detection and Fact Verification Research Paper
No ratings yet
Fake News Detection and Fact Verification Research Paper
2 pages
dense video
No ratings yet
dense video
35 pages
Get Practicing Trustworthy Machine Learning (Second Early Release) Yada Pruksachatkun free all chapters
100% (1)
Get Practicing Trustworthy Machine Learning (Second Early Release) Yada Pruksachatkun free all chapters
65 pages
2505.06133v1
No ratings yet
2505.06133v1
10 pages
Visual Transformer for Soil Classification
No ratings yet
Visual Transformer for Soil Classification
7 pages
AI Crash Course for Beginners
No ratings yet
AI Crash Course for Beginners
60 pages
Augmenting Decompiler Output with Learned Variable Names and Types
No ratings yet
Augmenting Decompiler Output with Learned Variable Names and Types
17 pages
Deep Learning Training Best Practices
No ratings yet
Deep Learning Training Best Practices
40 pages
《A Primer on Large Language Models and their Limitations
No ratings yet
《A Primer on Large Language Models and their Limitations
33 pages
Global Logic Interview Questions and Answers
No ratings yet
Global Logic Interview Questions and Answers
6 pages
98831852
No ratings yet
98831852
81 pages
0421
No ratings yet
0421
9 pages
Attention Attention!
No ratings yet
Attention Attention!
26 pages
Moodify
No ratings yet
Moodify
2 pages
Ch2
No ratings yet
Ch2
29 pages
Foundation Models in Robotics: Applications, Challenges, and The Future
No ratings yet
Foundation Models in Robotics: Applications, Challenges, and The Future
33 pages
Pattern Recognition And Computer Vision 6th Chinese Conference Part Xi 1st Edition Qingshan Liu download
No ratings yet
Pattern Recognition And Computer Vision 6th Chinese Conference Part Xi 1st Edition Qingshan Liu download
76 pages
ACM Journals Primary Article Template Latest Version 4
No ratings yet
ACM Journals Primary Article Template Latest Version 4
31 pages
Guo_Generating_Diverse_and_Natural_3D_Human_Motions_From_Text_CVPR_2022_paper
No ratings yet
Guo_Generating_Diverse_and_Natural_3D_Human_Motions_From_Text_CVPR_2022_paper
10 pages
s00066-024-02262-2
No ratings yet
s00066-024-02262-2
19 pages
llm-book (1)
No ratings yet
llm-book (1)
161 pages
Introduction to Deep Learning 17th January 2025 (2)
No ratings yet
Introduction to Deep Learning 17th January 2025 (2)
60 pages
Data Science syllabus EN 2024
No ratings yet
Data Science syllabus EN 2024
24 pages
Fine Tuned Understanding Enhancing Social Bot Detection With Transformer Based Classification
No ratings yet
Fine Tuned Understanding Enhancing Social Bot Detection With Transformer Based Classification
8 pages
Rishi S S(41111058) Final Report
No ratings yet
Rishi S S(41111058) Final Report
60 pages
Teaching With AI (Jos Antonio BowenC. Edward Watson)
No ratings yet
Teaching With AI (Jos Antonio BowenC. Edward Watson)
264 pages
Plagiarism chapter 1 n 2
No ratings yet
Plagiarism chapter 1 n 2
14 pages
Gen AI Unit 1
100% (1)
Gen AI Unit 1
86 pages
Semi-Dense Loftr SFM
No ratings yet
Semi-Dense Loftr SFM
13 pages