0% found this document useful (0 votes)
0 views

Traffic_Pattern_Sharing_for_Federated_Traffic_Flow_Prediction_with_Personalization

This paper presents a novel framework called 'personalized Federated learning with Traffic Pattern Sharing' (FedTPS) for Traffic Flow Prediction (TFP), addressing privacy concerns associated with centralized data collection. By utilizing Federated Learning, the framework allows traffic administration departments to collaboratively train models while maintaining data privacy, leveraging common traffic patterns across regions. Experimental results indicate that FedTPS outperforms existing methods in handling non-IID traffic data and improves prediction accuracy through personalized learning and shared traffic patterns.

Uploaded by

clusterone9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Traffic_Pattern_Sharing_for_Federated_Traffic_Flow_Prediction_with_Personalization

This paper presents a novel framework called 'personalized Federated learning with Traffic Pattern Sharing' (FedTPS) for Traffic Flow Prediction (TFP), addressing privacy concerns associated with centralized data collection. By utilizing Federated Learning, the framework allows traffic administration departments to collaboratively train models while maintaining data privacy, leveraging common traffic patterns across regions. Experimental results indicate that FedTPS outperforms existing methods in handling non-IID traffic data and improves prediction accuracy through personalized learning and shared traffic patterns.

Uploaded by

clusterone9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2024 IEEE International Conference on Data Mining (ICDM)

Traffic Pattern Sharing for Federated Traffic Flow


Prediction with Personalization
Hang Zhou†‡§ , Wentao Yu†‡§ , Sheng Wan†‡§∗ , Yongxin Tong¶ , Tianlong Gu∥ and Chen Gong†‡§∗
† School of Computer Science and Engineering, Nanjing University of Science and Technology, China
‡ Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, China
§ Jiangsu Key Laboratory of Image and Video Understanding for Social Security, China
¶ SKLSDE, Beihang University, China
∥ Engineering Research Center of Trustworthy AI (Ministry of Education), Jinan University, China

[email protected] [email protected]
2024 IEEE International Conference on Data Mining (ICDM) | 979-8-3315-0668-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICDM59182.2024.00071

Abstract—Accurate Traffic Flow Prediction (TFP) is crucial for


enhancing the efficiency and safety of transportation systems, so it
has attracted intensive researches by exploiting spatial-temporal
dependencies within road networks. However, existing works only
consider the case of centralized data collection with all traffic
data observed, which may raise privacy concerns as each region
of a city may have its own traffic administration department
and the traffic data is not allowed to distribute. Therefore,
Common Traffic Pattern
this paper proposes to use Federated Learning (FL) to address
this issue by allowing all clients (i.e., traffic administration
departments in all regions in our problem) to collaboratively
train TFP models without exchanging raw data, thereby offering
a solution in maintaining data privacy. Nevertheless, most existing (a) Sensor location (b) Traffic flow recorded by different sensors
FL methods aim to learn a global model that performs well
universally, so they cannot well handle the non-Independent Fig. 1. The observation on PEMS04 dataset. Traffic flow recorded by sensors
and Identically Distributed (non-IID) traffic data naturally over from different regions may share common traffic patterns (indicated by the
different regions. To cope with this problem, this paper develops red dashed line in (b)).
a new FL framework termed “personalized Federated learning
with Traffic Pattern Sharing” (FedTPS) to solve federated TFP
problem. Our FedTPS critically exploits the underlying common
traffic patterns (e.g., morning and evening rush hours) shared Early-staged deep learning-based TFP methods [3]–[5] of-
across different city regions and meanwhile maintaining the ten employ a combination of Convolutional Neural Networks
region-specific data characteristics in a personalized FL manner. (CNNs) and Recurrent Neural Networks (RNNs) to model the
Specifically, to extract the common traffic patterns, we decompose
the traffic data in each client via using discrete wavelet transform, diverse dependencies among different traffic routes. Although
where the low-frequency components uncover the stable traffic these deep learning-based methods generally outperform tradi-
dynamics of different regions and thus can be considered as the tional statistical approaches [6]–[8], their performances could
common traffic patterns. These common patterns are then shared be still limited due to the inherent non-Euclidean nature of
among different clients through traffic pattern repositories on the the traffic network. To better capture the spatial dependen-
server side to aid the global collaborative traffic flow modeling.
Moreover, the model components capturing spatial-temporal de- cies within traffic networks, in recent studies, Graph Neural
pendencies in traffic data are retained for local training, thereby Networks (GNNs) have been applied to TFP tasks [9], [10].
enabling personalized learning based on regional characteristics. However, these methods are usually based on pre-defined
Intensive experiments on four real-world traffic datasets firmly graphs, and thus may fail to characterize the dynamic nature
demonstrate the superiority of our proposed FedTPS over other of traffic networks. In response, recent research has shifted
compared typical FL methods in terms of various estimation
errors. towards adaptively learning graph structures [11]–[13], in
Index Terms—spatial-temporal data, traffic flow prediction, order to accurately reveal real-world traffic dynamics.
personalized federated learning Despite of the good performance achieved by the aforemen-
I. I NTRODUCTION tioned methods, they are typically performed in a centralized
manner, where all training data are collected onto a central
Traffic Flow Prediction (TFP) targets to forecast the future
server to train a global model. However, traffic data is of-
traffic conditions at specific locations or roadway segments via
ten collected by different traffic administration departments
using historical traffic data and relevant features [1]. Accurate
according to the zoning of the city, province, or state. Since
and real-time traffic prediction offers substantial benefits to
traffic data may include sensitive information, such as travel
urban management and travel planning [2].
trajectories of individuals and vehicle identification num-
∗ Corresponding authors. bers [14], centralizing these data will probably raise privacy

2374-8486/24/$31.00 ©2024 IEEE 639


DOI 10.1109/ICDM59182.2024.00071
Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
issues. To address these challenges, Federated Learning (FL) II. R ELATED W ORK
has emerged as a solution. In FL, model training is conducted In this section, the related works on TFP and FL for spatial-
locally on clients and only model parameters rather than raw temporal forecasting will be reviewed.
data are uploaded to the server, which helps ensure data
privacy [15], [16]. Up to now, various attempts have been made A. Traffic Flow Prediction
for accurate TFP using the FL framework [17]–[20]. TFP aims to forecast traffic volumes at specific times and
However, one major challenge in FL is the non-Independent locations. In the early stages, statistical methods, such as
and Identically Distributed (non-IID) issue, where heterogene- Historical Average (HA) [6], Kalman Filter (KF) [7], and
ity of data among different clients may lead to unstable Auto-Regressive Integrated Moving Average (ARIMA) [8],
training and performance deterioration [21]. This issue is were commonly employed for TFP. However, these methods
particularly pronounced in traffic data recorded by sensors often assume linearity in traffic data, which is inadequate
across various locations and time stamps. Recently, Per- for handling the complex dynamics in traffic flow. With the
sonalized Federated Learning (PFL) methods that develop advance of deep learning technologies, many deep learning-
customized models for each client have been proved to be based time series models have been applied to TFP, such as
effective in addressing the heterogeneity of traffic data [22]– RNN [3], Temporal Convolutional Network (TCN) [26], and
[24]. Although these methods enhance model personaliza- Transformer [27]. These models have shown great power in
tion to some extent, they ignore the underlying common capturing nonlinear correlations in traffic data and handling
knowledge across different regions which is actually critical dynamic temporal dependencies, thereby improving prediction
in collaborative model training. To be specific, despite the accuracy.
traffic data from different regions may be heterogeneous, they With the emergence of GNNs, various methods have been
share certain common traffic patterns with similar temporal developed to integrate GNNs with time series models, in
characteristics [25]. These common traffic patterns may arise order to capture the spatial-temporal dependencies of traffic
from similar functional characteristics of different regions data. For example, DCRNN [9] models the dynamics of
(e.g., commercial and residential areas), or consistent travel traffic flow as diffusion processes and introduces diffusion
behaviors during certain periods (e.g., morning and evening convolutional operations to capture spatial dependencies. Be-
rush hours). Although common traffic patterns may fluctuate sides, STGCN [10] combines Graph Convolutional Network
due to varying traffic conditions across regions, they generally (GCN) with TCN for TFP. To further explore the dynamism
exhibit stable traffic dynamics. For instance, as shown in of traffic networks, Graph WaveNet [11] and AGCRN [12]
Fig. 1, sensors A, B, and C are located in distinct regions, adaptively learn adjacency matrices to capture the spatial
but the traffic flows they record exhibit similar stable traffic dependencies. Building upon this, StemGNN [28] extracts
dynamics during certain periods. This inspires the sharing of temporal correlations of sequences through Gated Recurrent
common traffic patterns in the FL framework for performance Unit (GRU) to learn adjacency matrices, while MegaCRN [13]
enhancement. computes the weights of adjacency matrices through a learned
meta-node bank. Additionally, some methods employ atten-
Therefore, in this work, we propose personalized Federated tion mechanisms to capture time-varying spatial dependencies
learning with Traffic Pattern Sharing (FedTPS), a federated among traffic roads. For instance, GMAN [29] utilizes Graph
framework for TFP that effectively explores and utilizes Attention Network (GAT) and temporal attention to model the
common traffic patterns. Our objective is to improve local relationships between historical and future time stamps. Mean-
performance by sharing the common traffic patterns across while, ASTGNN [30] develops a dynamic graph convolution
different regions while maintaining the region-specific data module, which employs self-attention to capture the spatial
characteristics in a personalized manner. To be specific, we correlations in a dynamic manner. STWave [31] applies a
employ Discrete Wavelet Transform (DWT) to decompose the sampling strategy based GAT to decouple complex traffic data
traffic data in each client, where the low-frequency compo- and achieves accurate forecasts with reduced computational
nents reflecting stable traffic dynamics can be considered as costs.
the common traffic patterns. Afterwards, we design a traffic However, most existing research efforts still rely on cen-
pattern repository for each client to further extract and store tralized training data. Traffic data from different regions often
representations of common traffic patterns. In the aggregation belong to different traffic administration departments. Due to
phase of FL, we aggregate the traffic pattern repositories from privacy issues, sharing of traffic data among different regions
different clients on the server side to effectively share the may be prohibited, which makes the application of traditional
common patterns, which facilitates collaborative model train- techniques impractical for real-world TFP.
ing. Meanwhile, to preserve region-specific characteristics, the
model components capturing spatial-temporal dependencies of B. Federated Learning
traffic data are retained in each client for local training. We FL is a machine learning paradigm that enables collabora-
have conducted intensive experiments on four popular TFP tively training models across decentralized devices or clients
datasets in the FL scenario, which demonstrates the superiority with local data. This technique avoids the need to exchange
of FedTPS against multiple baseline methods. data, thereby preserving privacy and security. The traditional

640

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
FL method FedAvg [15] aggregates model weights sent from (m = 1, 2, . . . , M ) region possesses a subset of sensors Vm ,
local clients on the server and downloads the aggregated therefore forming a subgraph of the global traffic network
model back to the clients. However, the heterogeneity of data Gm ⊆ G along with its corresponding private dataset Dm =
(i) (i) (i) |Vm | (i)
among different clients poses a critical challenge. To deal with {x1 , . . . , xt , . . . , xT }i=1 , where xt ∈ Rd represents the
this problem, FedProx [32] proposes a regularization term observed d-dimensional features recorded by sensor i at time
aimed at minimizing the discrepancy between local models stamp t, and T represents the total number of time stamps.
and the global model, thereby preventing local models from Therefore, our target is to precisely predict the traffic flow at
deviating too far from their local training data. FedAtt [33] the location of sensors.
achieves flexible aggregation through adaptive weight. Be- However, most existing works rely on centralized data
sides, FedFed [34] shares the performance-sensitive features to collection, which is impractical since the traffic data possessed
mitigate data heterogeneity. Recently, PFL [35]–[39] becomes by different traffic administration departments is not allowed to
popular in dealing with highly heterogeneous data, which distribute due to privacy issues. To overcome this challenge,
proposes to train a personalized local model suitable for each we propose to use FL to collaboratively train TFP models
client in a collaborative manner. Therefore, PFL is usually without the need to exchange private traffic data. In federated
more effective than traditional FL methods that only learn TFP, each traffic administration department can be considered
a single global model [32]–[34]. For example, FedPer [36] as a client that trains a TFP model to capture the spatial-
shares a common base layer while providing individualized temporal dependencies of traffic roads from historical traffic
local layers for each client to preserve local knowledge. data recorded by local sensors and make accurate predictions
Additionally, through model-agnostic meta-learning, PerFe- of future traffic flow. To be specific, the task for m-th client is
dAvg [37] learns a meta-model to generate initial local models to train a model fWm parameterized by Wm such that it can
for each client, in order to improve the local performance. predict the traffic flow for the future T2 time stamps based on
Recently, various FL methods have been developed for the historical T1 time stamps, namely:
spatial-temporal forecasting. For example, FedGRU [17] in- fW
troduces an ensemble clustering-based FL scheme to capture Xt−T1 +1 , . . . , Xt −−−m
→ Xt+1 , . . . , Xt+T2 , (1)
the spatial-temporal correlation of traffic data. Furthermore, (1) (2) (|V |) |Vm |×d
where Xt = [xt ; xt ; . . . ; xt m ]
∈R represents the
CNFGNN [20] aggregates parameters based on spatial de-
observation of the local traffic network Gm at time stamp t.
pendencies captured by GNNs on the server. Considering the
The objective of federated TPF is to collaboratively train
heterogeneity of spatial-temporal data, some studies aim to
TFP models across clients without compromising data privacy.
enhance the performance of models by learning personalized
The classic FL method, i.e., FedAvg [15], aggregates model
models for each client. For instance, FedDA [22] proposes a
parameters at the server after local training according to the
dual attention scheme, which aggregates both intra- and inter-
following formula:
cluster models, rather than simply averaging the weights of
M
local models. In FML-ST [23], meta-learning is integrated into X |Vm |
the FL scenario to solve the heterogeneity problem in spatial- W← Wm . (2)
m=1
|V|
temporal prediction. Analogously, FUELS [24] incorporates
auxiliary contrastive tasks to inject detailed spatial-temporal The aggregated model is then redistributed to clients for the
heterogeneity into the latent representation space. However, next round of training. Due to the non-IID traffic data across
the aforementioned methods overlook the common knowledge different regions, this way of learning a global TFP model may
(e.g., common traffic patterns) within spatial-temporal data, exhibit suboptimal performance. To address this issue, PFL is
and thus their performance could be limited. implemented by training a customized model for each client
to enhance its performance on local traffic data. The objective
III. P ROBLEM D ESCRIPTION of PFL can then be formulated as
In this section, we formally define the setting of our M
1 X |Vm |
investigated federated TFP problem. The traffic road network min Lm (Wm , Dm ) , (3)
of a city can be represented as an undirected graph G = (V, E).
W1 ,...,WM M m=1 |V|
Here, V denotes the set of nodes, where each node corresponds where Lm is the loss function of the m-th client.
to a sensor to record traffic data, and E denotes the set of edges
corresponding to the roads connecting the sensors. Besides, IV. M ETHODOLOGY
A ∈ R|V|×|V| represents the weighted adjacency matrix This section details the proposed FedTPS (see Fig. 2).
depicting the proximity (e.g., geographical distance, causal During the local training phase (see Fig. 2(a)), stable traffic
dependencies, or traffic series similarity) between nodes. The dynamics are firstly obtained through the decomposition of
notation | · | denotes the cardinality of a set. traffic flow. To construct the traffic pattern repository, the
In reality, traffic sensors in different regions of a city stable traffic dynamics representation obtained through the
may belong to different traffic administration departments. pattern encoder (see Fig. 2(e)) is firstly projected to a query
Consequently, suppose there are M traffic administration space through a linear layer (see Fig. 2(f)) to compute the
departments governing M different regions. Then the m-th matching scores with patterns in the repository. Then the

641

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
Input
Input Traffic
Traffic Flow
Flow Prediction
Prediction Similarity-aware
Similarity-aware Aggregation
Aggregation
(c)(c) Representation
Representation
𝐇𝒕𝒐𝐇𝒕𝒐 (d)
(d)
Original
Original
Decoder
Decoder
Encoder
Encoder ഥ 1𝑝
ഥ 1𝑝𝐖 ഥ𝑀𝑝ഥ 𝑝
𝐖 𝐖 𝐖𝑀

𝑇1 𝑇1 𝑇2 𝑇2 similarity
similarity

Pattern
Pattern Matching
Matching
DWT
DWT
……
𝑝 𝑝 𝑝 𝑝
𝐖1𝐖1 𝐖𝑀𝐖𝑀
IDWT
IDWT
……
……
Matching
Matching Server
Server
Score
Score
𝐐𝐐 𝑝 𝑝
𝐖1𝐖1 ഥ 1𝑝𝐖
𝐖 ഥ 1𝑝 𝐖𝑀𝐖𝑀
𝑝 𝑝 ഥ𝑀
𝐖
𝑝ഥ 𝑝
𝐖𝑀
Stable Traffic
Stable Dynamics
Traffic Dynamics Traffic Pattern
Traffic Repository
Pattern Repository
𝐖𝑚𝐖𝑚
𝑝 𝑝

(e)(e)
Original
Original Original
Original
Encoder
Encoder Encoder
Encoder

(f)(f) Patter
Patter
……
Pattern
Pattern

Pattern
Pattern
Encoder
Encoder Encoder
Encoder
𝐒𝐨𝐟𝐭𝐦𝐚𝐱
𝐒𝐨𝐟𝐭𝐦𝐚𝐱
Linear
Linear
𝑝 𝑝
𝐖1 𝐖1 Decoder
Decoder 𝑝 𝑝
𝐖𝑀𝐖𝑀 Decoder
Decoder
Encoder
Encoder Client
Client Client
𝑇1 𝑇1 𝟏 𝟏 Client
𝑀𝑀

Decomposition of of
Decomposition Traffic Flow
Traffic Flow Construction of of
Construction Traffic Pattern
Traffic Repository
Pattern Repository Sharing Strategy
Sharing of of
Strategy Traffic Pattern
Traffic Pattern

(a) The local training phase (b) The global aggregation phase

Fig. 2. The framework of FedTPS. During the local training phase (i.e.,(a)), stable traffic dynamics are extracted through the decomposition of traffic flow
and are further utilized to construct the traffic pattern repository consisting of common traffic patterns on each client. During the global aggregation phase
(i.e.,(b)), each client uploads the traffic pattern repository to the server and shares the repository with other clients via similarity-aware aggregation.

where à = softmax ReLU(EE⊤ ) ∈ R|Vm |×|Vm | denotes



matched common pattern is computed as a weighted sum of
the patterns in the repository via matching scores. Afterwards, the adaptive adjacency matrix, obtained based on learnable
the representations of original traffic data obtained through parameter E ∈ R|Vm |×e . The notation Gconv(Xt , Ht−1 , Ã)
the original encoder (see Fig. 2(c)) and the matched common denotes the graph convolution operation over current input Xt
traffic pattern are fed into the decoder (see Fig. 2(d)) for and previous hidden states Ht−1 ∈ R|Vm |×h where h denotes
TFP. During the global aggregation phase (see Fig. 2(b)), the dimensionality of hidden state of each node. The update
the traffic pattern repository is uploaded to the server for gate, reset gate, and candidate state of GRU at time t are
similarity-aware aggregation, while the remaining components indicated by ut , rt , and Ct , respectively. The notation σ (·)
are used to learn region-specific characteristics locally. Next, represents an activation function, such as the sigmoid function
we detail the critical steps of FedTPS by presenting the used in this paper, and ⊙ represents element-wise product.
graph convolutional recurrent unit (GCRU) (see Section IV-A), Note that all the mathematical notations related to the m-th
explaining the process of common traffic pattern extraction (m = 1, 2, . . . , M ) client above and hereinafter should be
(see Section IV-B), and describing the sharing strategy of accompanied by the subscript m. However, to simplify the
traffic patterns (see Section IV-C). notation, we omit the subscript m if no notational confusion
is incurred.
A. Adaptive Graph Convolutional Recurrent Unit
The inherent graph structure of traffic networks is well- B. Extraction of Common Traffic Patterns
suited for methods integrating GCN and GRU to concurrently To extract common traffic patterns, DWT is firstly employed
explore spatial and temporal dependencies in traffic data [9], to decompose the traffic flow on each client to obtain stable
[28]. Based on this foundation, some methods [11]–[13] have traffic dynamics. Then, we utilize them to construct a traffic
introduced adaptive adjacency matrices to model the dynamic pattern repository during the local training phase. They are
spatial correlations within traffic networks. By following these detailed as follows.
prior works, our local TFP model employs an encoder-decoder 1) Decomposition of Traffic Flow: In reality, traffic data
architecture (i.e., (c), (d), and (e) in Fig. 2) composed of from different regions may share common traffic patterns
GCRUs with an adaptive adjacency matrix, which can be that manifest as stable dynamics due to similar functional-
represented as ities or consistent travel behaviors [25]. However, in most
ut = σ(Gconvu (Xt , Ht−1 , Ã)), (4) existing FL frameworks for TFP [22]–[24], the underlying
global knowledge represented by common traffic patterns
rt = σ(Gconvr (Xt , Ht−1 , Ã)), (5) among different regions is largely ignored. Therefore, we aim
to extract common traffic patterns manifesting stable traffic
Ct = tanh(GconvC (Xt , (rt ⊙ Ht−1 ), Ã), (6)
dynamics from the diverse traffic data. This can not only
Ht = ut ⊙ Ht−1 + (1 − ut ) ⊙ Ct , (7) facilitate the sharing of common patterns but also help mitigate

642

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
2 denote the number of the representative traffic patterns and
the dimension of each pattern. We first adopt a linear layer
2 … 2 to project the stable traffic dynamics representation Hlt to a
2 query space, which can be formulated as
2
2 Hqt = Hlt ∗ Wq + bq , (11)
where Hqt ∈R |Vm |×c
denotes query matrix and “∗” denotes
Fig. 3. Illustration of J-level DWT.
the dot product operation. Wq ∈ Rh×c and bq ∈ Rc are
learnable parameters. Then we compute the matching scores
the side effects of discrepancies arising from region-specific Q with patterns in the repository as follows:
characteristics. To achieve this goal, we employ DWT [31] to  
Q = softmax Hqt ∗ Wp ⊤ , (12)
decompose the traffic data into waveforms of different frequen-
cies, with the expectation that the low-frequency component Subsequently, we calculate the matched traffic patterns Pt ∈
corresponding to stable traffic dynamics can be considered as R|V |×c as a weighted sum of the patterns in Wp , and obtain
the common traffic pattern. To be specific, given traffic data
Z = [Xt−T1 +1 ; Xt−T1 +2 ; . . . ; Xt ] ∈ RT1 ×|Vm |×d , the J-level Pt = Q ∗ W p . (13)
DWT is performed to obtain the low-frequency component Z̄lj Equation (12) and (13) are used to retrieve the most relevant
and high-frequency component Z̄hj at the j-th level, namely: common traffic patterns for a given query matrix. Finally, we
concatenate Pt with the representations of the original traffic
Z̄lj = (↓ 2)(fg ⋆ Z̄lj−1 ), (8)
data Hot and feed them into the decoder to obtain predictions
Z̄hj = (↓ 2)(fh ⋆ Z̄lj−1 ), (9) Z′ = [Xt+1 ; Xt+2 ; . . . ; Xt+T2 ] ∈ RT2 ×|Vm |×d , where ℓ1
loss function is adopted to optimize the training process. The
where fg and fh represent the low-pass and high-pass filters of learnable parameters at the m-th client are denoted by Wm e1
,
a 1D orthogonal wavelet, respectively. The symbol ⋆ denotes e2 d q p e1 e2
Wm , Wm , Wm , and Wm , where Wm and Wm refer to the
the convolution operation and (↓ 2) represents naive down- parameters of the original encoder and the pattern encoder,
sampling halving the length of each component. The process respectively. Besides, Wm d
refers to the parameters of the
of J-level DWT is illustrated in Fig. 3. We only employ one- q
decoder, Wm refers to the parameters of the linear layer, and
level DWT in our model to reduce computational overhead. Wm p
refers to the learnable traffic pattern repository.
The low-frequency component, which represents stable traffic
dynamics, is then transformed back to the time domain through C. Sharing Strategy of Traffic Pattern
Inverse DWT (IDWT), which reaches: Based on the constructed traffic pattern repository, FedTPS
l
Z = fg−1 ⋆ (↑ 2)Z̄l1 , (10) aims to share the global knowledge contained in the common
traffic patterns across clients in a personalized manner. The
where fg−1 is the inverse low-pass filter and (↑ 2) denotes model on the m-th client can be divided into two parts: the
the naive up-sampling operation doubling the length of each p
traffic pattern repository Wm that represents the common
component. e1 e2
traffic patterns and other model parameters (i.e., Wm , Wm ,
After decomposition, the original traffic time series Z and its d q
Wm , and Wm ) that learn the spatial-temporal dependencies
low-frequency component Zl are separately fed into different of local traffic data. Our core idea is to share the learnable
encoders to obtain the learned representations Hot and Hlt , traffic pattern repository within the FL framework while
respectively. maintaining the rest model parameters for local training.
2) Construction of Traffic Pattern Repository: After ob- Moreover, to improve the alignment of traffic patterns from
taining the common traffic patterns represented by the low- different clients during the aggregation process, we devise
frequency component, we intend to exploit the knowledge the similarity-aware aggregation rather than the conventional
contained in these patterns for subsequent sharing among the p
averaging aggregation. Specifically, by denoting Wm [i] as the
different clients. However, since these patterns are generated i-th representative pattern in the repository of the m-th client,
from traffic data on each client individually, directly sharing p
we calculate the cosine similarity of Wm [i] with patterns from
them may pose a risk of privacy leakage. In addition, given repositories of other clients. Then we select and aggregate the
the observed variation in traffic patterns across different traffic top-k similar patterns from each client, which can be expressed
road networks [25], we aim to learn a set of representative as
M X
traffic patterns for each client to facilitate pattern sharing. p X 1
Wm [i] ← Wnp [j], (14)
Memory networks, which have achieved notable success in M k
n=1 j∈Sk
computer vision [40] and anomaly detection [41], [42] due to
their powerful representation abilities, have been increasingly where Sk indicates the set of k indices of the representative
adopted for spatial-temporal data analysis [13], [25], [43]. patterns in Wnp that are most similar to Wm p
[i]. Afterwards,
Inspired by the memory networks, we construct a learnable the server redistributes the aggregated traffic pattern repository
traffic pattern repository Wp ∈ RN ×c , where N and c to each client for the subsequent round of local training.

643

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
TABLE I
Algorithm 1 FedTPS on the client side DATASET S TATISTICS
Input: Historical traffic flow Z from private dataset Dm ;
Datasets # Samples # Nodes Sample Rate Time Span
number of local rounds R1 ; federated traffic pattern
p
repository Wm . PEMS03 26208 358 5 mins 09/2018 - 11/2018
PEMS04 16992 307 5 mins 01/2018 - 02/2018
Output: Prediction of future traffic flow Z′ . PEMS07 28224 883 5 mins 05/2017 - 08/2017
p
1: Download federated traffic pattern repository Wm from PEMS08 17856 170 5 mins 07/2016 - 08/2016
the server;
p
2: Update the traffic pattern repository Wp m ← Wm ;
3: for each local rounds r = 1, 2, . . . , R1 do V. E XPERIMENTS
4: Compute low-frequency component Zl via (8) and
To evaluate the performance of our model, we carried out
(10);
comparative experiments on four real-world highway traffic
5: Compute the representations Hot and Hlt via (4), (5),
datasets in FL scenarios. First, we introduce the experimental
(6), and (7);
settings, followed by a detailed presentation of the results,
6: Compute the matched pattern Pt via (11), (12) and
which includes performance comparison, ablation study, and
(13);
parametric sensitivity.
7: Concate Hot and Pt , and predict future traffic flow Z′
through decoder; A. Experimental Setup
8: Calculate gradients and update learnable parameters
Wm e1
, Wme2
, Wmd
, Wmq
, and Wm p
; 1) Datasets Description and Preprocessing: We evaluate
9: Upload Wm p
to the server. our proposed framework on four widely used datasets for TFP,
including PEMS03, PEMS04, PEMS07, and PEMS08. These
datasets consist of traffic flow data collected by California
Algorithm 2 FedTPS on the server side Transportation Agencies (CalTrans) Performance Measure-
ment System (PeMS) [44], with the number representing the
Input: Number of clients M ; number of communication district code. The statistical details of these datasets are listed
rounds R2 ; number of selected patterns k; the traffic in Table I.
p
pattern repository Wm from client m. Following the practice of previous methods [45], we split
p
Output: Federated traffic pattern Wm for client m. the datasets into training set, validation set, and test set in
p(1)
1: Initialize W ; chronological order with the ratio of 6 : 2 : 2. Across all four
2: for each communication round r = 1, 2, . . . , R2 do datasets, we use the past 12 time stamps to predict the traffic
3: for client m ∈ {1, 2, · · · , M } in parallel do flow for the upcoming 12 time stamps. Before training, we
4: if r = 1 then apply a standard normalization procedure to the datasets to
p(1)
5: Send W to client m; ensure a stable training process. To simulate the FL scenario,
6: else we employ the graph partitioning algorithm, i.e., METIS [46]
p(r) p(r)
7: Wm ← aggregate W1:M via (14); to evenly partition the global traffic road network, with each
p(r)
8: Send Wm to client m; client possessing a subgraph of the global traffic road network.
9: Perform Algorithm 1 on client m; 2) Evaluation Metrics: The evaluation metrics in this paper
include Mean Absolute Error (MAE), Root Mean Square
Error (RMSE), and Mean Absolute Percentage Error (MAPE),
which are defined as follows:
Through iterative training and aggregation of pattern repos- T
itories, common traffic patterns serve as additional global 1X
M AE = Xt − X̂t , (15)
knowledge to further guide the TFP process. Meanwhile, the T t=1
remaining model components, which learn the spatial-temporal v
dependencies specific to the region of each client, do not u
u1 X T  2
participate in the process of aggregation, thereby forming the RM SE = t Xt − X̂t , (16)
personalized FL style and mitigating the adverse effects of T t=1
region-specific variations.
T
1 X Xt − X̂t
In summary, through our proposed method of common pat- M AP E = , (17)
tern extraction and sharing strategy, the federated framework T t=1 Xt
can effectively explore common traffic patterns to enhance
where Xt denotes the ground truth of all nodes at time
TFP capabilities with the personalized model. We provide the
stamp t and X̂t denotes the prediction value. We evaluate
pseudocode of our FedTPS on the client side and server side
the performance of the TFP task on the client side, and then
in Algorithm 1 and Algorithm 2, respectively.
average the performances across all clients.

644

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
TABLE II
OVERALL P ERFORMANCE ON F OUR DATASETS . T HE BEST RESULTS ARE HIGHLIGHTED IN BOLDFACE .

PEMS03 PEMS04 PEMS07 PEMS08


Method
MAE RMSE MAPE/% MAE RMSE MAPE/% MAE RMSE MAPE/% MAE RMSE MAPE/%
Local 15.86 26.31 16.65 20.22 31.79 13.89 22.14 35.72 10.66 16.11 25.41 10.86
FedAvg [15] 16.55 26.61 22.90 20.23 31.87 14.42 24.29 37.12 11.51 16.29 25.36 11.16
FedProx [32] 16.35 26.52 21.13 20.73 32.31 14.66 25.10 38.12 12.41 16.51 25.44 11.87
FedAtt [33] 16.34 26.27 22.84 20.62 32.23 14.64 23.29 36.04 10.90 16.40 25.39 11.53
FedPer [36] 15.56 26.29 15.43 19.72 31.42 12.99 24.56 37.48 11.68 16.08 25.40 10.24
PerFedAvg [37] 15.76 26.82 15.55 19.67 31.46 12.87 24.21 37.36 10.42 16.17 25.37 10.33
pFedMe [38] 15.48 26.44 15.13 19.60 31.21 12.88 22.67 35.55 9.58 15.96 24.98 10.14
FedALA [39] 15.29 26.34 15.16 20.02 31.71 13.44 23.64 36.78 10.03 16.14 25.29 10.70
FedTPS 15.05 25.94 14.70 19.46 31.18 12.67 21.74 34.57 9.16 15.81 24.91 10.28

5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06


/RFDO
)HG$YJ
3HU)HG$YJ
S)HG0H
 /RFDO
)HG$YJ
3HU)HG$YJ
S)HG0H
/RFDO
)HG$YJ
3HU)HG$YJ
S)HG0H
/RFDO
)HG$YJ
3HU)HG$YJ
S)HG0H
 )HG3UR[
)HG$WW
)HG$/$
)HG736 2XUV
)HG3UR[
)HG$WW
)HG$/$
)HG736 2XUV  )HG3UR[
)HG$WW
)HG$/$
)HG736 2XUV
 )HG3UR[
)HG$WW
)HG$/$
)HG736 2XUV
)HG3HU  )HG3HU )HG3HU )HG3HU
 

0$(

0$(

0$(

0$(

 
 
 
 
                       
1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV
5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06
/RFDO 3HU)HG$YJ
 /RFDO 3HU)HG$YJ  /RFDO 3HU)HG$YJ  /RFDO 3HU)HG$YJ
)HG$YJ S)HG0H )HG$YJ S)HG0H )HG$YJ S)HG0H )HG$YJ S)HG0H
 )HG3UR[ )HG$/$ )HG3UR[ )HG$/$ )HG3UR[ )HG$/$ )HG3UR[ )HG$/$
)HG$WW )HG736 2XUV )HG$WW )HG736 2XUV  )HG$WW )HG736 2XUV
 )HG$WW )HG736 2XUV
)HG3HU  )HG3HU )HG3HU )HG3HU


506(

506(

506(

506(
 

  

 

                       
1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV
5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06 5HVXOWVRQ3(06
 /RFDO 3HU)HG$YJ /RFDO 3HU)HG$YJ  /RFDO 3HU)HG$YJ /RFDO 3HU)HG$YJ
)HG$YJ S)HG0H  )HG$YJ S)HG0H )HG$YJ S)HG0H
 )HG$YJ S)HG0H
)HG3UR[ )HG$/$ )HG3UR[ )HG$/$ )HG3UR[ )HG$/$ )HG3UR[ )HG$/$
 )HG$WW )HG736 2XUV )HG$WW )HG736 2XUV  )HG$WW )HG736 2XUV )HG$WW )HG736 2XUV
)HG3HU )HG3HU )HG3HU )HG3HU
 
0$3( 

0$3( 

0$3( 

0$3( 

 

 
 
   
                       
1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV 1XPEHURI&OLHQWV

Fig. 3. The performance on four datasets, with varying client numbers.

3) Baseline Methods: Different from the previous node- parameters.


level federated TFP methods [17], [20], where each sensor is FedPer [36]: A PFL algorithm sharing common base

considered as a client, our method is aimed at subgraph-level layers across clients while keeping the personalized layers
federated TFP task, where each client possesses a subset of locally.
sensors. To ensure the fairness of experiment, we compare our • PerFedAvg [37]: A PFL algorithm training an initial
method with the following eight baseline methods: model that can be fine-tuned to adapt to the local data of
• Local: A baseline method where all clients train their clients.
models locally without sharing model parameters. • pFedMe [38]: A PFL algorithm utilizing the global

• FedAvg [15]: The classic FL algorithm aggregating the model to optimize personalized models.
locally updated models via averaging strategy. • FedALA [39]: A PFL algorithm adaptively aggregating

• FedProx [32]: A FL algorithm preventing the deviation of the global and local models towards the local objective.
local models towards their corresponding data via using 4) Implementation Details: In our encoder-decoder archi-
a proximal term. tecture, both the encoder and decoder contain 64 GCRUs.
• FedAtt [33]: A FL algorithm using attention mechanism To ensure fairness in our experiments, all baseline methods
to weight the aggregation of local and global model employ the same encoder-decoder architecture as the local

645

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
FedTPS generally demonstrates promising performance, indi-
cating the effectiveness of our proposed method under the FL
frameworks of different numbers of clients.

C. Ablation Study
As described in the methodology, our proposed FedTPS
extracts common traffic patterns through DWT decomposition
and enables clients to learn personalized models while sharing
global knowledge represented by common traffic patterns. To
illustrate the contributions of these two modules, we carry out
a series of ablation studies.
1) Effectiveness of DWT Decomposition: To understand
the contribution of DWT decomposition, we conduct ablation
studies comparing the performance of models with or without
DWT. In the variant without DWT, the original traffic data
are fed into the pattern encoder. Moreover, for the variants
Fig. 4. Effect of DWT on four datasets. N: Not using DWT; B: Biorthogonal; with DWT, we further explore the effects of different wavelet
C: Coiflets; D: Daubechies; H: Haar; S: Symlets.
bases, including Biorthogonal, Coiflets, Daubechies, Haar, and
Symlets. As illustrated in Fig. 4, decomposing the traffic data
model. We set the size of the traffic pattern repository N to using DWT enhances the performance across all datasets.
40 for PEMS07 and 20 for PEMS03, PEMS04, and PEMS08. This is because the stable traffic dynamics obtained through
The dimension of representative patterns is set to 64 and the DWT can effectively capture the common traffic patterns,
number of selected patterns k in the aggregation process is which is beneficial to FL. Furthermore, we observe that
set to 2. We use the Adam optimizer with a learning rate different wavelet bases yield varied performance enhance-
of 0.001 and the batch size is set to 128. The local training ments across the datasets. Specifically, Daubechies and Haar
epochs and global communication rounds are fixed at 1 and wavelets demonstrate the best performance for PEMS03 and
200, respectively, for all FL methods. The default number of PEMS08 datasets, Coiflets wavelets show the optimal results
clients is set to 4. We implement all the methods on Python for PEMS04, and Biorthogonal wavelets are most effective for
3.8.8 using PyTorch 1.9.1 and conduct all experiments on one PEMS07.
GeForce RTX 3090 GPU. 2) Effectiveness of Traffic Pattern Sharing Strategy: To
validate the effectiveness of the proposed traffic pattern sharing
B. Performance Comparison strategy with the similarity-aware aggregation, we compare
We evaluate the performances of all methods under three FedTPS with its variants. These variants share different com-
metrics on four TFP datasets, with the results listed in Table II. ponents of the local model, where the same aggregation
We clearly observe that our proposed FedTPS outperforms the method as FedAvg [15] is adopted. As shown in Table III, on
baseline methods in most scenarios. We infer that this good most datasets, the aggregation strategy that shares encoder-
performance benefits from the effective use of common traffic decoder parameters (i.e., ED in Table III) results in a sub-
patterns sharing which facilitates collaborative model training stantial performance degradation compared with the strategy
and minimizes the side effects of region-specific discrepancies. that does not share parameters (i.e., None in Table III). These
Additionally, conventional FL methods (i.e., FedAvg, FedProx, outcomes indicate that directly sharing model parameters of
and FedAtt) suffer obvious performance degradation compared clients across different regions can introduce the interference
with Local, which could be due to the heterogeneity of of region-specific characteristics from other clients and thereby
traffic data from different clients. In contrast, personalized FL disrupt the learning process. Although the variant that shares
methods train customized models for each client and thus can all parameters (i.e., All in Table III) can somewhat mitigate
achieve better performance. the decline in performance with the help of common traffic
To further evaluate the performance of our FedTPS across patterns, it is still inevitably influenced by discrepancies of
FL frameworks of different numbers of clients, we investigate different regions. In contrast, the strategy that shares traffic
the impact of varying numbers of clients on the performance pattern repositories (i.e., PR in Table III) demonstrates good
of different methods. As shown in Fig. 3, there is a general performance since it only shares common traffic patterns that
trend of increasing prediction error as the number of clients represent global knowledge, while retaining region-specific
increases over all methods which can be attributed to the knowledge in a personalized FL manner, allowing each client
division of the traffic network into multiple subgraphs. To be to learn a personalized model. Furthermore, unlike the model
specific, the correlations between traffic sensors are disrupted variants using averaging aggregation (i.e., None, All, ED,
and the amount of data available to each client is reduced, and PR in Table III), our proposed FedTPS utilizing the
thereby hindering the training of local models. Nevertheless, similarity-aware aggregation can help align the common traffic

646

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
TABLE III
C OMPARATIVE ANALYSIS OF DIFFERENT SHARING STRATEGIES . N ONE : NON - PARAMETER SHARING ; A LL : SHARING ALL PARAMETERS ; ED: SHARING
ENCODER - DECODER PARAMETERS ; PR: SHARING THE TRAFFIC PATTERN REPOSITORY.

PEMS03 PEMS04 PEMS07 PEMS08


Shared Component
MAE RMSE MAPE/% MAE RMSE MAPE/% MAE RMSE MAPE/% MAE RMSE MAPE/%
None 15.29 26.30 15.04 19.65 31.69 12.78 23.54 37.07 9.94 15.90 25.06 10.56
All 15.24 26.18 15.22 20.19 31.98 13.29 23.33 36.28 10.47 15.98 24.92 10.72
ED 15.38 26.46 15.19 20.55 32.48 14.11 24.37 37.11 11.67 16.28 25.28 11.44
PR 15.13 26.33 14.87 19.59 31.60 12.68 22.61 35.67 9.63 15.87 24.97 10.33
FedTPS 15.05 25.94 14.70 19.46 31.18 12.67 21.74 34.57 9.16 15.81 24.91 10.28

 5HVXOWVRQ3(06   5HVXOWVRQ3(06   5HVXOWVRQ3(06   5HVXOWVRQ3(06 

       


506(

506(

506(

506(
0$(

0$(

0$(

0$(
       

       


0$( 0$( 0$( 0$(
506( 506( 506( 506(
       
               
N N N N

Fig. 5. Sensitivity analysis of the pattern number N in different datasets.

 5HVXOWVRQ3(06   5HVXOWVRQ3(06   5HVXOWVRQ3(06   5HVXOWVRQ3(06 

       


506(

506(

506(

506(
0$(

0$(

0$(

0$(
       

       


0$( 0$( 0$( 0$(
506( 506( 506( 506(
       
               
k k k k
Fig. 6. Sensitivity analysis of the number of selected patterns k during aggregation in different datasets.

patterns from different regions, thereby showing improved VI. C ONCLUSION


performance.
In this paper, we propose FedTPS, a new PFL framework
to address the challenge of data heterogeneity in federated
D. Parametric Sensitivity TFP via sharing common traffic patterns. Different from
previous works that overlook the underlying global knowledge
In the proposed FedTPS framework, two critical hyper- represented by common traffic patterns from different regions,
parameters need to be manually pretuned, i.e., the size of the proposed FedTPS decomposes the traffic data to acquire
the traffic pattern repository N and the number of selected common traffic patterns, which can be shared across different
patterns k in the aggregation process. In this subsection, we clients. Meanwhile, by devising the similarity-aware aggrega-
will evaluate the sensitivity of the performance to different tion strategy, clients can learn from the common traffic patterns
hyperparameter settings of the proposed FedTPS. of different regions globally while maintaining personalized
The impact of varying N is shown in Fig. 5. We observe components learning from local spatial-temporal dependencies
that the optimal value of N is related to the number of to preserve region-specific characteristics. Experimental eval-
traffic sensors in the dataset. Generally, datasets with a large uations conducted on four widely-used TFP datasets confirm
number of sensors benefit from a large-size traffic pattern the effectiveness and superiority of our FedTPS over multiple
repository (e.g., PEMS07). This allows for comprehensive baseline methods.
learning of common traffic patterns, which leads to enhanced
performance. ACKNOWLEDGMENT
The results for different values of k are presented in Fig. 6. Sheng Wan was supported by Postdoctoral Fellowship
We observe that our model achieves the best performance Program of CPSF (No: GZC20233503), China Postdoctoral
across all datasets when k = 2. If k is too small, FedTPS may Science Foundation (Nos: 2023M741708, 2023TQ0159), and
overlook the traffic patterns that are beneficial to knowledge NSF of Jiangsu Province (No: BK20241469). Tianlong Gu
sharing. Conversely, if k is too large, it may incorporate was supported by NSF of China (No: U22A2099). Chen Gong
patterns that do not align well, resulting in suboptimal per- was supported by NSF of China (Nos: 62336003, 12371510),
formance. NSF of Jiangsu Province (No: BZ2021013), and NSF

647

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.
for Distinguished Young Scholar of Jiangsu Province (No: [24] Q. Liu, S. Sun, Y. Liang, J. Xue, and M. Liu, “Personalized federated
learning for spatio-temporal forecasting: A dual semantic alignment-
BK20220080). based contrastive approach,” arXiv preprint arXiv:2404.03702, 2024.
[25] H. Lee, S. Jin, H. Chu, H. Lim, and S. Ko, “Learning to remember
R EFERENCES patterns: Pattern matching memory networks for traffic forecasting,” in
[1] C. Lin, G. Han, J. Du, T. Xu, L. Shu, and Z. Lv, “Spatiotemporal ICLR, 2021.
congestion-aware path planning toward intelligent transportation systems [26] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
in software-defined smart city iot,” IEEE Internet Things J., vol. 7, no. 9, with neural networks,” in NeurIPS, vol. 27, 2014.
pp. 8012–8024, 2020. [27] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang,
[2] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction “Informer: Beyond efficient transformer for long sequence time-series
with big data: A deep learning approach,” IEEE Trans. Intell. Transp. forecasting,” in AAAI, vol. 35, no. 12, 2021, pp. 11 106–11 115.
Syst., vol. 16, no. 2, pp. 865–873, 2014. [28] D. Cao, Y. Wang, J. Duan, C. Zhang, X. Zhu, C. Huang, Y. Tong,
[3] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. B. Xu, J. Bai, J. Tong et al., “Spectral temporal graph neural network
Woo, “Convolutional lstm network: A machine learning approach for for multivariate time-series forecasting,” in NeurIPS, vol. 33, 2020, pp.
precipitation nowcasting,” in NeurIPS, vol. 28, 2015. 17 766–17 778.
[4] G. Lai, W.-C. Chang, Y. Yang, and H. Liu, “Modeling long- and short- [29] C. Zheng, X. Fan, C. Wang, and J. Qi, “Gman: A graph multi-attention
term temporal patterns with deep neural networks,” in SIGIR, 2018, pp. network for traffic prediction,” in AAAI, vol. 34, no. 01, 2020, pp. 1234–
95–104. 1241.
[5] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li, [30] S. Guo, Y. Lin, H. Wan, X. Li, and G. Cong, “Learning dynamics
“Deep multi-view spatial-temporal network for taxi demand prediction,” and heterogeneity of spatial-temporal graph data for traffic forecasting,”
in AAAI, vol. 32, no. 1, 2018. IEEE Trans. Knowledge Data Eng., vol. 34, no. 11, pp. 5415–5428,
[6] J. D. Hamilton, Time series analysis. Princeton university press, 2020. 2021.
[7] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xing, “Discovering spatio- [31] Y. Fang, Y. Qin, H. Luo, F. Zhao, B. Xu, L. Zeng, and C. Wang,
temporal causal interactions in traffic data streams,” in KDD, 2011, pp. “When spatio-temporal meet wavelets: Disentangled traffic forecasting
1010–1018. via efficient spectral graph attention networks,” in ICDE, 2023, pp. 517–
[8] M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecast- 529.
ing: An experimental comparison of time-series analysis and supervised [32] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,
learning,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 2, pp. 871–882, “Federated optimization in heterogeneous networks,” in Proceedings of
2013. Machine learning and systems, vol. 2, 2020, pp. 429–450.
[9] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recurrent [33] S. Ji, S. Pan, G. Long, X. Li, J. Jiang, and Z. Huang, “Learning private
neural network: Data-driven traffic forecasting,” in ICLR, 2018. neural language modeling with attentive aggregation,” in IJCNN, 2019,
[10] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional pp. 1–8.
networks: A deep learning framework for traffic forecasting,” in IJCAI, [34] Z. Yang, Y. Zhang, Y. Zheng, X. Tian, H. Peng, T. Liu, and B. Han,
2018, pp. 3634–3640. “Fedfed: Feature distillation against data heterogeneity in federated
[11] Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph wavenet for learning,” in NeurIPS, vol. 36, 2023, pp. 60 397–60 428.
deep spatial-temporal graph modeling,” in IJCAI, 2019, pp. 1907–1913. [35] L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting
[12] L. Bai, L. Yao, C. Li, X. Wang, and C. Wang, “Adaptive graph shared representations for personalized federated learning,” in ICML,
convolutional recurrent network for traffic forecasting,” in NeurIPS, 2021, pp. 2089–2099.
vol. 33, 2020, pp. 17 804–17 815. [36] M. G. Arivazhagan, V. Aggarwal, A. K. Singh, and S. Choud-
[13] R. Jiang, Z. Wang, J. Yong, P. Jeph, Q. Chen, Y. Kobayashi, X. Song, hary, “Federated learning with personalization layers,” arXiv preprint
S. Fukushima, and T. Suzumura, “Spatio-temporal meta-graph learning arXiv:1912.00818, 2019.
for traffic forecasting,” in AAAI, vol. 37, no. 7, 2023, pp. 8078–8086. [37] A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated
[14] L. Li, J. Liu, L. Cheng, S. Qiu, W. Wang, X. Zhang, and Z. Zhang, learning with theoretical guarantees: A model-agnostic meta-learning
“Creditcoin: A privacy-preserving blockchain-based incentive announce- approach,” in NeurIPS, vol. 33, 2020, pp. 3557–3568.
ment network for communications of smart vehicles,” IEEE Trans. Intell. [38] C. T Dinh, N. Tran, and J. Nguyen, “Personalized federated learning
Transp. Syst., vol. 19, no. 7, pp. 2204–2220, 2018. with moreau envelopes,” in NeurIPS, vol. 33, 2020, pp. 21 394–21 405.
[15] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, [39] J. Zhang, Y. Hua, H. Wang, T. Song, Z. Xue, R. Ma, and H. Guan,
“Communication-efficient learning of deep networks from decentralized “Fedala: Adaptive local aggregation for personalized federated learning,”
data,” in AISTATS, 2017, pp. 1273–1282. in AAAI, vol. 37, no. 9, 2023, pp. 11 237–11 244.
[40] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching
[16] S. Yue, Y. Deng, G. Wang, J. Ren, and Y. Zhang, “Federated offline re-
networks for one shot learning,” in NeurIPS, vol. 29, 2016.
inforcement learning with proximal policy evaluation,” Chinese Journal
[41] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and
of Electronics, vol. 33, no. 6, pp. 1–13, 2024.
A. v. d. Hengel, “Memorizing normality to detect anomaly: Memory-
[17] Y. Liu, J. James, J. Kang, D. Niyato, and S. Zhang, “Privacy-preserving
augmented deep autoencoder for unsupervised anomaly detection,” in
traffic flow prediction: A federated learning approach,” IEEE Internet
ICCV, 2019, pp. 1705–1714.
Things J., vol. 7, no. 8, pp. 7751–7763, 2020.
[42] T. Jiang, W. Chen, H. Zhou, J. He, and P. Qi, “Towards semi-supervised
[18] C. Zhang, S. Zhang, J. James, and S. Yu, “Fastgnn: A topological
classification of abnormal spectrum signals based on deep learning,”
information protected federated learning approach for traffic speed
Chinese Journal of Electronics, vol. 33, no. 3, pp. 721–731, 2024.
forecasting,” IEEE Trans. Ind. Inform., vol. 17, no. 12, pp. 8464–8474,
[43] Z. Liu, G. Zheng, and Y. Yu, “Cross-city few-shot traffic forecasting via
2021.
traffic pattern bank,” in CIKM, 2023, pp. 1451–1460.
[19] H. Wang, R. Zhang, X. Cheng, and L. Yang, “Federated spatio-temporal
[44] C. Chen, “Freeway performance measurement system (pems),” Ph.D.
traffic flow prediction based on graph convolutional network,” in WCSP,
dissertation, University of California, Berkeley, 2002.
2022, pp. 221–225.
[45] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-
[20] C. Meng, S. Rambhatla, and Y. Liu, “Cross-node federated graph neural
temporal graph convolutional networks for traffic flow forecasting,” in
network for spatio-temporal data modeling,” in KDD, 2021, pp. 1202–
AAAI, vol. 33, no. 01, 2019, pp. 922–929.
1211.
[46] G. Karypis, “Metis: Unstructured graph partitioning and sparse matrix
[21] X. Li, M. JIANG, X. Zhang, M. Kamp, and Q. Dou, “Fedbn: Federated
ordering system,” Technical report, 1997.
learning on non-iid features via local batch normalization,” in ICLR,
2020.
[22] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “Dual attention-
based federated learning for wireless traffic prediction,” in INFOCOM,
2021.
[23] W. Li and S. Wang, “Federated meta-learning for spatial-temporal
prediction,” Neural Computing and Applications, vol. 34, no. 13, pp.
10 355–10 374, 2022.

648

Authorized licensed use limited to: CUNY Hunter College Library. Downloaded on April 16,2025 at 22:11:03 UTC from IEEE Xplore. Restrictions apply.

You might also like