Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach
Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach
2, APRIL 2024
Abstract—Federated learning (FL) has recently received suffi- For instance, real-time traffic information can be trained to
cient attention because of its capability of collaboratively training enable intelligent driving applications [3], [4], [5]. However,
machine learning models without exposing data privacy. Most exist- due to the limited network bandwidth and privacy concerns, it
ing FL schemes assume the fixed or predetermined local training in-
tensities/iterations at clients for each communication round, which is impractical to upload all these Big Data from edge devices to
however neglects the effect of local training intensity determination remote server for further training. In this context, traditional ma-
on the performance of FL. Besides that, in traditional FL, the clients chine learning mechanisms that heavily rely on data collection in
are assigned to conduct the same number of training iterations. In a centralized manner can not support these Big Data applications
this context, the client with low computation or communication
well.
capability may slow down the global model aggregation, which
causes high waiting latency at other clients. To address these is- To address the aforementioned issues, Federated Learning
sues, this paper proposes a novel Time-sensitive FL mechanism (FL), also known as federated optimization, has fortunately been
with Heterogeneous Training Intensity at clients, named TFL_HTI. proposed by Google [6], [7], [8]. Roughly speaking, FL allows
Specifically, we first explore the bounded convergence rate of FL multiple edge devices/clients to cooperatively train machine
with heterogeneous training iterations. Then, we design a Deep
learning models based on their local datasets, while without
Reinforcement Learning (DRL) approach to determine the overall
training intensity of clients in each communication round. Based on exposing data privacy of clients. Particularly, [6] proposes a
this, we further design an optimal deterministic algorithm to assign well-known FL algorithm, named FedAvg, which repeatedly
the appropriate local iterations to clients based on their training involves local model training at clients and global model ag-
capabilities. Finally, we conduct simulations to demonstrate the gregation at the centralized server. FL protects data owners’
effectiveness of our proposed scheme.
privacy because their data never leave their clients, which makes
Index Terms—Federated learning, time-sensitive, heterogeneous it a good fit for Big Data training in IoT. Since then, FL
training, deep reinforcement learning. has been implemented in many areas. For example, Google
keyboard uses FL to enhance the next word predictions [9],
I. INTRODUCTION while [10] utilizes FL to predict hospitalizations in the medical
field.
ITH the development of Internet of Things (IoT), a
W large volume of data has been generated at network
edges [1], [2]. Generally, these Big Data can be efficiently used
To optimize the performance of FL, a great deal of research
has been carried out to boost the training accuracy [11], [12],
[13], reduce the communication overhead [14], [15], [16], [17],
to train machine learning models, and obtain useful information
or incentivize the clients in FL [18]. Particularly, most FL
including environment detection, future event prediction, etc.
schemes assume that the server has to wait for all the clients
to finish their pre-determined local training iterations before
Manuscript received 27 April 2023; revised 18 September 2023; accepted
21 October 2023. Date of publication 8 January 2024; date of current version
global model aggregation. However, such a synchronous training
27 March 2024. This work was supported in part by the National Natural method may bring several issues. Firstly, it is nontrivial to
Science Foundation of China under Grants 62072187 and 61972448, in part accurately measure the proper training intensity of clients during
by Guangdong Basic and Applied Basic Research Foundation under Grant
2022B1515020015, in part by the Major Key Project of PCL under Grant
each communication round. Generally, higher training intensity
PCL2023AS7-1, and in part by Guangzhou Development Zone Science and in each communication round can reduce the communication
Technology Project under Grant 2021GH10. (Corresponding author: Xiumin overhead, which however may result in a bias model. Alterna-
Wang.)
Weijian Pan, Xiumin Wang, and Weiwei Lin are with the School of Computer
tively, simply lowing training intensity in each communication
Science and Engineering, South China University of Technology, Guangzhou round may enlarge the whole training time of FL. Secondly, due
510006, China (e-mail: [email protected]; [email protected]; to the heterogeneous of hardware setting [19], the clients with
[email protected]).
Pan Zhou is with the Hubei Key Laboratory of Distributed System Security,
the lowest computation or communication capability may drag
Hubei Engineering Research Center on Big Data Security, School of Cyber down the whole FL training. An intuitive way to alleviate such
Science and Engineering, Huazhong University of Science and Technology, a straggler problem is to allocate heterogeneous local training
Wuhan, Hubei 430074, China (e-mail: [email protected]).
Recommended for acceptance by Y. Yuan.
iterations to clients according to their training capabilities. How-
Digital Object Identifier 10.1109/TETCI.2023.3345366 ever, it is still challenge to know the exact training capability of
2471-285X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1403
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1404 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
Fig. 1. Impact of training intensity for each communication round. Fig. 2. Motivation of Heterogeneous FedAvg.
the vector xij is the input of the training model and yji is the the increment of local iterations, the accuracy increases first but
desired output of the training model. A typical federated learning then decreases. It is observed that simply setting too more or
process consists of multiple communication rounds. In each too less local iterations may not achieve the best FL efficiency.
communication round, the centralized server selects a certain Instead, FedAvg with 5 local training iterations for each client
number of clients to participate in training, and distributes the achieves the highest training accuracy. Therefore, allocating the
latest global model parameter w to clients. Then, each selected adequate local training iterations for each communication round
client trains its local model using gradient-descent algorithm is one of the most critical issues in FL.
based on its local dataset, and uploads its local training model Besides that, due to the heterogeneity of clients, previous
to the server. After receiving all the local models, the server works that allocate the same number of training iterations to
aggregates and updates the global model. Repeat the above each client may bring a straggler problem, i.e., the global model
procedure, until completing a specified number of rounds or aggregation is slowed down by the client with the lowest com-
achieving a target accuracy. putation and communication capability. To address the above
To formulate the training process, we use fi ((x, y), w) to problem, an intuitive solution is to give clients different training
denote the loss function of client ci on its parameter vector w intensities based on their training capabilities. To verify this,
and sample (x, y), and Fi (w) to denote the loss function of client we revise the traditional FedAvg by assigning more training
ci on dataset Di . Similar to previous works [6], [7], [8], Fi (w) iterations to clients with higher computation or communication
can be updated as follows, capabilities. Since Fig. 1 shows FedAvg with 5 iterations per-
forms the best, we compare the time efficiency of our revised Fe-
(xij ,yji )∈Di fi ((xj , yj ), w)
i i
Fi (w) = . (1) dAvg (referred to as Heterogeneous FedAvg) to that of FedAvg
Di with 5 iterations. As shown in Fig. 2, the accuracy achieved by
The global loss function over all the distributed datasets can Heterogeneous FedAvg performs much better than FedAvg with
be formulated into 5 iterations. Therefore, assigning appropriate training intensities
N to clients is also important to improve the performance of FL.
Di Fi (w)
F (w) = i=1 N . (2)
i=1 Di C. Problem Formulation
The problem is to minimize (2) with respect to w.
Inspired by the above observations, an efficient FL should
appropriately address the following questions:
B. Motivation on Heterogeneous Training Intensities r How to determine the exact training intensity for each com-
Most existing FL algorithms, such as FedAvg [6], always munication round, so as to improve the training accuracy
assume the fixed local training iterations/intensities of clients of FL?
for each communication round. However, few works study how r Given the training intensity of each communication round,
to determine the exact value of training intensities. how to assign appropriate local iterations to clients, ac-
To investigate the impact of training intensity on the perfor- cording to their heterogeneity in computation and commu-
mance of FL, we conduct a simple simulation based on FedAvg nication capabilities?
under the CIFAR-10 dataset. In the simulation, we uniformly Without loss of generality, we define τ k as the total training
distribute the training data to N = 100 clients, and then run intensity in round k, and τik as the local iterations given to client
FedAvg to train a CNN network within 3000 s time constraint. ci in the k-th communication round.
As shown in Fig. 1, we study 5 settings of local training iterations Suppose that in each communication round, M clients will be
of each client, ranging from 1 to 9 in each communication round. chosen randomly to participate in training for fairness. Denote
It is shown that FedAvg with 1 iteration performs the worst. With C k as the group of chosen clients in communication round k.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1405
For each client ci ∈ C k , we use fik , δi and Bik to denote the CPU τik > 0, ∀k, ∀ci ∈ C k , (8)
frequency used for FL in k-th round, the CPU cycles consumed
for a single data sample, and the communication bandwidth in where Tm denotes the FL system’s training time budget, and K
k-th round, respectively. It is worth noting that fik and Bik are is the total number of communication rounds. In the above for-
two parameters indicating the training capability of client ci . mulation, the constraint in (7) indicates that the time used during
After client ci finishes its local training, it can upload these two K rounds should not exceed the time budget. To guarantee the
parameters as well as its local training model. Then, the server training quality, each selected client in the k-th communication
can use them to predict their training capabilities for the next round should execute at least one local iteration, as shown in (8).
round. According to [14], the computation time of client ci per We refer to such Time-sensitive Federated Learning with Het-
unit local iteration, denoted by tki,cmp , can be formulation into erogeneous Training Intensities allocation problem as TFL_HTI
problem.
δi Di
tki,cmp = . (3)
fik D. Convergence Analysis
After local training, each client uploads its training model and In this subsection, we analyze the theoretical convergence
some related parameters to the server for global model aggre- bound of FL with heterogeneous training intensity. Firstly, we
gation. Since the downlink bandwidth is much larger than that make the following assumptions that have been applied fre-
of uplink, the model downloading time is negligible compared quently in previous works [13], [30], [31], [32], [33].
with uploading time [29]. We consider an orthogonal frequency Assumption 1: (L-Lipschitz Continuous Gradient) There is
division multiple access (OFDMA [40]) technique for clients’ a constant L > 0 that causes ||∇Fi (x) − ∇Fi (y)|| ≤ L||x −
uplink transmission. Then, the communication time of client ci y||, ∀x, y ∈ Rd .
in communication round k can be calculated by Assumption 2: (Unbiased Local Gradient Estimator) Let ξik
ξ represent a random local data sample in the k-th step at
tki,com = , (4) the i-th client. The local gradient estimator is unbiased, i.e.,
ρi hi
Bik ln 1 + E[∇Fi (wik , ξik )] = ∇Fi (wik ), ∀ci ∈ C, where the expectation
N0
is based on the samples from all local datasets.
where ξ is the size of clients’ model parameters, ρi is the trans- Assumption 3: (Bounded Local Variance) There exists a con-
mission power, hi is the channel gain, and N0 is the background stant σ, such that the variance of each local gradient estimator is
noise. We further use Tik to represent the i-th client’s entire bounded by E[||∇Fi (wik , ξik ) − ∇Fi (wik )||2 ] ≤ σ 2 , ∀ci ∈ C.
training time for the communication round k, which includes To simplify the convergence analysis, we first present an
both computation and communication times. Therefore, we have important theorem as follows, which is given by [33].
Theorem 1: The mean square gradient after K communica-
Tik = τik tki,cmp + tki,com
tion rounds for FL with heterogeneous training intensity satisfies
τik δi Di ξ
= + . (5)
fik ρi hi
1
K−1
Bik ln 1 +
N0 ||∇F (wk )||2
K
k=0
In synchronized FL system, the global model aggregation
starts only when the server receives the updates from all par- 2(F (w0 ) − F (w∗ )) Lηα(2 − γ)σ 2
≤ + + L2 η 2 τ σ 2 , (9)
ticipating clients. Hence, the training time of communication ηατ K N
round k, denoted by T k , can be calculated by
where w0 is the initial model, γik is the compression ratio of
T k = max {Tik }. (6) topk compression operator in round k, and αik is the aggregation
ci ∈C k
weight for client ci in round k, τ = max{τik }, γ = max{γik },
It is worth noting that the value of T k heavily depends on the α = max{αik }, and the local learning rate η satisfies
slowest client in C k .
Our goal is to determine the proper training intensity for each τ 2 L2 η 2 + (2 − γ)ηαLτ ≤ 1. (10)
client according to the information of client’s computation and Based on Theorem 1 and the above assumptions, we derive
communication capabilities, so as to minimize the loss function the convergence rate of our FL as follows.
within a time budget. So our training intensity determination c
Corollary 1: If we choose local learning rate η = √ ,
problem can be formulated as follows, Kτ L
the convergence rate can be bounded as
P1 : min F (wK )
τik
1
K−1
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1406 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1407
distribution π(a|s) over the whole action space. In the k-th Algorithm 1: SAC-Based Algorithm for Overall Training
communication round, the goal of our DRL algorithm is to Intensity.
determine the overall training intensity τ k , so as to balance the
model quality and training time. Therefore, the action is denoted
by
ak = τ k . (18)
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1408 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1409
Algorithm 3: Training of TFL_HTI Algorithm. τik = τ k . (39)
ci ∈C k
Therefore, we have
upper l + r upper
τi ≥ τi (T ∗ ) ≥ τik = τ k ,
k
2 k k
ci ∈C ci ∈C ci ∈C
(41)
which contradicts (37). Therefore, this case is impossible.
To sum up, our proposed approach provides the best result to
reduce the slowest client’s training time in each communication
round of k.
We further consider the computational complexity of
Algorithm 2.
Lemma 2: The computational
complexity of Algorithm 2 is
Tmax − Tmin
O M log2 .
Proof: As Algorithm 2 uses binary search to maintain an
interval in which there exists the minimum of the train-
ing time in communication round k, the complexity is
Tmax − Tmin
thus O log2 . Next, we calculate the upper
bound of local iterations of each client, whose complexity is
O(M ). Therefore,
the complexity
of finding minimal T k is
Tmax − Tmin
O M log2 . Given T k , we then allocate τ k
to clients and ensuring that τik is within its upper bound. This
procedure’s complexity is O(M ).
To sum up, the overall
computational complexity of
Tmax − Tmin
Algorithm 2 is O M log2 , which proves the
Next, we prove that Algorithm 2 minimizes the training time lemma.
of the slowest client. For the sake of simplicity, we assume that
T ∗ is the minimal training time specified by the optimal solution D. The Framework of TFL_HTI
to problem P3. We have
To illustrate the interaction between FL and DRL, we use
∗
Tmin ≤ T ≤ T ≤ Tmax .
k
(35) Fig. 3 to describe the framework of the proposed TFL_HTI
mechanism.
Case 1 (T ∗ = T k ): In this context, the proposed Algorithm 2 is In each communication round k, the DRL agent gets the
an optimal solution to problem P3. state sk from FL environment and takes an action ak based
Case 2 (T ∗ < T k ): In the process of Algorithm 2, there must on Algorithm 1. Given τ k = ak , the server uses the proposed
be an interval [l, r] satisfies, e.g., Algorithm 2 to allocate the appropriate local training iterations
to clients. After receiving the training intensity τik and the
l+r
Tmin ≤ T ∗ ≤ < T k ≤ Tmax , (36) aggregated model wk−1 , client ci trains its model based on its
2 own data and then uploads its updated model and some related
upper l + r parameters to server. The server selects clients randomly for the
τi < τ k. (37)
k
2 next communication round, and predicts/updates its state sk to
ci ∈C
sk+1 . It is worth noting that, in communication round k, the
As T ∗ is the optimal solution to problem P3, there must be an information (sk , ak , rk , sk+1 , done) is stored in the experience
τik , |∀ci ∈ C k } satisfies
allocation { pool D, from which the DRL agent randomly samples minibatch
data to update networks. The detailed process of TFL_HTI can
τik ≤ τiupper (T ∗ ), ∀ci ∈ C k , (38) also be seen in Algorithm 3.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1410 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
IV. PERFORMANCE EVALUATION Logs [39]. This dataset contains the bandwidth measurement
In this section, we use simulations to evaluate the effectiveness data of the 4 G network along several routes in and around
the city of Ghent, Belgium, during the period of 2015-12-16
of the proposed scheme.
to 2016-02-04. In our simulation, each client randomly chooses
a route dataset to generate its network bandwidth information.
A. Experiment Setting For fair comparison, we choose two baseline algorithms,
We conduct a FL system based on two publicly available named FedAvg and FedSAE-Fassa. In each communication
image datasets, Fashion-MNIST [37] and CIFAR-10 [38]. The round, FedAvg introduced in [6] assigns the identical train-
Fashion-MNIST dataset (referred to as FMNIST) is an image ing intensity to each client, which indicates that τik = τjk for
dataset consisting of a training set of 60,000 samples and a ∀ci , cj ∈ C k . FedSAE-Fassa was proposed in [20], which ad-
test set of 10,000 samples. Each sample consists of a 28 × 28 justs their affordable training intensities according to clients’
grayscale image paired with a label from one of ten classes. The historical training information. In our simulation, we randomly
CIFAR-10 dataset contains 50,000 color images for training and choose M = 10 clients to take part in FL in each communi-
10,000 color images for testing over 10 different classes. Each cation round. Specially, we randomly choose the total training
image is a 32 × 32 RGB image. intensity from 20 to 50 in both FedAvg and FedSAE-Fassa in
In the simulation, we evenly distribute the training datasets to each communication round, while we set the overall training
N = 100 clients, and thus each client has 600 data samples of intensity based on DRL policy in our TFL_HTI scheme. All
FMNIST and 500 data samples of CIFAR-10. Similar to [24], the simulations are conducted on a Dell server, with Intel Xeon
for the FMNIST dataset, we train a convolutional neural network Silver 4210, 40 logical CPUs, NVIDIA RTX A5000, and run on
(CNN) model with two 5 × 5 convolution layers. The first layer Debian 10 operating system.
has 32 output channels and the second has 64 output channels,
with each layer followed by 2 × 2 max pooling, two fully
B. The Performance of DRL-Based Algorithm
connected layers, and a final output layer with 10 units. For
the CIFAR-10 dataset, we construct a CNN model with two Firstly, we show the convergence of our DRL training. As
5 × 5 convolution layers, 64, 128 channels in sequence, each of shown in Fig. 4, we present the accumulative rewards under two
them is activated by ReLU function and followed by 2 × 2 max learning rates, 5 × 10−6 and 5 × 10−5 . It is observed that the
pooling, three fully connected layers, and a final output layer accumulative reward firstly increases significantly with the train-
with 10 units. ing episodes, but when the training reaches to a certain episodes
Similar to [14], the size of data samples in users’ local dataset (15 episodes with 5 × 10−5 learning rate and 20 episodes with
Di is 5 MB, and the number of CPU cycles of client i consumed learning rate 5 × 10−6 ), the training gradually stabilizes and
to train one data sample is set to be uniformly distributed between reaches convergence.
[10,30] cycles/bit. The CPU cycle frequency fik of client ci We next test the effectiveness of our trained DRL model
ranges consistently from 0.1 GHz to 1.0 GHz. We also set the in improving the effectiveness of FL, by comparing the pro-
local model parameters’ data size ξ to 5 MB. For the sake of posed TFL_HTI scheme with and without DRL. Specifically,
ρn hn for TFL_HTI without DRL, we randomly set the total local
simplicity, we set the value of ln 1 + to 1.
N0 iterations at the sever. We now present the accuracy impacted by
In order to simulate a dynamic network environment, we addi- training time, where Tm = 2000 seconds. As shown in Fig. 5, the
tionally employ a real trace dataset called 4 G/LTE Bandwidth test accuracy of each method increases over time. It can also be
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1411
Fig. 5. Performance on test accuracy with training time. shown in Fig. 7, we vary the number of communication rounds
from 0 to 100. It is shown that the training time of FedSAE-Fassa
is smaller than FedAvg in most communication rounds, because
FedSAE-Fassa allocates different affordable training intensities
to clients by using historical information. Owing to the allocation
of the same training intensity in FedAvg, clients with weak
computation capabilities or communication capabilities may
need too much time to finish training. As a result, FedAvg
achieves the worst performance of these three algorithms. The
per-round time of our algorithm is always smaller than that
of FedAvg and FedSAE-Fassa, thanks to our optimal training
intensity allocation.
Fig. 6. Performance on loss with training time.
Secondly, we compare the total training time used during
FL in Fig. 8. It is shown that the total training time of our
proposed method does not considerably increase with the growth
observed that the TFL_HTI algorithm with DRL achieves better
of communication rounds, which further verifies the training
model accuracy than without DRL. This implies that the DRL
efficiency of FL. This is reasonable, as our optimal algorithm
agent can pick up a good strategy for choosing the proper overall
always allocates the local iterations to clients to reduce the
training intensity considering clients’ computation capabilities,
training time of the slowest client in each communication round.
communication capabilities within time limit, i.e., improving
In this way, the next communication round can start earlier.
the accuracy of FL with less time. Similar results can also be
In the second simulation, we evaluate the performance of
found in Fig. 6, and the TFL_HTI scheme with DRL policy can
our optimal allocation algorithm impacted by the number of
converge faster than without DRL, which further confirms the
clients participating in a communication round. Similar to
effectiveness of our DRL-based algorithm.
the first simulation, we set Tm = 2000 s and the same τ k in
the three algorithms. As shown in Figs. 9 and 10, we range the
C. The Performance of Optimal Allocation of Intensities number of clients chosen in a communication round from 10 to
We now test the effectiveness of our optimal local training 40. It turns out that the test accuracy and the training loss of FL
intensity allocation algorithm. model on both FMNIST and CIFAR-10 datasets change little
Besides presenting the accuracy, we also investigate the train- when the number of clients rises. This is possible, as clients’
ing time in each communication round, so as to verify how training datasets are independently identically distributed. In
efficient our algorithm in reducing the waiting latency during addition, the overall training intensity is fixed and clients will
training. For fair comparison, we use the same random policy to be allocated with smaller local iterations with the growth of
determine the overall training intensity in each communication the per-round number of them. Nonetheless, the test accuracy
round, which means τ k is the same in these three algorithms. As achieved by the proposed algorithm is still better than the
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1412 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
baseline algorithms. It can also be seen that the average train- In the third simulation, we illustrate the impact of the total
ing time of FedAvg algorithm increases significantly when the training intensity on the performance of the proposed allocation
number of clients becomes larger. On the contrary, the average algorithm on both datasets within Tm = 2000 s. We vary the
training time of FedSAE-Fassa scheme does not increase very overall training intensity τ k in Figs. 11 and 12 from 20 to 80, and
fast since FedSAE-Fassa algorithm can predict clients’ afford- fix the per-round selected clients M = 10. Figs. 11 and 12 show
able local training iterations. Additionally, the average training that the performance of our scheme achieves the best test accu-
time of our local iteration allocation algorithm keeps stable with racy and training loss on both FMNIST and CIFAR-10 datasets.
the increasing number of clients participating in a round, which With the growth of the overall training intensity, the model
shows the efficiency and robustness of our strategy. accuracy trained by these three algorithms increase slightly. This
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1413
Fig. 13. Performance on test accuracy with communication rounds. Fig. 14. Performance on loss with communication rounds.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1414 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1415
[29] T. Wang, Y. Liu, X. Zheng, H. -N. Dai, W. Jia, and M. Xie, “Edge-based Pan Zhou (Senior Member, IEEE) received the B.S.
communication optimization for distributed federated learning,” IEEE degree in the Advanced Class from the Huazhong
Trans. Netw. Sci. Eng., vol. 9, no. 4, pp. 2015–2024, Jul./Aug. 2022. University of Science and Technology (HUST),
[30] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, “Federated Wuhan, China and the M.S. degree from the De-
learning with compression: Unified analysis and sharp guarantees,” in partment of Electronics and Information Engineering
Proc. Int. Conf. Artif. Intell. Statist., 2021, pp. 2350–2358. from HUST, in 2006 and 2008, respectively. He re-
[31] D. Basu, D. Data, C. Karakus, and S. N. Diggavi, “Qsparse-local-SGD: ceivedthe Ph.D. degree from the School of Electrical
Distributed SGD with quantization, sparsification, and local computa- and Computer Engineering, the Georgia Institute of
tions,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 217–226, May 2020. Technology (Georgia Tech), Atlanta, GA, USA, in
[32] H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial 2011. He is currently a Full Professor and Ph.D.
worker participation in non-iid federated learning,” in Proc. Int. Conf. Advisor with Hubei Engineering Research Center on
Learn. Representations, 2021, pp. 1–11. Big Data Security, School of Cyber Science and Engineering, HUST. He held
[33] Y. Xu, Y. Liao, H. Xu, Z. Ma, L. Wang, and J. Liu, “Adaptive control of honorary degree in his bachelor and merit research award of HUST in his master
local updating and model compression for efficient federated learning,” study. He was a Senior Technical Member with Oracle Inc., Austin, TX, USA,
IEEE Trans. Mobile Comput., vol. 22, no. 10, pp. 5675–5689, Oct. 2023. from 2011 to 2013, and worked on Hadoop and distributed storage system for
[34] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Big Data analytics with Oracle Cloud Platform. He has published more than
Cambridge, MA, USA: MIT Press, 2018. 170 refereed papers in international leading journals and key conferences in the
[35] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- area of security and privacy, Big Data analytics, machine learning, mobile com-
policy maximum entropy deep reinforcement learning with a stochastic puting and networks, including: /ACM/IEEE TRANSACTIONS ON NETWORKING,
actor,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 1861–1870. IEEE TKDE, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUT-
[36] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft-actor-critic: Off- ING, TIFS, IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE
policy maximum entropy deep reinforcement learning with a stochastic TRANSACTIONS ON MOBILE COMPUTING, TPDS, TIT, IEEE TRANSACTIONS ON
actor,” in Proc. 35th Int. Conf. Mach. Learn., Stockholm, Sweden, 2018, IMAGE PROCESSING, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, IEEE
pp. 1–10. TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, TCOMP,
[37] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel image IEEE TRANSACTIONS ON MULTIMEDIA, TII, TAI, IEEE TRANSACTIONS ON
dataset for benchmarking machine learning algorithms.” 2017. [Online]. COMMUNICATIONS, IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS,
Available: https://arxiv.org/abs/1708.07747 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, IEEE TRANS-
[38] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from ACTIONS ON VEHICULAR TECHNOLOGY, IEEE TRANSACTIONS ON EMERGING
tiny images.” Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009. TOPICS IN COMPUTATIONAL INTELLIGENCE, ICDE, INFOCOM, CVPR, ICCV,
[39] J. van der Hooft et al., “4 G/LTE bandwidth logs.” 2016. [Online]. ICDCS, ICPP, ACM MM, TOS, TKDD, AAAI, IJCAI, NAACL, COLING,
Available: http://users.ugent.be/jvdrhoof/dataset-4g/ PoPETs/PETS, SECON, CIKM, and ECAI. He was the recipoent of the Rising
[40] D. Lopez-Perez, A. Valcarce, G. de la Roche, and J. Zhang, “OFDMA Star in Science and Technology of HUST in 2017. He is an Associate Editor
femtocells: A roadmap on interference avoidance,” IEEE Commun. Mag., with IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING.
vol. 47, no. 9, pp. 41–48, Sep. 2009, doi: 10.1109/MCOM.2009.5277454.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.