0% found this document useful (0 votes)
30 views14 pages

Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach

Uploaded by

neghaajayaghosh7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

Time-Sensitive Federated Learning With Heterogeneous Training Intensity A Deep Reinforcement Learning Approach

Uploaded by

neghaajayaghosh7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1402 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO.

2, APRIL 2024

Time-Sensitive Federated Learning


With Heterogeneous Training Intensity:
A Deep Reinforcement Learning Approach
Weijian Pan , Xiumin Wang , Pan Zhou , Senior Member, IEEE, and Weiwei Lin , Member, IEEE

Abstract—Federated learning (FL) has recently received suffi- For instance, real-time traffic information can be trained to
cient attention because of its capability of collaboratively training enable intelligent driving applications [3], [4], [5]. However,
machine learning models without exposing data privacy. Most exist- due to the limited network bandwidth and privacy concerns, it
ing FL schemes assume the fixed or predetermined local training in-
tensities/iterations at clients for each communication round, which is impractical to upload all these Big Data from edge devices to
however neglects the effect of local training intensity determination remote server for further training. In this context, traditional ma-
on the performance of FL. Besides that, in traditional FL, the clients chine learning mechanisms that heavily rely on data collection in
are assigned to conduct the same number of training iterations. In a centralized manner can not support these Big Data applications
this context, the client with low computation or communication
well.
capability may slow down the global model aggregation, which
causes high waiting latency at other clients. To address these is- To address the aforementioned issues, Federated Learning
sues, this paper proposes a novel Time-sensitive FL mechanism (FL), also known as federated optimization, has fortunately been
with Heterogeneous Training Intensity at clients, named TFL_HTI. proposed by Google [6], [7], [8]. Roughly speaking, FL allows
Specifically, we first explore the bounded convergence rate of FL multiple edge devices/clients to cooperatively train machine
with heterogeneous training iterations. Then, we design a Deep
learning models based on their local datasets, while without
Reinforcement Learning (DRL) approach to determine the overall
training intensity of clients in each communication round. Based on exposing data privacy of clients. Particularly, [6] proposes a
this, we further design an optimal deterministic algorithm to assign well-known FL algorithm, named FedAvg, which repeatedly
the appropriate local iterations to clients based on their training involves local model training at clients and global model ag-
capabilities. Finally, we conduct simulations to demonstrate the gregation at the centralized server. FL protects data owners’
effectiveness of our proposed scheme.
privacy because their data never leave their clients, which makes
Index Terms—Federated learning, time-sensitive, heterogeneous it a good fit for Big Data training in IoT. Since then, FL
training, deep reinforcement learning. has been implemented in many areas. For example, Google
keyboard uses FL to enhance the next word predictions [9],
I. INTRODUCTION while [10] utilizes FL to predict hospitalizations in the medical
field.
ITH the development of Internet of Things (IoT), a
W large volume of data has been generated at network
edges [1], [2]. Generally, these Big Data can be efficiently used
To optimize the performance of FL, a great deal of research
has been carried out to boost the training accuracy [11], [12],
[13], reduce the communication overhead [14], [15], [16], [17],
to train machine learning models, and obtain useful information
or incentivize the clients in FL [18]. Particularly, most FL
including environment detection, future event prediction, etc.
schemes assume that the server has to wait for all the clients
to finish their pre-determined local training iterations before
Manuscript received 27 April 2023; revised 18 September 2023; accepted
21 October 2023. Date of publication 8 January 2024; date of current version
global model aggregation. However, such a synchronous training
27 March 2024. This work was supported in part by the National Natural method may bring several issues. Firstly, it is nontrivial to
Science Foundation of China under Grants 62072187 and 61972448, in part accurately measure the proper training intensity of clients during
by Guangdong Basic and Applied Basic Research Foundation under Grant
2022B1515020015, in part by the Major Key Project of PCL under Grant
each communication round. Generally, higher training intensity
PCL2023AS7-1, and in part by Guangzhou Development Zone Science and in each communication round can reduce the communication
Technology Project under Grant 2021GH10. (Corresponding author: Xiumin overhead, which however may result in a bias model. Alterna-
Wang.)
Weijian Pan, Xiumin Wang, and Weiwei Lin are with the School of Computer
tively, simply lowing training intensity in each communication
Science and Engineering, South China University of Technology, Guangzhou round may enlarge the whole training time of FL. Secondly, due
510006, China (e-mail: [email protected]; [email protected]; to the heterogeneous of hardware setting [19], the clients with
[email protected]).
Pan Zhou is with the Hubei Key Laboratory of Distributed System Security,
the lowest computation or communication capability may drag
Hubei Engineering Research Center on Big Data Security, School of Cyber down the whole FL training. An intuitive way to alleviate such
Science and Engineering, Huazhong University of Science and Technology, a straggler problem is to allocate heterogeneous local training
Wuhan, Hubei 430074, China (e-mail: [email protected]).
Recommended for acceptance by Y. Yuan.
iterations to clients according to their training capabilities. How-
Digital Object Identifier 10.1109/TETCI.2023.3345366 ever, it is still challenge to know the exact training capability of

2471-285X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1403

clients and determine the appropriate training intensity in each TABLE I


DESCRIPTIONS OF THE PRIMARY NOTATIONS
communication round, so as to optimize both training accuracy
and training time.
The work in [20] introduces a new self-adaptive federated
framework, named FedSAE, which modifies the affordable
training intensity for each client, by leveraging the historical
training information of clients. In order to achieve a trade-off
between minimizing the training time gap of clients and in-
creasing convergence gain, the work in [21] designs FedAda
to determine the number of local iterations for clients. A new
FL protocol called FedCS has been proposed in [22], which
actively manages clients according to their resource conditions.
A Tier-based FL system (TIFL) has also been proposed in [23],
which divides clients into multiple tiers according to their train-
ing performance. The clients belonging to the same tiel will
be chosen in the same communication round. All these works
indeed alleviate the heterogeneity issue of clients to a certain
extent, while neglecting the impact of the exact value of training
intensity on the performance of FL, e.g., training accuracy and
time.
Motivated by the above observations, this paper seeks to
improve the existing FL schemes by taking into account the
following issues: 1) time-sensitive, which means the whole
training time of FL is limited by a time budget; 2) hetero-
geneity, which indicates that the clients are heterogeneous in
their computation and communication capabilities; 3) dynamic takes actions on determining the suitable overall training
context, which represents the training capabilities of clients, intensity of clients in each communication round.
r An optimal deterministic algorithm is then proposed to
e.g., the network condition, vary over time and are unknown
in advance; 4) adaptive training intensity, which means that the assign the proper local iterations to clients, so as to reduce
local iterations assigned to clients are adaptively determined in the overall training time in the current communication
each communication round. Although the heterogeneity issue round.
r We conduct simulations based on two commonly used
of FL has already been studied in the literature, existing works
mainly use the strategy of choosing the right clients [12], [24], datasets to verify the effectiveness of the proposed scheme,
or asynchronous training [25], [26]. Nevertheless, frequently i.e., improving the training accuracy and accelerating
choosing the same clients may result in a biased model, and model convergence.
training in an entirely asynchronous manner could make certain The remainder of this paper is organized as follows.
local models out of date. Most related work to ours is to assign Section II introduces the system model and formulates the prob-
heterogeneous training intensity to clients based on their training lem. The detailed algorithm is designed in Section III. Section IV
capability [20], [27]. Nevertheless, none of them studies how to describes simulation results. Finally, we conclude our work in
determine the exact value of training intensity and balance the Section V.
tradeoff between training intensity and time.
Considering the above issues, this work intends to present a II. SYSTEM MODEL AND PROBLEM FORMULATION
novel time-sensitive FL scheme, which adaptively determines In this section, we first provide a quick overview of feder-
the overall training intensity of clients in each communication ated learning. Next, we introduce the motivation of designing
round, and heterogeneously allocates them to each client. We heterogeneous federated learning. Furthermore, we formulate
refer to such a Time-sensitive FL with Heterogeneous Training the problem. Finally, we analyze the theoretical convergence
Intensity as TFL_HTI problem. The main contribution of this bound of FL with heterogeneous local training iterations. To
paper can be summarized as follows.
r To optimize the training accuracy and meet the time con- ease understanding, the primary notations used in this research
are summarized in Table I.
straint of FL, we formulate a time-sensitive FL optimiza-
tion problem, which targets at determining and allocating
the heterogeneous training intensities to clients in each A. Overview of Federated Learning
communication round. We consider a general federated learning system consisting of
r The theoretical convergence bound is explored for the pro- a set C = {c1 , c2 , . . ., cN } of mobile devices. In this system, all
posed FL with heterogeneous training iterations of clients. mobile devices are connected to a centralized server. Each client
r To consider the dynamic context of FL, we construct a Deep ci holds a local dataset Di , whose size is denoted by Di . The
Reinforcement Learning (DRL) model, in which the agent j-th data sample in Di can be represented by (xij , yji ), where

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1404 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

Fig. 1. Impact of training intensity for each communication round. Fig. 2. Motivation of Heterogeneous FedAvg.

the vector xij is the input of the training model and yji is the the increment of local iterations, the accuracy increases first but
desired output of the training model. A typical federated learning then decreases. It is observed that simply setting too more or
process consists of multiple communication rounds. In each too less local iterations may not achieve the best FL efficiency.
communication round, the centralized server selects a certain Instead, FedAvg with 5 local training iterations for each client
number of clients to participate in training, and distributes the achieves the highest training accuracy. Therefore, allocating the
latest global model parameter w to clients. Then, each selected adequate local training iterations for each communication round
client trains its local model using gradient-descent algorithm is one of the most critical issues in FL.
based on its local dataset, and uploads its local training model Besides that, due to the heterogeneity of clients, previous
to the server. After receiving all the local models, the server works that allocate the same number of training iterations to
aggregates and updates the global model. Repeat the above each client may bring a straggler problem, i.e., the global model
procedure, until completing a specified number of rounds or aggregation is slowed down by the client with the lowest com-
achieving a target accuracy. putation and communication capability. To address the above
To formulate the training process, we use fi ((x, y), w) to problem, an intuitive solution is to give clients different training
denote the loss function of client ci on its parameter vector w intensities based on their training capabilities. To verify this,
and sample (x, y), and Fi (w) to denote the loss function of client we revise the traditional FedAvg by assigning more training
ci on dataset Di . Similar to previous works [6], [7], [8], Fi (w) iterations to clients with higher computation or communication
can be updated as follows, capabilities. Since Fig. 1 shows FedAvg with 5 iterations per-
 forms the best, we compare the time efficiency of our revised Fe-
(xij ,yji )∈Di fi ((xj , yj ), w)
i i
Fi (w) = . (1) dAvg (referred to as Heterogeneous FedAvg) to that of FedAvg
Di with 5 iterations. As shown in Fig. 2, the accuracy achieved by
The global loss function over all the distributed datasets can Heterogeneous FedAvg performs much better than FedAvg with
be formulated into 5 iterations. Therefore, assigning appropriate training intensities
N to clients is also important to improve the performance of FL.
Di Fi (w)
F (w) = i=1 N . (2)
i=1 Di C. Problem Formulation
The problem is to minimize (2) with respect to w.
Inspired by the above observations, an efficient FL should
appropriately address the following questions:
B. Motivation on Heterogeneous Training Intensities r How to determine the exact training intensity for each com-
Most existing FL algorithms, such as FedAvg [6], always munication round, so as to improve the training accuracy
assume the fixed local training iterations/intensities of clients of FL?
for each communication round. However, few works study how r Given the training intensity of each communication round,
to determine the exact value of training intensities. how to assign appropriate local iterations to clients, ac-
To investigate the impact of training intensity on the perfor- cording to their heterogeneity in computation and commu-
mance of FL, we conduct a simple simulation based on FedAvg nication capabilities?
under the CIFAR-10 dataset. In the simulation, we uniformly Without loss of generality, we define τ k as the total training
distribute the training data to N = 100 clients, and then run intensity in round k, and τik as the local iterations given to client
FedAvg to train a CNN network within 3000 s time constraint. ci in the k-th communication round.
As shown in Fig. 1, we study 5 settings of local training iterations Suppose that in each communication round, M clients will be
of each client, ranging from 1 to 9 in each communication round. chosen randomly to participate in training for fairness. Denote
It is shown that FedAvg with 1 iteration performs the worst. With C k as the group of chosen clients in communication round k.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1405

For each client ci ∈ C k , we use fik , δi and Bik to denote the CPU τik > 0, ∀k, ∀ci ∈ C k , (8)
frequency used for FL in k-th round, the CPU cycles consumed
for a single data sample, and the communication bandwidth in where Tm denotes the FL system’s training time budget, and K
k-th round, respectively. It is worth noting that fik and Bik are is the total number of communication rounds. In the above for-
two parameters indicating the training capability of client ci . mulation, the constraint in (7) indicates that the time used during
After client ci finishes its local training, it can upload these two K rounds should not exceed the time budget. To guarantee the
parameters as well as its local training model. Then, the server training quality, each selected client in the k-th communication
can use them to predict their training capabilities for the next round should execute at least one local iteration, as shown in (8).
round. According to [14], the computation time of client ci per We refer to such Time-sensitive Federated Learning with Het-
unit local iteration, denoted by tki,cmp , can be formulation into erogeneous Training Intensities allocation problem as TFL_HTI
problem.
δi Di
tki,cmp = . (3)
fik D. Convergence Analysis
After local training, each client uploads its training model and In this subsection, we analyze the theoretical convergence
some related parameters to the server for global model aggre- bound of FL with heterogeneous training intensity. Firstly, we
gation. Since the downlink bandwidth is much larger than that make the following assumptions that have been applied fre-
of uplink, the model downloading time is negligible compared quently in previous works [13], [30], [31], [32], [33].
with uploading time [29]. We consider an orthogonal frequency Assumption 1: (L-Lipschitz Continuous Gradient) There is
division multiple access (OFDMA [40]) technique for clients’ a constant L > 0 that causes ||∇Fi (x) − ∇Fi (y)|| ≤ L||x −
uplink transmission. Then, the communication time of client ci y||, ∀x, y ∈ Rd .
in communication round k can be calculated by Assumption 2: (Unbiased Local Gradient Estimator) Let ξik
ξ represent a random local data sample in the k-th step at
tki,com =  , (4) the i-th client. The local gradient estimator is unbiased, i.e.,
ρi hi
Bik ln 1 + E[∇Fi (wik , ξik )] = ∇Fi (wik ), ∀ci ∈ C, where the expectation
N0
is based on the samples from all local datasets.
where ξ is the size of clients’ model parameters, ρi is the trans- Assumption 3: (Bounded Local Variance) There exists a con-
mission power, hi is the channel gain, and N0 is the background stant σ, such that the variance of each local gradient estimator is
noise. We further use Tik to represent the i-th client’s entire bounded by E[||∇Fi (wik , ξik ) − ∇Fi (wik )||2 ] ≤ σ 2 , ∀ci ∈ C.
training time for the communication round k, which includes To simplify the convergence analysis, we first present an
both computation and communication times. Therefore, we have important theorem as follows, which is given by [33].
Theorem 1: The mean square gradient after K communica-
Tik = τik tki,cmp + tki,com
tion rounds for FL with heterogeneous training intensity satisfies
τik δi Di ξ
= +  . (5)
fik ρi hi
1 
K−1
Bik ln 1 +
N0 ||∇F (wk )||2
K
k=0
In synchronized FL system, the global model aggregation
starts only when the server receives the updates from all par- 2(F (w0 ) − F (w∗ )) Lηα(2 − γ)σ 2
≤ + + L2 η 2 τ σ 2 , (9)
ticipating clients. Hence, the training time of communication ηατ K N
round k, denoted by T k , can be calculated by
where w0 is the initial model, γik is the compression ratio of
T k = max {Tik }. (6) topk compression operator in round k, and αik is the aggregation
ci ∈C k
weight for client ci in round k, τ = max{τik }, γ = max{γik },
It is worth noting that the value of T k heavily depends on the α = max{αik }, and the local learning rate η satisfies
slowest client in C k .
Our goal is to determine the proper training intensity for each τ 2 L2 η 2 + (2 − γ)ηαLτ ≤ 1. (10)
client according to the information of client’s computation and Based on Theorem 1 and the above assumptions, we derive
communication capabilities, so as to minimize the loss function the convergence rate of our FL as follows.
within a time budget. So our training intensity determination c
Corollary 1: If we choose local learning rate η = √ ,
problem can be formulated as follows, Kτ L
the convergence rate can be bounded as
P1 : min F (wK )
τik
1 
K−1

subject to ||∇F (wk )||2


K
k=0

K
T k ≤ Tm , (7) 2L(F (w0 ) − F (w∗ )) cσ 2 c2 σ 2
≤ √ +√ + , (11)
k=1 c K Kτ N Kτ

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1406 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

where c is a constant coefficient. time, which can be formulated as:


Proof: In our algorithm, clients upload model without com-
P3 : min T k ,
pression, so the compression ratio γik = 1. What’s more, the τik
model aggregation weight is the same for all clients, i.e. αik = 1.
subject to
According to (9), we get 

K−1 τik = τ k , ∀k, (15)
1 2
||∇F (w )||
k ci ∈C k
K
k=0
τik > 0, ∀k, ∀ci ∈ C k . (16)
0 ∗ 2
2(F (w ) − F (w )) Lησ
≤ + + L2 η 2 τ σ 2 . (12) The constraint in (16) indicates that each selected client must
ητ K N participate in FL.
c In the following subsections, we will respectively design algo-
If we set local learning rate η = √ , we can get (11), which
Kτ L rithms to determine the overall training intensity, i.e., addressing
thus proves the corollary.  problem P2, and allocate the appropriate local training iterations
The bound of the convergence rate in (11) contains three to each client, i.e., addressing problem P3.
 The
terms.  first and the second terms decrease at the  1 rate
O √1K as K increases. The third term shrinks at rate O K as B. Overall Training Intensity Determination Based on DRL
K increases. With Corollary 1, TFL_HTI
 problem can achieve To determine the overall training intensity for each commu-
a speedup of convergence rate O √1K , indicating that the nication round (i.e., problem P2), the server needs to know the
solution method to TFL_HTI problem can make contribution to training capabilities of clients. However, in a practical dynamic
speeding up the training of FL without the loss of convergence FL context, the capabilities of clients, especially for their net-
performance. work conditions, are unknown in advance and vary over time. In
this context, it is inefficient to design deterministic algorithm to
III. ALGORITHM DESIGN OF TFL_HTI PROBLEM solve it. Fortunately, DRL algorithm has shown great potential
Due to its hardness of formulating the loss function impacted in learning an optimal action policy with uncertain or unstable
by training intensity and time budget, it is difficult to calculate information.
the proper training intensity of clients. To address this issue, we 1) DRL Model Design: In an unstable environment, the DRL
firstly simplify the original problem. Then, we design algorithms algorithm has been widely utilized to learn and improve an action
to solve the problem. policy using historical data. [34], [36]. In order to calculate the
overall training intensity τ k in the upcoming communication
A. Problem Simplification round k, we now build a DRL model for problem P2.
We build a DRL system in the centralized server. The DRL
As discussed above, it is nontrivial to optimize our original system has an agent interacting with FL environment during
problem P1 directly. To simplify the problem, we divide the training rounds. In each communication round k, the policy net-
original training intensity determination procedure into two work in the agent receives a state sk and outputs the probability
steps. In the first step, the overall training intensity of clients of some actions, in which an action ak will be chosen. After
for communication round k, denoted by τ k , is determined by executing action ak , the agent receives the next  state sk+1 and
the server, referred to as subproblem P2, based on which, in the a reward rk . The accumulated reward is Rk = K−k i
i=0 γ rk+i ,
second step, the server allocates these τ k training intensities to where γ is a discounted factor. The objective of the DRL agent
clients, referred to as subproblem P3. is to maximize the expected accumulated reward for each state
We first study how to determine the overall training inten- sk . The following is a detailed description of our DRL system’s
sity τ k for communication round k, which is a subproblem of state, action, and reward.
problem P1 and can be formulated as: State: In DRL model, the system records the selected clients’
P2 : min F (wK ), computation and communication capabilities and then calcu-
τk tki,cmp tki,com
lates P k = |ci ∈ C k , M k = |ci ∈ C k in
subject to Tm Tm

K the k-th communication round. Then, we define the state of
T k ≤ Tm , (13) communication round k as follows
k=1
sk = (Fk−1 , Hk−1 , P k , M k ). (17)
τ ≥ M, ∀k,
k
(14) k−1 k
T
where M denotes the total number of clients selected in each Here, Hk−1 = k =1 is the percentage of the time con-
Tm
communication round. Equation (14) is converted from the sumed before round k, and Fk−1 denotes the global model loss
constraint in (8). before round k, respectively.
Given the solution of problem P2, we then allocate these τ k Action: Given the current state sk , the DRL agent chooses an
intensity to clients to minimize the maximum clients’ training action based on a policy, which can be expressed by a probability

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1407

distribution π(a|s) over the whole action space. In the k-th Algorithm 1: SAC-Based Algorithm for Overall Training
communication round, the goal of our DRL algorithm is to Intensity.
determine the overall training intensity τ k , so as to balance the
model quality and training time. Therefore, the action is denoted
by

ak = τ k . (18)

Reward: Given the action ak , the server then uses an optimal


algorithm to allocate local iterations to clients, which will be
described in the next subsection. Each client trains its local
model based on the received number of local iterations. After
that, the DRL agent can get the latest information about FL
system and calculate reward, with
 
rk = α1 (Fk−1 − Fk ) − (Hk − Hk−1 ), (19) Q̂(sk , ak ) = R(sk , ak ) + γEsk+1 ∼p Vψ̂ (sk+1 ) , (25)

where α1 is a positive constant used to ensure that rk increases


with the reduced global loss. In addition, the longer training where ψ̂ is the parameter of the target value network and it be
time the communication round k takes, the less reward the DRL softly updated with the value network as follows,
agent will receive. Parameter α1 can also be interpreted as the
ψ̂ = βψ + (1 − β)ψ̂, (26)
normalized factor to strike the balance between loss reduction
and time consumption.
where 0 < β ≤ 1.
2) SAC-Based Solution Method: We train our DRL model
Policy Network Update: The expected Kullback-Leibler (KL)
using the Soft Actor Critic method (SAC [35]), as SAC has
divergence can be minimized to train the policy network param-
advantages in terms of exploration effectiveness, convergence
eters as follows,
properties and sample complexity [36]. SAC is an off-policy
deep RL algorithm based on the maximum entropy RL frame-   
exp(Qθ (sk , .))
work where actor network tries to maximize not only the ex- Jπ (φ) = Esk ∼D DKL πφ (.|sk )|| , (27)
Zθ (sk )
pected sum of rewards but also the entropy. The goal of SAC
algorithm is to find an optimal policy π ∗ using where Zθ (sk ) normalizes the distribution.
As presented in Algorithm 1, we then can get the total number

π ∗ (.|sk ) = arg max Eπ Rk + α2 H(π(.|sk )) (20) of iterations in the k-th communication round, τ k .
π
π We now analyze the time complexity of the above DRL
algorithm.
with
Lemma 1: Using Multilayer Perceptron (MLP) as actor
Rk = rk + γrk+1 + · · · + γ K−k rK , k ≤ K, (21) network,
 the computational complexity of Algorithm 1 is
O(d l−1 j=1 nj nj+1 ), where d is the batch size, nj is the number
where γ ∈ [0, 1] represents the discount factor, and K denotes of neurons of the j-th layer, and l is the total number of layers.
the total number of communication rounds in FL. H(π(.|sk )) Proof: The time complexity of the proposed DRL algorithm
is the entropy of the policy at state sk , and α2 is a nonnegative mainly depends on the computational complexity of back prop-
constant so as to balance exploration and exploitation. agation and feed forward pass procedures of neural network (the
Value Network Update: By minimizing the squared residue computational complexity of Algorithm 1), which can be derived
error between the target value function and the soft value func- by the number of real additions CA and real multiplications CM .
tion, the value network is updated as follows, It is worth noting that, each activation function can be simply
 2 regarded as consuming one real addition. So its computational
1 complexity is given as follows,
JV (ψ) = Esk ∼D (Vψ (sk ) − Vsof t (sk )) , (22)
2

l−1
with CA = d nj nj+1 , (28)
Vsof t (sk ) = Eak ∼π [Qsof t (sk , ak ) − α3 logπφ (ak |sk )] . (23) j=1
⎛ ⎞
In (22), D represents the experience pool, and α3 is a temperature 
l−1 
l−1
CM = d ⎝ nj nj+1 + nj+1 ⎠ . (29)
parameter which can be automatically adjusted in (23). j=1 j=1
Soft-Q Network Update: Soft Q function can be calculated by
minimizing To sum up, the overall
time complexity of Algorithm 1
 2 is O(CA + CM ) = O(d l−1 j=1 nj nj+1 ), which proves the
JQ (θ) = E(sk ,ak )∼D Qθ (sk , ak ) − Q̂(sk , ak ) , (24) lemma. 

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1408 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

C. Local Training Iteration Allocation Algorithm 2: Training Intensities Allocation Algorithm


To solve problem P3 in this subsection, we first propose an Based on Binary Search.
optimal algorithm, using the information of clients’ computation
and communication capabilities. Then, we evaluate the effec-
tiveness of the given algorithm.
1) Algorithm Design: We use M to represent the number of
clients taking part in the k-th communication round, i.e., M =
|C k |. We need to reduce the training time of the slowest client in
order to minimize the maximum training time for all clients. One
naive way is to enumerate a variable T ∈ [Tmin , Tmax ]. Here, Tmin
and Tmax are the lower and upper bounds of all possible training
time of clients respectively, i.e.,
Tmin = max {tki,cmp + tki,com }, (30)
ci ∈C k

Tmax = max {tki,cmp τ k + tki,com }. (31)


ci ∈C k

We try to allocate τ k to clients to ensure that all training time


of clients are no more than T , i.e., Tik ≤ T . Then we can test
whether the training intensity τ k can be allocated to clients in
C k or not. In this way, we can find the minimal T k with the
corresponding local training intensity allocation.
In fact, the naive way has very high time complexity since
Tmax − Tmin may be a very big real number. Therefore, we use
binary search to speed up the process. The procedure of our
algorithm is showed as follows.
r We maintain an interval [l, r] and initialize l = Tmin , r =
Tmax .
r We set T k = l + r , as a new upper bound of the training
2
time of all clients. That means our allocation should satisfy,
Tik ≤ T k , ∀ci ∈ C k . (32)
According to (5), we can get the upper bound of τik as
follows,
⎢⎛ ⎞ ⎥
⎢ ⎥
⎢ ⎥
⎢⎜ ξ ⎟ ⎥
τiupper (T k )=⎢⎜T k −   ⎟ fi ⎥ ,∀ci ∈ C k . r Repeat the above procedure until l and r is approximately
⎣⎝ ρi hi ⎠ c i Di ⎦
Bik ln 1+ the same, i.e., r − l ≤ , where  is a small positive real
N0
(33) number. Then we find the minimal T k .
We assign the training intensities to clients in C k after know-
r If the sum of all upper bounds is no less than the sum of ing the minimal T k . For every client ci ∈ C k , we first assign one
training intensities, i.e., local iteration to it. We then allocate the remaining number of
 upper training intensity to clients until their local iterations approach
τi (T k ) ≥ τ k , (34)
their upper bounds or all the training intensity has been dis-
ci ∈C k
tributed. The detail of this algorithm is showed in Algorithm 2.
it means that we can allocate training intensities to clients 2) Performance Analysis: We evaluate the effectiveness of
so as to keep their training time within T k . In this context, the proposed algorithm.
constraint in (32) is satisfied, which means the training time Theorem 2: The optimal solution to problem P3 is obtained
in communication round k is within T k . In this way, we by Algorithm 2.
can set r = T k . Proof: We first prove that the allocation given by Algorithm 2
r Alternatively, if the sum of the upper bounds is less than satisfies the training time of each client no more than T k given
the overall training intensity, it means that we can not by Algorithm 2.
find a feasible allocation to keep their training time within As shown from Line 13 to 16 in Algorithm 2, there must be
k
upper bound T because the remaining training intensities, a feasible allocation satisfying (32). Algorithm 2 constructs an
i.e., τ − ci ∈C k τiupper (T k ) must be allocated to clients.
k
training intensities allocation ensuring that the training time of
Therefore, we set l = T k . each client is within T k .

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1409


Algorithm 3: Training of TFL_HTI Algorithm. τik = τ k . (39)
ci ∈C k

According to (33), we have


 
l+r
τiupper (T ∗ ) ≤ τiupper . (40)
2

Therefore, we have
 upper  l + r   upper 
τi ≥ τi (T ∗ ) ≥ τik = τ k ,
k
2 k k
ci ∈C ci ∈C ci ∈C
(41)
which contradicts (37). Therefore, this case is impossible.
To sum up, our proposed approach provides the best result to
reduce the slowest client’s training time in each communication
round of k. 
We further consider the computational complexity of
Algorithm 2.
Lemma 2: The computational
  complexity of Algorithm 2 is
Tmax − Tmin
O M log2 .

Proof: As Algorithm 2 uses binary search to maintain an
interval in which there exists the minimum of the train-
ing time in communication  round k, the complexity is
Tmax − Tmin
thus O log2 . Next, we calculate the upper

bound of local iterations of each client, whose complexity is
O(M ). Therefore,
 the complexity
 of finding minimal T k is
Tmax − Tmin
O M log2 . Given T k , we then allocate τ k

to clients and ensuring that τik is within its upper bound. This
procedure’s complexity is O(M ).
To sum up,  the overall
 computational  complexity of
Tmax − Tmin
Algorithm 2 is O M log2 , which proves the

Next, we prove that Algorithm 2 minimizes the training time lemma. 
of the slowest client. For the sake of simplicity, we assume that
T ∗ is the minimal training time specified by the optimal solution D. The Framework of TFL_HTI
to problem P3. We have
To illustrate the interaction between FL and DRL, we use

Tmin ≤ T ≤ T ≤ Tmax .
k
(35) Fig. 3 to describe the framework of the proposed TFL_HTI
mechanism.
Case 1 (T ∗ = T k ): In this context, the proposed Algorithm 2 is In each communication round k, the DRL agent gets the
an optimal solution to problem P3. state sk from FL environment and takes an action ak based
Case 2 (T ∗ < T k ): In the process of Algorithm 2, there must on Algorithm 1. Given τ k = ak , the server uses the proposed
be an interval [l, r] satisfies, e.g., Algorithm 2 to allocate the appropriate local training iterations
to clients. After receiving the training intensity τik and the
l+r
Tmin ≤ T ∗ ≤ < T k ≤ Tmax , (36) aggregated model wk−1 , client ci trains its model based on its
2 own data and then uploads its updated model and some related
 upper  l + r  parameters to server. The server selects clients randomly for the
τi < τ k. (37)
k
2 next communication round, and predicts/updates its state sk to
ci ∈C
sk+1 . It is worth noting that, in communication round k, the
As T ∗ is the optimal solution to problem P3, there must be an information (sk , ak , rk , sk+1 , done) is stored in the experience
τik , |∀ci ∈ C k } satisfies
allocation { pool D, from which the DRL agent randomly samples minibatch
data to update networks. The detailed process of TFL_HTI can
τik ≤ τiupper (T ∗ ), ∀ci ∈ C k , (38) also be seen in Algorithm 3.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1410 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

Fig. 3. Framework of TFL_HTI.

IV. PERFORMANCE EVALUATION Logs [39]. This dataset contains the bandwidth measurement
In this section, we use simulations to evaluate the effectiveness data of the 4 G network along several routes in and around
the city of Ghent, Belgium, during the period of 2015-12-16
of the proposed scheme.
to 2016-02-04. In our simulation, each client randomly chooses
a route dataset to generate its network bandwidth information.
A. Experiment Setting For fair comparison, we choose two baseline algorithms,
We conduct a FL system based on two publicly available named FedAvg and FedSAE-Fassa. In each communication
image datasets, Fashion-MNIST [37] and CIFAR-10 [38]. The round, FedAvg introduced in [6] assigns the identical train-
Fashion-MNIST dataset (referred to as FMNIST) is an image ing intensity to each client, which indicates that τik = τjk for
dataset consisting of a training set of 60,000 samples and a ∀ci , cj ∈ C k . FedSAE-Fassa was proposed in [20], which ad-
test set of 10,000 samples. Each sample consists of a 28 × 28 justs their affordable training intensities according to clients’
grayscale image paired with a label from one of ten classes. The historical training information. In our simulation, we randomly
CIFAR-10 dataset contains 50,000 color images for training and choose M = 10 clients to take part in FL in each communi-
10,000 color images for testing over 10 different classes. Each cation round. Specially, we randomly choose the total training
image is a 32 × 32 RGB image. intensity from 20 to 50 in both FedAvg and FedSAE-Fassa in
In the simulation, we evenly distribute the training datasets to each communication round, while we set the overall training
N = 100 clients, and thus each client has 600 data samples of intensity based on DRL policy in our TFL_HTI scheme. All
FMNIST and 500 data samples of CIFAR-10. Similar to [24], the simulations are conducted on a Dell server, with Intel Xeon
for the FMNIST dataset, we train a convolutional neural network Silver 4210, 40 logical CPUs, NVIDIA RTX A5000, and run on
(CNN) model with two 5 × 5 convolution layers. The first layer Debian 10 operating system.
has 32 output channels and the second has 64 output channels,
with each layer followed by 2 × 2 max pooling, two fully
B. The Performance of DRL-Based Algorithm
connected layers, and a final output layer with 10 units. For
the CIFAR-10 dataset, we construct a CNN model with two Firstly, we show the convergence of our DRL training. As
5 × 5 convolution layers, 64, 128 channels in sequence, each of shown in Fig. 4, we present the accumulative rewards under two
them is activated by ReLU function and followed by 2 × 2 max learning rates, 5 × 10−6 and 5 × 10−5 . It is observed that the
pooling, three fully connected layers, and a final output layer accumulative reward firstly increases significantly with the train-
with 10 units. ing episodes, but when the training reaches to a certain episodes
Similar to [14], the size of data samples in users’ local dataset (15 episodes with 5 × 10−5 learning rate and 20 episodes with
Di is 5 MB, and the number of CPU cycles of client i consumed learning rate 5 × 10−6 ), the training gradually stabilizes and
to train one data sample is set to be uniformly distributed between reaches convergence.
[10,30] cycles/bit. The CPU cycle frequency fik of client ci We next test the effectiveness of our trained DRL model
ranges consistently from 0.1 GHz to 1.0 GHz. We also set the in improving the effectiveness of FL, by comparing the pro-
local model parameters’ data size  ξ to 5 MB.  For the sake of posed TFL_HTI scheme with and without DRL. Specifically,
ρn hn for TFL_HTI without DRL, we randomly set the total local
simplicity, we set the value of ln 1 + to 1.
N0 iterations at the sever. We now present the accuracy impacted by
In order to simulate a dynamic network environment, we addi- training time, where Tm = 2000 seconds. As shown in Fig. 5, the
tionally employ a real trace dataset called 4 G/LTE Bandwidth test accuracy of each method increases over time. It can also be

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1411

Fig. 7. Performance comparison on per-round time.

Fig. 4. Accumulative reward of DRL agent.

Fig. 8. Performance comparison on training time.

Fig. 5. Performance on test accuracy with training time. shown in Fig. 7, we vary the number of communication rounds
from 0 to 100. It is shown that the training time of FedSAE-Fassa
is smaller than FedAvg in most communication rounds, because
FedSAE-Fassa allocates different affordable training intensities
to clients by using historical information. Owing to the allocation
of the same training intensity in FedAvg, clients with weak
computation capabilities or communication capabilities may
need too much time to finish training. As a result, FedAvg
achieves the worst performance of these three algorithms. The
per-round time of our algorithm is always smaller than that
of FedAvg and FedSAE-Fassa, thanks to our optimal training
intensity allocation.
Fig. 6. Performance on loss with training time.
Secondly, we compare the total training time used during
FL in Fig. 8. It is shown that the total training time of our
proposed method does not considerably increase with the growth
observed that the TFL_HTI algorithm with DRL achieves better
of communication rounds, which further verifies the training
model accuracy than without DRL. This implies that the DRL
efficiency of FL. This is reasonable, as our optimal algorithm
agent can pick up a good strategy for choosing the proper overall
always allocates the local iterations to clients to reduce the
training intensity considering clients’ computation capabilities,
training time of the slowest client in each communication round.
communication capabilities within time limit, i.e., improving
In this way, the next communication round can start earlier.
the accuracy of FL with less time. Similar results can also be
In the second simulation, we evaluate the performance of
found in Fig. 6, and the TFL_HTI scheme with DRL policy can
our optimal allocation algorithm impacted by the number of
converge faster than without DRL, which further confirms the
clients participating in a communication round. Similar to
effectiveness of our DRL-based algorithm.
the first simulation, we set Tm = 2000 s and the same τ k in
the three algorithms. As shown in Figs. 9 and 10, we range the
C. The Performance of Optimal Allocation of Intensities number of clients chosen in a communication round from 10 to
We now test the effectiveness of our optimal local training 40. It turns out that the test accuracy and the training loss of FL
intensity allocation algorithm. model on both FMNIST and CIFAR-10 datasets change little
Besides presenting the accuracy, we also investigate the train- when the number of clients rises. This is possible, as clients’
ing time in each communication round, so as to verify how training datasets are independently identically distributed. In
efficient our algorithm in reducing the waiting latency during addition, the overall training intensity is fixed and clients will
training. For fair comparison, we use the same random policy to be allocated with smaller local iterations with the growth of
determine the overall training intensity in each communication the per-round number of them. Nonetheless, the test accuracy
round, which means τ k is the same in these three algorithms. As achieved by the proposed algorithm is still better than the

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1412 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

Fig. 9. Performance impacted by C k on FMNIST dataset.

Fig. 10. Performance impacted by C k on CIFAR-10 dataset.

Fig. 11. Performance impacted by τ k on FMNIST dataset.

baseline algorithms. It can also be seen that the average train- In the third simulation, we illustrate the impact of the total
ing time of FedAvg algorithm increases significantly when the training intensity on the performance of the proposed allocation
number of clients becomes larger. On the contrary, the average algorithm on both datasets within Tm = 2000 s. We vary the
training time of FedSAE-Fassa scheme does not increase very overall training intensity τ k in Figs. 11 and 12 from 20 to 80, and
fast since FedSAE-Fassa algorithm can predict clients’ afford- fix the per-round selected clients M = 10. Figs. 11 and 12 show
able local training iterations. Additionally, the average training that the performance of our scheme achieves the best test accu-
time of our local iteration allocation algorithm keeps stable with racy and training loss on both FMNIST and CIFAR-10 datasets.
the increasing number of clients participating in a round, which With the growth of the overall training intensity, the model
shows the efficiency and robustness of our strategy. accuracy trained by these three algorithms increase slightly. This

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1413

Fig. 12. Performance impacted by τ k on CIFAR-10 dataset.

Fig. 13. Performance on test accuracy with communication rounds. Fig. 14. Performance on loss with communication rounds.

is possible, as increasing overall intensity will correspondingly


increases clients’ training time in each communication round.
Therefore, FL can perform less global aggregation within time
limit. It verifies the necessity of determining the overall training
intensity according to clients’ computation and communication
capabilities. As shown in Figs. 11(c) and 12(c), the average
training time of multiple rounds of FedAvg and FedSAE-Fassa
increase rapidly with the growth of overall training intensity.
Conversely, the average training time of multiple rounds of
our algorithm is the smallest among these three algorithms. Fig. 15. Performance on test accuracy with training time.
Furthermore, the average training time of the proposed scheme
keeps almost no increase when the overall training intensity in- with FL system. In this way, more global aggregation can be con-
creases, which confirms the time effectiveness of our allocation ducted to get a model with higher accuracy. Similar observations
algorithm. can be found in Fig. 14 that the global model loss trained by our
proposed algorithm decreases the most significantly with the
growth of the communication rounds among all the schemes.
D. The Performance of TFL_HTI Secondly, we study the impact of the training time on the accu-
We now evaluate the performance of our proposed algorithm racy achieved by the proposed TFL_HTI algorithm. As shown in
with both DRL and optimal local iterations allocation. Further- Fig. 15, the proposed TFL_HTI algorithm achieves much better
more, we simulate a dynamic system where clients’ computation accuracy than the baseline algorithms. Futhermore, within the
and communication capabilities change over time. Specifically, same time budget, our scheme can achieve a higher accuracy than
we set the training time budget Tm = 2000 seconds, the number the baseline algorithms. This is reasonable, when compared to
of clients selected in each communication round M = 10. FedAvg and FedSAE-Fassa, the proposed TFL_HTI algorithm
Firstly, we present the test accuracy impacted by the number adaptively chooses the appropriate overall training intensity and
of communication rounds. It is observed from Fig. 13 that the adequately allocates them to clients, based on clients’ capabil-
test accuracy of FL achieved by the TFL_HTI algorithm is the ities and the remaining training time information. Therefore,
best within the time budget under both FMNIST and CIFAR-10 within a specific time period, TFL_HTI can finish more global
datasets. This is possible, as heterogeneous local iterations allo- aggregation than the baseline algorithms. We can also see in
cation strategy can indeed reduce the training time of the slowest Fig. 16 that the loss of global model obtained by our strategy
client. Futhermore, the DRL policy of our proposed algorithm is the least among these three algorithms without exceeding the
can assign an appropriate overall training intensity by interacting time limit under both FMNIST and CIFAR-10 datasets.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
1414 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 2, APRIL 2024

[4] P. Dass, S. Misra, and C. Roy, “T-safe: Trustworthy service provisioning


for IoT-Based intelligent transport systems,” IEEE Trans. Veh. Technol.,
vol. 69, no. 9, pp. 9509–9517, Sep. 2020.
[5] P. Papadimitratos, A. D. La Fortelle, K. Evenssen, R. Brignolo, and
S. Cosenza, “Vehicular communication systems: Enabling technologies,
applications, and future outlook on intelligent transportation,” IEEE Com-
mun. Mag., vol. 47, no. 11, pp. 84–95, Nov. 2009.
[6] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
“Communication-efficient learning of deep networks from decentral-ized
data,” in Proc. Artif. Intell. Statist., 2017, pp. 1273–1282.
[7] “Federated learning: Collaborative machine learning without centralized
training data,” Apr. 2017. [Online]. Available: http://ai.googleblog.com/
Fig. 16. Performance on loss with training time. 2017/04/federated-learning-collaborative.html
[8] P. Kairouz et al., “Advances and open problems in federated learning,”
Found. Trends® Mach. Learn., vol. 14, no. 1/2, pp. 1–210, Jun. 2021.
[9] A. Hard et al., “Federated learning for mobile keyboard prediction,”
Nov. 2018, arXiv:1811.03604.
[10] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W.
Shi, “Federated learning of predictive models from federated electronic
health records,” Int. J. Med. Inform., vol. 112, pp. 59–67, 2018.
[11] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,
“Federated optimization in heterogeneous networks,” in Proc. 1st Adaptive
Multitask Learn. Workshop, Long Beach, CA, USA, Jun. 2019, pp. 1–28.
[12] S. Wang et al., “When edge meets learning: Adaptive control for resource-
constrained distributed machine learning,” in Proc. IEEE Conf. Comput.
Commun., 2018, pp. 63–71, doi: 10.1109/INFOCOM.2018.8486403.
[13] S. Wang et al., “Adaptive federated learning in resource constrained
Fig. 17. Performance on training time with communication rounds. edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 6,
pp. 1205–1221, Jun. 2019, doi: 10.1109/JSAC.2019.2904348.
[14] N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, and C. S. Hong, “Fed-
erated learning over wireless networks: Optimization model design and
analysis,” in Proc. IEEE Conf. Comput. Commun., 2019, pp. 1387–1395,
Finally, we study the training time consumed by our algo- doi: 10.1109/INFOCOM.2019.8737464.
rithm. Specifically, we change the communication rounds from [15] H. Wang, S. Guo, and R. Li, “OSP: Overlapping computation and com-
0 to 100, and the result can be seen in Fig. 17. We can see munication in parameter server for fast machine learning,” in Proc. 48th
Int. Conf. Parallel Process., 2019, pp. 1–10.
that TFL_HTI consumes the least accumulative training time [16] H. Wang, Z. Qu, S. Guo, N. Wang, R. Li, and W. Zhuang, “LOSP: Over-
during the same communication rounds. This is possible because lap synchronization parallel with local compensation for fast distributed
TFL_HTI always allocates local training iterations to clients in training,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2541–2557,
Aug. 2021, doi: 10.1109/JSAC.2021.3087272.
an effort to minimize their largest training time. In this context, [17] B. Luo, X. Li, S. Wang, J. Huang, and L. Tassiulas, “Cost-effective
our algorithm can help with the straggler issue brought by federated learning design,” in Proc. IEEE Conf. Comput. Commun. (IN-
clients’ heterogeneous capabilities. FOCOM), 2021, pp. 1–10.
[18] R. Zeng, C. Zeng, X. Wang, B. Li, and X. Chu, “Incentive mechanisms in
federated learning and game-theoretical approach,” IEEE Netw., vol. 36,
V. CONCLUSION no. 6, pp. 229–235, Nov./Dec. 2022, doi: 10.1109/MNET.112.2100706.
[19] Q. Zhou et al., “Petrel: Heterogeneity-aware distributed deep learning via
This paper studies a time-sensitive FL optimization prob- hybrid synchronization,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 5,
lem, which considers both the heterogeneity of clients in their pp. 1030–1043, May 2021.
[20] L. Li et al., “FedSAE: A. novel self-adaptive federated learning framework
computation/communication capabilities and the training time in heterogeneous systems,” in Proc. IEEE Int. Joint Conf. Neural Netw.,
constraint of FL. We first analyze the convergence bound of 2021, pp. 1–10.
FL with heterogeneous training intensity. Then, we design a [21] J. Zhang et al., “FedAda: Fast-convergent adaptive federated learning in
heterogeneous mobile edge computing environment,” World Wide Web,
DRL-based approach to determine the overall training iterations vol. 25, no. 5, pp. 1971–1998, 2022.
of clients in each communication round. Based on this, we [22] T. Nishio and R. Yonetani, “Client selection for federated learning with
propose an optimal deterministic algorithm to assign the proper heterogeneous resources in mobile edge,” in Proc. IEEE Int. Conf. Com-
mun., 2019, pp. 1–7.
heterogeneous local iterations to clients depending on their [23] Z. Chai et al., “TiFL: A tier-based federated learning system,” in Proc. 29th
training capabilities, so as to reduce the completion time of each Int. Symp. High- Perform. Parallel Distrib. Comput., 2020, pp. 125–136.
communication round. Finally, we conduct simulations on two [24] H. Wang, Z. Kaplan, D. Niu, and B. Li, “Optimizing federated learn-
ing on Non-IID data with reinforcement learning,” in Proc. IEEE
real datasets to validate the efficiency of the proposed scheme. Conf. Comput. Commun., 2020, pp. 1698–1707, doi: 10.1109/INFO-
COM41043.2020.9155494.
REFERENCES [25] J. Liu et al., “Adaptive asynchronous federated learning in resource-
constrained edge computing,” IEEE Trans. Mobile Comput., vol. 22, no. 2,
[1] K. Lueth, “State of the IoT 2018: Number of IoT devices now At 7B- pp. 674–690, Feb. 2023.
market accelerating,” Aug. 2019. Accessed: Aug. 19, 2019. [Online]. [26] C. Xie, S. Koyejo, and I. Gupta, “Asynchronous federated optimization,”
Available: https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018- in Proc. 12th Annu. Workshop Optim. Mach. Learn., 2020, pp. 1–11.
number-of-iot-devices-now-7b/ [27] M. Zeng, X. Wang, W. Pan, and P. Zhou, “Heterogeneous training intensity
[2] M. Chiang and T. Zhang, “Fog and IoT: An overview of research oppor- for federated learning: A deep reinforcement learning approach,” IEEE
tunities,” IEEE Internet Things J., vol. 3, no. 6, pp. 854–864, Dec. 2016. Trans. Netw. Sci. Eng., vol. 10, no. 2, pp. 990–1002, Mar./Apr. 2023.
[3] J. Wang, C. Jiang, Z. Han, Y. Ren, and L. Hanzo, “Internet of vehicles: [28] H. T. Nguyen, V. Sehwag, S. Hosseinalipour, C. G. Brinton, M. Chiang,
Sensing-aided transportation information collection and diffusion,” IEEE and H. Vincent Poor, “Fast-convergent federated learning,” IEEE J. Sel.
Trans. Veh. Technol., vol. 67, no. 5, pp. 3813–3825, May 2018. Areas Commun., vol. 39, no. 1, pp. 201–218, Jan. 2021.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.
PAN et al.: TIME-SENSITIVE FL WITH HETEROGENEOUS TRAINING INTENSITY: A DEEP REINFORCEMENT LEARNING APPROACH 1415

[29] T. Wang, Y. Liu, X. Zheng, H. -N. Dai, W. Jia, and M. Xie, “Edge-based Pan Zhou (Senior Member, IEEE) received the B.S.
communication optimization for distributed federated learning,” IEEE degree in the Advanced Class from the Huazhong
Trans. Netw. Sci. Eng., vol. 9, no. 4, pp. 2015–2024, Jul./Aug. 2022. University of Science and Technology (HUST),
[30] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, “Federated Wuhan, China and the M.S. degree from the De-
learning with compression: Unified analysis and sharp guarantees,” in partment of Electronics and Information Engineering
Proc. Int. Conf. Artif. Intell. Statist., 2021, pp. 2350–2358. from HUST, in 2006 and 2008, respectively. He re-
[31] D. Basu, D. Data, C. Karakus, and S. N. Diggavi, “Qsparse-local-SGD: ceivedthe Ph.D. degree from the School of Electrical
Distributed SGD with quantization, sparsification, and local computa- and Computer Engineering, the Georgia Institute of
tions,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 217–226, May 2020. Technology (Georgia Tech), Atlanta, GA, USA, in
[32] H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial 2011. He is currently a Full Professor and Ph.D.
worker participation in non-iid federated learning,” in Proc. Int. Conf. Advisor with Hubei Engineering Research Center on
Learn. Representations, 2021, pp. 1–11. Big Data Security, School of Cyber Science and Engineering, HUST. He held
[33] Y. Xu, Y. Liao, H. Xu, Z. Ma, L. Wang, and J. Liu, “Adaptive control of honorary degree in his bachelor and merit research award of HUST in his master
local updating and model compression for efficient federated learning,” study. He was a Senior Technical Member with Oracle Inc., Austin, TX, USA,
IEEE Trans. Mobile Comput., vol. 22, no. 10, pp. 5675–5689, Oct. 2023. from 2011 to 2013, and worked on Hadoop and distributed storage system for
[34] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Big Data analytics with Oracle Cloud Platform. He has published more than
Cambridge, MA, USA: MIT Press, 2018. 170 refereed papers in international leading journals and key conferences in the
[35] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- area of security and privacy, Big Data analytics, machine learning, mobile com-
policy maximum entropy deep reinforcement learning with a stochastic puting and networks, including: /ACM/IEEE TRANSACTIONS ON NETWORKING,
actor,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 1861–1870. IEEE TKDE, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUT-
[36] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft-actor-critic: Off- ING, TIFS, IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE
policy maximum entropy deep reinforcement learning with a stochastic TRANSACTIONS ON MOBILE COMPUTING, TPDS, TIT, IEEE TRANSACTIONS ON
actor,” in Proc. 35th Int. Conf. Mach. Learn., Stockholm, Sweden, 2018, IMAGE PROCESSING, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, IEEE
pp. 1–10. TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, TCOMP,
[37] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel image IEEE TRANSACTIONS ON MULTIMEDIA, TII, TAI, IEEE TRANSACTIONS ON
dataset for benchmarking machine learning algorithms.” 2017. [Online]. COMMUNICATIONS, IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS,
Available: https://arxiv.org/abs/1708.07747 IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, IEEE TRANS-
[38] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from ACTIONS ON VEHICULAR TECHNOLOGY, IEEE TRANSACTIONS ON EMERGING
tiny images.” Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009. TOPICS IN COMPUTATIONAL INTELLIGENCE, ICDE, INFOCOM, CVPR, ICCV,
[39] J. van der Hooft et al., “4 G/LTE bandwidth logs.” 2016. [Online]. ICDCS, ICPP, ACM MM, TOS, TKDD, AAAI, IJCAI, NAACL, COLING,
Available: http://users.ugent.be/jvdrhoof/dataset-4g/ PoPETs/PETS, SECON, CIKM, and ECAI. He was the recipoent of the Rising
[40] D. Lopez-Perez, A. Valcarce, G. de la Roche, and J. Zhang, “OFDMA Star in Science and Technology of HUST in 2017. He is an Associate Editor
femtocells: A roadmap on interference avoidance,” IEEE Commun. Mag., with IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING.
vol. 47, no. 9, pp. 41–48, Sep. 2009, doi: 10.1109/MCOM.2009.5277454.

Weijian Pan received the B.E. degree in computer


science and technology from the South China Univer-
sity of Technology, Guangzhou, China, in 2021. He is
currently working toward the master’s degree with the
School of Computer Science and Engineering, South
China University of Technology, Guangzhou, China.
His research interests include federated learning, mo-
bile crowdsensing, and edge computing.

Xiumin Wang received the B.S. degree from the


Department of Computer Science, Anhui Normal Weiwei Lin (Member, IEEE) received the B.S. and
University, Wuhu, China, in 2006 and the Joint Ph.D. M.S. degrees from Nanchang University, Nanchang,
degree from the University of Science and Tech- China, in 2001 and 2004, respectively, and the Ph.D.
nology of China, Hefei, China and City University degree in computer application from the South China
of Hong Kong, Hong Kong. She is currently an University of Technology, Guangzhou, China, in
Associate Professor with the School of Computer 2007. He is currently a Professor with the School
Science and Engineering, South China University of of Computer Science and Engineering, South China
Technology, Guangzhou, China. She was a Postdoc University of Technology. His research interests in-
Research Fellow with the Singapore University of clude distributed systems, cloud computing, Big Data
Technology and Design, Singapore, from 2011 to computing, and AI application technologies. He has
2012. Her research interests include Internet of Things, edge computing, and published more than 100 papers in refereed journals
machine learning. and conference proceedings. He is a Senior Member of CCF.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on June 19,2024 at 06:24:29 UTC from IEEE Xplore. Restrictions apply.

You might also like