0% found this document useful (0 votes)
5 views13 pages

Efficient and Intelligent Multi-Job Federated Learning in Wireless Networks

The article discusses a novel framework for multi-job federated learning (FL) in wireless networks, named EffI-FL, which aims to optimize training efficiency by minimizing latency, energy consumption, and switching costs. It addresses challenges such as client heterogeneity, dynamic environments, and battery constraints, proposing innovative methods like block-wise client selection and multi-armed bandit techniques to enhance performance. Experimental results demonstrate that EffI-FL significantly outperforms existing FL frameworks, achieving reductions in latency and energy costs by up to 52.3%.

Uploaded by

junmei24.chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Efficient and Intelligent Multi-Job Federated Learning in Wireless Networks

The article discusses a novel framework for multi-job federated learning (FL) in wireless networks, named EffI-FL, which aims to optimize training efficiency by minimizing latency, energy consumption, and switching costs. It addresses challenges such as client heterogeneity, dynamic environments, and battery constraints, proposing innovative methods like block-wise client selection and multi-armed bandit techniques to enhance performance. Experimental results demonstrate that EffI-FL significantly outperforms existing FL frameworks, achieving reductions in latency and energy costs by up to 52.3%.

Uploaded by

junmei24.chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in IEEE Internet of Things Journal.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 1

Efficient and Intelligent Multi-Job Federated


Learning in Wireless Networks
Jiajin Wang, Ne Wang, Ruiting Zhou, Member, IEEE and Bo Li, Fellow, IEEE

Abstract—Federated learning (FL) has emerged as an in- FL facilitates collaborative model training across distributed
novative paradigm designed to protect privacy by enabling clients (e.g., sensors, smartphones, tablets) without the need
collaborative machine learning (ML) model training across mul- to transfer raw data to a centralized server. Consequently, FL
tiple data owners (also known as clients) without the need to
access clients’ raw data. The majority of existing FL research is particularly suitable for applications where data privacy is
concentrates on scenarios where a single job necessitates training. paramount, such as healthcare, finance, and the Internet of
In practical applications, multiple FL jobs can simultaneously Things (IoT). Most existing research in FL has focused on the
undergo training using a common pool of clients, a scenario monopoly scenario, where a single global model (associated
known as multi-job FL. However, the problem of FL training with with an FL job) is trained over a shared pool of clients that
multiple jobs remains open and presents significant challenges of
the escalated heterogeneity of jobs and clients, complex trade-offs possess data from the same domain [2], [3], [4], [5]. However,
between training latency and energy consumption, uncertainty in practice, clients may collect diverse data, including images,
of client quality, and potential linear switching cost associated text, and speech, for training jobs such as image classification,
with client selection. This work aims to jointly optimize training text generation, and speech recognition [6]. For instance,
efficiency in terms of latency, energy consumption, and switching smartphones can interact with users using voice assistants
cost for multiple jobs in stochastic and dynamic environments.
Specifically, we propose a novel multi-job FL framework, named (speech recognition), categorize images based on various at-
EffI-FL, incorporating three innovative designs. (i) To reduce tributes such as location or individual (image classification),
switching cost, we extend the client selection interval from every and offer intelligent keyboard search suggestions (text gener-
round to multiple rounds, called a block, within which client ation). Moreover, LG’s service robots [7], widely deployed
subset switching is prohibited. (ii) We employ multi-armed bandit in hotels, airports, and shopping malls, utilize cameras to
(MAB) methods to measure clients’ latency and energy cost under
uncertainty. Additionally, we utilize the virtual queue technique recognize guests, respond to voice commands, and provide
to trace clients’ battery usage patterns. By integrating the above text information on their screens. This reality motivates the
client-side knowledge, we propose an adaptive client selection exploration of multiple concurrent jobs competing for clients,
policy aimed at balancing latency, energy consumption, and known as multi-job FL [8].
battery condition. (iii) Given that multiple jobs may compete In this work, we focus on the efficiency optimization
for the same client, we devise a greedy algorithm to assign each
client to a single job. We rigorously prove that the regret of our (encompassing training latency, energy consumption, and other
client selection policy and the cost of our block-wise client subset additional cost) of multi-job FL systems in wireless networks.
switching algorithm are both sublinear. Finally, we implement However, the problem of FL training with multiple jobs
EffI-FL using PyTorch and conduct experiments demonstrating remains open [6], [9] and presents significant challenges. First,
that EffI-FL reduces the weighted sum of latency, energy in a multi-job FL system, the diversity of heterogeneity is more
consumption, and switching cost by up to 52.3% compared to
four state-of-the-art FL frameworks. pronounced compared to a single-job scenario. This increased
diversity arises from the heterogeneity exhibited by both
Index Terms—Multi-job federated learning, multi-armed ban- jobs and clients. In multi-job FL systems, clients can serve
dit, client selection, training efficiency, switching cost.
multiple jobs, leading to a complex array of heterogeneous
combinations. In contrast, a single-job FL system, where
I. I NTRODUCTION only clients exhibit heterogeneity, is considerably simpler.
These combinations of diverse heterogeneity complicate client
I N recent years, federated learning (FL), a distributed
machine learning paradigm, has garnered considerable
attention due to its potential for preserving privacy [1].
selection, a commonly used technique to enhance training
efficiency by carefully selecting a subset of “good” clients
(i.e., those with low latency and energy consumption in our
J. Wang is with the School of Computer Science and Technology, Anhui work) to participate in training.
University, Hefei, China. E-mail: [email protected]. Second, the dynamic and stochastic execution environments
N. Wang and B. Li are with the Department of Computer Science and
Engineering, Hong Kong University of Science and Technology, Hong Kong. of clients and communication networks pose challenges in
E-mail: [email protected], [email protected]. estimating the goodness of clients [10]. Typically, clients run
R. Zhou is with the School of Computer Science Engineering, Southeast not only FL jobs but also many other applications competing
University, Nanjing, China. E-mail: [email protected].
Corresponding Author: Ne Wang for computational resources, causing uncertainties in com-
Manuscript received 3 July 2024; revised 30 September 2024; accepted 15 putation latency and energy consumption [11]. Additionally,
November 2024. the wireless network is inherently dynamic [12], resulting
Copyright (c) 2024 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be in unpredictable transmission latency and energy cost. These
obtained from the IEEE by sending a request to [email protected]. uncertainties escalate the difficulty of client selection on the

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 2

fly, as the actual goodness of clients cannot be observed prior the impact of switching cost on client selection, which has
to decision-making. not been addressed in any related work in single- or multi-job
Third, due to the limited resources of clients, each client FL systems.
can only participate in the training of one job at a time, which To this end, we introduce an Efficient and Intelligent
is a common assumption in existing literature [6], [8], [9], client selection framework for multi-Job Federated Learning
[13]. We compute the memory footprint required to store the (EffI-FL) in wireless networks, with the goal of minimizing
datasets MNIST, CIFAR-10, and KDDCup99 as approximately latency, energy consumption, and switching cost under long-
209 MB, 146 MB, and 784 MB, respectively. As the number term battery constraints. For each client selection, we analyze
of jobs or the size of datasets grows (e.g., Federated-EMNIST that latency and energy consumption depend on the selected
[14] and Federated-CelebA [15]), the memory capacity of clients themselves, while switching cost are solely related to
resource-limited mobile devices can be rapidly exhausted, the number of newly selected clients. Given the different scales
leading to potential device freezing or even crashes. Therefore, of latency and energy consumption versus switching cost,
the job’s dataset is retained in a device’s memory only when we propose to decouple the optimization of the former two
the device is selected to continue training the job in subsequent factors from the latter. Specifically, to avoid potential linear
rounds; otherwise, the dataset needs to be offloaded to disk switching cost concerning training time (i.e., communication
and reloaded into memory when the device is selected again. rounds), we group communication rounds into frames, each
The time overhead of loading datasets is called switching cost. containing multiple equal-sized blocks (with each block in-
Moreover, we conduct a case study by training three jobs cluding multiple rounds). Notably, the size of blocks increases
detailed in Sec. V on a laptop with an Intel(R) Core(TM) at a polynomial rate from frame to frame, resulting in a
i7-1165G7 CPU and 16.0 GB of RAM, observing that the sublinear cumulative switching cost. Subsequently, we extend
averaged elapsed times for loading datasets and performing a multi-armed bandit (MAB) technique, specifically the upper
local training are 2.523s and 4.929s for MNIST, 2.611s and confidence bounds (UCB) method, to address uncertainties
6.598s for CIFAR-10, 1.131s and 3.694s for KDDCup99, in estimating clients’ latency and energy consumption. In
respectively. The switching cost accounts for 30.61%−51.18% addition, we employ a virtual queue to monitor the battery
of the time spent on local training, demonstrating that it cannot usage of each client. Utilizing the aforementioned estimates
be negligible when optimizing training efficiency. of dynamic variables, we design a pessimistic-two-optimistic
Last but not least, a client’s battery capacity significantly client selection (P2OCS) policy, which integrates two short-
impacts its duration of engagement in FL training. When term gains in latency and energy consumption with a long-
the same client serves different jobs, its battery consumption term battery constraint. Finally, we devise a greedy algorithm
exhibits heterogeneity, which is crucial for optimizing training to resolve the issue of multiple jobs competing for the same
efficiency. For instance, a client might be optimal for one client. Our primary contributions are summarized as follows.
job in terms of latency and energy consumption but will • For the first time, we investigate the optimization problem of
deplete its battery rapidly, while it may be a suboptimal jointly minimizing latency, energy consumption, and switch-
participant for another job, consuming less battery. Assigning ing cost when selecting clients in multi-job FL scenarios.
the client to the first job yields short-term gains (i.e., low This enables a more holistic efficiency improvement for
latency and energy cost at present); however, this immediate multi-job FL systems.
benefit is unsustainable once the client’s battery is depleted. • We first design a block-wise client subset switching method
Thus, it is essential to consider battery capacity constraints to reduce cumulative switching cost during training. Next,
when selecting clients, thereby balancing short-term gains with we propose a double UCB algorithm to measure both the
long-term potential benefits in strategic decision-making. iteration latency and energy consumption of each client.
There is a substantial body of work in single-job FL Moreover, leveraging virtual queue techniques, we integrate
focusing on client selection, considering various factors such battery usage and these properties into a compound metric
as latency [16], [17], [12], [5], [11], energy consumption [3], to evaluate client quality, which is used to select client
[18], data quality [17], [19], [5], fairness [9], [5], robustness subset for each job. Finally, we introduce a tailor-made
[20], and more. However, multi-job FL systems cannot directly greedy algorithm to resolve conflicts among multiple jobs
apply these algorithms due to their inherent heterogeneity. competing for the same client. Together, these components
A limited number of studies have emerged exploring client constitute our framework, EffI-FL.
selection in multi-job FL systems. For instance, [8] aimed • We prove that EffI-FL ensures vanishing constraint vi-
to achieve fairness while ensuring fast convergence. [21] and olation over time, achieves sublinear cumulative switching
[22] matched jobs with clients based solely on latency. The cost, and guarantees sublinear regret in latency and energy
authors in [6] incorporated latency and fairness into a unified consumption.
optimization objective. [23] explored to optimize latency under • Comprehensive evaluations show that EffI-FL can reduce
long-term budget constraints. [9] encouraged edge servers to the switching cost by up to 87.7% and deliver up to 52.3%
adjust job payments to compete for clients while ensuring reduction in the weighted sum of total latency, energy
fair client selection. None of the above studies investigated consumption, and switching cost compared to state-of-the-
a problem akin to ours, namely jointly optimizing latency art methods, including HierFAVG [24], MetaGreedy [6],
and energy consumption under long-term battery capacity con- EACS-FL [23], and MCS [22].
straints in stochastic environments. Moreover, we examined The remainder of our paper is organized as follows. Sec. II

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 3

reviews the related work. We formulate our joint optimization amidst varying computation and communication latency using
problem in Sec. III. The efficient and intelligent multi-job FL MAB techniques. However, these studies primarily focus on a
framework is presented in Sec. IV. Extensive evaluations are single varying client property, whereas our approach integrates
conducted to validate the performance of EffI-FL in Sec. V. both stochastic latency and energy consumption considera-
Sec. VI concludes our work. tions. Furthermore, we incorporate the impact of switching
cost in client selection, leading to a more comprehensive
optimization of training efficiency.
II. R ELATED W ORK
Comparison with Related Work. Our work distinguishes
Client Selection in FL. Client selection in single-job FL itself from previous studies in several key aspects. First, we
has undergone extensive research in recent years. This body focus on multi-job FL, which exhibits escalated heterogeneity
of work encompasses various optimization objectives and compared to single-job FL systems. Second, we consider both
constraints, including latency [16], [17], [12], [5], [11], energy latency and energy consumption of clients, in addition to
consumption [3], [18], data quality [17], [19], [5], fairness [9], long-term battery constraints. Existing 0-lookahead methods
robustness [20], short-term budget constraints [19], and long- in single- and multi-job FL typically focus on one of these
term fairness constraints [5]. Moreover, several client selection factors, and may accompanied by long-term constraints. Note
policies have been proposed, categorized as either 1-lookahead that developing algorithms and conducting theoretical analyses
(where all inputs are known before decisions) or 0-lookahead. involving multiple factors is complex and differentiate our
For the former, Kim et al. [3] explored joint optimization work from existing bandit methods and FL research. Third,
of latency and energy consumption assuming fixed CPU none of the existing studies account for the switching cost
frequency and transmission rates a priori, whereas our focus is incurred by client selection. These distinctions drive us to
on 0-lookahead scenarios. We now delve into studies centered propose a novel multi-job framework that effectively addresses
on 0-lookahead. Lai et al. [17] integrated client latency and these challenges.
data quality into a compound metric influenced by stochastic
training environments. They proposed leveraging Multi-Armed III. S YSTEM M ODEL
Bandit (MAB) methods to learn real-time compound values for
client selection. Wang et al. [11] similarly addressed latency A. System Overview
and data quality using UCB methods. Both [5] and [12] Download global models Central Server
Global models
aimed to minimize training latency while meeting long-term
Conduct local training
fairness constraints. Mao et al. [25] focused on heterogeneous Upload local updates
Local models

client selection and resource allocation in wireless networks, Datasets


Aggregate updates
employing deep reinforcement learning to adaptively select
participants based on varying client conditions. However, these
Edge
studies are tailored for single-job FL systems and cannot be Servers
directly applicable to our targeting multi-job FL with high
diverse heterogeneity.
Multi-Job FL. Multi-task FL [26] and clustered FL [11]
have been proposed to train multiple models across clients
with non-Independent and Identically Distributed (non-IID)
data. Unlike these scenarios where models share common
components, multi-job FL systems independently train mul-
tiple models corresponding to different jobs. This has led Clients
to the development of several studies in this area. Zhou et
Fig. 1: Multi-Job Federated Learning over Wireless Networks.
al. [8] introduced the concept of multi-job FL and proposed
two scheduling methods based on reinforcement learning and Multi-Job Federated Learning. We consider a multi-job
Bayesian optimization, respectively, aiming to select high- FL system composed of a central server, J edge servers,
performing clients while ensuring data fairness. Liu et al. [6] and I clients, as depicted in Fig. 1. In this system, various
devised an intelligent approach to engage low-latency clients FL jobs, such as image classification, speech recognition,
in training to enhance convergence, integrating data fairness and text generation [6], are delegated to network operators
into client selection. Pang et al. [13] focused on designing (a.k.a., job publishers) located at edge servers for concurrent
incentive mechanisms to encourage clients with high utility to processing. We assume there are J jobs, with each edge server
participate in model training. Chen et al. [21] developed a low- responsible for training a single job. Hence, we will use the
latency client selection policy based on matching theory. Shi terms jobs and edge servers interchangeably hereafter. Each
et al. [9] jointly considered client demands and job payment job needs to select M clients within its coverage area to
bids to optimize client allocation while preventing prolonged train its model. The central server is responsible for making
waiting and ensuring fairness. However, these studies predomi- client selection decisions for all jobs, which are subsequently
nantly operate under 1-lookahead scenarios. There is relatively broadcast to all edge servers. Due to the little information
less research focusing on 0-lookahead scenarios in multi- exchange between the central server and edge servers, we
job FL. [22] and [23] explored optimizing training efficiency disregard the communication time involved. We introduce [X]

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 4

to represent the integer set 1, 2, ..., X , where X is any non- Latency of Local Training. Let fi (t) be the CPU frequency
negative integer. Additionally, each client i ∈ [I] possesses J of client i in round t, then the time for client i to conduct local
local datasets (e.g., images, speech, etc.) corresponding to the training for job j is
J jobs, without loss of generality. Consequently, a client can |Dj,i |Vj,i Kj,i
µLU
j,i (t) = , (2)
undertake all jobs. However, due to limited resources, each fi (t)
client can only participate in the training of one job at a
where Vj,i is the number of CPU cycles required for processing
time, a common assumption in existing literature [8], [6], [13],
one sample in dataset Dj,i , and Kj,i is the number of local
[9]. In other words, these jobs will compete with each other
iterations.
to access clients during their model training processes. The
Latency of Model Transmission. The first and third steps
goal of multi-job FL is to learn the respective optimal model
of the training process in a round encompass the downloading
parameters ωj for each job j , which is formulated as follows:
and uploading of models. The model size for job j remains
X X |Dj,i | constant and is represented by qj . Denote wj,i (t) as the bidi-
min Lj,i (ωj ), (1) rectional communication bandwidth between client i and edge
{ωj }j∈[J] |Dj |
j∈[J] i∈[I] server j during round t. Furthermore, we utilize Shannon’s
equation to characterize the state of wireless network channels:
oi (t)pi (t) 
where Dj,i is the dataset of job j on client i and Dj = log2 1 + N 0 wj,i (t)
, where pi (t), oi (t), and N0 denote the
∪i∈[I] Dj,i denotes the whole dataset of job j . Then Lj,i (ωj ) , transmission power, the wireless channel gain, and the noise
power density, respectively. Thus, the transmission latency can
P
ξj,i ∼Dj,i `(ωj ; ξj,i ) is the loss function associated with Dj,i ,
`(ωj ; ξj,i ) is the loss of training sample ξj,i . be expressed as
Multi-Job FL Training Process. To achieve the stated goal, 2qj
µTS
j,i (t) = oi (t)pi (t) 
. (3)
each job needs to iterate over T communication rounds until wj,i (t) log2 1 + N0 wj,i (t)
its corresponding model converges to a desired accuracy. Next, Iteration Latency. To sum up, we call the latency of client
we outline the procedure of a single round t ∈ [T ] within the i employed by job j in round t iteration latency, which can
multi-job FL training framework as follows. be computed as
i) Client Selection. In each round, each job j first ran- µj,i (t) = µLU TS
j,i (t) + µj,i (t). (4)
domly selects a subset Sj (t) of clients with fixed number
|Sj (t)| = M . In scenarios where a client i lies within the For each job j , the latency of one round t is determined
overlapping coverage area of multiple edge servers, it is by the the slowest participant among Sj (t) and thus can be
restricted to communicate with only one edge server. We denoted as follow.
use a binary variable xj,i (t) ∈ {0, 1} to denote whether
µj (t) = max {µj,i (t)}. (5)
client i is selected by job j in round t (xj,i (t) = 1) or not i∈Sj (t)
P
(xj,i (t) = 0). Consequently, we have j∈[J] xj,i (t) = 1.
ii) Global Model Download, Local Training, and Local Up- C. Energy Model
date Upload. Then, each client i pulls the corresponding
global model ωj (t) from edge server j (when xj,i (t) = 1) In addition to training latency, we also consider energy
and updates the model using its local dataset Dj,i via the consumption of multi-job FL. We next analyze the energy
Stochastic Gradient Descent (SGD) method. Next, client models of local training and model transmission.
i pushes the model updates ωj,i (t) − ωj (t) to edge server Computation Energy. In round t, client i serving for job
j , where ωj,i (t) is the updated local model. j requires consuming |Dj,i |Vj,i Kj,i CPU cycles, then the
iii) Model Aggregation. For each edge server j , after received corresponding energy consumption is κ|Dj,i |Vj,i Kj,i (fi (t))2
all the local updates from selected clients, it conducts based on Lemma 1 in [27], where κi is the effective switched
model aggregation according to Eq. ωj (t + 1) = ωj (t) + capacitance. That is, the total computation energy incurred by
P 1 1
P client i for serving job j is
i∈Sj (t) |Sj (t)| (ωj,i (t) − ωj (t)) = |Sj (t)| i∈Sj (t) ωj,i (t). 2
So far, one communication round ends. LU
j,i (t) = κi |Dj,i |Vj,i Kj,i (fi (t)) . (6)

Transmission Energy. The energy cost of client i to down-


load global models and upload local updates of size qj within
B. Computation and Transmission Model
a time duration µTS
j,i (t) is

In this paper, we focus on improving the training efficiency 2pi (t)qj


TS
j,i (t) = oi (t)pi (t) 
. (7)
of multi-job FL systems. According to Sec. III-A, the latency wj,i (t) log2 1 + N0 wj,i (t)
of one communication round are jointly determined by four Therefore, the energy cost of client i for serving job j is
steps: global model download, local training, local update TS
j,i (t) = LU
j,i (t) + j,i (t). (8)
upload, and model aggregation. The time required for the last
step is constant, since each server is dedicated to one job and
In summary, the total energy cost of job j in round t is
has constant workload from a fixed size of client subset [11]. X
Therefore, we focus on the dynamic time incurred by the first j (t) = j,i (t). (9)
i∈Sj (t)
three steps associated with clients.

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 5

D. Switching Cost Model X X


min {µj (t) + η1 j (t) + η2 ψj (t)} (11)
Switching Cost. Switching cost manifests in various forms
t∈[T ] j∈[J]
within the wireless networking literature, such as the time
required to start up a new container [28], the handover cost subject to:
X
for a base station to connect a user [29], and so on. In wire- xj,i (t) = M,∀t, ∀j ∈ [J] (11a)
less networks, the signal quality and susceptibility of clients i∈[I]

fluctuate when interference arises. Base stations may initiate


X
xj,i (t) = 1,∀t, ∀i ∈ [I] (11b)
channel switching to maintain communication quality when j∈[J]
connected clients switch [30], which involves reconfiguring X X
bj,i (t)xj,i (t) ≤ Bi , ∀i ∈ [I] (11c)
communication channels, managing handovers, and optimizing t∈[T ] j∈[J]
network resources. As elaborated in Sec. I, the time required xj,i (t) = 0, ∀j ∈ [J],t ∈ [T ], i ∈
/ Aj (t) (11d)
to load datasets can be interpreted as the switching cost in xj,i (t) ∈ {0, 1}, ∀j ∈ [J],∀i ∈ [I], t ∈ [T ]. (11e)
multi-job FL systems. The unit switching cost is associated
with both jobs and clients. In this work, we assume that the Constraint (11a) imposes a limit on the maximum number
switching cost is identical across all clients for the same job, of clients that can be selected by a job in a single round,
denoted by Cj . This assumption can be realized by having all denoted as M . Constraint (11b) ensures that each client can be
clients adopt the maximum switching cost value for a given selected by only one job in any given round. Constraint (11c)
job. The scenario involving heterogeneous client switching guarantees that the cumulative battery usage of each client
cost remains for future work. For job j , the switching cost does not exceed its capacity, where bj,i (t) > 0 represents the
is measured by the number of clients newly selected, denoted battery consumption of client i serving job j in round t, and
by ∆j (t) = |Sj (t) − Sj (t − 1)|, ∀t > 1. In particular, ∆j (1) = 0. bj,i (t) = 0 if i is not selected by job j in round t. For each
Thus, the switching cost for job j is edge server j , the set Aj (t) of clients within its coverage can
ψj (t) = Cj ∆j (t). (10) be observed at the beginning of round t. For any client i that
is not included in Aj (t), it cannot be selected by edge server
Table I summarizes all the important notations. j in round t (xj,i (t) = 0), guaranteed by constraint (11d).
TABLE I. Notations
Symbol Description F. Challenges
[X] integer set {1, 2, ..., X}
i, j, t indices for clients, jobs, and rounds Solving problem (11) is non-trivial given these challenges
I, J, T number of clients, jobs, and rounds below.
Cj unit switching cost of job j • Stochastic Environments. As detailed in Sec. I, fluctua-
Bi battery capacity of client i
bj,i (t) battery consumption of i serving j in t tions in computation and transmission rates render the ac-
µj,i (t), j,i (t) iteration latency & energy cost of i for job j in t tual iteration latency and energy cost unpredictable until
Ej,i (t), Uj,i (t) observed energy & latency reward of i in t the end of a round. Unlike deterministic 1-lookahead sce-
ej,i , uj,i means of random variable Ej,i (t), Uj,i (t)
j,i (t), µj,i (t) average energy & latency reward of i for j up to t
narios, our 0-lookahead problem involves irrecoverably
ˆj,i (t), µ̂j,i (t) UCBs of energy & latency reward of i for j in t selecting clients with unknown and stochastic iteration
hj,i (t) # of times client i has been selected by j up to t latency and energy cost, difficult to tackle. Furthermore,
(t)
ωj,i local model weights of client i for job j in t due to client and job heterogeneity, achieving a trade-off
M # of clients selected by a job in a round between these properties complicates decision-making.
Sj (t) client subset selected by job j in round t
• Switching Cost. Besides minimizing iteration latency and
Aj (t) set of clients available for job j in round t
xj,i (t) whether client i is selected by job j in round t energy cost, optimizing switching cost is crucial for our
optimization objectives. Unlike iteration latency and en-
ergy cost, switching cost hinge on decisions made by edge
E. Problem Formulation servers across two consecutive rounds, irrespective of
Our focus lies in identifying efficient client selection poli- client characteristics. This distinction introduces varying
cies for J jobs, with the objective of minimizing the cu- growth rates between switching cost and client-related
mulative training latency, energy consumption, and switching cost (i.e.., iteration latency and energy cost), complicating
cost incurred. Specifically, our optimization objective is to client selection strategies.
P P
minimize t∈[T ] j∈[J] µj (t) + η1 j (t) + η2 ψj (t), where η1 , η2 • Battery Capacity Constraints. Given the uncertain nature
represent user-defined weight factors balancing the importance of client properties, imprudent client selections by edge
of training latency, energy consumption, and switching cost. servers can lead to battery inefficiencies. Excessive waste
Given the different scales of µj (t), j (t), and ψj (t), we utilize risks early departure of high-quality clients (i.e., those
Min-Max Scaling II(·) to normalize these metrics within [0, 1], with low iteration latency and energy cost), thereby jeop-
respectively. To simplify notation, we assume µj (t), j (t), and ardizing training efficiency. Moreover, we consider non-
ψj (t) are normalized unless specified otherwise, omitting II(·). renewable battery resources. Consequently, one needs to
Next, considering the above constraints on client selection negotiate between short-term training efficiency bene-
and resource capacity, we define our optimization problem for fiting from high-quality clients and delayed efficiency
multi-job FL as effects incurred by long-term battery capacity constraints.

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 6

Moreover, heterogeneous battery capacities among clients exploitation (i.e., making decisions based on current
add complexity to optimizing total latency and energy empirical evidence) and exploration (i.e., exploring poten-
consumption while adhering to these constraints. tially optimal decisions that are less frequently sampled).
Inspired by the UCB policy, we first compute confidence
IV. A LGORITHM D ESIGN AND A NALYSIS intervals that encompass the actual values of iteration
latency and energy cost. Then, embracing an optimistic
A. Algorithm Idea outlook amidst uncertainty, we utilize the upper bounds
By carefully integrating MAB techniques [31] and the of these confidence intervals as estimates for iteration
Lyapunov optimization method [32], we introduce an effi- latency and energy cost.
cient and intelligent client section framework for multi-job Furthermore, each client is equipped with a virtual queue
FL, called EffI-FL, to solve the problem in (11) and to monitor its battery constraint violations. Clients with
meet the challenges detailed in Sec. III-F. First, we propose minor violation indicate sufficient battery reserves and
conducting client selection per block (comprising multiple are deemed suitable for continued participation in sub-
rounds) instead of per individual round. This strategic shift sequent training rounds. Conversely, significant violation
reduces the frequency of client switches, thereby minimizing suggests imminent battery depletion. By using cumulative
associated cost. Consequently, client selection can focus solely constraint violation as a conservative estimate of battery
on client characteristics, thereby decoupling the optimization health, we carefully select clients for jobs. Consequently,
of switching cost from that of client-related cost. Second, given we compute a composite measure of client quality that
the stochastic computation and communication environments, incorporates iteration latency, energy cost, and constraint
we introduce one of MAB techniques, UCB, to online learn violation, and then use a greedy approach to select clients
actual iteration latency and energy cost of clients. Existing based on their estimated quality. Moreover, in the context
UCB-based algorithms either learn a single property [5], [12] of multi-job FL, some clients may reside in areas where
or a compound one integrated by multiple properties [17] multiple edge servers overlap, allowing them to serve any
for each client. Recognizing that iteration latency and energy available edge server. For these clients, we implement a
consumption are independent and their intrinsic distributions deterministic rule to ensure they select only one edge
are naturally different, we extend the traditional UCB approach server, thus avoiding conflict.
by devising a double UCB policy to estimate these two client
properties independently. We then introduce a user-defined
B. The EffI-FL Algorithm
weight factor to integrate them into a single compound metric,
allowing for a flexible trade-off between them when evaluating Blocks and Frames. We group T communication rounds
client quality. Third, we employ the virtual queue method into multiple frames indexed by k = 0, 1, 2, .... Furthermore,
to monitor and regulate client battery consumption, namely each frame k is divided into multiple blocks indexed by
tracing constraint violation (i.e., actual battery usage minus a = 0, 1, 2, ..., ak , each block containing k rounds. In particular,
excepted volume) for each client. By integrating constraint frame k = 0 includes dI/M e blocks (later we will explain
violation with client properties, we design the pessimistic-two- how dI/M e is derived in the algorithm design subsection.),
optimistic client selection (P2OCS) policy to conduct efficient and each frame k > 0 contains d k4 − (k − 1)4 /ke blocks.
training for multi-job FL systems. Finally, we prove that the For ease of description, we introduce two auxiliary symbols
proposed P2OCS policy achieves sublinear regret, which is a t1st fin
k,a and tk,a to denote the first and final rounds in block
desired theoretical guarantee compared to other state-of-the- a of frame k. Next, each edge server can switch selected
art bandit-based studies [30], [31], [33]. The main components client subset in t1st 1st
k,a of block a and frame f , i,e, Sm (tk,a ) =
1st fin fin
of EffI-FL are as follows. S (t + 1) = · · · = Sm (tk,a ), ∀a, k. And tk,ak = dI/M e +
Pmk k,a 4
− (κ − 1)4 /κeκ.

i) Block-Wise Switching. To achieve a sublinear cumulative κ=1 d κ
switching cost over the course of training, we adopt Optimistic Estimates on Iteration Latency and Energy
a hierarchical approach. We group rounds into frames, Cost. In each block, every job is tasked with selecting a
each composed of multiple uniformly-sized blocks, with subset of clients to participate in its model training, aim-
the number of rounds per block increasing polynomially ing to optimize efficiency. However, achieving this optimal
across successive frames. Moreover, we limit client subset subset selection is challenging due to uncertainties stem-
switches exclusively to the initial round of each block. ming from stochastic computation and communication envi-
Consequently, the number of client subset switches per ronments, which affect the quality of clients. To tackle this
job evolves as a sublinear function relative to the total challenge, we leverage MAB techniques for online learning
number of rounds. of client quality under uncertainty. We reformulate the client
ii) Pessimistic-Two-Optimistic Client Selection. In each selection problem in multi-job FL systems into a multi-player
round, we devise a double UCB algorithm that integrates MAB (MPMAB) problem. In the classic MPMAB, there are
optimistic estimates for both iteration latency and energy multiple players and a set of arms. Each player can pull an
cost, complemented by a pessimistic estimate on con- arm or a subset of arms at a time, and then receives a feedback
straint violation, to strategically select a subset of clients (a.k.a., reward) reflecting the quality of this pull. The goal is
for each job. A fundamental challenge in decision-making to maximize cumulative rewards over a defined time period,
under uncertainty lies in striking a balance between where the means ej,i , uj,i of rewards associated with iteration

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 7

latency and energy cost remain initially unknown. We proceed hP


dI/M e
i+
Particularly, Qi (t1st 1st
0,0 ) = 0 and Qi (t1,0 ) = τ =1 $i (τ ) .
by outlining how we estimate ej,i , uj,i and derive our client
By expanding Eq. (14), we have
selection policy accordingly.
E[Qi (t1st fin
k,a )] = E[Qi (tk,a−1 ) + 1]
Our client selection scheme is an extension of the classi-  +
tfin
cal UCB policy, which strikes a balance between exploita- h i k,a−1
X
= E Qi (t1st k,a−1 ) + E [$i (τ )] + δk 
 
tion and exploration. Let hj,i (t) be the times of client i
has been selected by job j until round t, then we have τ =t1st
k,a−1
Pt
hj,i (t) , τ =1 1{i ∈ Sj (τ )}. As a result, each job j can tfin
k,a−1
X
collect hj,i (t) observed iteration latency and energy cost for ≥ E[Qi (t1st
k,a−1 )] + E [$i (τ )] + δk
client i. The lower the iteration latency or energy cost, τ =t1st
k,a−1
the more satisfied the job. Based on this fact, we define tfin
k,a−2
two types of rewards associated with iteration latency and X
≥ E[Qi (t1st
k,a−2 )] + E [$i (τ )] + δk
energy cost, termed latency and energy rewards. Therefore,
τ =t1st
k,a−2
we can define the observed latency and energy rewards by
tfin
Uj,i (t) = 1 − µj,i (t), Ej,i (t) = 1 − j,i (t) (where µj,i (t), j,i (t) k,a−1
X
are also normalized into [0, 1]P ). Moreover, we define the mean + E [$i (τ )] + δk
t Pt
Uj,i (t)
of Uj,i (t), Ej,i (t) by µj,i (t) = τ h=1j,i (t) , j,i (t) = τ =1
Ej,i (t)
, τ =t1st
k,a−1
(16)
hj,i (t)
respectively. Consequently, the UCB estimates of client i’s tfin
k,a−1
X
latency and energy reward means are defined as below. = E[Qi (t1st
k,a−2 )] + E [$i (τ )] + 2δk
τ =t1st
k,a−2
( s )
3 log t ≥ ...
ˆj,i (t) , min j,i (t) + ,1 ,
2hj,i (t) tfin
k,a−1 k−1 aκ
(12) X XX
≥ E [$i (τ )] + δκ + aδk
( s )
3 log t
µ̂j,i (t) , min µj,i (t) + ,1 . τ =1 κ=1 a=0
2hj,i (t) t1st
k,a −1 k−1 aκ
X XX
= $i (τ ) + δκ + aδk
Virtual Queue. Next, we introduce the virtual queue tech- τ =1 κ=1 a=0
nique to handle the long-term battery capacity constraints t1st
k,a −1
(a)
(11c) and demonstrate their influence on client selection. We ≥
X
E [$i (τ )] + (k − 1)3 log 3 k,
1
P P Bi
first equally rewrite (11c) as t∈[T ] j∈[J] E[bj,i (t)] − T ≤ τ =1
0. Moreover, we consider the battery constraint violation in where (a) holds because aδk ≥ 0 and
round t for client i to evaluate whether our client selection
k−1 aκ k−1
policy adheres to the long-term constraints over time. The XX X 1 1
δκ = k3 log 3 (k + 1) − (k − 1)3 log 3 k
constraint violation related to client i is defined by (17)
κ=1 a=0 κ=1
3 1
  + = (k − 1) log k. 3
t
X X Bi  Then for any round t in block a of frame k, we have
Bi (t) =   E[bj,i (κ)] − , (13)
κ=1
T
j∈[J]  +
t1st tfin
k,a −1k,a
+
X   X  
Bi (t) ≤  E $i (κ) + E[$i (κ)] 

where [·]+ , max{·, 0}. For job j , bj,i (t) > 0 if i ∈ Sj (t); κ=1 κ=t1st (18)
k,a
otherwise bj,i (t) = 0. Let $i (t) = j∈[J] bj,i (t) − BTi . Subse-
P
(a) h 1
i+
3
quently, considering jobs select client subset block by block, ≤ E[Qi (t1st
k,a )] − (k − 1) log k + k
3 .
we construct a block-based virtual queue Qi (t1st k,a ) for client i  +
to track its cumulative battery constraint violation by round t. where (a) is because (16) and (17), and E[$i (κ)] ≤ 1.
In round tfin 1st 1st From Eq. (18), we can know that a larger δk leads to smaller
k,a + 1 (equals to round tk,a+1 if a < ak or tk+1,0 if
a = ak ), the corresponding block-based virtual queue evolves constraint violation.
according to the following Eq. (14). Client Selection Policy. Based on the derived two UCB
estimates and the block-based virtual queues, we next depict
the developed client selection policy. At the beginning of
+
block a in frame k (i.e., round t1st
k,a ), we compute a compound

tfin
k,a
value for each client i selected by job j as Eq. (19), which
X
Qi (tfin 1st
k,a + 1) = Qi (tk,a ) + $i (τ ) + δk  , (14)
 
τ =t1st incorporates iteration latency, energy cost, and battery status
k,a
into evaluating client quality.
3 3
P2Oj,i (t1st
k,a ) , k ˆj,i (t1st 1st
k,a ) + k η1 µ̂j,i (tk,a )
where δk is used to control constraint violation and defined by (19)
− η3 k$i (t1st 1st
k,a )Qi (tk,a ).

1
k3 log 3 (k + 1) − (k − 1)3 log 3 k
1 The rationale behind is that the reward upper estimate minus
δk , . (15) the cumulative battery constraint violation, i.e., selecting the
d(k4 − (k − 1)4 ) /ke

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 8

clients with maximal possible reward and minimal battery Algorithm 1 Efficient and Intelligent Client Section Frame-
usage violation. Then, in round t1st
k,a , each job j picks a client
work for Multi-Job FL: EffI-FL
subset following the greedy rule as below. Input: global rounds T, client set [I], job set[J], M, η1 , η2 , η3
X Output: iteration latency µj (t), energy cost j,i (t)
Sj (t1st
k,a ) = arg max P2Oj,i (t1st
k,a ). (20) 1: Initialization: hj,i (t) ← 0, ∀i, j
S
i∈S,S⊂[I],|S|=M // Initialization Phase
2: Partition round set [T ] into frames and blocks, indexed by k =
That is, each job selects the top M clients with the largest
0, 1, 2, ... and a = 0, 1, 2, ..., ak , respectively
P2Oj,i (t1st
k,a ). For each client i located in the overlapping areas 3: Divide all clients into dI/M e disjoint groups
of multiple edge servers, we assign it to job j that can achieve 4: for t ∈ {1, 2, ..., dI/M e} do
maximal gain, i.e., 5: for j ∈ [J] do
6: Job j select a client group with index (t+j −1)%dI/M e
j = arg max P2Oj,i (t1st
k,a ). (21) 7: Compute Uj,i (t) = 1 − µj,i (t), Ej,i (t) = 1 − j,i (t)
j
8: Set hj,i (t) = hj,i (t) + 1 if i ∈ Sj (t) and update the
Regret. decisions in selecting clients, thereby resulting in reward estimates in Eq. (12)
diminished efficiency. This decrement in efficiency, termed 9: end for
10: end for
regret, quantifies the disparity between the performance of // Exploration-Exploitation Phase
our client selection policy (Sj (t), j ∈ [J], t ∈ [T ], as computed 11: for k = 1, 2... do
by Eq. (20) and (21)) and that achievable with the optimal 12: for a = 0, 1, 2, ..., ak do
decision (Sj∗ (t), j ∈ [J], t ∈ [T ]). Herein, we formally define 13: Determine Sj (t1st k,a ) for each job j ∈ [J] according to
regret as follows. Eq. (20) and (21)
  14: Clients in Sj (t1stk,a ) conduct local training and communi-
X X X cate with corresponding edge server following the three steps in
R(T ) , E  ej,i + η1 max∗ uj,i 
i∈Sj Sec. III-A
t=[T ] j∈[J] i∈Sj∗
15: Compute Uj,i (t) = 1 − µj,i (t), Ej,i (t) = 1 − j,i (t) for
i ∈ Sj (t), t ∈ [t1st fin
 
k,a , tk,a ]
(22)
X
− j,i (t) + η1 max µj,i (t) 16: Set hj,i (t) = hj,i (t) + 1 if i ∈ Sj (t), t ∈ [t1st fin
k,a , tk,a ] and
i∈Sj (t)
i∈Sj (t) update the reward estimates in Eq. (12)
T   17: Update Qi (tfin k,a + 1) based on Eq. (14)
18: end for
X
+ E η2 C∆j (t) ,
t=1
19: end for

where the first item is reward regret, denoted by Rreward (T ),


and the second item is switching regret, denoted by Rswitch (T ). Theorem 1. The EffI-FL algorithm achieves vanishing
Algorithm Details. To consolidate the aforementioned com- constraint violation, i.e., limt→∞ Bi (t) → 0, ∀i ∈ [I].
ponents, we have devised EffI-FL, an efficient and in-
telligent framework for multi-job FL detailed in Alg. 1. In Proof. Please see Appendix A. 
frame k = 0, each job selects M clients by sampling without
replacement. Consequently, we ensure that in frame k = 0, Theorem 2. (Upper Bound of Switching Cost). Under
each client is selected by every job at least once by setting the EffI-FL, Rswitch (T ) = O(T 3/4 ).
number of blocks to dI/M e. Considering that each client can Proof. Please see Appendix B. 
only serve one job at a time, we establish a sequential sampling
strategy for the J jobs. Assuming I clients are grouped into Theorem 3. (Upper Bound of Reward Regret). Under
1 3
dI/M e disjoint sets (typically, dI/M e > J in cross-device FL), EffI-FL, Rreward (T ) = O(log 3 (T )T 4 ).
each containing M clients. In round t, job j selects the client
group indexed by (t+j −1)%dI/M e. After initialization, at the Proof. Please see Appendix C. 
beginning of each block a in subsequent frames k = 1, 2, ...,
we determine the client subset for each job using Eq. (20) V. P ERFORMANCE E VALUATION
and (21). Following the job-client assignment, each selected A. Experiment Setup
client downloads the global model, performs local training,
Setup for FL. We have implemented EffI-FL and the
and subsequently uploads updates to the associated edge server
comparative algorithms based on an open-source hierarchical
for model aggregation. Next, we update correlated variables to
FL framework1 provided by [34]. Our simulation of the
prepare for the next block’s training phase.
hierarchical FL environment is conducted on a node equipped
with four Nvidia GeForce RTX2080Ti GPUs interconnected
C. Theoretical Analysis via PCIe 3.0x16. Specifically, we designate one GPU as
In this section, we undertake theoretical analysis for the central server, while the remaining three GPUs simulate
EffI-FL. Specifically, we demonstrate that EffI-FL en- edge servers, each covering an area with a radius of 500 m.
sures vanishing constraint violation. Furthermore, we establish Additionally, we deploy 100, 100, 200 lightweight threads to
that both the cost incurred by the proposed block-wise switch- represent clients within each edge server, uniformly distributed
ing strategy and the regret stemming from our client selection across its coverage area. Moreover, assuming a probability
policy are sublinear. 1 https://github.com/C3atUofU/Hierarchical-SGD.

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 9

range of [0.2, 0.3] for clients located in overlapping areas •HierFAVG [24]: In each round, HierFAVG employs a
between any two edge servers, we incorporate this spatial random selection strategy to choose M clients per job.
characteristic into our simulation. The job type for each edge When a client is selected by multiple jobs concurrently,
server is randomly selected from three types specified in it is assigned to any one of them, followed by random
Table II. sampling from the remaining clients until each job forms
Datasets, Models, and Jobs. We evaluated EffI-FL under a distinct subset of clients.
three realistic FL jobs, trained with different models and • MetaGreedy [6]: MetaGreedy is a multi-job FL frame-
datasets. Following the settings of [35], [11], we configure jobs work designed to minimize training latency across mul-
I and II. Job I involves training a CNN model with two 5 × 5 tiple jobs while ensuring data fairness. The approach
convolution layers over the MNIST dataset, comprising 70,000 begins by establishing a cost model for each client,
gray-scale images of handwritten digits. The output channels accounting for both latency and fairness considerations.
of the two layers are 20 and 50, respectively. Similarly, job Subsequently, a reward function is crafted based on
II trains a CNN model on the CIFAR-10 dataset, which this model, followed by the application of reinforcement
includes images of ten categories, each containing 60,000 learning (RL) techniques to optimize client selection.
color images. Here, the CNN architecture also features two • EACS-FL [23]: EACS-FL addresses the challenge of
5 × 5 convolutional layers, with output channels of 6 and 32, selecting clients for multiple jobs under long-term con-
respectively. According to [36], job III focuses on training a straints, akin to the battery constraints in our study. Its
support vector machines (SVM) model using the KDDCup99 primary focus is on minimizing overall training latency.
dataset, comprising 5,209,460 TCP dump records commonly To achieve this, EACS-FL integrates these constraints
utilized for network intrusion detection tasks. and iteration latency into an evaluation criterion to guide
client selection, favoring a greedy approach by selecting
TABLE II. Parameters of Three FL Jobs.
clients with the top-M evaluation scores. Similar to our
Items Job I Job II Job III
Dataset MNIST CIFAR-10 KDDCup99
framework, EACS-FL also accounts for uncertainties in
Model CNN CNN SVM measuring client iteration latency.
Data size ∼ 11MB ∼ 162MB ∼ 743MB • MCS [22]: MCS selects clients with smallest latency for
# of selected clients (M ) 10 10 25 each job in hierarchical FL network. Like EACS-FL,
Batch size 10 50 100
Learning rate 1e-3 1e-2 1e-2 MCS also operates within a stochastic environment.
# of local iteration 2 5 5 Metrics. We evaluate the performance of EffI-FL and all
benchmarks in terms of test accuracy of global models, total
Heterogeneous and Dynamic Environment. First, we as- latency, energy consumption, switching cost, cumulative regret
sume the computational and communication capabilities of and violation of client selection.
clients (CPU frequency fi (t), bandwidth wj,i (t), and transmis-
sion power pi (t)) follow exponential distributions, with means
randomly chosen from [1, 2] GHz, [0.125, 0.3] MHz [37], and B. Evaluation Results
[80, 120] mW [3], respectively. According to [38], the wireless TABLE III. The test accuracy of EffI-FL v.s. four baselines.
channel gain oi (t) = h0 ∗ (dj,i (t))−α dB, where h0 = 30 dB, Method Setting MNIST CIFAR-10 KDDCup99
α = −4, and dj,i (t) represents the distance between client i HierFAVG 98.75% 76.72% 99.92%
and edge server j . The noise power density is fixed at N0 = MetaGreedy 98.77% 77.43% 99.95%
IID
EACS-FL 98.74% 76.71% 99.9%
−173 dBm. For the three datasets considered, the cycles of MCS 98.74% 75.29% 99.63%
clients processing one sample Vj,i are sampled from [105 , 106 ], EffI-FL 98.76% 76.74% 99.94%
[106 , 107 ], and [104 , 105 ], respectively. The effective switched HierFAVG 98.52% 69.66% 99.87%
MetaGreedy 98.64% 70.04% 99.89%
capacitance is uniformly set to κi = 10−28 for all clients [27]. EACS-FL
non-IID I
98.45% 69.87% 99.78%
Consequently, we derive the time-varying iteration latency and MCS 97.48% 60.67% 99.68%
energy consumption using Eq. (5) and (9). Switching cost Cj EffI-FL 98.59% 69.96% 99.9%
HierFAVG 98.31% 63.56% 99.7%
is configured within the range [0.2, 0.6] based on [30]. Battery MetaGreedy 98.58% 65.18% 99.74%
capacities Bi are randomly assigned from [1300, 1600] J [23], non-IID II
EACS-FL 98.36% 63.67% 99.66%
with battery consumption per client per job varying between MCS 97.25% 58.67% 99.62%
EffI-FL 98.53% 64.84% 99.72%
70 J and 125 J. Default weight factors η1 , η2 are uniformly set
to 1. Furthermore, we explore the impact of non-IID datasets Test Accuracy. Table III presents the test accuracy of all
on performance evaluation, generating imbalanced and non- algorithms across three datasets under both IID and non-IID
IID data distributions among clients following the approach settings. The results indicate that EffI-FL achieves com-
outlined in [37]. Each client’s local data size is assumed to parable accuracy to HierFAVG, MetaGreedy, and EACS-FL.
follow a Gaussian distribution, with means computed as ratios Additionally, EffI-FL and other benchmarks demonstrate
of the number of clients to the data samples and variances a slight advantage in accuracy compared to MCS. This is
randomly selected from {0.2, 0.4} [36]. These distributions are because these algorithms consider multiple factors in client
denoted as non-IID I and II, respectively. selection, thereby improving scheduling fairness, which has
Baselines. To verify the effectiveness of EffI-FL, we been shown to enhance model performance, particularly in
compare it with four representative FL methods as follows. non-IID scenarios. From Table III, several observations can be

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 10

made. (i) In both IID and non-IID scenarios, the test accuracy is comparable to that of EACS-FL, and significant lower than
of EffI-FL closely aligns with that of HierFAVG, Meta- that of other baselines. This disparity arises due to several
Greedy, and EACS-FL. (ii) The performance improvements factors. First, EffI-FL and all benchmarks except Hier-
of EffI-FL over MCS are more pronounced in non-IID FAVG adapt to heterogeneous and dynamic environments. In
scenarios than in the IID setting. (iii) By comparing the results contrast, HierFAVG’s random client selection neglects client-
under non-IID I and II, we can find that the performance side information, compromising training efficiency. Second,
degradation of MCS becomes more evident with increasing MetaGreedy and MCS neglect battery capacity constraints,
data imbalance. unlike EffI-FL and EACS-FL. Consequently, this oversight
Objective Values. Fig. 2 illustrates the aver- results in inefficient client selection and imprudent battery
aged objective values per job per round (i.e., usage throughout training. Third, EffI-FL selects clients
1
by balancing latency and energy consumption, resulting in a
P P
T ∗J t∈[T ] j∈[J] µj (t) + η1 j (t) + η2 ψj (t)), with
η1 = η2 = 1, for both IID and non-IID settings marginal compromise in each metric. For instance, EffI-FL
across EffI-FL and all benchmarks. The results exhibits 7.7% higher latency compared to EACS-FL.
demonstrate that EffI-FL consistently outperforms all
500 EffI-FL
baselines irrespective of the degree of data imbalance. HierFAVG

Cumulative Energy
Consumption (J)
Specifically, EffI-FL achieves 51.8%, 30.9%, 22.1%, 28.3% 400 MetaGreedy
EACS-FL
(52.0%, 32.1%, 22.8%, 27.4%, 52.3%, 31.5%, 21.7%, 27.9%) cost 300 MCS
reductions compared to HierFAVG, MetaGreedy, EACS-
FL, and MCS, respectively, in the IID setting (non-IID 200
settings I, II). This is because EffI-FL effectively balances 100
latency, energy consumption, and switching cost during client
0
selection. As a comparison, EACS-FL considers both latency
0 50 100 150 200
and battery capacity constraints, MetaGreedy and MCS Number of Rounds
optimize latency only and HierFAVG considers none of these Fig. 4: Energy consumption of different benchmarks under IID
factors. Furthermore, the degree of data imbalance has little settings.
impact on the objective values, primarily influencing the label 3
×10
distribution among clients rather than the data volume itself.
Cumulative Switching Cost
3 EffI-FL
HierFAVG
EffI-FL MetaGreedy MCS
2.0 MetaGreedy
HierFAVG EACS-FL
EACS-FL
2
MCS
Objective Values

1.5

1
1.0

0.5 0
0 50 100 150 200
0.0
Number of Rounds
IID non-IID I non-IID II
Fig. 5: Switching cost of different benchmarks under IID
Fig. 2: Objective values under IID and two non-IID settings. settings.
×10
5 Furthermore, analysis of Fig. 4 and 5 reveals that EffI-FL
EffI-FL achieves significantly lower reductions in energy consumption
Cumulative Latency (s)

HierFAVG and switching cost compared to all benchmarks, as these algo-


MetaGreedy
1.0 EACS-FL
rithms do not optimize these factors. Specifically, EffI-FL
MCS achieves reductions in energy consumption (switching cost) of
up to 26.1%, 23.7%, 21.1%, 19.8% (87.7%, 82.3%, 71.5%, 78.8%),
0.5 respectively, compared to HierFAVG, MetaGreedy, EACS-FL,
and MCS. The suboptimal performance of these algorithms
stems from their failure to balance latency, energy consump-
0.0 tion, and switching cost during client selection. Additionally,
0 50 100 150 200
Number of Rounds unlike HierFAVG, the other benchmarks establish relatively
stable client selection policies, resulting in minor fluctuations
Fig. 3: Latency of different benchmarks under IID settings.
among selected clients. Consequently, HierFAVG incurs sub-
Latency, Energy Consumption, and Switching Cost. To stantially higher switching cost than the other algorithms.
delve deeper into the dynamics of latency, energy consump- Number of Switching. To examine whether EffI-FL
tion, and switching cost, we visualized their observed values achieves sub-linear switching cost, we present the cumulative
during training. Fig. 3-5 present these results under the IID number of switches over time in Fig. 6 when training the
setting, as the non-IID setting exerts minimal impact on our model on CIFAR-10 under the IID setting. To exclude the
primary metrics. Clearly, the latency reduction of EffI-FL influence of battery capacity constraints, we assume all clients

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 11

have unlimited power. Consequently, EffI-FL exhibits cu- considerations. (iii) EffI-FL demonstrates sublinear regret,
1 3
mulative switching bounded by the black curve with form aligning with the regret form O(log 3 (T )T 4 ) (equivalent to
1 1
10 ∗ T 3/4 , where 10 denotes the clients selected per round. O(log 3 (T )T − 4 ∗ T )) described in Theorem 3.
This means that the experimental results of EffI-FL agree Battery Usage. We investigate the battery consumption
with Theorem 2. patterns of all algorithms and illustrate the findings in Fig. 9
3
×10 and 10. Fig. 9 clearly demonstrates that EffI-FL and EACS-
EffI-FL FL exhibit substantially slower rates of battery power con-
0.8 10 * T 3/4
Number of Switches

sumption compared to HierFAVG, MetaGreedy, and MCS,


0.6 which do not account for battery capacity constraints.
4
×10

Cumulative Battery Usage (J)


0.4 2.0 EffI-FL
HierFAVG
0.2 MetaGreedy
1.5 EACS-FL
MCS
0.0
1.0
0 100 200 300 400
Number of Rounds
0.5
Fig. 6: EffI-FL v.s. O(T 3/4 ) in terms of the total number of
switches. 0.0
2
×10 0 50 100 150 200
4 Oracle Number of Rounds
EffI-FL Fig. 9: Cumulative battery power consumption of different
Cumulative Reward

HierFAVG
3 algorithms.
MetaGreedy
EACS-FL Cumulative Violation. Fig. 10 depicts the cumulative con-
2 MCS straint violations across clients with varying maximal battery
capacities, ranging incrementally from 1400 J to 2000 J. From
1 the results, we can find the following facts. (i) The viola-
tion generated by EffI-FL and EACS-FL is comparable,
0 stemming from their careful integration of battery capacity
0 50 100 150 200 constraints in client selection. As training progresses, the
Number of Rounds
violation diminishes in both algorithms. As a result, EffI-FL
Fig. 7: Cumulative reward of different benchmarks under IID and EACS-FL can satisfy the battery capacity constraints in
settings. the long run by carefully managing the battery usage of clients
over time. (ii) Conversely, other benchmarks exhibit relatively
1.5 ln(T) * T −1/4 MetaGreedy
high violation in the absence of such consideration. (iii) With
Time-Averaged Regret

EffI-FL EACS-FL
HierFAVG MCS higher maximum battery capacities, the overall cumulative vio-
1.0 lation across all algorithms decreases. This decline accelerates
notably when ample resources are available. The reason is that
violation mainly occurs in the early stages of training process,
0.5
when client-side information is unknown. More resources pro-
vide more opportunities for tolerating resource waste incurred
0.0 by random exploration.
0 50 100 150 200 ×10
3

Number of Rounds 4 EffI-FL EACS-FL


Cumulative Violation

Fig. 8: Time-averaged regret of different benchmarks under HierFAVG MCS


3 MetaGreedy
IID settings.
Cumulative Reward and Time-Averaged Regret of Client 2
Selection. We investigate the cumulative reward and corre-
sponding time-averaged regret of EffI-FL and all compari- 1
son algorithms, depicted in Fig. 7 and 8. Firstly, we define the
offline optimal client selection policy, referred to as Oracle, 0
1400 1600 1800 2000
which possesses prior knowledge of all stochastic variables Maximal Battery Capacity (J)
(such as CPU frequency fi (t), bandwidth wj,i (t), and trans- Fig. 10: Violation with different battery capacities.
mission power pi (t)), thereby solving problem (11) optimally. Impact of Client Dropout. Clients located in the over-
From these figures, several conclusions can be drawn. (i) The lapping areas may roam out of the current edge server’s
cumulative reward achieved by Oracle consistently outper- coverage, leading to dropout from the training process. To
forms all other algorithms. (ii) EffI-FL closely approaches simulate this situation, we assume the probability of client
the performance of Oracle and significantly outperforms other dropout is 0.5. Subsequently, we assess the test accuracy
benchmarks, which often lack comprehensive decision-making of EffI-FL under both IID and non-IID settings with

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 12

the results presented in Table IV. From the results in Ta- switching cost to a sublinear scale by employing a block-
bles. III and IV, it is evident that in the presence of client wise selection strategy for client subsets. Next, EffI-FL
dropout, our algorithm experiences performance degrada- introduces a double UCB estimator to quantify the uncertain
tion of −0.01%, −0.03%, −0.02% (−0.04%, −0.07%, −0.03%, iteration latency and energy consumption of clients. Virtual
−0.05%, −0.11%, −0.04%) under the three settings, respec- queue techniques are then employed to effectively manage the
tively. The reason is that with client selection, the number violation of battery capacity constraints. By integrating the
of dropping clients is relatively small, thereby limiting their double UCB estimator and virtual queues, EffI-FL develops
impact on overall model performance. a novel client selection approach called P2OCS. We rigorously
TABLE IV. The test accuracy of EffI-FL with client demonstrate that the proposed block-wise switching method
dropout. incurs sublinear cost, while P2OCS achieves sublinear regret.
Extensive experimental results validate the effectiveness of
Setting MNIST CIFAR-10 KDDCup99
IID 98.75% 76.71% 99.92% EffI-FL.
non-IID I 98.55% 69.89% 99.87%
non-IID II 98.48% 64.73% 99.68%
Ablation Study. EffI-FL enhances training efficiency R EFERENCES
through three key aspects: 1) reducing the total incurred la-
[1] J. Liu, J. Huang, Y. Zhou, X. Li, S. Ji, H. Xiong, and D. Dou,
tency and energy consumption via a double UCB estimator; 2) “From distributed machine learning to federated learning: A survey,”
minimizing the switching cost through a block-wise switching Knowledge and Information Systems, vol. 64, no. 4, pp. 885–917, 2022.
algorithm; 3) further optimizing the total latency and energy [2] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy
consumption by considering battery capacity constraints and efficient federated learning over wireless communication networks,”
IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp.
integrating constraint violation into the client selection criteria. 1935–1949, 2020.
To demonstrate the effectiveness of these aspects, we measure [3] M. Kim, W. Saad, M. Mozaffari, and M. Debbah, “Green, quantized
the total incurred latency, energy consumption, and switching federated learning over wireless networks: An energy-efficient design,”
IEEE Transactions on Wireless Communications, vol. 23, no. 2, pp.
cost by incrementally incorporating each of the three compo- 1386–1402, 2024.
nents, denoted as EffI-FL1 , EffI-FL2 , and EffI-FL, and [4] W. Xia, T. Q. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu,
compare the corresponding speedup over MCS. The results “Multi-armed bandit-based client scheduling for federated learning,”
IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp.
are summarized in Table IV. Comparing EffI-FL1 with 7108–7123, 2020.
MCS reveals that our double UCB algorithm can achieve a [5] T. Huang, W. Lin, W. Wu, L. He, K. Li, and A. Y. Zomaya, “An
lower cumulative sum of latency and energy cost, attributed efficiency-boosting client selection scheme for federated learning with
fairness guarantee,” IEEE Transactions on Parallel and Distributed
to its accurate measurement of the two properties of clients Systems, vol. 32, no. 7, pp. 1552–1564, 2021.
and its effective balancing during client selection. Next, we [6] J. Liu, J. Jia, B. Ma, C. Zhou, J. Zhou, Y. Zhou, H. Dai, and D. Dou,
can observe that the switching cost incurred by EffI-FL2 “Multi-job intelligent scheduling with cross-device federated learning,”
decreases significantly compared to EffI-FL1 , while the IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 2,
pp. 535–551, 2022.
other two metrics exhibit minimal change. This is attributable [7] LG Robot, 2024, https://www.lg.com/global/business/robot.
to the fact that our block-wise switching algorithm primarily [8] C. Zhou, J. Liu, J. Jia, J. Zhou, Y. Zhou, H. Dai, and D. Dou, “Efficient
reduces the frequency of client switches without affecting device scheduling with multi-job federated learning,” in Proc. of AAAI,
vol. 36, no. 9, 2022, pp. 9971–9979.
the degree of learning for each client, thereby maintaining [9] Y. Shi and H. Yu, “Fairness-aware job scheduling for multi-job federated
the accuracy of the UCB estimator. Finally, we find that learning,” arXiv preprint arXiv:2401.02740, 2024.
EffI-FL can further decrease the latency and energy cost [10] Y. Jin, L. Jiao, Z. Qian, S. Zhang, and S. Lu, “Budget-aware online
control of edge federated learning on streaming data with stochastic
by considering battery capacity constraints, thereby validating inputs,” IEEE Journal on Selected Areas in Communications, vol. 39,
the effectiveness of our algorithm design. no. 12, pp. 3704–3722, 2021.
[11] N. Wang, R. Zhou, L. Su, G. Fang, and Z. Li, “Adaptive clustered
TABLE V. Ablation study of EffI-FL, i.e., the speedup over federated learning for clients with time-varying interests,” in Proc. of
MCS. IEEE/ACM IWQoS, 2022, pp. 1–10.
Speedup [12] Z. Qu, R. Duan, L. Chen, J. Xu, Z. Lu, and Y. Liu, “Context-aware online
Algorithms
Latency Energy Cost Switching Cost client selection for hierarchical federated learning,” IEEE Transactions
EffI-FL1 1.082 0.856 1.013 on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4353–4367,
EffI-FL2 1.076 0.853 0.232 2022.
EffI-FL 1.041 0.815 0.228 [13] J. Pang, J. Yu, R. Zhou, and J. C. Lui, “An incentive auction for
heterogeneous client selection in federated learning,” IEEE Transactions
on Mobile Computing, vol. 22, no. 10, pp. 5733–5750, 2023.
VI. C ONCLUSION [14] The EMNIST Dataset, 2024, https://www.nist.gov/itl/
products-and-services/emnist-dataset.
This paper focuses on optimizing training efficiency in [15] Z. Ke, B. Liu, and X. Huang, “Continual learning of a mixed sequence
terms of latency, energy consumption, and switching cost of similar and dissimilar tasks,” in Proc. of NeurIPS, vol. 33, 2020, pp.
18 493–18 504.
for multi-job FL systems in unstable wireless networks. To
[16] W. Xia, T. Q. S. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu,
achieve this goal, we formulate an optimization problem that “Multi-armed bandit-based client scheduling for federated learning,”
seeks to minimize the weighted sum of these factors while IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp.
considering long-term battery capacity constraints. We propose 7108–7123, 2020.
[17] F. Lai, X. Zhu, H. V. Madhyastha, and M. Chowdhury, “Oort: Efficient
a novel multi-job FL framework, EffI-FL, to facilitate the federated learning via guided participant selection,” in Proc. of USENIX
concurrent training of multiple jobs. EffI-FL first reduces OSDI, 2021, pp. 19–35.

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3502403

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. X, AUGUST 2024 13

[18] C. Peng, Q. Hu, Z. Wang, R. W. Liu, and Z. Xiong, “Online-learning- interests are in the areas of machine learning, blockchain security, and
based fast-convergent and energy-efficient device selection in federated optimization algorithms.
edge learning,” IEEE Internet of Things Journal, vol. 10, no. 6, pp.
5571–5582, 2023.
[19] Y. Deng, F. Lyu, J. Ren, Y.-C. Chen, P. Yang, Y. Zhou, and Y. Zhang,
“Fair: Quality-aware federated learning with precise user incentive and
model aggregation,” Proc. of IEEE INFOCOM, pp. 1–10, 2021.
[20] P. Pene, W. Liao, and W. Yu, “Incentive design for heterogeneous
client selection: A robust federated learning approach,” IEEE Internet
of Things Journal, vol. 11, no. 4, pp. 5939–5950, 2024.
[21] D. Chen, C. S. Hong, L. Wang, Y. Zha, Y. Zhang, X. Liu, and Z. Han,
“Matching-theory-based low-latency scheme for multitask federated
learning in mec networks,” IEEE Internet of Things Journal, vol. 8, Ne Wang received the PhD degree from the School of Computer Science,
no. 14, pp. 11 415–11 426, 2021. Wuhan University, Wuhan, China, in 2023. She is currently a postdoctoral
[22] L. Su, R. Zhou, N. Wang, J. Chen, and Z. Li, “Low-latency hierarchical fellow with the Department of Computer Science and Engineering, Hong Kong
federated learning in wireless edge networks,” IEEE Internet of Things University of Science and Technology, Hong Kong. She has published research
Journal, vol. 11, no. 4, pp. 6943–6960, 2024. papers in top-tier computer science conferences and journals, including IEEE
[23] K. Zhu, F. Zhang, L. Jiao, B. Xue, and L. Zhang, “Client selection for IWQoS, ACM ICPP, IEEE JOURNAL ON SELECTED AREAS IN COM-
federated learning using combinatorial multi-armed bandit under long- MUNICATIONS, IEEE TRANSACTIONS ON MOBILE COMPUTING, and
term energy constraint,” Computer Networks, vol. 250, p. 110512, 2024. IEEE TRANSACTIONS ON COMPUTERS. Her research interests are in the
[24] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Hierarchical federated areas of cloud/edge computing, distributed machine learning, and optimization
learning with quantization: Convergence analysis and system design,” algorithms.
IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 2–
18, 2023.
[25] W. Mao, X. Lu, Y. Jiang, and H. Zheng, “Joint client selection and band-
width allocation of wireless federated learning by deep reinforcement
learning,” IEEE Transactions on Services Computing, vol. 17, no. 1, pp.
336–348, 2024.
[26] V. Smith, C.-K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated multi-
task learning,” in Proc. of NeurIPS, 2017, p. 4427–4437.
[27] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading
for mobile-edge computing with energy harvesting devices,” IEEE Ruiting Zhou received the PhD degree from the Department of Computer
Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 3590– Science, University of Calgary, Canada, in 2018. She has been an associate
3605, 2016. professor with the School of Cyber Science and Engineering, Wuhan Univer-
[28] Z. Rejiba, X. Masip-Bruin, and E. Marı́n-Tordera, “Towards user-centric, sity since June 2018. Her research interests include cloud computing, machine
switching cost-aware fog node selection strategies,” Future Generation learning and mobile network optimization. She has published research papers
Computer Systems, vol. 117, pp. 359–368, 2021. in top-tier computer science conferences and journals, including IEEE IN-
[29] Y. Zhou, C. Shen, and M. van der Schaar, “A non-stationary online FOCOM, ACM MobiHoc, ICDCS, IEEE/ACM Transactions on Networking,
learning approach to mobility management,” IEEE Transactions on IEEE Journal on Selected Areas in Communications, IEEE Transactions on
Wireless Communications, vol. 18, no. 2, pp. 1434–1446, 2019. Mobile Computing. She also serves as a reviewer for journals and international
[30] T. Le, C. Szepesvári, and R. Zheng, “Sequential learning for multi- conferences such as the IEEE Journal on Selected Areas in Communications,
channel wireless network monitoring with channel switching costs,” IEEE Transactions on Mobile Computing, IEEE Transactions on Cloud
IEEE Transactions on Signal Processing, vol. 62, no. 22, pp. 5919– Computing, IEEE Transactions on Wireless Communications, and IEEE/ACM
5929, 2014. IWQOS.
[31] J. Steiger, B. Li, B. Ji, and N. Lu, “Constrained bandit learning with
switching costs for wireless networks,” in Proc. of IEEE INFOCOM,
2023.
[32] M. Neely, Stochastic network optimization with application to commu-
nication and queueing systems. Springer Nature, 2022.
[33] S. Krishnasamy, P. Akhil, A. Arapostathis, R. Sundaresan, and
S. Shakkottai, “Augmenting max-weight with explicit learning for
wireless scheduling with switching costs,” IEEE/ACM Transactions on
Networking, vol. 26, no. 6, pp. 2501–2514, 2018.
[34] J. Wang, S. Wang, R.-R. Chen, and M. Ji, “Demystifying why local
aggregation helps: Convergence analysis of hierarchical sgd,” in Proc. Bo Li is a Chair Professor in the Department of Computer Science and Engi-
of AAAI, 2022, pp. 8548–8556. neering, Hong Kong University of Science and Technology. He held a Cheung
[35] H. Wang, Z. Kaplan, D. Niu, and B. Li, “Optimizing federated learning Kong Visiting Chair Professor in Shanghai Jiao Tong University between
on non-iid data with reinforcement learning,” in Proc. of IEEE INFO- 2010 and 2016, and was the Chief Technical Advisor for ChinaCache Corp.
COM, 2020. (NASDAQ:CCIH), a leading CDN provider. He was an adjunct researcher
[36] J. Yu, R. Zhou, C. Chen, B. Li, and F. Dong, “Asfl: Adaptive semi- at the Microsoft Research Asia (MSRA) (1999-2006) and at the Microsoft
asynchronous federated learning for balancing model accuracy and total Advanced Technology Center (2007-2008). He made pioneering contributions
latency in mobile edge networks,” in Proc. of ACM ICPP, 2023, p. in multimedia communications and the Internet video broadcast, in particular
443–451. Coolstreaming system, which was credited as first large-scale Peer-to-Peer
[37] W. Wu, L. He, W. Lin, R. Mao, C. Maple, and S. Jarvis, “Safa: A semi- live video streaming system in the world. It attracted significant attention
asynchronous protocol for fast federated learning with low overhead,” from both industry and academia and received the Test-of-Time Best Paper
IEEE Transactions on Computers, vol. 70, no. 5, pp. 655–668, 2021. Award from IEEE INFOCOM (2015). He has been an editor or a guest editor
[38] Z. Li, S. Zou, and S. Guo, “On energy optimization for hierarchical for over a two dozen of IEEE and ACM journals and magazines. He was
federated learning with delay constraint through node cooperation,” the Co-TPC Chair for IEEE INFOCOM 2004. He is a Fellow of IEEE. He
IEEE Internet of Things Journal, vol. 11, no. 9, pp. 15 299–15 309, received his PhD in the Electrical and Computer Engineering from University
2024. of Massachusetts at Amherst, and his B. Eng. (summa cum laude) in the
Computer Science from Tsinghua University, Beijing.

Jiajin Wang is currently pursuing a master’s degree in the School of


Computer Science and Technology at Anhui University, China. His research

Authorized licensed use limited to: Wuhan University. Downloaded on March 06,2025 at 03:44:56 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

You might also like