0% found this document useful (0 votes)
7 views12 pages

ip2024_12_002

research paper

Uploaded by

resoj26651
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

ip2024_12_002

research paper

Uploaded by

resoj26651
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Performance and Communication Cost of Deep Neural

Networks in Federated Learning Environments: An


Empirical Study
Basmah K. Alotaibi1,2, Fakhri Alam Khan1,3,4*, Yousef Qawqzeh5, Gwanggil Jeon6, David Camacho7
1
Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261 (Saudi Arabia)
2
Department of Computer Science, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic
University (IMSIU), Riyadh 13318 (Saudi Arabia)
3
Interdisciplinary Research Centre for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran
(Saudi Arabia)
4
SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals, Dhahran
(Saudi Arabia)
5
College of Information Technology, Fujairah University (UAE)
6
Department of Embedded Systems Engineering,Incheon National University, 119 Academy-ro, Yeonsu-gu,
Incheon, 22012 (Korea)
7
Department of Computer Systems Engineering Universidad Politécnica de Madrid Madrid (Spain)

* Corresponding author: [email protected]

Received 14 May 2024 | Accepted 6 September 2024 | Early Access 16 December 2024

Abstract Keywords
Federated learning, a distributive cooperative learning approach, allows clients to train the model locally using Communication Cost,
their data and share the trained model with a central server. When developing a federated learning environment, Convolutional Neural
a deep/machine learning model needs to be chosen. The choice of the learning model can impact the model Network (CNN), Deep
performance and the communication cost since federated learning requires the model exchange between clients Neural Networks,
and a central server in several rounds. In this work, we provide an empirical study to investigate the impact of Distributive Learning,
using three different neural networks (CNN, VGG, and ResNet) models in image classification tasks using two Federated Learning,
different datasets (Cifar-10 and Cifar-100) in a federated learning environment. We investigate the impact of using Neural Network,
these models on the global model performance and communication cost under different data distribution that are Performance, Residual
IID data and non-IID data distribution. The obtained results indicate that using CNN and ResNet models provide Neural Network
a faster convergence than VGG model. Additionally, these models require less communication costs. In contrast, (ResNet), Visual
the VGG model necessitates the sharing of numerous bits over several rounds to achieve higher accuracy under Geometry Group (VGG).
the IID data settings. However, its accuracy level is lower under non-IID data distributions than the other models.
Furthermore, using a light model like CNN provides comparable results to the deeper neural network models with
less communication cost, even though it may require more communication rounds to achieve the target accuracy DOI: 10.9781/ijimai.2024.12.001
in both datasets. CNN model requires fewer bits to be shared during communication than other models.

I. Introduction to provide the edge devices in the network with intelligence. However,
traditional machine/deep approaches necessitate the collection of data

T he expansion of information and communication technology has


increased the availability of data and computing resources, resulting
in the Big Data era. This increasing data generated in the network
to a central location to train the model and extract knowledge from
it. Collecting the data to a central location can cause a significant
transmission delay and raise privacy concerns due to sharing some
requires efficient knowledge extraction and processing mechanisms private information through the network. Therefore, traditional
to benefit from it. The data generated can be utilized as training data machine learning approaches that require data collection in a central

Please cite this article as:


B. K. Alotaibi, F. A. Khan, Y. Qawqzeh, G. Jeon, D. Camacho. Performance and Communication Cost of Deep Neural Networks in Federated Learning
Environments: An Empirical Study, International Journal of Interactive Multimedia and Artificial Intelligence, (2024), http://dx.doi.org/10.9781/ijimai.2024.12.001

-1-
International Journal of Interactive Multimedia and Artificial Intelligence

location face various challenges, such as network communication used in the study that proposed the FL approach [2]. We aim to gain
and data privacy [1]. To tackle these issues, Federated Learning (FL) a deeper understanding of the benefits and drawbacks of using these
was introduced [2] to allow the model to be trained locally at the models in FL environment.
network edge and share the learned model rather than the data. FL is a The main contributions of this study are as follows:
distributive collaborative learning process that allows devices (known
1. Conducting an empirical study that investigates the impact of
as clients) to train the model locally using their local data and share
utilizing three different neural networks (CNN, ResNet-18, and
the trained model with a central server. The central server aggregates
VGG) for image classification task in a FL environment using two
the received local models to create the global model. The learning
widely recognized datasets (Cifar-10 and Cifar-100).
process in federated learning is performed in rounds where, in each
round, the central server provides the global model to the clients to 2. We performed a comparative experiment and analyzed the
train the model locally using their data [2]. performance of these three models under different data distribution
settings (IID and non-IID), providing valuable insights into their
Federated learning technology offers numerous advantages over
behavior in different FL settings.
traditional learning methods. It provides for more effective usage
of network bandwidth and protects data privacy, as raw data is not 3. We studied and analyzed these models’ performance and associated
demanded to be transferred to the server. Moreover, federated learning communication costs with different batch sizes and epoch values.
can employ the computing resources and diverse datasets on clients’ 4. We provide insights into the trade-offs between model accuracy
devices to enhance the quality of the global model [3], [4], [5]. With and communication efficiency.
these benefits, federated learning can be applied in various areas such 5. Our findings shed light on the suitability of each neural network
as healthcare, transportation, IoTs, and mobile applications (such as for FL, enabling researchers and practitioners to make informed
next-word prediction) [6], [7]. decisions when selecting a learning model for their work.
However, federated learning faces various challenges because of 6. To best of our knowledge, this study is the first to explore how
its decentralized approach such as quality of the training data, the the choice of different neural network models impacts the
distributed architecture, the type of devices used to train the model, performance of federated learning.
and the communication and aggregation mechanisms used; which
The remainder of this work is organized as follows: Section I‎ I
affect the learning process in FL [8]. The clients’ data in FL is non-
presents the literature review and highlight the datasets and local
Independent and Identically Distributed (non-IID) data due to varying
models in image classification tasks in federated learning. Section I‎ II
device usage and location, as each client’s data is dependent on
show the system model. In section IV, we show the neural networks
their device usage and location. This means that the assumption of
design, and the experiment settings. Section V shows the experiments
the IID data used in machine learning algorithms cannot be applied
results, while section VI shows the discussion. The conclusion and
in FL. Therefore, FL encounters the additional challenge of data
future work are provided in section VII.
heterogeneity [4], [9]. Furthermore, during FL training, the clients and
the central server exchange the local and global models in multiple
rounds. However, this communication process can be a bottleneck due II. Literature Review
to the network’s limited resources [10], [11]. The devices in FL vary
in their computational power, storage, and network connectivity, and Numerous studies have attempted to tackle the FL challenges using
this heterogeneity could unbalance the training time and affect global various techniques. For instance, Zhong et al. [21] uses a hierarchical
model training [12], [13], [14]. clustering algorithm to overcome the non-IID data challenge by
clustering the clients based on their model similarity and merge
Despite federated learning’s potential, most research has focused similar clusters. While Wu et al. [22] addresses communication cost
on overcoming challenges like communication efficiency, data and non-IID data challenges by using a threshold value to determine
heterogeneity, and privacy preservation. However, an overlooked the importance of the local model to be uploaded to the server or
aspect of FL is the influence of various deep learning models on skipped. Other studies [23], [24], [25] focus on non-IID data challenges
the overall performance and efficiency of FL. Understanding the by aiming to reduce client drift using different techniques such as
impact of different neural network architectures within FL is crucial decoupling and correction the local drift, rescaling the gradients,
for optimizing model performance and communication resource primal-dual variable that can adapt to data heterogeneity. The studies
efficiency. in [15], [16], [17] address communication challenges by reducing the
In contrast to the traditional centralized learning approach, FL number of bits exchanges using different compression techniques,
perform the training in distributed manner on the clients’ devices such as Quantization and Count Sketch. The work in [18] improves the
and shares the local models with the central server through the communication efficiency of FL by parallelizing the communication
network for several rounds. However, sharing these local models can with computation to cover the communication phase with the training
be costly when using a deep neural network with many parameters. phase. Asynchronous technique is to overcome the communication
For that, this study aims to investigate the impact of using different bottleneck in FL [26], and another work uses partial synchronous
neural networks on the model performance and communication technique to accelerate the training process in FL over the two-tier
cost, focusing on image classification tasks. Specifically, the research network using relay nodes to aggregate the model partially and reduce
intends to evaluate and analyze the performance of a Convolutional the communication rounds required [19]. The work of Li et al. aims to
Neural Network (CNN) model along with two complex variations of it, tackle the diversity in the computational capacity between devices and
namely Visual Geometry Group (VGG) and Residual Neural Network avoid waiting for slow devices by approximating the optimal gradients
(ResNet), which are widely used by the researchers when evaluating with a complete local training model using the Hessian estimation
their proposed work in image classification tasks [15], [16], [17], [18], method to achieve the approximation, based on the heterogeneous
[19]. These three models are mainly used with Cifar-10 and Cifar-100 local updates that has been received [27]. Another work uses a tier
datasets [20]. In this study, we aim to address the following research approach to overcome the latency caused by slow clients, by grouping
question: Do we need a deeper network in a federated learning clients into the same tier based on their response time to overcome
environment? by studying the performance of VGG-11, ResNet-18, the system heterogeneity [28]. Zeng et al. [29] proposes an energy-
and comparing them with a lighter CNN model, that is the same model efficient bandwidth allocation and client selection scheme. The work

-2-
Article in Press

aims to reduce energy consumption while maximizing the selection the MNIST datasets that are gray-colored images and widely used with
of clients participating by adapting to both channel states and device simpler learning models. However, some research also uses Cifar-10
computation capability when selecting clients and allocating the and Cifar-100 with a simpler model, such as CNN. The table indicates
channel for them. Jebreel et al. [30] propose a mechanism to overcome that CNN, ResNet, and VGG are commonly used with Cifar-10 and
the label-flapping attack in the federated learning environment to Cifar-100 datasets. Therefore, this study aims to investigate the
overcome malicious clients flipping their labels to poison the global performance of the three aforementioned learning models on the
model. Their work clusters clients based on their gradient parameters Cifar-10 and Cifar-100 datasets in a federated learning environment.
and analyzes the clusters to filter any potential threat. A novel backdoor
attack in FL is introduced by Zhang et al. [31]. Their approach enables TABLE I Learning Models and Datasets Used at Different Studies in
Federated Learning Environment
the attacker to optimize the backdoor trigger using adversarial training
to enhance its persistence within the global training dynamics. In their Model Dataset Ref.
work, they study the performance of existing techniques to overcome Cifar10 [21], [23], [25], [18], [26], [28]
their attack and show their limitations. Cifar-100 [23]
CNN
These studies addressing FL challenges commonly use image MNIST [21], [23], [25], [28], [29], [30]
classification tasks as their application when evaluating their Fashion MNIST [25], [26], [28]
proposed methods with different learning models. According to [20], Cifar-10 [22], [15], [16], [17], [18], [19], [27], [30], [31]
[32], the most widely used datasets to test the model performance in Cifar-100 [15], [17], [18], [26], [27]
ResNet
FL are image datasets, and image classification tasks being the most FEMNIST [17], [31]
commonly employed applications in FL. Furthermore, the study in [4] Tiny-ImageNet [23], [31]
indicates that image datasets are the most used in FL. Cifar-10 [22], [15], [18], [24]
VGG
CNN learning model is used in the study that propose FL [2] to Cifar-100 [22], [24]
evaluate its performance with MNIST [33] dataset. Many studies [21], MNIST [16], [18]
MLP
[23], [26] use the same learning model when evaluating their proposed Fashion MNIST [27]
work, on different datasets such as CIFAR-10, CIFAR-100, and MNIST. Logistic MNIST [19]
Others have opted for a deeper neural network, such as VGG and regression Cifar-10 [19]
ResNet, mainly when using Cifar-10 and Cifar-100 as their training
dataset [15], [16], [17]. Despite the extensive research addressing various challenges
Table I shows various learning models and the datasets used to in federated learning, the impact of different learning models on
evaluate the performance in the FL environment. The table highlights federated learning performance has not been thoroughly investigated.
that deeper network such as VGG and ResNet used different datasets Table I shows that some studies utilized more than learning model,
such as Cifar-10 and Cifar-100, which include colored images, unlike however these models were utilized with different datasets. As shown
in the Table II, the studies did not compare the impact of different
TABLE II Summary of the Literature Studies

No. of Models for Comparison Hyper-parameter


Ref. Focus Methodology
Same Dataset between model Tuning (Epoch-Batch)
[21] Non-IID Hierarchical clustering algorithm 1 𝑿 𝑿
Communication cost and
[22] Select model update 2 𝑿 𝑿
Non-IID
[23] Non-IID Decoupling and correcting local drift 1 𝑿 𝑿
[24] Non-IID Rescaling the gradient 1 𝑿 𝑿
[25] Non-IID Primal-dual variable to adapt to data heterogeneity 1 𝑿 𝑿
[15] Communication cost Compression 2 𝑿 𝑿
[16] Communication cost Compression 1 𝑿 𝑿
[17] Communication cost Compression 1 𝑿 𝑿
Communication
[18] Parallelizing communication with computation 𝑿 𝑿
efficiency
Communication
[26] Asynchronous technique 1 𝑿 𝑿
bottleneck
Accelerating training Partial synchronous technique using relay nodes to aggregate
[19] 𝑿 𝑿
process the model partially
Approximating the optimal gradients with a complete local
[27] Computational capacity 1 𝑿 𝑿
training model
[28] Latency Tier approach 1 𝑿 𝑿
Select client based on device computation capability and
[29] Energy 1 𝑿 𝑿
channel states.
Cluster clients based on gradients parameter and filter any
[30] Security 1 𝑿 𝑿
potential threat
[31] Security Optimize attack trigger through an adversarial adaptation loss 1 𝑿 𝑿
Impact of Learning
Ours Models in Federated Evaluation of various deep learning models. 3 ü ü
Learning

-3-
International Journal of Interactive Multimedia and Artificial Intelligence

deep learning models on the performance of federated learning. Even In FL, the central server plays a vital role in providing an
when multiple models are utilized, they are often used with different initial model, receiving updated local models from participating
datasets, which makes direct comparisons difficult. Additionally, these clients, aggregating these received local models, and subsequently
studies did not thoroughly investigate the effects of hyper-parameter disseminating a new global model to the participating clients. The
tuning, such as varying epochs and batch sizes, on model performance most commonly used aggregation scheme in this type of learning
and communication efficiency. Selecting the learning model is essential is called Federated Averaging (FedAvg). FedAvg involves averaging
to federated learning, as it impacts both model performance and the local stochastic gradient descent (SGD) updates. This method is
communication cost. Therefore, an evaluation that compares multiple usually implemented in a few general steps as follows [2], [7], [8]:
neural networks on the same datasets while considering hyper- 1. The server sets up the initial global model.
parameter variations is important and is the focus of this study. Our
2. The server selects the participating clients (R, C), and sends the
study fills these gaps by evaluating the performance of various deep
global model to them.
learning models within a Federated Learning framework, providing a
unique contribution to the existing body of knowledge. 3. The clients that receive the global model train the received model
using their local dataset. The most used technique is using SGD to
compute the update.
III. System Model
4. The clients train the model for some epochs and upload the trained
In this section, we will provide a detailed explanation of the system local model to the server.
model used in our study. This includes the principles and mechanisms 5. The server then aggregates all the received local models using an
of FL, the utilized aggregation algorithm, and the network architecture averaging aggregation mechanism based on the clients’ dataset
that we used. size to create a new global model.
A. Federated Learning 6. The steps from 2 to 5 are repeated for several rounds until a
predefined target is reached.
FL is a distributed collaborative learning process that was proposed
by Google researchers in 2016 [2]. It is different from distributed Algorithm 1 (FedAvg):
(on-site) learning in that, in the latter, the central server provides the
The C Clients are indexed by c.
clients with an initial or pre-trained model, which the clients use to
train their personalized models using their data. In this type of learning η: learning rate.
approach, there is no sharing of data or information [8], [34], [35]. B: Batch size.
In FL, there is a fixed set of Clients C, where each client c has its own E: Local epochs.
datasets dc, and at each round a fraction R of the clients C is selected
to participate in this round to train the model [2]. Fig. 1 illustrates the Server executes:
FL architecture, where the central server sends the initial model to Initialize the initial model ω0
the participating clients. These clients then use this model to train a for each round t = 1, 2, 3, … do
local model using their dataset. Afterward, the clients send the trained m ← max (R.C, 1)
models to the server. The server then aggregates all the received local
models using an aggregation mechanism. The process will be repeated Ct ← (Select random set of m clients)
for several rounds until a target is reached [2]. Typically, FL aims to for every client c ∈ Ct in parallel do
minimize the objective function shown in (1): ← ClientUpdate (c, ωt) ...
mt ←

(1) ωt+1 ←

Where C is the total number of clients, pc ≥ 0 and , the pc ClientUpdate(c, ω): // run on client c
term define the impact of each client on the global model, where there β ← Spilt client dataset into batches of size B
are two natural settings existing which are: , where for each local epoch i from 1 to E do
d represents the total data sample of all clients and dc represents the
for batch b ∈ β do
data sample for client c, and Fc is the local objective function of client
c [2], [7]. ω ← ω − η ∇ l (ω; b)
return ω to the server
Local dataset
Algorithm 1 illustrates the process of the FedAvg algorithm
Learning process where the (Server executes) section shows the steps that the server
Local Model performs. In contrast, the (CleintUpdate) section illustrates the
Global Model Aggregation
process that are performed at each client. The clients train a local
model using a deep/machine learning approach and send the locally
Global Model trained model to the server.
Model Upload Update
B. Network Architecture
Local learning
The network design used in this study for FL is a centralized
architecture. In this architecture, there is a central server S and a set of
Local Data clients C, where each client c has its own local dataset dc. The central
server is responsible for initializing the global model and selecting the
Fig. 1 The federated learning architecture. participating clients (R.C) from the clients set C. For this work, the
central server selects the participating clients randomly.

-4-
Article in Press

In this network, each participating client ci receives the global model learn more effectively, thereby improving its performance. ResNet
from the central server and trains it locally using its own dataset dc for is commonly used for image classification and object detection [47].
a predefined local epoch. After the completion of training, the local ResNet has different variations depending on the number of layers
model is shared with the central server for global aggregation. The used (ResNet-18, ResNet-34, and so on). ResNet-18 comprises 17
central server applies the FedAvg aggregation scheme to aggregate convolution layers and a fully connected layer. A batch normalization
the local models received from all participating clients. Then, the layer and activation function can follow each convolution layer.
process of selecting clients and providing the global model is repeated. The first convolution layer is followed by a max pooling layer. The
The process will continue for several communication rounds. The FL network also includes eight sets of two convolutional layers, then an
architecture used in this work is shown in Fig. 1. average pooling layer, and finally a fully connected layer with SoftMax
activation function. The residual map is applied between the output of
the even-numbered stack of convolution layers and the output of the
IV. Methods
next stack. The residual function is shown in (2):
This section presents the local learning model implemented (2)
by the federated learning clients and the experimental settings. It
covers the local model architecture and the specific experimental Where y represents the output vector of the layer, F(x) represents
parameters employed. the residual mapping to be learned, and x represents the input vector.

A. Local Model Design X


In this subsection, we will discuss three deep learning models,
namely CNN, ResNet, and VGG, that are utilized in a federated Weight Layer
learning environment for image classification tasks. We will provide a
F(X) ReLU
general overview of their concepts and architecture in this subsection.
X
1. Convolutional Neural Network Weight Layer identity
CNN is a deep learning approach that can be utilized for tasks such ReLU
as speech recognition and computer vision [36], [37]. CNN typically
comprises three primary types of layers: convolutional, pooling, and
fully connected layers. The convolutional layer is used for extracting
features from the data. The pooling layer, on the other hand, reduces F(X) + X
the size of the output from the convolutional layer and combines
similar features to avoid redundancy. Finally, the fully connected Fig. 2 Residual learning: a building block.
layer establishes connections between the previous layer’s output
and the subsequent layer’s input [38], [39], [40]. There are different B. Experimental Setup
well-known CNN architectures such as LeNet, AlexNet, GoogleNet,
In this study, we evaluate the performance of the three learning
ResNet, VGG, which differ in terms of the used layers, the number of
models (CNN [2], ResNet-18 [46], and VGG-11 [42]), for image
layers, the activation function used, and other factors [41], [38], [37].
classification tasks using two datasets, Cifar-10 and Cifar-100 [48].
The CNN model architecture used in this study is adopted from [2]
For ResNet-18, we replaced the batch normalization layers with group
and includes two convolutional layers, each followed by a max pooling
normalization as suggested and evaluated in [49]. The sizes of the
layer, a fully connected layer, with ReLU activation function, and final
three models transmitted by each client are presented in Table III.
SoftMax output layer.
Cifar-10 dataset consists of 60,000 images that are categorized into
2. Visual Geometry Group 10 different classes. Each class has 6,000 images. Out of these 60,000
images, 50,000 are used for training and 10,000 for testing purposes.
The VGG model is a deep convolution model developed by the
Similarly, the Cifar-100 dataset contains 60,000 images that have
Visual Geometry Group [42]. To enhance the learning process, VGG
been classified into 100 distinct classes. Each class has 600 images.
uses a small convolution filter (3x3), which increases the depth of the
Out of these 60,000 images, 50,000 are used for training and 10,000
network [42], [43]. The VGG has three (3x3) convolutional layers, which
for testing purposes. In this study, we distributed the dataset among
are equivalent to having a single (7x7) convolutional layer. However,
100 clients in such a way that each client received 500 samples for
VGG uses three (3x3) convolutional layers to reduce the number of
training similar to [2], [22], [25]. We tried different settings and
parameters as it contains more ReLU layers (one after each convolution
reported the best results obtained, we conducted the experiment over
layer), which makes the decision function more discriminatory [42],
300 rounds, with a learning rate of 0.01 and SGD as optimizer. At each
[44]. VGG has different variations depending on the number of layers
round, we randomly selected 10 clients to participate as many studies
used (VGG-11, VGG-13, VGG-16, and so on). The VGG-11 model consists
used these settings [2], [25]. We performed the test on the client’s
of two stacks of convolution layers and a Max pool layer, followed by
side every five rounds. Following we show the experimental results
three stacks of two convolution layers and a Max pool layer, followed
of the three models, that highlight the model training accuracy and
by three fully connected layers, resulting in having eight convolution
communication costs to reach a predefined target. We designed two
layers and three fully connected layers [42], [45].
cases to study the models’ performance based on the data distribution.
3. Residual Neural Network The first is the IID data distribution, and the second is the non-IID
ResNet is a deep neural network that was proposed by He et al. data distribution, following a similar setting as in [2]. In the IID data
in 2015 for image detection [46]. In ResNet, the input of the layer is case, each client has data from all classes, where each client holds
added to the output of the residual mapping, which can contain two 50/5 samples from each class from the Cifar-10/Cifar-100 datasets.
or more layers. Fig. 2 shows that a shortcut connection is established In the non-IID data setting, each client has data from a few classes,
between the input and output of the residual mapping along with 2/20 classes, where each client holds 250/25 samples from the selected
an additional operation. This shortcut connection helps the network class from the Cifar-10/Cifar-100 datasets.

-5-
International Journal of Interactive Multimedia and Artificial Intelligence

TABLE III Model Size of the Three Models Transmitted By Each Client 2. Non-IID Data Setting
Dataset CNN ResNet-18 VGG-11 In the case of non-IID data settings, Fig. 4 illustrates the
performance of the three models when tested using the Cifar-10
Cifar-10 8.22MB 41.61MB 104.87MB
dataset and Cifar-100. In Cifar-10 the performance of all models
Cifar-100 8.4MB 41.87MB 107.25MB improves as the number of local epochs increases and the batch size
decreases as shown in Fig. 4 (a), (b), and (c). For all three model
we observe that configurations with more epochs per round lead
V. Results to higher accuracy, regardless of batch size. The best performance
was achieved when the number of local epochs was equal to 5 for
In this section, we present the experimental results of the three
all models. However, the VGG-11 model performed the worst in
models with different numbers of epochs and batch sizes using two
non-IID settings compared to the CNN and ResNet-18 models, as
different datasets as we increase the computation per client. We
illustrated in Fig. 4 (d). In Cifar-100 the performance of all models
compare the testing accuracy of the three models, communication bits
improves as the number of local epochs increases and the batch size
exchanges and number of rounds to reach a predefined target accuracy.
decreases as shown in Fig. 4 (e), (f), and (g). For all three model
A. Performance Comparison on Testing Accuracy we observe that configurations with more epochs per round lead to
higher accuracy, regardless of batch size, with a slight edge for the
In this subsection, we show the performance of the three deep
smaller batch size 16. The best performance was achieved when the
learning models using two cases when the clients have: (1) IID data and
number of local epochs was set to 5 for CNN and VGG-11 models.
(2) non-IID data. Each case is evaluated using two different datasets,
However, in non-IID settings, CNN and ResNet with batch size = 16
Cifar-10 and Cifar-100, and we varied the batch sizes and epoch values
perform better than the VGG-11 model that need more rounds to
for each dataset.
converge, as illustrated in Fig. 4 (h).
1. IID Data Setting
B. Performance Comparison on Communication Cost
To evaluate the performance of the three models, we tested them
Communication cost is an important metric to evaluate FL, as the
by varying the number of epochs and batch sizes. Fig. 3 illustrates
training process in FL is known to be a distributed process between
the performance of the three models in the Cifar-10 and Cifar-100
clients, and requires clients to share their local models with a central
dataset under the IID settings with an increase in batch size and local
server in multiple rounds through the network. For that, the number
epoch value. In Cifar-10 the figure (a) demonstrates that increasing
of rounds needed to reach a target accuracy and the number of bits
the number of local epochs and decreasing batch sizes improves the
sent by the clients are both an essential metric in FL. In this subsection,
model’s performance for CNN model. Similar results were obtained
we evaluate the performance of the three learning models to achieve a
in ResNet-18 model and VGG-11 model under the same settings, as
predefined target accuracy in terms of the number of communication
shown in Fig. 3 (b) and (c). The ResNet model converge faster with
rounds (RoA@XX) and training bits exchanged from clients through
the increase of the local epoch value as shown in Fig. 3 (b), both
the network. Table IV and Table V show the results for Cifar-10 and
batch sizes perform comparably well when trained with the same
Cifar-100 datasets under the IID data settings, respectively. The “-”
epochs value, indicating that the epoch count plays a crucial role in
symbol means the target accuracy could not be obtained within the
enhancing model performance. The VGG model in Fig. 3 (c) show
given number of communication rounds. However, the number of bits
a slow start especially with the 1-epoch configurations, however it
uploaded from clients during training is still reported.
obtains a higher accuracy with the increase of the global rounds with
the 5-epoch configurations. In CNN and VGG-11 models, the best Fig. 5 (a) illustrates the communication costs associated with
performance is achieved when the epoch size is 5, and the batch size different learning models for the CIFAR-10 and CIFAR-100
is 16. While ResNet-18 performs best when the batch size =16 in all datasets. The figure shows that the CNN model consistently has
epoch values. We compared the performance of the three models at the lowest communication cost across various configuration
epoch=5 and batch size=16, 32, as shown in Fig. 3 (d). The VGG-11 settings. In contrast, the VGG-11 model demonstrates the highest
model started slowly, but its performance improved with increasing communication cost under all configurations. Notably, ResNet-18
rounds compared to ResNet-18 and CNN models. ResNet-18 and CNN falls between CNN and VGG-11 in terms of communication cost.
provide comparable performance, as shown in Fig. 3 (d). Although ResNet-18 requires fewer communication rounds to
achieve the target accuracy as shown in Table IV, and Table V, these
In Cifar-100 datasets, the performance of the three models under
rounds are more costly compared to those of the CNN model. This
the IID settings is shown in Fig. 3. The models were evaluated with
indicates that the CNN model can achieve the target accuracy with
different numbers of epochs and batch sizes. Fig. 3 (e) shows the
significantly lower communication overhead, making it a more
performance of the CNN model. The results indicate that the CNN
efficient choice for federated learning scenarios.
shows better performance with a batch size of 16 in this setting.
ResNet-18 and VGG-11 have similar performance, and they perform To sum up, the three models perform better when the data is IID
better with a batch size of 16 for different epoch sizes, as demonstrated data. In the Cifar-10 and Cifar-100 datasets, ResNet-18 shows faster
in Fig. 3 (f) and (g). The ResNet model converge faster with the increase convergence to reach the target accuracy compared to CNN and
of the local epoch value as shown in Fig. 3 (f), with the smaller batch VGG-11 in most cases. However, since FL is a distributed learning
size 16 outperforms the larger batch size 32. Also, the VGG model process that shares a locally trained model instead of raw data
converge faster with the increase of the local epoch value as shown to preserve privacy and provide an efficient communication, it is
in Fig. 3 (g), with obtaining higher accuracy with the 5-epoch and essential to consider the difference in the model weight of these
10-epoch configurations compared to the 1-epoch configurations for three models. Although ResNet-18 requires fewer rounds, the
all batch sizes. We compared the performance of the three models at number of bits transmitted is more than that of the CNN model as
epoch=5 and batch size=16, 32, which is illustrated in Fig. 3 (h). The shown in Fig. 5.
results indicate that VGG-11 performs worse compared to ResNet-18
and CNN models. Among these three models, CNN performs the best
under the specified settings, as illustrated in Fig. 3 (h).

-6-
Article in Press

0.7
0.6
0.6
0.5
0.5
0.4
Accuracy

Accuracy
0.4
0.3 CNN E :1 B :16 0.3 ResNet-18 E :1 B :16
CNN E :1 B :32 ResNet-18 E :1 B :32
0.2 CNN E :5 B :16 0.2 ResNet-18 E :5 B :16
CNN E :5 B :32 ResNet-18 E :5 B :32
0.1 0.1
CNN E :10 B :16 ResNet-18 E :10 B :16
CNN E :10 B :32 ResNet-18 E :10 B :32
0.0 0.0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round

(a) CNN model for IID settings in Cifar-10 (b) ResNet-18 model for IID settings in Cifar-10

0.7 0.7

0.6 0.6

0.5 0.5
Accuracy

Accuracy
0.4 0.4

0.3 VGG-11 E :1 B :16 0.3 CNN E :5 B :16


VGG-11 E :1 B :32 CNN E :5 B :32
0.2 VGG-11 E :5 B :16 0.2 ResNet-18 E :5 B :16
VGG-11 E :5 B :32 ResNet-18 E :5 B :32
0.1 VGG-11 E :10 B :16 0.1 VGG-11 E :5 B :16
VGG-11 E :10 B :32 VGG-11 E :5 B :32
0.0 0.0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round

(c) VGG-II model for IID settings in Cifar-10 (d) The three models for IID settings in Cifar-10 with E=5

0.35
0.30
0.30
0.25
0.25
0.20
Accuracy

Accuracy

0.20
0.15 CNN E :1 B :16 0.15 ResNet-18 E :1 B :16
CNN E :1 B :32 ResNet-18 E :1 B :32
0.10 CNN E :5 B :16 0.10 ResNet-18 E :5 B :16
CNN E :5 B :32 ResNet-18 E :5 B :32
0.05 0.05
CNN E :10 B :16 ResNet-18 E :10 B :16
CNN E :10 B :32 ResNet-18 E :10 B :32
0.00 0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round
(e) CNN model for IID settings in Cifar-100 (f) ResNet-18 model for IID settings in Cifar-100

0.175
0.30
0.150
0.25
0.125 VGG-11 E :1 B :16
VGG-11 E :1 B :32 0.20
Accuracy
Accuracy

0.100 VGG-11 E :5 B :16

0.075 VGG-11 E :5 B :32 0.15 CNN E :5 B :16


VGG-11 E :10 B :16 CNN E :5 B :32
0.050 VGG-11 E :10 B :32 0.10 ResNet-18 E :5 B :16
ResNet-18 E :5 B :32
0.025 0.05
VGG-11 E :5 B :16
VGG-11 E :5 B :32
0.000 0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round
(g) VGG-II model for IID settings in Cifar-100 (h) The three models for IID settings in Cifar-100 with E=5

Fig. 3 The test accuracy of the three models for IID setting in Cifar-10 and Cifar-100 dataset.

-7-
International Journal of Interactive Multimedia and Artificial Intelligence

0.5 0.5

0.4 0.4

Accuracy
Accuracy

0.3 0.3
CNN E :1 B :16 ResNet-18 E :1 B :16
0.2 CNN E :1 B :32 0.2 ResNet-18 E :1 B :32
CNN E :5 B :16 ResNet-18 E :5 B :16
0.1 CNN E :5 B :32 0.1 ResNet-18 E :5 B :32
CNN E :10 B :16 ResNet-18 E :10 B :16
CNN E :10 B :32 ResNet-18 E :10 B :32
0.0 0.0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round

(a) CNN model for non-IID settings in Cifar-10 (b) ResNet-18 model for non-IID settings in Cifar-10

VGG-11 E :1 B :16
0.4 VGG-11 E :1 B :32 0.5
VGG-11 E :5 B :16
VGG-11 E :5 B :32 0.4
0.3 VGG-11 E :10 B :16
Accuracy

Accuracy
VGG-11 E :10 B :32
0.3
0.2 CNN E :5 B :16
0.2 CNN E :5 B :32
ResNet-18 E :5 B :16
0.1 ResNet-18 E :5 B :32
0.1
VGG-11 E :5 B :16
VGG-11 E :5 B :32
0.0 0.0
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round

(c) VGG-II model for non-IID settings in Cifar-10 (d) The three models for non-IID settings in Cifar-10 with

0.30
0.25
0.25
0.20
0.20
Accuracy

Accuracy

0.15
0.15
CNN E :1 B :16 ResNet-18 E :1 B :16
0.10 CNN E :1 B :32 0.10 ResNet-18 E :1 B :32
CNN E :5 B :16 ResNet-18 E :5 B :16
0.05 CNN E :5 B :32 0.05 ResNet-18 E :5 B :32
CNN E :10 B :16 ResNet-18 E :10 B :16
CNN E :10 B :32 ResNet-18 E :10 B :32
0.00 0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round
(e) CNN model for non-IID settings in Cifar-100 (f) ResNet-18 model for non-IID settings in Cifar-100

0.14 VGG-11 E :1 B :16


VGG-11 E :1 B :32 0.25
0.12 VGG-11 E :5 B :16
VGG-11 E :5 B :32 0.20
0.10
VGG-11 E :10 B :16
Accuracy
Accuracy

0.08 VGG-11 E :10 B :32 0.15


CNN E :5 B :16
0.06
0.10 CNN E :5 B :32
0.04 ResNet-18 E :5 B :16
0.05 ResNet-18 E :5 B :32
0.02 VGG-11 E :5 B :16
VGG-11 E :5 B :32
0.00 0.00
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Round Round
(g) VGG-II model for non-IID settings in Cifar-100 (h) The three models for non-IID settings in Cifar-100 with E=5
Fig. 4 The test accuracy of the three models for non-IID setting in Cifar-10 and Cifar-100 dataset.

-8-
Article in Press

TABLE IV Communication Rounds and Training Bits Exchanges to Reach a target Accuracy for IID Data Settings in Cifar-10

Model CNN ResNet-18 VGG-11


Epoch 1 5 10 1 5 10 1 5 10
Batch 16 32 16 32 16 32 16 32 16 32 16 32 16 32 16 32 16 32
RoA@60 190 285 35 65 40 80 105 140 45 60 40 55 - - 75 140 155 105
Communication
15.63 23.44 2.87 5.34 3.29 6.58 43.70 58.27 18.73 24.97 16.65 22.89 314.62 314.62 78.65 146.82 162.55 110.11
Cost (GB)

TABLE V Communication Rounds and Training Bits Exchanges to Reach a target Accuracy for IID Data Settings in Cifar-100

Model CNN ResNet-18 VGG-11


Epoch 1 5 10 1 5 10 1 5 10
Batch 16 32 16 32 16 32 16 32 16 32 16 32 16 32 16 32 16 32
RoA@30 240 - 95 120 105 - 205 210 185 - 220 - - - - - - -
Communication
20.16 25.2 7.98 10.08 8.82 25.2 85.84 87.93 77.46 125.6 92.12 125.6 318.7 318.7 318.7 318.7 318.7 318.7
Cost (GB)

Cifar-10 Communication Cost Cifar-100 Communication Cost

CNN CNN
300 ResNet-18 300 ResNet-18
VGG-11 VGG-11
250 250
Communication Cost (GB)

Communication Cost (GB)


200 200

150 150

100 100

50 50

0 0
16 3 2 16 16 16 3 2 6 2 6 6 16 32
B= B= B= B= B= B= =1 =3 =1 =1 B= B=
1, 1, 5 , 5 , 0, 0, 1,B 1 ,B 5 ,B 5,B 0, 0,
E= E= E= E= E =1 E =1 E= E= E= E= E =1 E =1

Learning model Learning model

(a) Cifar-10 (b) Cifar-100


Fig. 5 The communication cost for the different models using Cifar-10/Cifar-100 dataset to reach target accuracy (=60%, =30%).

VI. Discussion accuracy is necessary and is considered an essential evaluation


criterion. However, as FL is a decentralized approach that requires
In this study, we have assessed the performance of three different sharing clients’ model with the central server, the communication
models, CNN, ResNet-18, and VGG-11, in image classification tasks cost is also considered a vital evaluation metric. Therefore, we must
under two different settings. Our findings suggest that VGG-11 has a consider the number of rounds and the client bits sent to reach these
slower start and necessitates more communication rounds to obtain the results. When we use the communication costs as an evaluation metric
same accuracy as CNN and ResNet-18. Although VGG-11 eventually to evaluate the performance of the three models, CNN provides better
reaches a similar performance level to CNN and ResNet-18, it requires results than ResNet-18 and VGG-11. The CNN model exchanges fewer
more communication rounds, and these rounds are costly as the VGG- bits than ResNet-18 and VGG-11, providing comparable accuracy.
11 model size is higher than CNN and ResNet-18. Moreover, VGG- Based on our analysis of the performance of the three models
11 did not perform well in the case of non-IID data setting. On the and their associated communication costs, we recommend using a
other hand, ResNet-18 performs well and converge quickly, requiring lighter model (such as CNN in [2]) in FL, since federated learning is
fewer communication rounds than CNN in some cases. However, it a learning process that requires sharing locally trained models with
is noteworthy that ResNet-18 communication rounds cost more than a central server for several rounds through the network. Moreover,
CNN communication rounds, as the ResNet-18 model requires sending it is essential to determine the local epoch value since the devices in
more bits when uploading the model. FL are limited in resources. Setting the local epoch to a high value
The CNN model used in this study is a lighter model compared with does not indicate enhancing the model performance in these settings.
ResNet-18 and VGG-11; however, it provides comparable performance Therefore, we recommend using an adaptive local epoch that can
compared to the other two models in terms of accuracy obtained in start with a higher value and decrease after a certain point to avoid
the predefined number of rounds. In some cases, more communication overfitting and enhance the global model performance.
rounds may be required to obtain the same accuracy as ResNet-18; Hence, to answer the research question, Do we need a deeper
however, these rounds are less costly than ResNet-18 rounds since network in a federated learning environment? Our answer is that in
CNN requires fewer bits for exchanging the model compared to FL, it is necessary to consider different factors before choosing the
ResNet-18 and VGG-11. When training a model, obtaining high

-9-
International Journal of Interactive Multimedia and Artificial Intelligence

local model since the clients’ devices are limited in resources and data, by MCIN/AEI/10.13039/501100011033/ and European Union
and the learning process is performed in rounds. We may not need NextGenerationEU/PRTR for XAI-Disinfodemics (PLEC 2021-007681)
a deeper neural network model since we need to consider not only grant, by European Comission under IBERIFIER Plus - Iberian Digital
model accuracy but also communication cost, and it may cost more to Media Observatory (DIGITAL-2023-DEPLOY- 04-EDMO-HUBS
share a deeper model during training. 101158511); and by EMIF managed by the Calouste Gulbenkian
Foundation, in the project MuseAI.
VII. Conclusion and Future Work
References
Federated learning is a collaborative learning approach where
the clients and server exchange training models for several rounds [1] Z. Yang, M. Chen, K.-K. Wong, H. V. Poor and S. Cui, “Federated learning
through the network to reach a predefined target. The choice of the for 6G: Applications, challenges, and opportunities,” Engineering, vol. 8,
learning model affects the model performance and the communication pp. 33-41, 2022.
[2] B. McMahan, E. Moore, D. Ramage, S. Hampson and B. A. y Arcas,
costs. Researchers have been faced with the decision of which
“Communication-efficient learning of deep networks from decentralized
learning model to choose when using FL for image classification data,” in Proceedings of the 20th International Conference on Artificial
tasks; several studies chose a deeper neural network model, such Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017.
as VGG and ResNet, to evaluate their proposed work, while others [3] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li and H.
chose a light model, such as CNN. In this study, we aimed to answer V. Poor, “Federated learning for internet of things: A comprehensive
the question, “Do we need a deeper network in a federated learning survey,” IEEE Communications Surveys & Tutorials, vol. 23, pp. 1622-1658,
environment?” Since FL is a decentralized approach, the model 2021.
weight must be considered along with the model performance when [4] X. Ma, J. Zhu, Z. Lin, S. Chen and Y. Qin, “A state-of-the-art survey on
choosing a neural network model since the model will be shared solving non-IID data in Federated Learning,” Future Generation Computer
during training through the network. To answer this question, we Systems, vol. 135, pp. 244-258, 2022.
[5] C. Carrascosa, F. Enguix, M. Rebollo and J. Rincon, “Consensus-based
conducted an empirical study investigating the impact of using three
learning for MAS: definition, implementation and integration in IVEs,”
different neural networks in a FL environment (CNN, VGG-11, and International Journal of Interactive Multimedia and Artificial Intelligence,
ResNet-18). Our study evaluates the three models under different data vol. 8, pp. 21-32, 2023.
settings (IID and non-IID) using two datasets (Cifar-10 and Cifar-100). [6] M. Aledhari, R. Razzak, R. M. Parizi and F. Saeed, “Federated learning:
We showed the performance of these models with varying numbers A survey on enabling technologies, protocols, and applications,” IEEE
of local epochs and batch sizes. The results indicate that using CNN Access, vol. 8, pp. 140699-140725, 2020.
provides comparable results compared to the other models, with less [7] T. Li, A. K. Sahu, A. Talwalkar and V. Smith, “Federated learning:
communication cost; however, in some cases, it may require more Challenges, methods, and future directions,” IEEE signal processing
rounds to reach the predefined target, but the communication cost magazine, vol. 37, pp. 50-60, 2020.
(GB) is less than the other two models, making it a more practical [8] S. AbdulRahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi and M.
Guizani, “A survey on federated learning: The journey from centralized
choice for FL applications where communication efficiency is critical.
to distributed on-site learning and beyond,” IEEE Internet of Things
We observed significant performance degradation for all models Journal, vol. 8, pp. 5476-5497, 2020.
under non-IID settings compared to IID settings, highlighting the [9] C. Briggs, Z. Fan and P. Andras, “A review of privacy-preserving federated
importance of addressing data heterogeneity in FL. Furthermore, our learning for the Internet-of-Things,” Federated Learning Systems: Towards
analysis revealed that using a 5-epoch configuration with a batch size Next-Generation AI, pp. 21-50, 2021.
16 resulted in the best performance across all three models compared [10] A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie and R. Pedarsani,
to other configurations. Our study provided valuable insights into the “Fedpaq: A communication-efficient federated learning method with
trade-offs between model accuracy and communication efficiency, periodic averaging and quantization,” in Proceedings of the Twenty Third
suggesting that CNN offers a balanced approach by maintaining high International Conference on Artificial Intelligence and Statistics, Online,
2020.
performance while minimizing communication costs. Our findings
[11] A. Khan, M. ten Thij and A. Wilbik, “Communication-Efficient Vertical
indicate that training a model using a CNN model requires fewer
Federated Learning,” Algorithms, vol. 15, p. 273, 2022.
network resources to train the FL model to reach accuracy similar to [12] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li and Y. Gao, “A survey on federated
that obtained using deeper models. learning,” Knowledge-Based Systems, vol. 216, p. 106775, 2021.
For the future research direction, we aim there is a need to analyze [13] B. Yu, W. Mao, Y. Lv, C. Zhang and Y. Xie, “A survey on federated
the performance of FL on client device energy consumption and learning in data mining,” Wiley Interdisciplinary Reviews: Data Mining
computational resources using different models and investigate if and Knowledge Discovery, vol. 12, p. e1443, 2022.
they are applicable on the client devices that are limited in resources. [14] L. Li, Y. Fan, M. Tse and K.-Y. Lin, “A review of applications in federated
learning,” Computers & Industrial Engineering, vol. 149, p. 106854, 2020.
Also, investigate the effect of applying different compression method
[15] Z. Lian, J. Cao, Y. Zuo, W. Liu and Z. Zhu, “AGQFL: Communication-
with deep neural networks. Furthermore, investigating the effects of
efficient Federated Learning via Automatic Gradient Quantization in
introducing an adaptive local epoch size. By initially setting a higher Edge Heterogeneous Systems,” in 2021 IEEE 39th International Conference
epoch size that ultimately decreases after a certain point, to enhance on Computer Design (ICCD), Storrs, CT, USA, 2021.
the model’s accuracy while simultaneously reducing costs. [16] J. Xu, W. Du, Y. Jin, W. He and R. Cheng, “Ternary Compression for
Communication-Efficient Federated Learning,” IEEE Transactions on
Neural Networks and Learning Systems, vol. 33, p. 1162–1176, 2022.
VIII. Funding Declaration
[17] D. Rothchild, A. Panda, E. Ullah, N. Ivkin, I. Stoica, V. Braverman, J.
Gonzalez and R. Arora, “Fetchsgd: Communication-efficient federated
1: The authors would like to acknowledge the support received from
learning with sketching,” in Proceedings of the 37th International
the Saudi Data and AI Authority (SDAIA) and King Fahd University Conference on Machine Learning, Virtual, 2020.
of Petroleum and Minerals (KFUPM) under SDAIA-KFUPM Joint [18] Y. Zhou, Q. Ye and J. Lv, “Communication-efficient federated learning
Research Center for Artificial Intelligence Grant no. JRC-AI-RFP-12 with compensated overlap-fedavg,” IEEE Transactions on Parallel and
2: This work has been partially supported by the project PCI2022- Distributed Systems, vol. 33, pp. 192-205, 2021.
134990-2 (MARTINI) of the CHISTERA IV Cofund 2021 program; [19] Z. Qu, S. Guo, H. Wang, B. Ye, Y. Wang, A. Y. Zomaya and B. Tang,

- 10 -
Article in Press

“Partial Synchronization to Accelerate Federated Learning Over Relay- [41] N. K. Chauhan and K. Singh, “A review on conventional machine learning
Assisted Edge Networks,” IEEE Transactions on Mobile Computing, vol. vs deep learning,” in 2018 International conference on computing, power
21, pp. 4502-4516, 2021. and communication technologies (GUCON), Greater Noida, India, 2018.
[20] B. Alotaibi, F. A. Khan and S. Mahmood, “Communication Efficiency [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
and Non-Independent and Identically Distributed Data Challenge in large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
Federated Learning: A Systematic Mapping Study,” Applied Sciences, vol. [43] A. S. Rao, T. Nguyen, M. Palaniswami and T. Ngo, “Vision-based
14, p. 2720, 2024. automated crack detection using convolutional neural networks for
[21] J. Zhong, Y. Wu, W. Ma, S. Deng and H. Zhou, “Optimizing Multi- condition assessment of infrastructure,” Structural Health Monitoring, vol.
Objective Federated Learning on Non-IID Data with Improved NSGA-III 20, pp. 2124-2142, 2021.
and Hierarchical Clustering,” Symmetry, vol. 14, p. 1070, 2022. [44] W. Wang, Y. Yang, X. Wang, W. Wang and J. Li, “Development of
[22] X. Wu, X. Yao and C.-L. Wang, “FedSCR: Structure-based communication convolutional neural network and its application in image classification:
reduction for federated learning,” IEEE Transactions on Parallel and a survey,” Optical Engineering, vol. 58, pp. 040901-040901, 2019.
Distributed Systems, vol. 32, pp. 1565-1577, 2020. [45] M. Pak and S. Kim, “A review of deep learning in image recognition,” in
[23] L. Gao, H. Fu, L. Li, Y. Chen, M. Xu and C.-Z. Xu, “Feddc: Federated 2017 4th international conference on computer applications and information
learning with non-iid data via local drift decoupling and correction,” in processing technology (CAIPT), Kuta Bali, Indonesia, 2017.
Proceedings of the IEEE/CVF conference on computer vision and pattern [46] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image
recognition, New Orleans, LA, USA, 2022. recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
[24] Z. Lian, W. Liu, J. Cao, Z. Zhu and X. Zhou, “FedNorm: An Efficient Recognition (CVPR), Las Vegas, NV, USA, 2016.
Federated Learning Framework with Dual Heterogeneity Coexistence on [47] M. Shafiq and Z. Gu, “Deep Residual Learning for Image Recognition: A
Edge Intelligence Systems,” in 2022 IEEE 40th International Conference on Survey,” Applied Sciences, vol. 12, p. 8972, 2022.
Computer Design (ICCD), Olympic Valley, CA, USA , 2022. [48] A. Krizhevsky, G. Hinton and others, “Learning multiple layers of features
[25] Y. Gong, Y. Li and N. M. Freris, “FedADMM: A robust federated deep from tiny images,” University of Tront, Toronto, ON, Canada, 2009.
learning framework with adaptivity to system heterogeneity,” in 2022 [49] K. Hsieh, A. Phanishayee, O. Mutlu and P. Gibbons, “The Non-IID Data
IEEE 38th International Conference on Data Engineering (ICDE), Kuala Quagmire of Decentralized Machine Learning,” in Proceedings of the 37th
Lumpur, Malaysia, 2022. International Conference on Machine Learning, Virtual, 2020.
[26] S. Zhou, Y. Huo, S. Bao, B. Landman and A. Gokhale, “FedACA: An
Adaptive Communication-Efficient Asynchronous Framework for Basmah Alotaibi
Federated Learning,” in 2022 IEEE International Conference on Autonomic
Basmah Alotaibi received the B.Sc. degree in Computer Science from Imam
Computing and Self-Organizing Systems (ACSOS), CA, USA , 2022.
Muhammad ibn Saud Islamic University (IMSIU) and the M.Sc. degree from
[27] X. Li, Z. Qu, B. Tang and Z. Lu, “Fedlga: Toward system-heterogeneity of
the Department of Computer Science, King Saud University, Riyadh, Saudi
federated learning via local gradient approximation,” IEEE Transactions
Arabia. She is currently a Ph.D. student in Information and Computer Science,
on Cybernetics, vol. 54, pp. 401 - 414, 2023.
at King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi
[28] Z. Chai, A. Ali, S. Zawad, S. Truex, A. Anwar, N. Baracaldo, Y. Zhou, Arabia. Her research interest includes Federated learning, IoT, fog computing,
H. Ludwig, F. Yan and Y. Cheng, “Tifl: A tier-based federated learning cloud computing, and network security.
system,” in Proceedings of the 29th International Symposium on High-
Performance Parallel and Distributed Computing, Stockholm, 2020. Fakhri Alam Khan
[29] Q. Zeng, Y. Du, K. Huang and K. K. Leung, “Energy-efficient radio
resource allocation for federated edge learning,” in 2020 IEEE International Fakhri Alam Khan is currently serving as an Associate
Conference on Communications Workshops (ICC Workshops), Dublin, Professor with the Department of Information and
Ireland, 2020. Computer Science at King Fahd University of Petroleum
[30] N. M. Jebreel, J. Domingo-Ferrer, D. S´anchez and A. Blanco-Justicia, and Minerals. He is also a ‘Research Fellow’ with the Saudi
“LFighter: Defending against the label-flipping attack in federated Data and AI Authority (SDAIA) under the SDAIA-KFUPM
learning,” Neural Networks, vol. 170, pp. 111-126, 2024. Joint Research Center for Artificial Intelligence. He
[31] H. Zhang, J. Jia, J. Chen, L. Lin and D. Wu, “A3fl: Adversarially adaptive received his PhD in computer science from the University
backdoor attacks to federated learning,” in Advances in Neural Information of Vienna, Austria, in 2010 and completed a post-doctorate from the Vienna
Processing Systems, New Orleans, LA, USA, 2024. University of Technology in 2017. He has published several research articles
in various reputed peer-reviewed internationally recognized journals and has
[32] S. K. Lo, Q. Lu, C. Wang, H.-Y. Paik and L. Zhu, “A systematic literature
supervised numerous M.S. and Ph.D. students. His research interests include
review on federated machine learning: From a software engineering
the IoT, data analytics, data provenance, distributed systems, machine learning,
perspective,” ACM Computing Surveys (CSUR), vol. 54, pp. 1-39, 2021.
multimedia technologies, and nature-inspired metaheuristic algorithms.
[33] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. Yousef Qawqzeh
2278-2324, 1998.
[34] B. Ghimire and D. B. Rawat, “Recent advances on federated learning for Yousef Qawqzeh received the PhD. degree in systems
cybersecurity and cybersecurity for federated learning for internet of engineering from UKM University, Bangi, Kuala Lumpur,
things,” IEEE Internet of Things Journal, vol. 9, pp. 8229-8249, 2022. Malaysia, in 2011, where he is currently working as an
[35] Z. Lu, H. Pan, Y. Dai, X. Si and Y. Zhang, “Federated Learning With Non- associate professor in the college of information technology,
IID Data: A Survey,” IEEE Internet of Things Journal, pp. 1-1, 2024. Fujairah University. He is currently working on several
[36] C. Janiesch, P. Zschech and K. Heinrich, “Machine learning and deep projects in the fields of machine learning, data science, and
learning,” Electronic Markets, vol. 31, pp. 685-695, 2021. bioinformatics. He has several publications in international
[37] A. Mathew, P. Amudha and S. Sivakumari, “Deep Learning Techniques: journal and conferences. His research interest includes the early prediction
An Overview,” Advanced Machine Learning Technologies and Applications: of cardiovascular diseases using the photoplethysmography technique, the
Proceedings of AMLTA 2020, pp. 599-608, 2021. development of computer-aided diagnosis systems for early diagnosis of breast
cancer using artificial intelligence and machine learning techniques, and the
[38] Z. Li, F. Liu, W. Yang, S. Peng and J. Zhou, “A survey of convolutional
detection and prediction of high-risk diabetics using machine learning and
neural networks: analysis, applications, and prospects,” IEEE transactions
artificial intelligence techniques.
on neural networks and learning systems, vol. 33, pp. 6999 - 7019, 2021.
[39] Y. LeCun, Y. Bengio and G. Hinton, “Deep learning,” Nature, vol. 521, pp.
436-444, 2015.
[40] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang,
L. Wang, G. Wang, J. Cai and T. Chen, “Recent advances in convolutional
neural networks,” Pattern recognition, vol. 77, pp. 354-377, 2018.

- 11 -
International Journal of Interactive Multimedia and Artificial Intelligence

Gwanggil Jeon
Gwanggil Jeon received the B.S., M.S., and Ph.D. (summa
cum laude) degrees from the Department of Electronics and
Computer Engineering, Hanyang University, Seoul, Korea,
in 2003, 2005, and 2008, respectively. From 2009.09 to
2011.08, he was with the School of Information Technology
and Engineering, University of Ottawa, Ottawa, ON,
Canada, as a Post-Doctoral Fellow. From 2011.09 to
2012.02, he was with the Graduate School of Science and Technology, Niigata
University, Niigata, Japan, as an Assistant Professor. From 2014.12 to 2015.02
and 2015.06 to 2015.07, he was a Visiting Scholar at Centre de Mathématiques
et Leurs Applications (CMLA), École Normale Supérieure Paris-Saclay (ENS-
Cachan), France. From 2019 to 2020, he was a Prestigious Visiting Professor
at Dipartimento di Informatica, Università degli Studi di Milano Statale, Italy.
From 2019 to 2020 and 2023 to 2024, he was a Visiting Professor at Faculdade
de Ciência da Computação, Universidade Federal de Uberlândia, Brasil. He is
currently a professor at Incheon National University, Incheon. He was a general
chair of IEEE SITIS 2023, and served as a workshop chairs in numerous
conferences. Dr. Jeon is an Associate Editor of IEEE Transactions on Circuits
and Systems for Video Technology (TCSVT), Elsevier Sustainable Cities and
Society, IEEE Access, Springer Real-Time Image Processing, Journal of System
Architecture, and Wiley Expert Systems. Dr. Jeon was a recipient of the IEEE
Chester Sall Award in 2007, ACM’s Distinguished Speaker in 2022, the ETRI
Journal Paper Award in 2008, and Industry-Academic Merit Award by Ministry
of SMEs and Startups of Korea Minister in 2020.

David Camacho
David Camacho is full professor at Computer Systems
Engineering Department of Universidad Politécnica de
Madrid (UPM), and the head of the Applied Intelligence
and Data Analysis research group (AIDA: https://aida.
etsisi.uam.es) at UPM. He holds a Ph.D. in Computer
Science from Universidad Carlos III de Madrid in 2001
with honors (best thesis award in Computer Science).
He has published more than 300 journals, books, and conference papers. His
research interests include Machine Learning (Clustering/Deep Learning),
Computational Intelligence (Evolutionary Computation, Swarm Intelligence),
Social Network Analysis, Fake News and Disinformation Analysis. He has
participated/led more than 50 research projects (Spanish and European: H2020,
DG Justice, ISFP, and Erasmus+), related to the design and application of
artificial intelligence methods for data mining and optimization for problems
emerging in industrial scenarios, aeronautics, aerospace engineering,
cybercrime/cyber intelligence, social networks applications, or video games
among others. He serves as Editor in Chief of Wiley’s Expert Systems from
2023, and sits on the Editorial Board of several journals including Information
Fusion, IEEE Transactions on Emerging Topics in Computational Intelligence
(IEEE TETCI), Human-centric Computing and Information Sciences (HCIS),
and Cognitive Computation among others.

- 12 -

You might also like