0% found this document useful (0 votes)
51 views25 pages

2024 MTH058 Lecture07 FederatedLearning

Federated learning enables training privacy-preserving models across distributed networks by keeping data localized on devices and periodically communicating updates to a central server. It follows a decentralized approach to ensure data privacy while allowing for collaborative model training. Devices train local models on their own data and only share model parameters rather than private data with the server, which then aggregates the updates into a global model. While federated learning preserves privacy and reduces network strain, it faces challenges from statistical and system heterogeneity among decentralized devices.

Uploaded by

Mark Mystery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views25 pages

2024 MTH058 Lecture07 FederatedLearning

Federated learning enables training privacy-preserving models across distributed networks by keeping data localized on devices and periodically communicating updates to a central server. It follows a decentralized approach to ensure data privacy while allowing for collaborative model training. Devices train local models on their own data and only share model parameters rather than private data with the server, which then aggregates the updates into a global model. While federated learning preserves privacy and reduces network strain, it faces challenges from statistical and system heterogeneity among decentralized devices.

Uploaded by

Mark Mystery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

FEDERATED LEARNING

Nguyễn Ngọc Thảo


[email protected]
Federated learning: A definition
• Federated learning (FL) enables training privacy-preserving
models in heterogeneous, distributed networks.

An example of FL for the task of next-word prediction on cell phones

Image credit: CMU ML Blog 2


Federated learning: A definition
• It follows a decentralized approach, ensuring data privacy
and security while enabling collaborative model training.

Devices communicate
with a central server
periodically to learn a
global model.
FL helps preserve user
privacy and reduces strain
on the network by
keeping data localized.

3
Federated learning: A definition
Generate a global model shared by all nodes

Exchange
parameters
between
these local
nodes

Train local models on local data samples


4
Federated learning workflow
• First, the initial model is distributed to the edge devices and
trained based on data generated by the user.

5
Federated learning workflow
• Only the updated model will be sent to the server side.
• Any actual data based on user behavior is not necessarily
included.

6
Federated learning workflow
• The new model is aggregated on the server side and will be
distributed to all client devices.

7
Centralized vs. Decentralized techniques

Centralized Decentralized Distributed

All the local datasets are Local data samples are


collected to one server identically distributed

8
Types of federated learning
• Centralized federated learning
• A central server coordinates all the participating nodes during
the learning process → possibly a bottleneck of the system.

• Decentralized federated learning


• The nodes coordinate themselves to obtain the global model.
• The specific network topology may affect the performance.

• Heterogenous federated learning (HeteroFL)


• Local models are trained heterogeneously with dynamically-
varying computation complexities

9
10
Diao, Enmao, Jie Ding, and Vahid Tarokh. "Heterofl: Computation and
communication efficient federated learning for heterogeneous clients."
(Link, ICLR 2021)

Global model parameters 𝑊𝑔 are distributed to 𝑚 = 6


local clients with 𝑝 = 3 computation complexity levels.
11
Federated vs. Distributed learning
• The key difference lies in the assumptions about local dataset
properties.
• Distributed learning aims at parallelizing computing power where
federated learning aims at training on heterogeneous datasets.
• Distributed learning: local datasets are identically distributed (i.i.d.)
and roughly have the same size
• Federated learning: the datasets are typically heterogeneous, and their
sizes may span several orders of magnitude
• Clients involved in FL may be subject to more failures or drop out
• Nodes in distributed learning are typically datacenters of powerful
computational capabilities and fast networks.

12
Federated vs. Distributed learning
• Distributed learning aims at parallelizing computing power, while
Federated learning aims at training on heterogeneous datasets.

• Clients involved in FL may be subject to more failures or drop out.


• Nodes in distributed learning are typically datacenters of powerful
computational capabilities and fast networks.

• The distinction lies in the assumptions about local data properties.


• DL: local datasets are identically distributed (iid.) and roughly
of the same size.
• FL: datasets are typically heterogeneous, and their sizes may
span several orders of magnitude.

13
Canonical problem formulation
• FL was originally introduced as a new setting for distributed
optimization with a few distinctive properties.

Massive number of distributed nodes

Slow and expensive communication

Unbalanced and non-IID data scattered across the nodes

• FL aims to approximate centralized training and converge to


the same optimum as quickly as possible.

14
Canonical problem formulation
• Objective: Learn a single global statistical model from data
stored on tens to potentially millions of remote devices.
• In particular, minimize the following objective function:
𝑲

𝒘∗ = 𝐚𝐫𝐠 𝐦𝐢𝐧 𝑭 𝒘 ≔ ෍ 𝒑𝒌 𝑭𝒌 𝒘
𝒘∈𝑹𝒅 𝒌=𝟏
• 𝐾: the total number of devices
• 𝐹𝑘 is the local objective function for the 𝑘th device, defined as the
empirical risk over local data.
• 𝑝𝑘 is the relative impact of each device, 𝑝𝑘 ≥ 0 and σ𝐾
𝑘=1 𝑝𝑘 = 1.
1 𝑛𝑘
• It is user-defined, usually 𝑝𝑘 = 𝐾 or 𝑝𝑘 = , where 𝑛 is the number of
𝑛
samples over all devices.
15
The FedAvg (or Local SGD) method
• The clients optimize their local objective functions for
multiple steps to obtain 𝜃𝑡𝑖 .
• Then, send the pseudo-gradients, ∆𝑡𝑖 = 𝜃 𝑡 − 𝜃𝑡𝑖 , to the server.
• 𝜃 𝑡 : initial state, 𝜃𝑡𝑖 : a local update at client 𝑖 at timestep 𝑡.

• The server averages these values to update the model state


with the learning rate 𝛼𝑡 .
𝑵

𝜽𝒕+𝟏 = 𝜽𝒕 + 𝜶𝒕 ෍ 𝒑𝒊 ∆𝒕𝒊
𝒊=𝟏

16
FedAvg: The “client drift” problem
• Clients makes additional SGD steps locally → it converges much
faster both in the number of rounds and in wall-clock time.

• FedAvg converges to an inferior optimum in the non-IID setting


(i.e., when clients have different data distributions) .
• The resulting pseudo-gradients are somehow biased compared to
centralized training.

• Solution: use local regularization, carefully set learning rate


schedules, or use different control variate methods.
• Most of these intentionally must limit the optimization progress
clients can make at each round.

17
FedAvg: The “client drift” problem

A toy 2D setting with two clients and quadratic objectives that illustrates the convergence
issues of FedAvg. Left: convergence trajectories in the parameter space. Right: convergence
in terms of distance from the global optimum. Each drawing of the plot corresponds to a
run of federated optimization from a different starting point in the parameter space.
More local SGD steps per round speed up training, but the progress eventually stagnates at
an inferior point further away from the global optimum.

Image credit: CMU ML Blog 18


Federated Posterior Averaging (2021)
• FedPA employs MCMC for local posterior approximation on clients and
send statistics to the server to refine the global posterior mode estimate.

Al-Shedivat, Maruan, Jennifer Gillenwater, Eric Xing, and Afshin Rostamizadeh. "Federated
learning via posterior averaging: A new perspective and practical algorithms." ICLR, 2021.
Image credit: CMU ML Blog 19
Federated Posterior Averaging (2021)
• FedPA uses stochastic gradient Markov chain Monte Carlo (SG-MCMC)
for approximate sampling from local posteriors on the clients

FedPA vs. FedAvg in the toy 2D setting with two clients and quadratic objectives.

Image credit: CMU ML Blog 20


Federated learning: Pros and Cons

Hyper-personalized Low Cloud Minimum Privacy preserving


Infra Overheads latencies

Expensive System Statistical Privacy concerns


communication heterogeneity heterogeneity

21
Federated learning platforms

TensorFlow
Federated

IBM Federated Learning


22
Federated learning: Applications

Another application of federated learning for personal healthcare via learning over
heterogeneous electronic medical records distributed across multiple hospitals.
Image credit: CMU ML Blog 23
List of references

• Federated Learning: Challenges, Methods, and Future Directions (link)


• An Inferential Perspective on Federated Learning (link)
• Federated learning: a beginner guide (link)

24
25

You might also like