0% found this document useful (0 votes)
28 views

FFSML - Thesis Presentation

The document proposes a methodology for utilizing federated learning and meta-learning for few-shot learning on edge devices. It involves: 1. Training a global prototypical network model on a large base dataset at a centralized server. 2. Distributing the global model to edge devices for fine-tuning on local support sets and predicting on local query sets. 3. Aggregating the locally fine-tuned models back at the server to update the global model, improving few-shot learning performance across devices.

Uploaded by

kousalyavoleti16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

FFSML - Thesis Presentation

The document proposes a methodology for utilizing federated learning and meta-learning for few-shot learning on edge devices. It involves: 1. Training a global prototypical network model on a large base dataset at a centralized server. 2. Distributing the global model to edge devices for fine-tuning on local support sets and predicting on local query sets. 3. Aggregating the locally fine-tuned models back at the server to update the global model, improving few-shot learning performance across devices.

Uploaded by

kousalyavoleti16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Utilizing Federated Learning and Meta-Learning

for Few-Shot Learning on Edge Devices

Presented by,
Lahari Voleti
Contents • Introduction
• Related Work
• Our Proposed Methodology
• Datasets
• Experimental Designs
• Empirical Results
• Conclusions & Future Work
• References
Introduction
Introduction
• In recent years, mobile devices being a convenient way
of connecting users to internet, have become a platform
to deploy ML models.

• Importance is given to make these devices smarter


and more human-like by using ML techniques.
• One important on-device machine learning task
is prediction using small amount of data on
Few-shots Learning
the resource-limited devices.

• The solution for this challenging problem of predicting


using model trained using limited data (few-shots
learning) can be enhanced using "more experience"
model (meta-learning) and knowledge (e.g., model)
sharing between devices and a centralized server
via federated learning.

Federated Learning
Wang, H. (2021). GitHub - wangshusen/DeepLearning.
GitHub. https://github.com/wangshusen/DeepLearning
Federated Learning

• Federated learning is a machine learning


technique which can successfully train
a model with a number of edge devices
with their local data and
allows them to collaboratively train and
share knowledge to improve prediction
accuracy.

• Enhances model predictability, data


security and personalization of edge
devices.
Example of a Federated Learning scenario

Chandorikar, K. (2021). Introduction to Federated Learning and Privacy


Preservation. https://towardsdatascience.com/introduction-to-federated-learning-and-privacy-preservation-75644686b559
Federated Learning Types

• Centralized federated learning: In


this setting, a central server is used to
orchestrate the different steps of
algorithms and coordinate all the
participating nodes during the learning
process.

• Decentralized federated learning: In


this type, nodes are able to coordinate
themselves to obtain the global model.

Li, Q. (2019, July 23). A Survey on Federated Learning Systems: Vision, Hype and Reality. https://arxiv.org/abs/1907.09693
Few-shot classification using Meta Learning

• Few-shot
learning means building
predictive models which can
efficiently solve the challenge of
prediction using only
limited amount of data for each
object class. N-way K-shot Q-query tasks (3-way 2-shot 1-query)
• Meta-learning is learning-to-
learn, a technique of making a
machine learning to make a
model more experienced on
a distribution of tasks and use this
experience to improve on
future learning performance.
• Goal is to make machine learning
models more human-like, reduce
data collection and improve
predictability.
Meta-Training
Yasd, J. (2018). GitHub - johnnyasd12/awesome-few-
shot-meta-learning: awesome few shot / meta learning
papers. GitHub.
Problem Statement and Solution Overview

Problem: How can we perform few-shot learning using meta-learning on a


centralized federated learning setting?
Related Work
Federated Learning Algorithms
Federated Averaging (FedAvg) [7]

• During aggregation, the edge device's


local model parameters are just The FedAvg algorithm
averaged to prepare the aggregated
global model parameters.

Advantages: Easy implementation

Disadvantages: Can handle data


heterogeneity

“Breaking Privacy in Federated Learning,” KDnuggets.


https://www.kdnuggets.com/2020/08/breaking-privacy-
federated-learning.html.
Federated Learning Algorithms
Federated Personalization (FedPer) [8]

• The client neural network is seen as a combination of


base and personalization layers

• Advantage: Allows client specialization and easy


handling of variety of data.

Arivazhagan, M. G., Agarwal, V., Singh, A. K., & Choudhary, S.(2019).


Federated Learning with Personalization Layers. arXiv:1912.00818 [Cs.LG].
Retrieved from https://arxiv.org/abs/1912.00818
Federated Learning Algorithms
Federated Proximization (FedProx) [9]

• During the client local updates, for every round of training we minimize the loss function F using a proximal
term as follows:

• Where is the old loss, and is the proximal term which varies according to the data. is the client
local data weight parameters and is the global model parameters at time t.

• Advantage: specifically addresses and deals with the inconstant resource constraints of clients during
federated learning and the issue with heterogeneity of local data at the clients
Few-shot learning using meta-learning

• Relation Networks :
• It uses one module for generating
feature maps called embedding module
and relation module for calculating
relation score between support and
query images.
❖ Drawbacks: More cumbersome architecture
and adaptation to new few-shot tasks is of
minimal accuracy.

Tan, F. (2022). Learning to Compare: Relation Network for Few-shot Learning. https://medium.com/mlearning-
ai/learning-to-compare-relation-network-for-few-shot-learning-fa9c40c22701
Few-shot learning using meta-learning

• Model Agnostic Meta Learning


(MAML):
• Its main goal is to learn how to initialize
good parameters of the model which can
successfully make an accurate prediction
based on optimal minimization of the
loss function.

• Drawbacks: Unstable, hard to train to


implement efficiently . Its higher order
derivatives are expensive to compute and
takes longer runtimes.

• Harder to implement with complex


architectures

Meta-Learning. (2020). https://meta-learning.fastforwardlabs.com/


Federated learning using meta-learning
• Federated Meta-Learning (FedMeta):
• Uses MAML meta-learning algorithm in
federated learning scenario to improve the
model adaptation to heterogenous data .

• Drawbacks: This work is not about few-shot


classification task and is also expensive to
compute and takes longer runtimes.

F. Chen, M . Luo, Z. Dong, Z. Li, and X. He,


“Federated M eta-Learning with Fast Convergence and
Efficient Communication,” arXiv:1802.07876 [cs],
2019
Prototypical Networks
• These networks are implemented by calculating the prototypes for
images in each class of the support set.

• An appropriate feature extractor like CNN, ResNet is used to get


feature vectors of images.

• A Prototype is simply the mean of all the feature vectors of images


in the support set.

• Once these are set up, we calculate the Euclidean distance


between each prototype and the feature vector of the input image
(from query set).

• The class of prototype with very minimum distance is predicted


as an output.

• Advantages: computationally inexpensive as well as efficient and


easily implemented.
Snell, J., Swersky, K., & Zemel, R. (2017).
Prototypical networks for few-shot learning. Advances
in neural information processing systems, 30.
Our Proposed Methodology
Solution Overview
We implement our proposed federated few-shot learning framework utilizing Prototypical Networks a meta-learning
algorithm, on all devices in a centralized data distributed architecture such that different few-shot predictive models
can be executed on the edge devices and the server.
Our Methodology
Step 1: Server side global model M episodical pre-training of large base dataset using Prototypical Network . See
figure below

Step 2: The global model M is sent to the clients.

Step 3: Individual clients fine-tuning with model M with their distinct support sets S1, S2, S3 and perform
prediction on their query sets Q1, Q2, Q3 using their respective fine-tuned model M '1, M '2, M '3.

Server Pre-Training/ Initialization


Our Methodology (contd.)
Step 4: Local copy of global models which are
updated in Step 3, are sent back to the server.

Step 5: The resulting model is referred to as M’.

Step 6: Using M’ we test the server using S and make


a prediction on Q and obtain a server testing accuracy.

These steps are iterated for multiple rounds (20) in our


experimental scenario implementation.
Empirical Study
The main goals to be made in the empirical study of our framework are:

(1) compare the performance of the following three federated learning approaches on few-shots:
 Federated Averaging (FedAvg)

 Federated Proximal (FedProx)

 Federated Personalization (FedPer) in our proposed framework and

(2) explore the effect of data heterogeneity (using different datasets on different edge devices) on the few-shot
learning performance.
Datasets
Fashion-MNIST Dataset
• It is a dataset of Zalando’s article images which consists of 70,000 images from 10 different dataset.
The images are different types of clothes such as trousers, shirts etc. Every image is a 28 x 28 grayscale
image. [4]

• Total: 10*7000=70000 data points


• Training set: 60,000 ; Testing set: 10,000
• From training set:
• Server Base Pre-Training: 30,000 samples
• Each client Fine-tuning: 10,000 samples

• From Testing set:


• Server Testing : 10,000 samples

Fashion-MNIST dataset

Fashion MNIST. (2017, December 7). Kaggle..https://www.kaggle.com/datasets/zalando-research/fashionmnist


Omniglot Dataset
• It contains 1623 different handwritten characters from 50 different series of alphabets, where
each character was handwritten by 20 different people. [5]

• 1623*20=32460 alphabets

• Training set=24460 ; Test set=8000

• From training set:


 Server Base Pre-Training: 8460 samples
 Client 1 Fine-tuning : 8000 samples
 Client 2 Fine-tuning: 8000 samples

• From Testing set: Omniglot dataset


 Server Testing : 8000 samples

Dataset from: https://github.com/brendenlake/omniglot


CIFAR-100 Dataset
• The CIFAR100 (Canadian Institute For Advanced Research) dataset consists of 100 classes with 600
color images . It is divided into 500 training and 100 testing images per class. [3]

• Total: 600*100=60000 data points


• Training set: 50,000 ; Testing set: 10,000
• From training set:
• Server Base Pre-Training: 20,000 samples
• Each client Fine-tuning: 10,000 samples

• From Testing set:


• Server Testing : 10,000 samples

CIFAR-100 dataset

Image from: https://docs.activeloop.ai/datasets/cifar-100-dataset


Experimental Scenarios
• To explore the feasibility and efficacy of our few-shot solution using meta-learning and federated learning on both
homogeneous and heterogeneous data, we perform experiments on two scenarios:
➢ Single Dataset
➢ Multiple Datasets

• We also limit the number of shots in the few-shot learning tasks on the clients. We consider three few-shot learning task
configurations, namely:
❑ 3-way:5-shot:10 query

❑ 5-way:5-shot:10-query

❑ 5-way:5-shot:5-query

• The feature extractor we use in our Prototypical Networks is ResNet18, and optimization of loss is done using SGD.
• Software used: Google Collaboratory Pro+ with GPU (NVIDIA PT100) , 52 GB RAM
• Performance measure: Accuracy
• Number of rounds: 20
• Number of clients: either 2 or 3
Experimental Designs
Experimental Design of FedPer Algorithm

Question: Which network configuration to use for FedPer approach?

Performance Comparison of 2 Layer configurations in FedPer using Fashion-MNIST

• Maintaining an appropriate number of base layers is an essential criteria here.


• 2-client scenario .
• For ResNet18, there are 120 total layers , In this experiment, we compare 2 configurations:
(1) 78 base layers and 42 personalized layers
(2) 42 base layers and 78 personalization layers
FedPer on Fashion-MNIST – 2 clients(20 rounds)

Out of these two configurations, We use the model with the first configuration for the rest of the experiments
(78 base and 42 personalized layers) which emphasizes less on the importance of local data.
Experimental Design of FedProx Algorithm
Question: What is the effect of proximal term in FedProx?
Effect of FedProx proximal term on Prediction Performance using Fashion-MNIST

• In our experiments, we use four different


proximal values for this method they are, 0.01,
0.1, 1,1.5 .

• One observes that we cannot pick a particular


value that works best across the different few-
shot learning task configurations even if the
configuration variation is not significant.

• We choose mu=1 for the rest of the


experiments.

Fashion-MNIST(2 clients)
Empirical Results
Part A: Single-Dataset Scenario (Homogeneous data)
Omniglot (Only 2 client scenarios) :
• In the below figures, we can see the sample individual round performances of
clients (green) and the aggregated server model(red).

5-5-10 5-5-5
2-clients 2-clients
Fashion-MNIST (Both 2 and 3-client scenarios):
CIFAR-100 (Both 2-client and 3-client scenarios ):
Empirical Results
Part B: Multiple-Dataset Scenario (Heterogeneous Data)
Important Investigations

1) How well the edge devices and server aggregated model performs on a completely new task
it hasn't seen before?

2) What is the performance of these edge devices and server aggregated model under heterogenous data?
Experimental Results-Multiple Dataset Scenario
Server Pre-Training Client Fine-Tune Accuracy Server Aggregated M' Testing
Accuracy Accuracy
Number N-way​​ K- Q- Server Base M C1​​ C-2 C-3​ M' on​ M' on​ M'​
of clients​ &​ shot​​ query​​ Train Accu​​ S1-Q1​​ S2-Q2​​ S3-Q3 S-Q​ S_Q​ ​S-Q​
Type of algo​ CIFAR,OMNIGLOT​ ​CIFAR​ Omniglot FashionMNIST CIFAR​ Omniglot FashionMNIST

2 Fedavg 3​​ 5​​ 10​​ 63.933​ 41.33​ 90.00​ - 41.16​ 79.33​ 66.50​

3 Fedavg 64.6583​ 40.33​ 90.66​ 60.33​ 44.83​ 83.00​ 74.00​

2 Fedper 64.191​ 46.333​ 92.00​ - 48.50​ 71.50​ 64.16​

3 Fedper 61.975​ 44.83​ 87.166​ 55.666​ 45.66​ 81.00​ 69.166​

2 Fedprox 64.2083​ 48.83​ 91.000​ - 45.666​ 87.833​ 79.00​

3 Fedprox 65.1416​ 44.833​ 88.50​ 64.33​ 47.83​ 86.53​ 73.66​

2 Fedavg 5​​ 5​​ 5​​ 53.02​ 26.285​ 85.904​ - 33.40​ 76.20​ 60.60​

3 Fedavg 56.6​ 27.60​ 87.00​ 49.600​ 32.80​ 69.60​ 67.4​

2 Fedper 52.19​ 27.60​ 86.60​ - 27.80​ 84.200​ 52.00​

3 Fedper 53.05​ 33.20​ 88.00​ 49.40​ 33.80​ 71.60​ 62.800​

2 Fedprox 53.11​ 31.80​ 84.20​ - 30.000​ 80.60​ 47.80​

3 Fedprox 53.18​ 29.80​ 86.20​ 49.20​ 32.20​ 77.200​ 66.800​


Effect of data heterogeneity

SINGLE-DATA Scenario MULTIPLE-DATA Scenario

OMNIGLOT

5-5-5 5-5-5
Conclusions
• The main observations and conclusions from the empirical results on our proposed framework are as follows:

1) Varying the ratio of base and personalization layers has shown that a considerable amount of base layers is
good for the FedPer algorithm .

2) Varying the FedProx proximal term between 0.01 and 1.5 does not have a significant effect on the prediction
performance for our proposed approach using FedProx for federated learning.

3) For few-shot classification tasks with reasonable difficulty (> 50% accuracy), the proposed approach is able
to improve the edge devices’ individual prediction performance and improve significantly on the global
model (on the server) using any of the federated learning approaches when the few-shot tasks are from the
same datasets.

4) Unsurprisingly, the aggregated (global) models from FedPer perform the best most frequently, followed by
aggregated models from FedProx.

5) Data heterogeneity problem affects the prediction performance of our proposed solution no matter which
federated learning approach we used.
Future Work

❖ In our thesis, we perform experiments with the assumption that data used by the server and clients to
generate few-shot tasks are from all classes (even though the data for the server and clients are non-
overlapping). Future work include testing our proposed approach on experimental setting such that the server
and the clients have data from different classes (and non-overlapping).

❖ Federated few-shot learning with meta-learning in decentralized architectures (i.e., no server).

❖ Improve our proposed framework to handle heterogenous data problem using transfer learning approaches.
References
[1] Wang, Y., Yao, Q., Kwok, J. T., & Ni, L. M. (2021). Generalizing from a Few Examples. ACM Computing Surveys, 53(3),
1–34. https://doi.org/10.1145/3386252

[2] Li, Q. (2019). A Survey on Federated Learning Systems: Vision, Hype and Reality. https://arxiv.org/abs/1907.09693

[3] Benchmarks CIFAR-100. (n.d.). [Database]. CIFAR-100; benchmarks. ai. https://benchmarks. ai/cifar-100

[4] Z. (2017). GitHub - zalandoresearch/fashion-mnist: A MNIST-like fashion product database. Benchmark [Dataset].
Retrieved from https://github.com/zalandoresearch/fashion-mnist

[5] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program
induction (Vol. 350, Issue 6266). https://doi.org/10.1126/science.aab3050

[6] Snell, J. (2017). Prototypical Networks for Few-shot Learning. Advances in Neural Information Processing Systems 30
(NIPS 2017). Retrieved from https://papers.nips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42 -Abstract.html

[7] McMahan, B. H., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2016). Communication-Efficient Learning of
Deep Networks from Decentralized Data. arXiv:1602.05629 [Cs.LG]. Retrieved from https://arxiv.org/abs/1602.05629

[8] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated Optimization In Heterogeneous
Networks. arXiv:1812.06127 [Cs.LG]. Retrieved from https://arxiv.org/pdf/1812.06127

[9] Arivazhagan, M. G., Agarwal, V., Singh, A. K., & Choudhary, S.(2019). Federated Learning with Personalization Layers.
arXiv:1912.00818 [Cs.LG]. Retrieved from https://arxiv.org/abs/1912.00818
Thank you
QUESTION?
Appendix: Additional
Results
Federated learning algorithms on Few-shot data of Fashion-MNIST

3-5-10
5-5-5
3-clients
2-clients
• In the below figures, we can see the sample individual round performances of clients
(green) and the aggregated server model(red).

5-5-10 5-5-5
3-clients 3-clients
Observations on Fashion-MNIST:

1. Improvement in server performance from 60% to around 80%.


2. The Server aggregated model with Fedper algorithm performs slightly better than the other two algorithms.
3. No algorithm has shown consistent improvement (or convergence) in prediction performance for the few-shot
learning task as more rounds iterated.
4. Time taken:
- When it comes to a 2-client prediction, each experimental trial takes about 300 seconds (FedPer for 3-5-10) to
420 seconds (Fedper of 5-5-10).
- In the case of 3-clients, an experimental trial takes 600 (FedPer 3-5-10) to 696 seconds (5-5-10 FedProx) .
Fed-avg/Fed-per/Fed-prox on Omniglot – 2clients

3-5-10 5-5-10 5-5-5


Observations on Single-Dataset Scenario

Omniglot:

1. Improvement in server performance from 50% to around 90%.


2. None of the three federated learning methods outperforms the others as their aggregated global models do not
consistently help improve clients’ predictive performance.
3. No consistent improvement (or convergence) in prediction performance for the few-shot learning task as more
rounds are iterated.
4. Time taken:
- In this case of 2-clients, all experimental trails takes about 251 sec (Fedper 3-5-10) to 507 sec (FedPer 5-5-10).
Federated learning algorithms on Few-shot data of CIFAR-100

3-5-10
5-5-5
3-clients
2-clients
• In the below figures, we can see the sample individual round performances of
clients (green) and the aggregated server model(red).

5-5-10 5-5-5
3-clients 3-clients
Observations on Single-Dataset Scenario

CIFAR-100:

1. For this dataset there is only very slight improvement (29% to 39%) in aggregated server testing performance in
case of 5-5-10 and 5-5-5 .
2. None of the three federated learning methods outperforms the others as their aggregated global models do not
consistently help improve clients’ predictive performance.
3. No consistent improvement (or convergence) in prediction performance for the few-shot learning task as more
rounds are iterated.
4. Time taken:
- When it comes to a 2-client prediction, each experimental trial takes about 275 seconds (FedPer for 3-5-10) to
452 seconds (Fedper of 5-5-10).
- In the case of 3-clients, an experimental trial takes 380 (FedPer 3-5-10) to 609 seconds (5-5-5 FedProx) .
Experimental Results-Multiple Dataset Scenario
Number N-way​​ K-shot​​ Q- Server Base M C1​​ C-2 C-3​ Server Testing​ M' on​ M' on​
of clients​ &​ query​​ Train Accu​​ S1-Q1​​ S2-Q2​​ S3-Q3 M'​ S-Q​ S_Q​
Type of algo​ CIFAR,OMNIGLOT​ ​CIFAR​ Omniglot FashionMNIS T ​S -Q​ CIFAR​ Omniglot
FashionMNIS T

2 Fedavg 3​​ 5​​ 10​​ 63.933​ 41.33​ 90.00​ 66.50​ 41.16​ 79.33​

3 Fedavg 64.6583​ 40.33​ 90.66​ 60.33​ 74.00​ 44.83​ 83.00​

2 Fedper 64.191​ 46.333​ 92.00​ 64.16​ 48.50​ 71.50​

3 Fedper 61.975​ 44.83​ 87.166​ 55.666​ 69.166​ 45.66​ 81.00​

2 Fedprox 64.2083​ 48.83​ 91.000​ 79.00​ 45.666​ 87.833​

3 Fedprox 65.1416​ 44.833​ 88.50​ 64.33​ 73.66​ 47.83​ 86.53​

2 Fedavg 5​​ 5​​ 10​​ 60.62​ 32.30​ 91.400​ 59.90​ 30.30​ 87.50​

3 Fedavg 58.545​ 31.00​ 87.50​ 56.300​ 62.60​ 29.60​ 74.900​

2 Fedper 57.365​ 32.40​ 88.80​ 53.30​ 27.70​ 87.10​

3 Fedper 59.925​ 31.70​ 88.90​ 51.00​ 64.60​ 34.30​ 67.90​

2 Fedprox 58.16​ 27.900​ 89.60​ 49.70​ 28.30​ 85.500​

3 Fedprox 56.56​ 30.00​ 86.500​ 50.10​ 60.60​ 33.10​ 74.80​

2 Fedavg 5​​ 5​​ 5​​ 53.02​ 26.285​ 85.904​ 60.60​ 33.40​ 76.20​

3 Fedavg 56.6​ 27.60​ 87.00​ 49.600​ 67.4​ 32.80​ 69.60​

2 Fedper 52.19​ 27.60​ 86.60​ 52.00​ 27.80​ 84.200​

3 Fedper 53.05​ 33.20​ 88.00​ 49.40​ 62.800​ 33.80​ 71.60​

2 Fedprox 53.11​ 31.80​ 84.20​ 47.80​ 30.000​ 80.60​

3 Fedprox 53.18​ 29.80​ 86.20​ 49.20​ 66.800​ 32.20​ 77.200​


Comparison of single-data and
multiple-data scenarios

CIFAR-100
Comparing Fed Algorithms of Single-Data and Multiple-Data scenario- (5-5-5)
We investigate how the global model reacts to few-shot learning tasks on an unseen dataset, but only provided
relevant information to the client model fine-tuning.

SINGLE-DATA Scenario
MULTIPLE-DATA Scenario MULTIPLE-DATA Scenario
With and without F-MNIST

Observations:
-When the client has not been trained on Fashion-MNIST, server performance is not better.
-When the client has been trained on Fashion-MNIST, the server model trained using FedProx is the best
performer over all three aggregation algorithms.
Comparing on Single-Dataset Scenario and Multiple-data scenarios

1. Fashion-MNIST test accuracies for single-dataset scenarios are in the range of 80% to 86% where in case of
multiple-dataset scenarios it is only 60% to 70% in all few-shot learning task configurations.

2. CIFAR-100 under multiple-dataset scenarios has accuracies in the range of 27% to 33% in case of 5-5-5 and 5-5-
10 few-shot learning task configuration and 40% to 48% in case of 3-5-10 few-shot learning task configuration,
whereas in single-dataset case it is just 30% to 35% in all three few-shot learning task configurations.

3. Omniglot in single-dataset scenarios is more accurate in the range of 88% to 90% whereas in multiple-dataset
scenarios its accuracy is only between 71% and 87%.

You might also like