0% found this document useful (0 votes)

2 views

Slides Architecture Based Continual Learning

The document discusses architecture-based approaches in continual learning (CL), focusing on task-incremental learning (TIL) where models learn a sequence of tasks without access to previous data. It outlines various existing methods such as modular networks, parameter allocation, and model decomposition, highlighting their strategies to minimize inter-task interference and manage network capacity. Key challenges include balancing stability and plasticity to prevent performance drops as tasks increase.

Uploaded by

qin yang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Slides Architecture Based Continual Learning

Uploaded by

qin yang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Architecture-Based Approaches in Continual Learning

Pengxiang Wang

Peking University, School of Mathematical Sciences

University of Bristol, School of Engineering Mathematics and Technology

1 / 23
Introduction

2 / 23
Problem Definition
Continual Learning (CL): learning a sequence of tasks 𝑡 = 1, ⋯ , 𝑁 in order, with
datasets 𝐷𝑡 = {𝑥𝑡 , 𝑦𝑡 }
Task-Incremental Learning (TIL): continual learning scenario, aim to train a model �
that performs well on all learned tasks

𝑁
max ∑ metric(𝑓(𝑥𝑡 ), 𝑦𝑡 ), {𝑥𝑡 , 𝑦𝑡 } ∈ 𝐷𝑡
𝑓
𝑡=1

Key assumptions when training and testing task 𝑡:

▶ No access to the whole data from previous tasks 1, ⋯ , 𝑡 − 1
▶ Testing on all seen tasks 1, ⋯ , 𝑡
▶ For TIL testing, task ID 𝑡 of each test sample is known by the model. Otherwise,
it is task-agnostic testing

3 / 23
Existing Approaches for TIL
Replay-based Approaches
▶ Prevent forgetting by storing parts of the data from previous tasks
▶ Replay algorithms use them to consolidate previous knowledge
▶ E.g. iCaRL, GEM, DER, DGR ...
Regularization-based Approaches
▶ Add regularization terms constructed using information about previous tasks to
the loss function when training new tasks
▶ E.g. LwF, EWC, SI, IMM, VCL, ...
Architecture-based Approaches (what we are talking about)
▶ Dedicate network parameters in different parts of the network to different tasks
▶ Keep the parameters for previous tasks from being significantly changed
▶ E.g. Progressive Networks, PackNet, DEN, Piggyback, HAT, CPG, UCL, ...

4 / 23
Existing Approaches for TIL

Optimization-based Approaches
▶ Explicitly design and manipulate the optimization step
▶ For example, project the gradient not to interfere previous tasks
▶ E.g. GEM, A-GEM, OWM, OGD, GPM, RGO, TAG, ...
Representation-based Approaches
▶ Use special architecture or training procedure to create powerful representations
▶ Inspired from self-supervised learning, large-scale pre-training like LLMs
▶ E.g. Co2L, DualNet, prompt-based approaches (L2P, CODAPrompt, ...), CPT
(continual pre-training)...

5 / 23
Architecture-based Approaches

6 / 23
Architecture-based Approaches

▶ Leverages the separability characteristic of the neural network architecture

▶ Treat the network as decomposable resources for tasks, rather than as a whole
▶ Dedicate different parts of a neural network to different tasks to minimize the
inter-task interference
▶ Focus on reducing representational overlap between tasks
The “part” of a network can be regarded in various ways:
▶ Modular Networks: play around network modules like layers, blocks
▶ Parameter Allocation: allocate group of parameters or neurons to task as a
subnet
▶ Model Decomposition: decompose network from various aspects into shared
and task-specific components

7 / 23
Modular Networks: Progessive Networks

Progressive Networks, 2016

▶ Expand the network with new column module
for each new task
▶ Linearly increasing model memory
▶ Similar to independent training: train a
independent network for each task

8 / 23
Modular Networks: Progessive Networks

Expert Gate, 2017

▶ A new independent expert (network)
for each new task
▶ Similar to independent training but
work in task-agnostic testing
▶ A gate works as the task ID selector at
test time
▶ The gate is a network learned through
the task sequence

9 / 23
Modular Networks: PathNet
PathNet, 2017
▶ Prepare a large pool of modules for the algorithm to select from
▶ Several options in each module position, concatenated and form a subnet for a
task
▶ Choose the path by tournament genetic algorithm between different paths during
the training of a task

10 / 23
Parameter Allocation: Overview

Parameter Allocation
▶ Refines the level of modules to parameters or neurons
▶ Selects a collection of parameters or neurons to allocate to each task
▶ Also forms a subnet for the task

11 / 23
Parameter Allocation: Overview

Parameter Allocation methods differ in ways:

▶ Methods to allocate
▶ Manually set through hyperparameters
▶ Learned together with the learning process
▶ Application of masks during training
▶ Forward pass
▶ Backward pass
▶ Parameter update step
▶ Application of masks during testing
▶ Most methods fix the selected subnet after
▶ Weight masks are way greater trained on their belonged task and use it as
than feature masks in scale the only model to predict for that task during
▶ Should keep a decent amount testing
of neurons in each layer

12 / 23
Parameter Allocation: PackNet
PackNet, 2018
▶ Select non-overlapping weight masks and allocate them to tasks
▶ Fix masked parameters once trained until testing using the subnet
▶ Post-hoc selection by pruning (by absolute values of weights) after training
▶ Retraining after pruning as network structure changes
▶ Manually allocation by percentage hyperparameters

13 / 23
Parameter Allocation: DEN
DEN (Dynamically Expandable Networks), 2018
▶ Find the important neurons as feature masks for testing, and duplicate
▶ Find by training with equally L2 regularisation, whose connected parameters
change a lot are important
▶ Dynamic network expansion when performance can’t be improved, prune after
▶ The training selects their own important neurons by L1 regularised training, then
only train them by L2 regularisation
▶ Manually allocation by threshold hyperparameters, slightly better than percentage

14 / 23
Parameter Allocation: Piggyback
Piggyback, 2018
▶ Learnable allocation: binary masks are gated from real values which is
differentiable and can be learned together with parameters
▶ Still binary during test
▶ Sacrifices with the network parameters fixed, reduced representation ability
SupSup, 2020
▶ Extends to task-agnostic
testing

15 / 23
Parameter Allocation: HAT
HAT (Hard Attention to the Task), 2018
▶ Masks and parameters are both learnable
▶ Fix masked parameters once trained until testing using the subnet
▶ Sparsity regularization for masks
AdaHAT, 2024 (my work)
▶ Allow minor adaptive
adjustment to masked
parameters

16 / 23
Parameter Allocation: CPG
CPG (Compacting, Picking and Growing), 2019
▶ Post-hoc pruning and retraining + network expanding + learnable masks (on
previous weights)

17 / 23
Model Decomposition: ACL
ACL (Adversarial Continual Learning), 2020
▶ Shared and task-specific, modules, features
▶ Shared module is adversarially trained with the discriminator to generate
task-invariant features. The discriminator predicts task labels

18 / 23
Model Decomposition: APD

APD (Additive Parameter Decomposition), 2020

▶ Decomposes the parameter matrix of a layer mathematically:

𝜃𝑡 = 𝜎 ⊙ ℳ𝑡 + 𝜏𝑡 , ℳ𝑡 = Sigmoid(v𝑡 )

▶ Apply different regularisation strategies to shared 𝜎 and task-specific 𝜏𝑡 , v𝑡

2
min ℒ ({𝜎 ⊗ ℳ𝑡 + 𝜏 𝑡 } ; 𝒟𝑡 ) + 𝜆1 ‖𝜏 𝑡 ‖1 + 𝜆2 ∥𝜎 − 𝜎(𝑡−1) ∥
𝜎,𝜏 𝑡 ,v𝑡 2

1. Shared parameters 𝜎 not deviate far from the previous

2. The capacity of task-specific 𝜏𝑡 to be as small as possible, by making it sparse

19 / 23
Model Decomposition: PGMA
PGMA (Parameter Generation and Model Adaptation), 2019
▶ Task-specific parameters 𝑝𝑡 are generated by DPG (dynamic parameter generator)
▶ Shared parameters 𝜃0 (in solver 𝑆 ) adapt itself to task 𝑡 with the generated
task-specific 𝑝𝑡

20 / 23
Challenges

21 / 23
Challenge: Network Capacity and Plasticity

Network Capacity Problem

▶ Any fixed model will eventually get full and lead to the performance drop, given
the potentially infinite task sequence
▶ Become explicit in architecture-based approaches
▶ Can be solved by taking shortcuts to expand the networks, but it is not fair
Stability-Plasticity Trade-Off
▶ Continual learning seeks to trade off the balance between stability and plasticity
▶ Approaches that fix part of model for previous tasks are lack of plasticity by
stressing too much stability
▶ Others whichever has task shared components still face the classic catastrophic
forgetting problem, which is a result of lack of stability
▶ They both lead to a bad average performance

22 / 23
Thank You
Thank you for your attention!

Please feel free to ask any questions.

My blog post provides detailed information about this:

Architecture-based Continual Learning Algorithms

23 / 23

Learning Rules
No ratings yet
Learning Rules
60 pages
Deep Learning 1.0 and Beyond: A Tutorial
No ratings yet
Deep Learning 1.0 and Beyond: A Tutorial
55 pages
N A S R L: Eural Rchitecture Earch With Einforcement Earning
No ratings yet
N A S R L: Eural Rchitecture Earch With Einforcement Earning
16 pages
Deep Learning 1.0 and Beyond: A Tutorial
No ratings yet
Deep Learning 1.0 and Beyond: A Tutorial
50 pages
2208.00791v1
No ratings yet
2208.00791v1
10 pages
Simple and Efficient Architecture Search For Convolutional Neural Networks
No ratings yet
Simple and Efficient Architecture Search For Convolutional Neural Networks
14 pages
2504.02489v1
No ratings yet
2504.02489v1
7 pages
ETH Zurich Talk - April 14, 2025
No ratings yet
ETH Zurich Talk - April 14, 2025
84 pages
D N N A R L: Esigning Eural Etwork Rchitectures Using Einforcement Earning
No ratings yet
D N N A R L: Esigning Eural Etwork Rchitectures Using Einforcement Earning
18 pages
B122_Activity1_Sanskruti Alone
No ratings yet
B122_Activity1_Sanskruti Alone
9 pages
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
No ratings yet
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
117 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Deep Learning
No ratings yet
Deep Learning
48 pages
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
No ratings yet
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
15 pages
Lecture 01
No ratings yet
Lecture 01
45 pages
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
No ratings yet
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
15 pages
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
No ratings yet
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
8 pages
A Continual Learning Survey Defying Forgetting in Classification Tasks
No ratings yet
A Continual Learning Survey Defying Forgetting in Classification Tasks
20 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
Neural Network Fundamentals With Graphs
No ratings yet
Neural Network Fundamentals With Graphs
6 pages
ELM Tutorial
No ratings yet
ELM Tutorial
177 pages
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
No ratings yet
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
21 pages
Computer Science Lesson plan5
No ratings yet
Computer Science Lesson plan5
4 pages
Computer Science Lesson plan7
No ratings yet
Computer Science Lesson plan7
4 pages
2110.05954v1
No ratings yet
2110.05954v1
16 pages
DL Intro
No ratings yet
DL Intro
64 pages
Compiler Construction
No ratings yet
Compiler Construction
33 pages
Practical Block-Wise Neural Network Architecture Generation
No ratings yet
Practical Block-Wise Neural Network Architecture Generation
11 pages
Deep Complex Networks
No ratings yet
Deep Complex Networks
19 pages
Taking The Human Out of Learning Applications: A Survey On Automated Machine Learning
No ratings yet
Taking The Human Out of Learning Applications: A Survey On Automated Machine Learning
20 pages
Neural Network Thesis Papers
100% (2)
Neural Network Thesis Papers
7 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
58 pages
2006.03669v2
No ratings yet
2006.03669v2
73 pages
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
No ratings yet
A Comprehensive Survey On Pretrained Foundation Models: A History From BERT To ChatGPT
99 pages
Towards Better Dynamic Graph Learning - New Architecture and Unified Library
No ratings yet
Towards Better Dynamic Graph Learning - New Architecture and Unified Library
24 pages
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
No ratings yet
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
38 pages
M Thesis Report
No ratings yet
M Thesis Report
38 pages
osdi23_slides_zhao
No ratings yet
osdi23_slides_zhao
68 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
No ratings yet
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
5 pages
Presentasi SLR IEEE.pptx
No ratings yet
Presentasi SLR IEEE.pptx
11 pages
Neural Network Assignment 1 by Gourav Meena
No ratings yet
Neural Network Assignment 1 by Gourav Meena
14 pages
A Comprehensive Survey of Graph Neural Networks PDF
No ratings yet
A Comprehensive Survey of Graph Neural Networks PDF
22 pages
UCS_401_Unit-LV_ Trends in Machine Learning_Model and Symbols- Bagging and Boosting, Multitask
No ratings yet
UCS_401_Unit-LV_ Trends in Machine Learning_Model and Symbols- Bagging and Boosting, Multitask
44 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
NeurIPS-2021-redesigning-the-transformer-architecture-with-insights-from-multi-particle-dynamical-systems-Paper
No ratings yet
NeurIPS-2021-redesigning-the-transformer-architecture-with-insights-from-multi-particle-dynamical-systems-Paper
14 pages
Wavelets Meet Large Language Models
No ratings yet
Wavelets Meet Large Language Models
16 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
MODULE 2 Deep Learning
No ratings yet
MODULE 2 Deep Learning
26 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
132618915-Neural-Network-Using-Matlab Sumathi and Sivanandam PDF
67% (3)
132618915-Neural-Network-Using-Matlab Sumathi and Sivanandam PDF
548 pages
Neural Network Using Matlab
63% (30)
Neural Network Using Matlab
548 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
27 pages
2003.09553v2
No ratings yet
2003.09553v2
20 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Papers Papers PDF
No ratings yet
Papers Papers PDF
48 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
From Everand
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
Manish Soni
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Zhu_Self-Promoted_Prototype_Refinement_for_Few-Shot_Class-Incremental_Learning_CVPR_2021_paper
No ratings yet
Zhu_Self-Promoted_Prototype_Refinement_for_Few-Shot_Class-Incremental_Learning_CVPR_2021_paper
10 pages
TEEN NeurIPS-2023-few-shot-class-incremental-learning-via-training-free-prototype-calibration-Paper-Conference
No ratings yet
TEEN NeurIPS-2023-few-shot-class-incremental-learning-via-training-free-prototype-calibration-Paper-Conference
17 pages
Li_Adaptive_Prototype_Learning_and_Allocation_for_Few-Shot_Segmentation_CVPR_2021_paper
No ratings yet
Li_Adaptive_Prototype_Learning_and_Allocation_for_Few-Shot_Segmentation_CVPR_2021_paper
10 pages
FSLL Few-shot Lifelong Learning
No ratings yet
FSLL Few-shot Lifelong Learning
9 pages
Introduction To Natural Language Processing: by Rohit Sharma
No ratings yet
Introduction To Natural Language Processing: by Rohit Sharma
8 pages
Class Note For Machine Learning at University
No ratings yet
Class Note For Machine Learning at University
58 pages
4433 PDF
No ratings yet
4433 PDF
1 page
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
Fake Audio Detection
100% (1)
Fake Audio Detection
14 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Lang Chain
No ratings yet
Lang Chain
2 pages
Unit Iii
No ratings yet
Unit Iii
29 pages
AI For EV
No ratings yet
AI For EV
22 pages
CH 3 NLP WORKSHEET
No ratings yet
CH 3 NLP WORKSHEET
2 pages
6. Deep Learning
No ratings yet
6. Deep Learning
79 pages
LG Soft India PVT LTD
No ratings yet
LG Soft India PVT LTD
24 pages
AI in Entertainment
No ratings yet
AI in Entertainment
10 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
Artificial Intelligence - Curriculum PDF
No ratings yet
Artificial Intelligence - Curriculum PDF
9 pages
Hcia Ai
100% (1)
Hcia Ai
49 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Worksheet Paper - Human Computer Interaction - Jul 2024
No ratings yet
Worksheet Paper - Human Computer Interaction - Jul 2024
12 pages
AI Session For Amity Institute of Information Technology Noida 2021-Public
No ratings yet
AI Session For Amity Institute of Information Technology Noida 2021-Public
85 pages
LLM Inference V - S Fine-Tuning
No ratings yet
LLM Inference V - S Fine-Tuning
3 pages
Research Statement
No ratings yet
Research Statement
2 pages
Product Feature Sentiment Analysis Based On GRU CAP - 2024 - Expert Systems Wit
No ratings yet
Product Feature Sentiment Analysis Based On GRU CAP - 2024 - Expert Systems Wit
17 pages
The Failure of The Perceptron To Successfully Simple Problem Such As XOR (Minsky and Papert)
No ratings yet
The Failure of The Perceptron To Successfully Simple Problem Such As XOR (Minsky and Papert)
13 pages
d2l en
No ratings yet
d2l en
1,157 pages
LLM - A Introduction To Generative AI
100% (1)
LLM - A Introduction To Generative AI
31 pages
09 - Machine Learning
No ratings yet
09 - Machine Learning
7 pages
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
No ratings yet
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
7 pages
Machine Learning 3rd Sem MCA 2022 QP
100% (1)
Machine Learning 3rd Sem MCA 2022 QP
2 pages

Slides Architecture Based Continual Learning

Uploaded by

Slides Architecture Based Continual Learning

Uploaded by

Architecture-Based Approaches in Continual Learning

Peking University, School of Mathematical Sciences

University of Bristol, School of Engineering Mathematics and Technology

Key assumptions when training and testing task 𝑡:

▶ Leverages the separability characteristic of the neural network architecture

Progressive Networks, 2016

Expert Gate, 2017

Parameter Allocation methods differ in ways:

APD (Additive Parameter Decomposition), 2020

▶ Apply different regularisation strategies to shared 𝜎 and task-specific 𝜏𝑡 , v𝑡

1. Shared parameters 𝜎 not deviate far from the previous

Network Capacity Problem

Please feel free to ask any questions.

My blog post provides detailed information about this:

You might also like