SlideShare a Scribd company logo
Mini-Batch Consistent Slot Set
Encoder For Scalable Set
Encoding
Andreis Bruno1, Jeffrey Ryan Willette1, Juho Lee1,2, Sung Ju Hwang1,2
1KAIST, South Korea
2AITRICS, South Korea
1
Many problems in machine learning involve converting a set of arbitrary size to a
single vector or set of vectors, the set encoding/representation.
The Set Encoding Problem
Encoder Set Encoding
This places a few symmetrical (sometimes probabilistic) restrictions on the encoder.
2
Property 1 A function 𝒇: 𝟐𝑿 → 𝒀 acting on sets must be permutation invariant to the
order of objects in the set, i.e. for any permutation 𝝅:
𝒇 𝒙𝟏, … , 𝒙𝑴 = 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 .
Exchangeability A distribution for a set of random variables 𝑿 = 𝒙𝒊 𝒊&𝟏
𝑴
is
exchangeable if for an permutation 𝝅:
𝒑 𝑿 = 𝒑 𝝅 𝑿 .
Property 2 A function 𝒇: 𝑿𝑴 → 𝒀𝑴 acting on sets is a permutation equivariant
function if permutation of the input instances permutes the output labels, i.e. for any
permutation 𝝅:
𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 = [𝒇𝝅 𝟏 𝒙 , … , 𝒇𝝅 𝑴 𝒙 ]
Permutation Invariance & Equivariance
Bloem-Reddy, Benjamin, and Yee Whye Teh. "Probabilistic Symmetries and Invariant Neural Networks." J. Mach. Learn. Res. 21 (2020): 90-1.
3
Mini-Batch Consistent (MBC) Set Encoding
Given large sets, we want to be able to process the elements of the set in mini-
batches based on the available computational and memory resources.
Set encoders such as DeepSets and Set Transformers can be modified to do this but
not all can perform mini-batch encoding consistently. We formalize the
requirements for MBC set encoding below:
Property 5 𝐿𝑒𝑡 𝑿 ∈ 𝑹𝑴×𝒅 𝑏𝑒 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑒𝑑 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑿 = 𝑿𝟏 ∪ 𝑿𝟐 ∪ ⋯ ∪
𝑿𝒑 𝑎𝑛𝑑 𝒇: 𝑹𝑴𝒊×𝒅 → 𝑹𝒅"
𝑏𝑒 𝑎 𝑠𝑒𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝒇 𝑿 =
𝒁. 𝐺𝑖𝑣𝑒𝑛 𝑎𝑛 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝒈: 𝒁𝒋 ∈ 𝑹𝒅"
𝒋&𝟏
𝒑
→ 𝑹𝒅"
, 𝒈 𝑎𝑛𝑑 𝒇 𝑎𝑟𝑒 𝑀𝑖𝑛𝑖 −
𝐵𝑎𝑡𝑐ℎ 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝒈 𝒇 𝑿𝟏 , … , 𝒇 𝑿𝒑 = 𝒇 𝑿 .
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
4
Violation of MBC: Set Transformer
We train Set Transformer on an image reconstruction task. At test time, we increase
the number of pixels and encode them in a mini-batch fashion.
The performance of the model degrades in the mini-batch setting. Additionally, it is
not immediately clear how to aggregate the encodings of the mini-batches.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
5
MBC Set Encoding
Deep Sets can trivially satisfy MBC by removing the message-passing layers. Set
Transformer, which is attention based, violates MBC.
Our goal is to design an attention based set encoder, such as Set Transformer, that
satisfies MBC. We achieve this by using slots.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
6
Slot Set Encoder (SSE)
We realize an MBC set encoder, SSE, by computing attention over slots instead of
between the elements of the set. This makes SSE amenable to mini-batch processing.
The SSE in Algorithm 1 is functionally composable over partitions of the input X for a
given slot initialization and any partition of X.
𝑺 ∼ 𝑁 𝜇, 𝑑𝑖𝑎𝑔 𝜎 ∈ 𝑅!×#
𝒂𝒕𝒕𝒏𝒊,𝒋 ≔ 𝝈 𝑀',( 𝑤ℎ𝑒𝑟𝑒 𝑀 ≔
1
C
𝑑
𝑘 𝑿 ⋅ 𝑞 𝑺 ) ∈ 𝑅*×+
G
𝑺 ≔ 𝑾𝑻
⋅ 𝑣 𝑿 ∈ 𝑅!× -
.
𝑤ℎ𝑒𝑟𝑒 𝑾𝒊,𝒋 ≔
𝒂𝒕𝒕𝒏𝒊,𝒋
∑/01
!
𝒂𝒕𝒕𝒏𝒊,𝒍
𝒇 𝑿 = 𝒈 𝒇 𝑿𝟏 , 𝒇 𝑿𝟐 , … , 𝒇 𝑿𝒑
𝒈 ∈ {𝑚𝑒𝑎𝑛, 𝑠𝑢𝑚, 𝑚𝑎𝑥, 𝑚𝑖𝑛)
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
7
Slot Set Encoder (SSE)
SSE is permutation invariant with respect to partitions of the input set in permutation
equivariant with respect to the order of slots.
Proposition 3 𝐹𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑠 𝑆 ∈
𝑅X×W, 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓 𝑎𝑛𝑑 𝑔 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1 𝑎𝑟𝑒 𝑀𝐵𝐶 𝑓𝑜𝑟
𝒂𝒏𝒚 𝒑𝒂𝒓𝒕𝒊𝒕𝒊𝒐𝒏 𝒐𝒇 𝑿 𝑎𝑛𝑑 ℎ𝑒𝑛𝑐𝑒 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑀𝐵𝐶 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦.
Proposition 4 𝐿𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑆 ∈ 𝑅X×W 𝑏𝑒 𝑎𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛
𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. 𝐴𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦, 𝑙𝑒𝑡 𝑆𝑆𝐸 𝑋, 𝑆 𝑏𝑒 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1, 𝑎𝑛𝑑
𝜋Y ∈ 𝑅V×V 𝑎𝑛𝑑 𝜋Z ∈ 𝑅X×X 𝑏𝑒 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠. 𝑇ℎ𝑒𝑛
𝑺𝑺𝑬 𝝅𝐱 ⋅ 𝑿, 𝝅𝑺 ⋅ 𝑺 = 𝝅𝑺 ⋅ 𝑺𝑺𝑬(𝑿, 𝑺)
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
8
Hierarchical Slot Set Encoder
We can stack multiple Slot Set Encoders on top of each to obtain a hierarchy of slot
set encoders. This allows us to model higher-order interactions across slots.
𝑓 𝑋 = 𝑆𝑆𝐸 … 𝑆𝑆𝐸] 𝑆𝑆𝐸^ 𝑋
The resulting set encoding function 𝑓(𝑋) satisfies the MBC property as well as
Propositions 3 & 4.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
9
Approximate Mini-Batch Training of MBC Encoders
How can we train Slot Set Encoders in the large scale or streaming setting?
Both DeepSets and Set Transformers require gradients to be taken with respect to
the full set at train time.
In the Mini-Batch Consistent Setting, this is not feasible for large sets or when set
elements arrive in a stream.
We train MBC models on partitions of sets sampled at each iteration of the
optimization process and find that it works well empirically.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
10
Experiments: Point Cloud Classification (ModelNet40)
We first show that SSE is a valid set encoding function on the point cloud
classification task. Here, no mini-batch encoding is used.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
Encoder Set Encoding Classifier
11
Experiments: Image Reconstruction (CelebA)
We perform image reconstruction using Conditional Neural Processes. We replace
the aggregation function with DeepSets, Set Transformer or Slot Set Encoder. We test
this model in the mini-batch setting where data arrives in a stream.
Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615.
12

More Related Content

Similar to Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding (20)

Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Thien Q. Tran
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
Janani C
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo Doval
Plain Concepts
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Deep Coder - Experimental Research Presentation
Deep Coder - Experimental Research PresentationDeep Coder - Experimental Research Presentation
Deep Coder - Experimental Research Presentation
DUONG Dinh Cuong
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Seldon
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
DeepFak.pptx asdasdasdasdasdasdasdasdasd
DeepFak.pptx asdasdasdasdasdasdasdasdasdDeepFak.pptx asdasdasdasdasdasdasdasdasd
DeepFak.pptx asdasdasdasdasdasdasdasdasd
RahulRajendrakumar1
 
International Journal of Image Processing (IJIP) Volume (4) Issue (1)
International Journal of Image Processing (IJIP) Volume (4) Issue (1)International Journal of Image Processing (IJIP) Volume (4) Issue (1)
International Journal of Image Processing (IJIP) Volume (4) Issue (1)
CSCJournals
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
AI Frontiers
 
IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET Journal
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
zukun
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
Yan Xu
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
Kang-Ho Lee
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Seongwon Hwang
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
Lovelyn Rose
 
Intelligent Parallel Processing and Compound Image Compression
Intelligent Parallel Processing and Compound Image CompressionIntelligent Parallel Processing and Compound Image Compression
Intelligent Parallel Processing and Compound Image Compression
DR.P.S.JAGADEESH KUMAR
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Thien Q. Tran
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
Janani C
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo Doval
Plain Concepts
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Deep Coder - Experimental Research Presentation
Deep Coder - Experimental Research PresentationDeep Coder - Experimental Research Presentation
Deep Coder - Experimental Research Presentation
DUONG Dinh Cuong
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Seldon
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
DeepFak.pptx asdasdasdasdasdasdasdasdasd
DeepFak.pptx asdasdasdasdasdasdasdasdasdDeepFak.pptx asdasdasdasdasdasdasdasdasd
DeepFak.pptx asdasdasdasdasdasdasdasdasd
RahulRajendrakumar1
 
International Journal of Image Processing (IJIP) Volume (4) Issue (1)
International Journal of Image Processing (IJIP) Volume (4) Issue (1)International Journal of Image Processing (IJIP) Volume (4) Issue (1)
International Journal of Image Processing (IJIP) Volume (4) Issue (1)
CSCJournals
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
AI Frontiers
 
IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET Journal
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
zukun
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
Yan Xu
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
Kang-Ho Lee
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Seongwon Hwang
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
Lovelyn Rose
 
Intelligent Parallel Processing and Compound Image Compression
Intelligent Parallel Processing and Compound Image CompressionIntelligent Parallel Processing and Compound Image Compression
Intelligent Parallel Processing and Compound Image Compression
DR.P.S.JAGADEESH KUMAR
 

More from MLAI2 (20)

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement Learning
MLAI2
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
MLAI2
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient DistillationOnline Hyperparameter Meta-Learning with Hypergradient Distillation
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
MLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Skill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement LearningSkill-Based Meta-Reinforcement Learning
Skill-Based Meta-Reinforcement Learning
MLAI2
 
Edge Representation Learning with Hypergraphs
Edge Representation Learning with HypergraphsEdge Representation Learning with Hypergraphs
Edge Representation Learning with Hypergraphs
MLAI2
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Ad

Recently uploaded (20)

Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Ad

Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding

  • 1. Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding Andreis Bruno1, Jeffrey Ryan Willette1, Juho Lee1,2, Sung Ju Hwang1,2 1KAIST, South Korea 2AITRICS, South Korea 1
  • 2. Many problems in machine learning involve converting a set of arbitrary size to a single vector or set of vectors, the set encoding/representation. The Set Encoding Problem Encoder Set Encoding This places a few symmetrical (sometimes probabilistic) restrictions on the encoder. 2
  • 3. Property 1 A function 𝒇: 𝟐𝑿 → 𝒀 acting on sets must be permutation invariant to the order of objects in the set, i.e. for any permutation 𝝅: 𝒇 𝒙𝟏, … , 𝒙𝑴 = 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 . Exchangeability A distribution for a set of random variables 𝑿 = 𝒙𝒊 𝒊&𝟏 𝑴 is exchangeable if for an permutation 𝝅: 𝒑 𝑿 = 𝒑 𝝅 𝑿 . Property 2 A function 𝒇: 𝑿𝑴 → 𝒀𝑴 acting on sets is a permutation equivariant function if permutation of the input instances permutes the output labels, i.e. for any permutation 𝝅: 𝒇 𝒙𝝅 𝟏 , … , 𝒙𝝅 𝑴 = [𝒇𝝅 𝟏 𝒙 , … , 𝒇𝝅 𝑴 𝒙 ] Permutation Invariance & Equivariance Bloem-Reddy, Benjamin, and Yee Whye Teh. "Probabilistic Symmetries and Invariant Neural Networks." J. Mach. Learn. Res. 21 (2020): 90-1. 3
  • 4. Mini-Batch Consistent (MBC) Set Encoding Given large sets, we want to be able to process the elements of the set in mini- batches based on the available computational and memory resources. Set encoders such as DeepSets and Set Transformers can be modified to do this but not all can perform mini-batch encoding consistently. We formalize the requirements for MBC set encoding below: Property 5 𝐿𝑒𝑡 𝑿 ∈ 𝑹𝑴×𝒅 𝑏𝑒 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑒𝑑 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑿 = 𝑿𝟏 ∪ 𝑿𝟐 ∪ ⋯ ∪ 𝑿𝒑 𝑎𝑛𝑑 𝒇: 𝑹𝑴𝒊×𝒅 → 𝑹𝒅" 𝑏𝑒 𝑎 𝑠𝑒𝑡 𝑒𝑛𝑐𝑜𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝒇 𝑿 = 𝒁. 𝐺𝑖𝑣𝑒𝑛 𝑎𝑛 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝒈: 𝒁𝒋 ∈ 𝑹𝒅" 𝒋&𝟏 𝒑 → 𝑹𝒅" , 𝒈 𝑎𝑛𝑑 𝒇 𝑎𝑟𝑒 𝑀𝑖𝑛𝑖 − 𝐵𝑎𝑡𝑐ℎ 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝒈 𝒇 𝑿𝟏 , … , 𝒇 𝑿𝒑 = 𝒇 𝑿 . Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 4
  • 5. Violation of MBC: Set Transformer We train Set Transformer on an image reconstruction task. At test time, we increase the number of pixels and encode them in a mini-batch fashion. The performance of the model degrades in the mini-batch setting. Additionally, it is not immediately clear how to aggregate the encodings of the mini-batches. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 5
  • 6. MBC Set Encoding Deep Sets can trivially satisfy MBC by removing the message-passing layers. Set Transformer, which is attention based, violates MBC. Our goal is to design an attention based set encoder, such as Set Transformer, that satisfies MBC. We achieve this by using slots. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 6
  • 7. Slot Set Encoder (SSE) We realize an MBC set encoder, SSE, by computing attention over slots instead of between the elements of the set. This makes SSE amenable to mini-batch processing. The SSE in Algorithm 1 is functionally composable over partitions of the input X for a given slot initialization and any partition of X. 𝑺 ∼ 𝑁 𝜇, 𝑑𝑖𝑎𝑔 𝜎 ∈ 𝑅!×# 𝒂𝒕𝒕𝒏𝒊,𝒋 ≔ 𝝈 𝑀',( 𝑤ℎ𝑒𝑟𝑒 𝑀 ≔ 1 C 𝑑 𝑘 𝑿 ⋅ 𝑞 𝑺 ) ∈ 𝑅*×+ G 𝑺 ≔ 𝑾𝑻 ⋅ 𝑣 𝑿 ∈ 𝑅!× - . 𝑤ℎ𝑒𝑟𝑒 𝑾𝒊,𝒋 ≔ 𝒂𝒕𝒕𝒏𝒊,𝒋 ∑/01 ! 𝒂𝒕𝒕𝒏𝒊,𝒍 𝒇 𝑿 = 𝒈 𝒇 𝑿𝟏 , 𝒇 𝑿𝟐 , … , 𝒇 𝑿𝒑 𝒈 ∈ {𝑚𝑒𝑎𝑛, 𝑠𝑢𝑚, 𝑚𝑎𝑥, 𝑚𝑖𝑛) Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 7
  • 8. Slot Set Encoder (SSE) SSE is permutation invariant with respect to partitions of the input set in permutation equivariant with respect to the order of slots. Proposition 3 𝐹𝑜𝑟 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛𝑠 𝑆 ∈ 𝑅X×W, 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓 𝑎𝑛𝑑 𝑔 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1 𝑎𝑟𝑒 𝑀𝐵𝐶 𝑓𝑜𝑟 𝒂𝒏𝒚 𝒑𝒂𝒓𝒕𝒊𝒕𝒊𝒐𝒏 𝒐𝒇 𝑿 𝑎𝑛𝑑 ℎ𝑒𝑛𝑐𝑒 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑀𝐵𝐶 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦. Proposition 4 𝐿𝑒𝑡 𝑋 ∈ 𝑅V×W 𝑎𝑛𝑑 𝑆 ∈ 𝑅X×W 𝑏𝑒 𝑎𝑛 𝑖𝑛𝑝𝑢𝑡 𝑠𝑒𝑡 𝑎𝑛𝑑 𝑠𝑙𝑜𝑡 𝑖𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. 𝐴𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑙𝑦, 𝑙𝑒𝑡 𝑆𝑆𝐸 𝑋, 𝑆 𝑏𝑒 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 1, 𝑎𝑛𝑑 𝜋Y ∈ 𝑅V×V 𝑎𝑛𝑑 𝜋Z ∈ 𝑅X×X 𝑏𝑒 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠. 𝑇ℎ𝑒𝑛 𝑺𝑺𝑬 𝝅𝐱 ⋅ 𝑿, 𝝅𝑺 ⋅ 𝑺 = 𝝅𝑺 ⋅ 𝑺𝑺𝑬(𝑿, 𝑺) Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 8
  • 9. Hierarchical Slot Set Encoder We can stack multiple Slot Set Encoders on top of each to obtain a hierarchy of slot set encoders. This allows us to model higher-order interactions across slots. 𝑓 𝑋 = 𝑆𝑆𝐸 … 𝑆𝑆𝐸] 𝑆𝑆𝐸^ 𝑋 The resulting set encoding function 𝑓(𝑋) satisfies the MBC property as well as Propositions 3 & 4. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 9
  • 10. Approximate Mini-Batch Training of MBC Encoders How can we train Slot Set Encoders in the large scale or streaming setting? Both DeepSets and Set Transformers require gradients to be taken with respect to the full set at train time. In the Mini-Batch Consistent Setting, this is not feasible for large sets or when set elements arrive in a stream. We train MBC models on partitions of sets sampled at each iteration of the optimization process and find that it works well empirically. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 10
  • 11. Experiments: Point Cloud Classification (ModelNet40) We first show that SSE is a valid set encoding function on the point cloud classification task. Here, no mini-batch encoding is used. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. Encoder Set Encoding Classifier 11
  • 12. Experiments: Image Reconstruction (CelebA) We perform image reconstruction using Conditional Neural Processes. We replace the aggregation function with DeepSets, Set Transformer or Slot Set Encoder. We test this model in the mini-batch setting where data arrives in a stream. Andreis, B., Willette, J., Lee, J., & Hwang, S. J. (2021). Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding. arXiv preprint arXiv:2103.01615. 12