Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
This talk by Emily Denton from New York University on "Unsupervised Learning of Disentangled Representations from Video" was presented at the Learning Image Representations event on 30th August at Twitter as part of the Creative AI meetup.
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
The paper proposes Factor-VAE, which aims to learn disentangled representations in an unsupervised manner. Factor-VAE enhances disentanglement over the β-VAE by encouraging the latent distribution to be factorial (independent across dimensions) using a total correlation penalty. This penalty is optimized using a discriminator network. Experiments on various datasets show that Factor-VAE achieves better disentanglement than β-VAE, as measured by a proposed disentanglement metric, while maintaining good reconstruction quality. Latent traversals qualitatively demonstrate disentangled factors of variation.
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
The document provides an introduction to variational autoencoders (VAE). It discusses how VAEs can be used to learn the underlying distribution of data by introducing a latent variable z that follows a prior distribution like a standard normal. The document outlines two approaches - explicitly modeling the data distribution p(x), or using the latent variable z. It suggests using z and assuming the conditional distribution p(x|z) is a Gaussian with mean determined by a neural network gθ(z). The goal is to maximize the likelihood of the dataset by optimizing the evidence lower bound objective.
This document summarizes key concepts in diffusion models and their applications in generative AI systems. It discusses early diffusion models from Sohl-Dickstein and later improvements from DDPM. It also covers recent large diffusion models like GLIDE and DALL-E 2 that can generate images from text prompts. The document provides technical details on diffusion processes, loss functions, and model architectures.
Generative Adversarial Networks (GANs) are a type of deep learning model used for unsupervised machine learning tasks like image generation. GANs work by having two neural networks, a generator and discriminator, compete against each other. The generator creates synthetic images and the discriminator tries to distinguish real images from fake ones. This allows the generator to improve over time at creating more realistic images that can fool the discriminator. The document discusses the intuition behind GANs, provides a PyTorch implementation example, and describes variants like DCGAN, LSGAN, and semi-supervised GANs.
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
The document proposes modifications to self-attention in Transformers to improve faithful signal propagation without shortcuts like skip connections or layer normalization. Specifically, it introduces a normalization-free network that uses dynamic isometry to ensure unitary transformations, a ReZero technique to implement skip connections without adding shortcuts, and modifications to attention and normalization techniques to address issues like rank collapse in Transformers. The methods are evaluated on tasks like CIFAR-10 classification and language modeling, demonstrating improved performance over standard Transformer architectures.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
This document discusses techniques for training deep variational autoencoders and probabilistic ladder networks. It proposes three advances: 1) Using an inference model similar to ladder networks with multiple stochastic layers, 2) Adding a warm-up period to keep units active early in training, and 3) Using batch normalization. These advances allow training models with up to five stochastic layers and achieve state-of-the-art log-likelihood results on benchmark datasets. The document explains variational autoencoders, probabilistic ladder networks, and how the proposed techniques parameterize the generative and inference models.
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
The document discusses a new approach to unsupervised deep learning using concepts from nonequilibrium thermodynamics. Specifically, it proposes destroying structure in data through an iterative forward diffusion process, then learning the reverse diffusion process to restore structure and act as a generative model. This approach is shown to outperform other generative models on image datasets like CIFAR-10 and is able to perform tasks like inpainting. The diffusion process is modeled using Gaussian distributions and the reverse process is learned using a deep network as an approximator.
This document summarizes a paper on Style GAN, which proposes a style-based GAN that can control image generation at multiple levels of style. It introduces new evaluation methods and collects a larger, more varied dataset (FFHQ). The paper aims to disentangle style embeddings to allow unsupervised separation of high-level attributes and introduce stochastic variation in generated images through control of the network architecture.
Introduction to Generative Adversarial Networks (GANs) by Michał Maj
Full story: https://appsilon.com/satellite-imagery-generation-with-gans/
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
My slides that discuss different deep generative models, mainly normalizing flows for density estimation at a deep learning seminar at Aalto University fall 2019.
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
Starting from the very basic of what a GAN is, passing trough Tensorflow implementation, using the most cutting-edge APIs available in the framework, and finally, production-ready serving at scale using Google Cloud ML Engine.
Slides for the talk: https://www.pycon.it/conference/talks/deep-diving-into-gans-form-theory-to-production
Github repo: https://github.com/zurutech/gans-from-theory-to-production
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
The document presents SimCLR, a framework for contrastive learning of visual representations using simple data augmentation. Key aspects of SimCLR include using random cropping and color distortions to generate positive sample pairs for the contrastive loss, a nonlinear projection head to learn representations, and large batch sizes. Evaluation shows SimCLR learns representations that outperform supervised pretraining on downstream tasks and achieves state-of-the-art results with only view augmentation and contrastive loss.
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
The document proposes modifications to self-attention in Transformers to improve faithful signal propagation without shortcuts like skip connections or layer normalization. Specifically, it introduces a normalization-free network that uses dynamic isometry to ensure unitary transformations, a ReZero technique to implement skip connections without adding shortcuts, and modifications to attention and normalization techniques to address issues like rank collapse in Transformers. The methods are evaluated on tasks like CIFAR-10 classification and language modeling, demonstrating improved performance over standard Transformer architectures.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
This document discusses techniques for training deep variational autoencoders and probabilistic ladder networks. It proposes three advances: 1) Using an inference model similar to ladder networks with multiple stochastic layers, 2) Adding a warm-up period to keep units active early in training, and 3) Using batch normalization. These advances allow training models with up to five stochastic layers and achieve state-of-the-art log-likelihood results on benchmark datasets. The document explains variational autoencoders, probabilistic ladder networks, and how the proposed techniques parameterize the generative and inference models.
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
The document discusses a new approach to unsupervised deep learning using concepts from nonequilibrium thermodynamics. Specifically, it proposes destroying structure in data through an iterative forward diffusion process, then learning the reverse diffusion process to restore structure and act as a generative model. This approach is shown to outperform other generative models on image datasets like CIFAR-10 and is able to perform tasks like inpainting. The diffusion process is modeled using Gaussian distributions and the reverse process is learned using a deep network as an approximator.
This document summarizes a paper on Style GAN, which proposes a style-based GAN that can control image generation at multiple levels of style. It introduces new evaluation methods and collects a larger, more varied dataset (FFHQ). The paper aims to disentangle style embeddings to allow unsupervised separation of high-level attributes and introduce stochastic variation in generated images through control of the network architecture.
Introduction to Generative Adversarial Networks (GANs) by Michał Maj
Full story: https://appsilon.com/satellite-imagery-generation-with-gans/
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
My slides that discuss different deep generative models, mainly normalizing flows for density estimation at a deep learning seminar at Aalto University fall 2019.
Machine learning workshop, session 3.
- Data sets
- Machine Learning Algorithms
- Algorithms by Learning Style
- Algorithms by Similarity
- People to follow
This document summarizes the NGBoost method for probabilistic regression. NGBoost uses gradient boosting to fit the parameters of an assumed probabilistic distribution for the target variable. It improves on existing probabilistic regression methods by using the natural gradient, which performs gradient descent in the space of distributions rather than the parameter space. This addresses issues with prior approaches and allows NGBoost to achieve state-of-the-art performance while remaining fast, flexible, and scalable. Future work may apply NGBoost to other problems like survival analysis or joint outcome regression.
Joint contrastive learning with infinite possibilitiestaeseon ryu
Contrastive Learning은 두 이미지가 유사한지 유사하지 않은 지에 대해서 어떤 label이 없이 피쳐들을 배우게 하는 머신 learning 테크닉 중에 하나입니다 우리는 기존에 있는 Supervised learning과 조금 차이가 있는데 Supervised learning은 label cost가 들고
그다음에 Task specific 하기 때문에 generalizability가 조금 떨어질 수 있습니다 하지만 Contrastive Learning은 label이 없이 진행하기때문에 label cost가 없고 generalizability가 조금 더 좋을수 있습니다. 해당 논문은 보다 유용한 Contrastive Learning을 위한 Joint Contrastive Learning에 대해 제안을 하는대요 https://youtu.be/0NLq-ikBP1I
The document presents an unsupervised and self-supervised method for sentence summarization called BottleSum that is based on the Information Bottleneck principle. BottleSum includes an extractive model called BottleSum Ex and an abstractive model called BottleSum Self. BottleSum Ex achieves state-of-the-art results on automatic metrics for unsupervised models and BottleSum Self performs better than BottleSum Ex in human evaluations, demonstrating the effectiveness of the Information Bottleneck approach for summarization.
There are a few potential issues with modeling the data this way:
1. Students are nested within classrooms. A student's outcomes may be more similar to others in their classroom compared to students in other classrooms, due to shared classroom factors. This violates the independence assumption of ordinary least squares regression.
2. Classroom-level factors like teacher quality are not included in the model but likely influence student outcomes. Failing to account for these could lead to omitted variable bias.
3. The error terms for students within the same classroom may not be independent as assumed, since classroom factors induce correlation.
To properly account for the nested data structure, we need to model the classroom as a second level in a multilevel
Top 50+ Data Science Interview Questions and Answers for 2025 (1).pdfkhushnuma khan
Preparing for a Data Science interview requires a solid grasp of fundamental concepts, algorithms, and techniques. The questions and answers outlined here cover a broad spectrum of topics, from machine learning algorithms to statistical methods, model evaluation, and real-world applications like recommendation systems and time series analysis.
1) Machine learning is a field of artificial intelligence that allows computers to learn without being explicitly programmed by finding patterns in data.
2) There are three main types of machine learning problems: supervised learning which uses labeled training data, unsupervised learning which finds hidden patterns in unlabeled data, and reinforcement learning where a system learns from feedback of rewards and punishments.
3) Key machine learning concepts include linear regression, which finds a linear relationship between variables, and gradient descent, an algorithm for minimizing cost functions to optimize model parameters like slope and intercept of a linear regression line.
Building useful models for imbalanced datasets (without resampling)Greg Landrum
1) Building machine learning models on imbalanced datasets, where there are many more inactive compounds than active ones, can lead to models with high accuracy but low ability to predict actives.
2) Shifting the decision threshold from 0.5 to a lower value, such as 0.2, for classifiers like random forests can significantly improve the models' ability to predict actives, as measured by Cohen's kappa, without retraining the models.
3) Across a variety of bioactivity prediction datasets, this threshold-shifting approach generally performed better than alternative methods like balanced random forests at improving predictions of active compounds.
The document discusses test equating, which is the process of establishing comparable scores on different forms of a test. It covers topics such as why scaled scores are reported instead of raw scores, considerations in choosing a score scale, limitations of equating, different equating methods like linear and equipercentile equating, and different equating designs like single-group and anchor designs. It provides explanations of key concepts in test equating and guidelines for effective equating.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
This document provides a summary of a 4-part training program on using PASW Statistics 17 (SPSS 17) software to perform descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. The agenda covers topics like frequency analysis, correlations, t-tests, ANOVA, importing/exporting data, and more. The goal is to help users answer research questions and test hypotheses using techniques in PASW Statistics.
This document provides information about a business course, including that there will be class on Labor Day and lab materials are available on Canvas. It discusses fixed effects in models and how to include them in R scripts. Predicted values from models are covered, along with using residuals to detect outliers and interpreting interactions between variables in models. The document provides examples of adding interaction terms to model formulas and interpreting the results.
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
Brief History of Visual Representation LearningSangwoo Mo
The document summarizes the history of visual representation learning in 3 eras: (1) 2012-2015 saw the evolution of deep learning architectures like AlexNet and ResNet; (2) 2016-2019 brought diverse learning paradigms for tasks like few-shot learning and self-supervised learning; (3) 2020-present focuses on scaling laws and foundation models through larger models, data and compute as well as self-supervised methods like MAE and multimodal models like CLIP. The field is now exploring how to scale up vision transformers to match natural language models and better combine self-supervision and generative models.
Learning Visual Representations from Uncurated DataSangwoo Mo
Slide about the defense of my Ph.D. dissertation: "Learning Visual Representations from Uncurated Data"
It includes four papers about
- Learning from multi-object images for contrastive learning [1] and Vision Transformer (ViT) [2]
- Learning with limited labels (semi-sup) for image classification [3] and vision-language [4] models
[1] Mo*, Kang* et al. Object-aware Contrastive Learning for Debiased Scene Representation. NeurIPS’21.
[2] Kang*, Mo* et al. OAMixer: Object-aware Mixing Layer for Vision Transformers. CVPRW’22.
[3] Mo et al. RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data. ICLR’23.
[4] Mo et al. S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions. Under Review.
This document proposes using hyperbolic space to embed hierarchical tree structures, like those that can represent sequences of events in reinforcement learning problems. Specifically, it suggests a method called S-RYM that applies spectral normalization to regularize gradients when training deep reinforcement learning agents with hyperbolic embeddings. This stabilization technique allows naive hyperbolic embeddings to outperform standard Euclidean embeddings. It works by reducing gradient norm explosions during training, allowing the entropy loss to converge properly. The document provides technical details on spectral normalization, hyperbolic space representations, and how S-RYM trains deep reinforcement learning agents with stabilized hyperbolic embeddings.
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
Lab seminar introduces Ting Chen's recent 3 works:
- Pix2seq: A Language Modeling Framework for Object Detection (ICLR’22)
- A Unified Sequence Interface for Vision Tasks (NeurIPS’22)
- A Generalist Framework for Panoptic Segmentation of Images and Videos (submitted to ICLR’23)
This document is a slide presentation on recent advances in deep learning. It discusses self-supervised learning, which involves using unlabeled data to learn representations by predicting structural information within the data. The presentation covers pretext tasks, invariance-based approaches, and generation-based approaches for self-supervised learning in computer vision and natural language processing. It provides examples of specific self-supervised methods like predicting image rotations, clustering representations to generate pseudo-labels, and masked language modeling.
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
This document summarizes key points from a lecture on deep learning theory:
1) It discusses the Maurey sampling technique, which shows that a finite sample approximation X^ of a random variable X converges to X as the number of samples k goes to infinity.
2) It proposes extending this technique to sample finite-width neural networks by converting the weight distribution of an infinite network to a probability measure through normalization.
3) The approximation error between outputs of the infinite and finite networks is bounded using Maurey sampling, with the bound converging to zero as the number of samples increases.
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
1. The document discusses the approximation capabilities of deep neural networks. It outlines topics that will be covered, including approximation, optimization, and generalization.
2. For approximation, it shows that a neural network can approximate any smooth function over a compact domain to any desired accuracy by bounding the function norm. Specifically, it presents constructive proofs that a univariate function can be approximated by a 2-layer network and a multivariate function by a 3-layer network.
3. The chapter will prove approximation capabilities of finite-width neural networks, including constructive proofs for specific activations and universal approximation for general activations. It will discuss approximating indicators with ReLU activations.
The document provides an introduction to diffusion models. It discusses that diffusion models have achieved state-of-the-art performance in image generation, density estimation, and image editing. Specifically, it covers the Denoising Diffusion Probabilistic Model (DDPM) which reparametrizes the reverse distributions of diffusion models to be more efficient. It also discusses the Denoising Diffusion Implicit Model (DDIM) which generates rough sketches of images and then refines them, significantly reducing the number of sampling steps needed compared to DDPM. In summary, diffusion models have emerged as a highly effective approach for generative modeling tasks.
1) The document discusses object-region video transformers (ORViT) for video recognition. ORViT applies attention at both the patch and object levels.
2) ORViT considers three aspects of objects: the objects themselves, interactions between objects, and object dynamics over time.
3) Experimental results show ORViT outperforms baseline models on action recognition, compositional action recognition, and spatio-temporal action detection tasks. ORViT better captures object-level information and dynamics compared to patch-level attention alone.
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
Deep implicit layers allow neural networks to solve structured problems by following algorithmic rules. They include layers for convex optimization, discrete optimization, differential equations, and more. The forward pass runs an algorithm, while the backward pass computes gradients using algorithmic properties like KKT conditions. This enables problems like structured prediction, meta-learning, and time series modeling to be solved reliably with neural networks by respecting their underlying structure.
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
The document discusses recent theories on why deep neural networks generalize well despite being highly overparameterized. Classic learning theory, which assumes restricting the hypothesis space is necessary for generalization, fails to explain modern neural networks. Recent studies suggest neural networks generalize because 1) their complexity is underestimated and 2) SGD regularization finds flat minima. Sharpness-aware minimization (SAM) directly optimizes for flat minima and consistently improves generalization, especially for vision transformers which have sharper loss landscapes than ResNets. SAM produces more interpretable attention maps and significantly boosts performance of vision transformers and MLP-Mixers on in-domain and out-of-domain tasks.
Lab seminar on
- Sharpness-Aware Minimization for Efficiently Improving Generalization (ICLR 2021)
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (under review)
This document summarizes recent advances in deep generative models with explicit density estimation. It discusses variational autoencoders (VAEs), including techniques to improve VAEs such as importance weighting, semi-amortized inference, and mitigating posterior collapse. It also covers energy-based models, autoregressive models, flow-based models, vector-quantized VAEs, hierarchical VAEs, and diffusion probabilistic models. The document provides an overview of these generative models with a focus on density estimation and generation quality.
This document summarizes research on reducing the computational complexity of self-attention in Transformer models from O(L2) to O(L log L) or O(L). It describes the Reformer model which uses locality-sensitive hashing to achieve O(L log L) complexity, the Linformer model which uses low-rank approximations and random projections to achieve O(L) complexity, and the Synthesizer model which replaces self-attention with dense or random attention. It also briefly discusses the expressive power of sparse Transformer models.
This document summarizes two meta-learning papers:
1) "Meta-Learning with Implicit Gradients" which introduces Implicit Model-Agnostic Meta-Learning (iMAML), an efficient alternative to MAML that computes meta-gradients without differentiating through the inner loop.
2) "Modular Meta-Learning with Shrinkage" which proposes learning a separate set of parameters for each module with different levels of shrinkage, optimized in an alternating manner to avoid collapse.
Introduction (application) of generative models for general audiences. Many figures are borrowed from https://lilianweng.github.io.
Deep Learning for Natural Language ProcessingSangwoo Mo
This document summarizes a lecture on recent advances in deep learning for natural language processing. It discusses improvements to network architectures like attention mechanisms and self-attention, which help models learn long-term dependencies and attend to relevant parts of the input. It also discusses improved training methods to reduce exposure bias and the loss-evaluation mismatch. Newer models presented include the Transformer, which uses only self-attention, and BERT, which introduces a pretrained bidirectional transformer encoder that achieves state-of-the-art results on many NLP tasks.
This document discusses domain transfer and domain adaptation in deep learning. It begins with introductions to domain transfer, which learns a mapping between domains, and domain adaptation, which learns a mapping between domains with labels. It then covers several approaches for domain transfer, including neural style transfer, instance normalization, and GAN-based methods. It also discusses general approaches for domain adaptation such as source/target feature matching and target data augmentation.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
1. Challenging Common Assumptions
in the Unsupervised Learning of
Disentangled Representations
(ICML 2019 Best Paper)
2019.07.17.
Sangwoo Mo
1
2. Outline
• Quick Review
• What is disentangled representation (DR)?
• Prior work on the unsupervised learning of DR
• Theoretical Results
• Unsupervised learning of DR is impossible without inductive biases
• Empirical Results
• Q1. Which method should be used?
• Q2. How to choose the hyperparameters?
• Q3. How to select the best model from a set of trained models?
2
3. Quick Review
• Disentangled representation: Learn a representation 𝑧 from the data 𝑥 s.t.
• Contain all the information of 𝑥 in a compact and interpretable structure
• Currently no single formal definition L (many definitions for the factor of variation)
3* Image from BetaVAE (ICLR 2017)
4. Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
4
5. Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
51. It requires the aggregated posterior 𝑞(𝒛)
6. Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
• DIP-VAE (ICLR 2018)
• Match 𝑞(𝒛) to the disentangled prior 𝑝(𝒛), where 𝐷 is a (tractable) moment matching
61. It requires the aggregated posterior 𝑞(𝒛)
7. Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
7
8. Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
8
9. Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
• Mutual Information Gap (NeurIPS 2018)
• Compute the mutual information between each factor 𝑐. and each dimension of 𝑧5
• For the highest and second highest dimensions 𝑖7 and 𝑖8 of the mutual information,
measure the difference between them: 𝐼 𝑐., 𝑧5:
− 𝐼(𝑐., 𝑧5;
)
9
10. Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
10
11. Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
11
12. Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
• Proof sketch. By construction.
• Let 𝑔: supp 𝒛 → 0,1 ?
s.t. 𝑔5 𝒗 = 𝑃(𝑧5 ≤ 𝑣5)
• Let ℎ: 0,1 ? → ℝ? s.t. ℎ5 𝒗 = 𝜓17(𝑣5) where 𝜓 is a c.d.f. of a normal distribution
• Then for any orthogonal matrix 𝑨, the following 𝑓 satisfies the condition:
𝑓 𝒖 = ℎ ∘ 𝑔 17(𝑨 ℎ ∘ 𝑔 𝒖 )
12
13. Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
• Corollary. One cannot find the disentangled representation 𝑟(𝒙) (w.r.t. to the generative
model 𝐺(𝒙|𝒛)) as there are two equivalent generative models 𝐺 and 𝐺′ which has same
marginal distribution 𝑝(𝒙) but 𝒛4 = 𝑓(𝒛) is completely entangled w.r.t. 𝒛 (so as 𝑟(𝒙))
• Namely, inferring representation 𝒛 from observation 𝒙 is not a well-defined problem
13
14. Theoretical Results
• 𝛽-VAE learns some decorrelated features, but they are not semantically decomposed
• E.g., the width is entangled with the leg style in 𝛽-VAE
14* Image from BetaVAE (ICLR 2017)
15. Empirical Results
• Q1. Which method should be used?
• A. Hyperparameters and random seeds matter more than the choice of the model
15
16. Empirical Results
• Q2. How to choose the hyperparameters?
• A. Selecting the best hyperparameter is extremely hard due to the randomness
16
17. Empirical Results
• Q2. How to choose the hyperparameters?
• A. Also, there is no obvious trend over the variation of hyperparameters
17
18. Empirical Results
• Q2. How to choose the hyperparameters?
• A. Good hyperparameters often can be transferred (e.g., dSprites → color-dSprites)
18
Rank correlation matrix
19. Empirical Results
• Q3. How to select the best model from a set of trained models?
• A. Unsupervised (training) scores do not correlated to the disentanglement metrics
19
Unsupervised scores vs disentanglement metrics
20. Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
20
21. Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
• Q2. How to choose the hyperparameters?
• A. No rule of thumb, but transfer across datasets seem to help!
21
22. Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• Q1. Which method should be used?
• A. Current methods should be rigorously validated (no significant difference)
• Q2. How to choose the hyperparameters?
• A. No rule of thumb, but transfer across datasets seem to help!
• Q3. How to select the best model from a set of trained models?
• A. (Unsupervised) model selection remains a key challenge!
22
23. Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
23
24. Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
24
25. Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
25
26. Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
• Take-home message: Future research should be on “how to utilize inductive bias better”
using a few labels, rather than the previous total correlation-like approaches
26