SlideShare a Scribd company logo
Presentation By Jimmy Liang
Deep Neural
Networks and
Tabular Data
Introduction
Deep Neural Networks (DNNs) has produced incredible results in the past few
years in the fields of computer vision, audio, video, as well as natural language
processing.
But, its usage with tabular data, which most business processes rely on, has failed
to meet the predictive capability as well as explainability of 'classical' machine
learning models such as gradient boosted trees. This presentation goes over the
current state of using DNNs with tabular data and future directions
02
Introduction
Contents
Challenges
Current State
Classification
Regression
Data Generation
Explainability
Future Directions
Contents
03
Challenges
Data quality
Spatial dependencies
Preprocessing
Model sensitivity
What are some issues with using DNNs with tabular data:
04
Data Quality
Spatial
Correlation
05
Unlike images, audio, and videos where
neighboring pixels and bits provide spatial
context, there are no such relationships
available in tabular data.
Research has hypothesized that even if there
are spatial correlations between variables in
tabular data, it is rather complex and irregular
and difficult to determine.
Real world tabular data often
contains missing values, outliers,
inconsistent, or erroneous data.
Tabular data are often high-
dimensional with relatively small
sample sizes.
Tabular data is often expensive to
obtain and hard to come by, and
the dataset are often class
imbalanced.
Preprocessing
Model
Sensitivity
06
Unlike classical machine learning models such
as decision trees, DNNs are very sensitive to
small changes in the input data.
Tabular data are often highly variable from one
sample to the next.
Tabular data is hard to preprocess. One of the
main challenges is how to convert categorical
features into numerical representations
without creating very sparse matrixes. Another
issue to watch out for is inadvertently encoding
an alignment or ordering based on the
numbering system used.


Some implementation attempts to resolve this
issue by encoding the categorical features in
an embedding space.
Current State
Data Transformation
Hybrid Models
Transformers
Regularization
Now that we discussed the challenges, what are some
approaches to overcome them?
We go over:
07
Methods
Data Transformation
Transforming the tabular data with
various techniques to overcome
issues with categorical variables.
08
Hybrid Models
Combine deep neural networks with
classical machine learning techniques
such as decision trees.
Transformers
Building on the success of
transformers in natural language
processing, using the attention
mechanism on tabular data
Regularization
Utilizing the theory that strong
regularization will help overcome the
model sensitivity due to the extreme
flexibility of deep learning models.
Data Transformation
09
Single Dimensional
Deterministic, can be used before
training. Can be as simple as ordinal
encoding, binary encoding, leave-
one-out encoding, hash-based
encoding.
Multi-dimensional
Using self or semi-supervised
techniques to encode the
categorical values into a dense
embedding space.
Hybrid Models
10
Fully Differentiable
Permits end to end optimization
using gradient descent. Highly
efficient on GPU.
Partly Differentiable
Combining non-differentiable
models such as decision trees with
deep neural networks. Utilizing
different models to handle
numerical and categorical features.
Transformers
11
Lots of
research
Lots of recent and active research in
this area. TabNet, TabTransformer,
ARM-net, SAINT, etc. Utilizes
multiple subnetworks and self-
attention to handle categorical
features and incorporate varying
techniques such as decision trees, k-
nearest-neighbor, and feature
crosses.
12
Regularization
Regularization
Learning Network
Applying a trainable
regularization coefficient to lower
the overall model sensitivity.
Regularization
Cocktail
Applying multiple regularization
techniques together. A paper in
2021 used 13 regularization
techniques together that
outperformed tree-based models.
Data Generation
Why
13
How Quality
14
Why
Tabular data is difficult and
expensive to come by.
Training data is usually
limited.
Data augmentation and
imputation (filling in
missing values)
Rebalancing imbalanced
classes
Ensure privacy


How Quality
Generative Adversarial
Networks (GANs)
MedGAN for domain specific
generations
Variational Autoencoders
Various VAEs, can
outperform GANs, but both
are considered state of the
art.


How to assess quality?
Typically using a proxy
classification task that is
trained using generated
tabular data.
The prediction is done using
real data to assess the
quality of the generated
data.
Another approach is using
statistical methods to
generate data based on
original data's distribution.
Explainability
The ability to understand what the prediction is based on is
hugely important in the real world.
04
Feature Highlighting
Constructing the model that are
explainable by design. In cases
where the model parameter is not
available, use surrogate models or a
benchmarking library.
13
Deep Learning with Tabular Data is an actively
researched topic. Tabular data is the most used type
of data in businesses and as such, can have the
potential to produce the most impact.
Some of the trends and future directions include ->
Future Directions
Trends
Data
Preprocessing
Architecture
Transformers have taken the lead,
offering multiple advantages such as
attention over both categorical and
numerical features.
Regularization
Data
Generation
11
It's been shown that combining
regularization techniques can help even
a vanilla feed forward network.
Generation task is difficult for tabular
data as the possible space is infinite.
More research needs to happen in
this area.
Explainability
Explainable AI is the foundation to
ensure equity. DNNs need to do more in
this area to match the classical
techniques such as decision trees.
Continue to transform into
homogeneous representations such
as an embedding
Thank You !
Ad

More Related Content

What's hot (20)

Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Ding Li
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
geetachauhan
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Dr. Stylianos Kampakis
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Tabnet presentation
Tabnet presentationTabnet presentation
Tabnet presentation
Sebastien Fischman
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
Christoph Körner
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
Kishor Datta Gupta
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
Roberto Pereira Silveira
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
Nguyen Giang
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Ding Li
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
geetachauhan
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
Christoph Körner
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
Sri Ambati
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
Nguyen Giang
 

Similar to Deep neural networks and tabular data (20)

[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
DataScienceConferenc1
 
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
pns00911
 
deeplearning
deeplearningdeeplearning
deeplearning
huda2018
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
Adam Gibson
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learning
Peter Wlodarczak
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
GeeksLab Odessa
 
Intro to Deep Learning
Intro to Deep LearningIntro to Deep Learning
Intro to Deep Learning
Kushal Arora
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Robotics: Current Topics
Robotics: Current TopicsRobotics: Current Topics
Robotics: Current Topics
Sabbir Ahmmed
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
Roelof Pieters
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
Grigory Sapunov
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Michael Batavia
 
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Lviv Startup Club
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Yann le cun
Yann le cunYann le cun
Yann le cun
Yandex
 
[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
[Cryptica 22] Deep Learning on Tabular Data, Predicting Profitability - Peiyu...
DataScienceConferenc1
 
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a ...
pns00911
 
deeplearning
deeplearningdeeplearning
deeplearning
huda2018
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
Adam Gibson
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learning
Peter Wlodarczak
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
AI&BigData Lab 2016. Артем Чернодуб: Обучение глубоких, очень глубоких и реку...
GeeksLab Odessa
 
Intro to Deep Learning
Intro to Deep LearningIntro to Deep Learning
Intro to Deep Learning
Kushal Arora
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Robotics: Current Topics
Robotics: Current TopicsRobotics: Current Topics
Robotics: Current Topics
Sabbir Ahmmed
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
Roelof Pieters
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Michael Batavia
 
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Lviv Startup Club
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Yann le cun
Yann le cunYann le cun
Yann le cun
Yandex
 
Ad

Recently uploaded (20)

RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
SOFTTECHHUB
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Building a research repository that works by Clare Cady
Building a research repository that works by Clare CadyBuilding a research repository that works by Clare Cady
Building a research repository that works by Clare Cady
UXPA Boston
 
RFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdfRFID in Supply chain management and logistics.pdf
RFID in Supply chain management and logistics.pdf
EnCStore Private Limited
 
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Building Connected Agents:  An Overview of Google's ADK and A2A ProtocolBuilding Connected Agents:  An Overview of Google's ADK and A2A Protocol
Building Connected Agents: An Overview of Google's ADK and A2A Protocol
Suresh Peiris
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Right to liberty and security of a person.pdf
Right to liberty and security of a person.pdfRight to liberty and security of a person.pdf
Right to liberty and security of a person.pdf
danielbraico197
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Understanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdfUnderstanding SEO in the Age of AI.pdf
Understanding SEO in the Age of AI.pdf
Fulcrum Concepts, LLC
 
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
Longitudinal Benchmark: A Real-World UX Case Study in Onboarding by Linda Bor...
UXPA Boston
 
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
Developing Product-Behavior Fit: UX Research in Product Development by Krysta...
UXPA Boston
 
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptxIn-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
In-App Guidance_ Save Enterprises Millions in Training & IT Costs.pptx
aptyai
 
Best 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat PlatformsBest 10 Free AI Character Chat Platforms
Best 10 Free AI Character Chat Platforms
Soulmaite
 
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
OpenAI Just Announced Codex: A cloud engineering agent that excels in handlin...
SOFTTECHHUB
 
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
Bridging AI and Human Expertise: Designing for Trust and Adoption in Expert S...
UXPA Boston
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UX for Data Engineers and Analysts-Designing User-Friendly Dashboards for Non...
UXPA Boston
 
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Middle East and Africa Cybersecurity Market Trends and Growth Analysis
Preeti Jha
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
Breaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP DevelopersBreaking it Down: Microservices Architecture for PHP Developers
Breaking it Down: Microservices Architecture for PHP Developers
pmeth1
 
SQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptxSQL Database Design For Developers at PhpTek 2025.pptx
SQL Database Design For Developers at PhpTek 2025.pptx
Scott Keck-Warren
 
Building a research repository that works by Clare Cady
Building a research repository that works by Clare CadyBuilding a research repository that works by Clare Cady
Building a research repository that works by Clare Cady
UXPA Boston
 
Ad

Deep neural networks and tabular data

  • 1. Presentation By Jimmy Liang Deep Neural Networks and Tabular Data
  • 2. Introduction Deep Neural Networks (DNNs) has produced incredible results in the past few years in the fields of computer vision, audio, video, as well as natural language processing. But, its usage with tabular data, which most business processes rely on, has failed to meet the predictive capability as well as explainability of 'classical' machine learning models such as gradient boosted trees. This presentation goes over the current state of using DNNs with tabular data and future directions 02
  • 4. Challenges Data quality Spatial dependencies Preprocessing Model sensitivity What are some issues with using DNNs with tabular data: 04
  • 5. Data Quality Spatial Correlation 05 Unlike images, audio, and videos where neighboring pixels and bits provide spatial context, there are no such relationships available in tabular data. Research has hypothesized that even if there are spatial correlations between variables in tabular data, it is rather complex and irregular and difficult to determine. Real world tabular data often contains missing values, outliers, inconsistent, or erroneous data. Tabular data are often high- dimensional with relatively small sample sizes. Tabular data is often expensive to obtain and hard to come by, and the dataset are often class imbalanced.
  • 6. Preprocessing Model Sensitivity 06 Unlike classical machine learning models such as decision trees, DNNs are very sensitive to small changes in the input data. Tabular data are often highly variable from one sample to the next. Tabular data is hard to preprocess. One of the main challenges is how to convert categorical features into numerical representations without creating very sparse matrixes. Another issue to watch out for is inadvertently encoding an alignment or ordering based on the numbering system used. Some implementation attempts to resolve this issue by encoding the categorical features in an embedding space.
  • 7. Current State Data Transformation Hybrid Models Transformers Regularization Now that we discussed the challenges, what are some approaches to overcome them? We go over: 07
  • 8. Methods Data Transformation Transforming the tabular data with various techniques to overcome issues with categorical variables. 08 Hybrid Models Combine deep neural networks with classical machine learning techniques such as decision trees. Transformers Building on the success of transformers in natural language processing, using the attention mechanism on tabular data Regularization Utilizing the theory that strong regularization will help overcome the model sensitivity due to the extreme flexibility of deep learning models.
  • 9. Data Transformation 09 Single Dimensional Deterministic, can be used before training. Can be as simple as ordinal encoding, binary encoding, leave- one-out encoding, hash-based encoding. Multi-dimensional Using self or semi-supervised techniques to encode the categorical values into a dense embedding space.
  • 10. Hybrid Models 10 Fully Differentiable Permits end to end optimization using gradient descent. Highly efficient on GPU. Partly Differentiable Combining non-differentiable models such as decision trees with deep neural networks. Utilizing different models to handle numerical and categorical features.
  • 11. Transformers 11 Lots of research Lots of recent and active research in this area. TabNet, TabTransformer, ARM-net, SAINT, etc. Utilizes multiple subnetworks and self- attention to handle categorical features and incorporate varying techniques such as decision trees, k- nearest-neighbor, and feature crosses.
  • 12. 12 Regularization Regularization Learning Network Applying a trainable regularization coefficient to lower the overall model sensitivity. Regularization Cocktail Applying multiple regularization techniques together. A paper in 2021 used 13 regularization techniques together that outperformed tree-based models.
  • 14. 14 Why Tabular data is difficult and expensive to come by. Training data is usually limited. Data augmentation and imputation (filling in missing values) Rebalancing imbalanced classes Ensure privacy How Quality Generative Adversarial Networks (GANs) MedGAN for domain specific generations Variational Autoencoders Various VAEs, can outperform GANs, but both are considered state of the art. How to assess quality? Typically using a proxy classification task that is trained using generated tabular data. The prediction is done using real data to assess the quality of the generated data. Another approach is using statistical methods to generate data based on original data's distribution.
  • 15. Explainability The ability to understand what the prediction is based on is hugely important in the real world. 04 Feature Highlighting Constructing the model that are explainable by design. In cases where the model parameter is not available, use surrogate models or a benchmarking library.
  • 16. 13 Deep Learning with Tabular Data is an actively researched topic. Tabular data is the most used type of data in businesses and as such, can have the potential to produce the most impact. Some of the trends and future directions include -> Future Directions
  • 17. Trends Data Preprocessing Architecture Transformers have taken the lead, offering multiple advantages such as attention over both categorical and numerical features. Regularization Data Generation 11 It's been shown that combining regularization techniques can help even a vanilla feed forward network. Generation task is difficult for tabular data as the possible space is infinite. More research needs to happen in this area. Explainability Explainable AI is the foundation to ensure equity. DNNs need to do more in this area to match the classical techniques such as decision trees. Continue to transform into homogeneous representations such as an embedding