SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Layla Yang, Databricks
Unified approach to interpret
machine learning model: SHAP
+ LIME
#UnifiedDataAnalytics #SparkAISummit
Overview
• What is Machine Learning Interpreter?
• Why is it important?
• Why is interpreting ML model difficult?
• Different methodologies of model explainer
• SHAP: SHapley Additive exPlanations
• LIME: Local Interpretable Model-Agnostic Explanations
• Application: real world use case
3#UnifiedDataAnalytics #SparkAISummit
ML interpreter: What is it?
Black box
4#UnifiedDataAnalytics #SparkAISummit
Accuracy and Interpretability
Trade-off
Input Data Output
ML interpreter: What is it?
“ Interpretability is the degree to which a human can
understand the cause of a decision.”
-- Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.”
5#UnifiedDataAnalytics #SparkAISummit
Why is it important?
Commercial Drive:
● Many enterprises rely on machine learning models to make important decision:
○ Loan, credit card application
○ To buy / sell commodities, stocks or advertisements
○ Diagnose cancer or benign cells
6#UnifiedDataAnalytics #SparkAISummit
● Trust
○ It is easier for humans to trust a business that explains its decisions
● Legal Regulation
○ GDPR: the customer has right to obtain explanation
Why is it important?
Technical Drive: Can you trust your model based on accuracy?
7#UnifiedDataAnalytics #SparkAISummit
● Understand edge cases and
circumstances of failure
● Knowing the ‘why’ can help you learn
more about the problem and the data
● Improve the model
Why is it important?
Social and ethical aspect:
● Interpretability - a useful debugging tool for Detecting Bias
○ Sean Owen: What do Developer Salaries Tell us about the Gender Pay Gap?
○ Minority, Gender
8#UnifiedDataAnalytics #SparkAISummit
● Machine learning models pick up biases and may really harm vulnerable group of people
○ Be cautious about marketing strategy
○ Fairness: e.g. automatic approval or rejection of loan applications
Why is it difficult?
ML creates functions to
explicitly or implicitly combine
variables (features) in
sophisticated way
9#UnifiedDataAnalytics #SparkAISummit
Disaggregate the final
prediction to single feature
contribution and untangle
interaction between features
are very difficult!
Why it’s difficult?
Explain one model out of many good
models: if they give different pictures of
nature’s mechanism and lead to different
conclusions - how to justify one against
another and recommend to the business?
10#UnifiedDataAnalytics #SparkAISummit
“ Data will often point with almost equal emphasis on several possible models. The
question of which one most accurately reflects the data is difficult to resolve.”
-- Leo Breiman, “Statistical Modeling: The Two Cultures”
Different methodologies
● Surrogate model
○ simple model out of fancy model (to establish relationship between input and prediction)
11#UnifiedDataAnalytics #SparkAISummit
● Tree interpreter
○ feature importance
○ it measures how often and how much
a feature was used in the model
○ report relative score for feature
importance
○ global picture of the effect of features
LIME (Local Interpretable Model-Agnostic Explanations)
● Model Agnostic! Approximate a black-box model by a simple linear surrogate model locally
● Learned on perturbations of the original instance in some cases faster than SHAP
● It doesn’t work out-of-the-box on all models.
12#UnifiedDataAnalytics #SparkAISummit
SHAP (SHapley Additive exPlanation)
● To unify various model explanation methods: Model-Agnostic or Model-Specific Approximations
● Based on the game theory, Shapley Values, by Scott Lundberg
● Shapley value is the average contribution of features which are predicting in different situation.
13#UnifiedDataAnalytics #SparkAISummit
● SHAP provides multiple explainers for different kind of models:
○ TreeExplainer: Support XGBoost, LightGBM, CatBoost and scikit-learn models by Tree SHAP.
○ DeepExplainer (DEEP SHAP): Support TensorFlow and Keras models by using DeepLIFT and Shapley
values.
○ KernelExplainer (Kernel SHAP): Applying to any models by using LIME and Shapley values.
○ GradientExplainer: Support TensorFlow and Keras models.
SHAP: Important Benefits
14#UnifiedDataAnalytics #SparkAISummit
Produce explanations at the level of individual inputs
● Traditional feature importance algorithms
will tell us which features are most
important across the entire population
● With individual-level SHAP values, we
can pinpoint which factors are most
impactful for each customer
SHAP: Important Benefits
15#UnifiedDataAnalytics #SparkAISummit
Can directly relate feature values to the output, which greatly improves interpretation of the results
● SHAP can quantify the impact of a feature on the unit of the model target
● Avoid decomp matrix which involves complex transformation and calculation
Read the impact of a feature in dollars!
Application
SHAP explainer: CNN
16#UnifiedDataAnalytics #SparkAISummit
LIME: NLP Fasttext
Real world implementation
17#UnifiedDataAnalytics #SparkAISummit
Run on Single Node Distributed Implementation
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Ad

More Related Content

What's hot (20)

Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Sri Ambati
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Sri Ambati
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
Anton Kulesh
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Sri Ambati
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
Krishnaram Kenthapadi
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
inovex GmbH
 
eScience SHAP talk
eScience SHAP talkeScience SHAP talk
eScience SHAP talk
Scott Lundberg
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Arithmer Inc.
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Wagston Staehler
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)
Rama Irsheidat
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
Daiki Tanaka
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Sri Ambati
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Sri Ambati
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
Anton Kulesh
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Sri Ambati
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
Krishnaram Kenthapadi
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
inovex GmbH
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)A Unified Approach to Interpreting Model Predictions (SHAP)
A Unified Approach to Interpreting Model Predictions (SHAP)
Rama Irsheidat
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
Daiki Tanaka
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 

Similar to Unified Approach to Interpret Machine Learning Model: SHAP + LIME (20)

AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdfAI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Lakshay14663
 
C3 w5
C3 w5C3 w5
C3 w5
Ajay Taneja
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
Pramit Choudhary
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
Pramit Choudhary
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
Aditya Bhattacharya
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdf
ssusera5352a2
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
BigML, Inc
 
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Databricks
 
Introduction to Machine Learning and Deep Learning
Introduction  to Machine Learning and Deep LearningIntroduction  to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
ssuser120749
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Pramit Choudhary
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
James Anderson
 
Ai and using ml in mobile apps
Ai and using ml in mobile appsAi and using ml in mobile apps
Ai and using ml in mobile apps
Paramvir Singh
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc
 
"Custom ML Models for Each User", Siamion Karasik
"Custom ML Models for Each User", Siamion Karasik"Custom ML Models for Each User", Siamion Karasik
"Custom ML Models for Each User", Siamion Karasik
Fwdays
 
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdfAI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Lakshay14663
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
Pramit Choudhary
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
Pramit Choudhary
 
Northbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdfNorthbay_December_2023_LLM_Reporting.pdf
Northbay_December_2023_LLM_Reporting.pdf
ssusera5352a2
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
BigML, Inc
 
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Working with 1 Million Time Series a Day: How to Scale Up a Predictive Analyt...
Databricks
 
Introduction to Machine Learning and Deep Learning
Introduction  to Machine Learning and Deep LearningIntroduction  to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
ssuser120749
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Pramit Choudhary
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
James Anderson
 
Ai and using ml in mobile apps
Ai and using ml in mobile appsAi and using ml in mobile apps
Ai and using ml in mobile apps
Paramvir Singh
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc
 
"Custom ML Models for Each User", Siamion Karasik
"Custom ML Models for Each User", Siamion Karasik"Custom ML Models for Each User", Siamion Karasik
"Custom ML Models for Each User", Siamion Karasik
Fwdays
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 

Unified Approach to Interpret Machine Learning Model: SHAP + LIME

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Layla Yang, Databricks Unified approach to interpret machine learning model: SHAP + LIME #UnifiedDataAnalytics #SparkAISummit
  • 3. Overview • What is Machine Learning Interpreter? • Why is it important? • Why is interpreting ML model difficult? • Different methodologies of model explainer • SHAP: SHapley Additive exPlanations • LIME: Local Interpretable Model-Agnostic Explanations • Application: real world use case 3#UnifiedDataAnalytics #SparkAISummit
  • 4. ML interpreter: What is it? Black box 4#UnifiedDataAnalytics #SparkAISummit Accuracy and Interpretability Trade-off Input Data Output
  • 5. ML interpreter: What is it? “ Interpretability is the degree to which a human can understand the cause of a decision.” -- Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.” 5#UnifiedDataAnalytics #SparkAISummit
  • 6. Why is it important? Commercial Drive: ● Many enterprises rely on machine learning models to make important decision: ○ Loan, credit card application ○ To buy / sell commodities, stocks or advertisements ○ Diagnose cancer or benign cells 6#UnifiedDataAnalytics #SparkAISummit ● Trust ○ It is easier for humans to trust a business that explains its decisions ● Legal Regulation ○ GDPR: the customer has right to obtain explanation
  • 7. Why is it important? Technical Drive: Can you trust your model based on accuracy? 7#UnifiedDataAnalytics #SparkAISummit ● Understand edge cases and circumstances of failure ● Knowing the ‘why’ can help you learn more about the problem and the data ● Improve the model
  • 8. Why is it important? Social and ethical aspect: ● Interpretability - a useful debugging tool for Detecting Bias ○ Sean Owen: What do Developer Salaries Tell us about the Gender Pay Gap? ○ Minority, Gender 8#UnifiedDataAnalytics #SparkAISummit ● Machine learning models pick up biases and may really harm vulnerable group of people ○ Be cautious about marketing strategy ○ Fairness: e.g. automatic approval or rejection of loan applications
  • 9. Why is it difficult? ML creates functions to explicitly or implicitly combine variables (features) in sophisticated way 9#UnifiedDataAnalytics #SparkAISummit Disaggregate the final prediction to single feature contribution and untangle interaction between features are very difficult!
  • 10. Why it’s difficult? Explain one model out of many good models: if they give different pictures of nature’s mechanism and lead to different conclusions - how to justify one against another and recommend to the business? 10#UnifiedDataAnalytics #SparkAISummit “ Data will often point with almost equal emphasis on several possible models. The question of which one most accurately reflects the data is difficult to resolve.” -- Leo Breiman, “Statistical Modeling: The Two Cultures”
  • 11. Different methodologies ● Surrogate model ○ simple model out of fancy model (to establish relationship between input and prediction) 11#UnifiedDataAnalytics #SparkAISummit ● Tree interpreter ○ feature importance ○ it measures how often and how much a feature was used in the model ○ report relative score for feature importance ○ global picture of the effect of features
  • 12. LIME (Local Interpretable Model-Agnostic Explanations) ● Model Agnostic! Approximate a black-box model by a simple linear surrogate model locally ● Learned on perturbations of the original instance in some cases faster than SHAP ● It doesn’t work out-of-the-box on all models. 12#UnifiedDataAnalytics #SparkAISummit
  • 13. SHAP (SHapley Additive exPlanation) ● To unify various model explanation methods: Model-Agnostic or Model-Specific Approximations ● Based on the game theory, Shapley Values, by Scott Lundberg ● Shapley value is the average contribution of features which are predicting in different situation. 13#UnifiedDataAnalytics #SparkAISummit ● SHAP provides multiple explainers for different kind of models: ○ TreeExplainer: Support XGBoost, LightGBM, CatBoost and scikit-learn models by Tree SHAP. ○ DeepExplainer (DEEP SHAP): Support TensorFlow and Keras models by using DeepLIFT and Shapley values. ○ KernelExplainer (Kernel SHAP): Applying to any models by using LIME and Shapley values. ○ GradientExplainer: Support TensorFlow and Keras models.
  • 14. SHAP: Important Benefits 14#UnifiedDataAnalytics #SparkAISummit Produce explanations at the level of individual inputs ● Traditional feature importance algorithms will tell us which features are most important across the entire population ● With individual-level SHAP values, we can pinpoint which factors are most impactful for each customer
  • 15. SHAP: Important Benefits 15#UnifiedDataAnalytics #SparkAISummit Can directly relate feature values to the output, which greatly improves interpretation of the results ● SHAP can quantify the impact of a feature on the unit of the model target ● Avoid decomp matrix which involves complex transformation and calculation Read the impact of a feature in dollars!
  • 16. Application SHAP explainer: CNN 16#UnifiedDataAnalytics #SparkAISummit LIME: NLP Fasttext
  • 17. Real world implementation 17#UnifiedDataAnalytics #SparkAISummit Run on Single Node Distributed Implementation
  • 18. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT