0% found this document useful (0 votes)

126 views

1 _ An Introduction to SHAP Values and Machine Learning Interpretability _ DataCamp

This document introduces SHAP values as a method for interpreting machine learning models, emphasizing their role in explaining how features impact predictions. It details the properties of SHAP values, their implementation in Python, and various visualization techniques such as summary, dependence, force, and decision plots. Additionally, it discusses the applications of SHAP values in model debugging, feature importance, and fairness auditing.

Uploaded by

Bryan Darquea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views

1 _ An Introduction to SHAP Values and Machine Learning Interpretability _ DataCamp

Uploaded by

Bryan Darquea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

An Introduction to SHAP Values and Machine

Learning Interpretability
Machine learning models are powerful but hard to interpret. However, SHAP
values can help you understand how model features impact predictions.

Machine learning models are becoming increasingly complex, powerful, and able to make
accurate predictions. However, as these models become "black boxes," it's even harder to
understand how they arrived at those predictions. This has led to a growing focus on machine
learning interpretability and explainability.

For example, you applied for a loan at a bank but were rejected. You want to know the reason
for the rejection, but the customer service agent responds that an algorithm dismissed the
application, and they cannot determine the reason why. This is frustrating, right? You deserve an
explanation for the decision that affects you. That's why companies try to make their machine
learning models more transparent and understandable.

One of the most promising tools for this process is SHAP values, which measure how much
each feature (such as income, age, credit score, etc.) contributes to the model's prediction.
SHAP values can help you see which features are most important for the model and how they
affect the outcome.

In this tutorial, we will learn about SHAP values and their role in machine learning model
interpretation. We will also use the `Shap` Python package to create and analyze different plots
for interpreting models.

What are SHAP Values?

SHAP (SHapley Additive exPlanations) values are a way to explain the output of any machine
learning model. It uses a game theoretic approach that measures each player's contribution to
the final outcome. In machine learning, each feature is assigned an importance value
representing its contribution to the model's output.

SHAP values show how each feature affects each final prediction, the significance of each
feature compared to others, and the model's reliance on the interaction between features.

SHAP Values in Machine Learning

SHAP values are a common way of getting a consistent and objective explanation of how each
feature impacts the model's prediction.
SHAP values are based on game theory and assign an importance value to each feature in a
model. Features with positive SHAP values positively impact the prediction, while those with
negative values have a negative impact. The magnitude is a measure of how strong the effect is.

SHAP values are model-agnostic, meaning they can be used to interpret any machine learning
model, including:

Linear regression

Decision trees

Random forests

Gradient boosting models

Neural networks

The Properties of SHAP Values

SHAP values have several useful properties that make them effective for interpreting models:

Additivity
SHAP values are additive, which means that the contribution of each feature to the final
prediction can be computed independently and then summed up. This property allows for
efficient computation of SHAP values, even for high-dimensional datasets.

Local accuracy
SHAP values add up to the difference between the expected model output and the actual output
for a given input. This means that SHAP values provide an accurate and local interpretation of
the model's prediction for a given input.

Missingness
SHAP values are zero for missing or irrelevant features for a prediction. This makes SHAP
values robust to missing data and ensures that irrelevant features do not distort the
interpretation.

Consistency
SHAP values do not change when the model changes unless the contribution of a feature
changes. This means that SHAP values provide a consistent interpretation of the model's
behavior, even when the model architecture or parameters change.

Overall, SHAP values provide a consistent and objective way to gain insights into how a
machine learning model makes predictions and which features have the greatest influence.

How to Implement SHAP Values in Python

In this section, we will calculate SHAP values and visualize feature importance, feature
dependence, force, and decision plot. You can find a code source, dataset, and visualizations on
DataLab.

Setting Up
Install SHAP either using PyPI or conda-forge:

pip install shap

Explain code OpenAI

conda install -c conda-forge shap

Explain code OpenAI

Load the Telecom Customer Churn. The dataset looks clean, and the target column is “Churn.”

import shap
import pandas as pd
import numpy as np
shap.initjs()

customer = pd.read_csv("data/customer_churn.csv")
customer.head()

Model Training and Evaluation

1. Create X and y using a target column and split the dataset into train and test.

2. Train Random Forest Classifier on the training set.

3. Make predictions using a testing set.

4. Display classification report.

from sklearn.metrics import classification_report

from sklearn.model_selection import train_test_split

X = customer.drop("Churn", axis=1) # Independent variables

y = customer.Churn # Dependent variable

# Split into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=

# Train a machine learning model

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Make prediction on the testing data

y_pred = clf.predict(X_test)

# Classification Report
print(classification_report(y_pred, y_test))

Explain code OpenAI

The model has shown better performance for “0” label than “1” due to an unbalanced dataset.
Overall, it is an acceptable result with 94% accuracy.

precision recall f1-score support

0 0.97 0.96 0.97 815

1 0.79 0.82 0.80 130
accuracy 0.94 945
macro avg 0.88 0.89 0.88 945
weighted avg 0.94 0.94 0.94 945

Explain code OpenAI

Check out our Classification in Machine Learning guide to learn about classification in machine
learning with Python examples.

Setting up SHAP Explainer

Now comes the model explainer part.

We will first create an explainer object by providing a random forest classification model, then
calculate SHAP value using a testing set.

explainer = shap.Explainer(clf)
shap_values = explainer.shap_values(X_test)

Explain code OpenAI

Summary Plot
Display the summary_plot using SHAP values and testing set.

shap.summary_plot(shap_values, X_test)

Explain code OpenAI

The summary plot shows the feature importance of each feature in the model. The results show
that “Status,” “Complaints,” and “Frequency of use” play major roles in determining the results.
Display the summary_plot of the label “0”.

shap.summary_plot(shap_values[0], X_test)

Explain code OpenAI

Y-axis indicates the feature names in order of importance from top to bottom.

X-axis represents the SHAP value, which indicates the degree of change in log odds.

The color of each point on the graph represents the value of the corresponding feature,
with red indicating high values and blue indicating low values.

Each point represents a row of data from the original dataset.

If you look at the feature “Complaints ', you will see that it is mostly high with a negative SHAP
value. It means higher complaint counts tend to negatively affect the output.

Note: for label “1” the visualization will be flipped.

Dependence Plot
Visualize the `dependence_plot` between the feature “Subscription Length” and “Age.”

"Subscription Length", shap_values[0], X_test,interaction_index="Age")

Explain code OpenAI

A dependence plot is a type of scatter plot that displays how a model's predictions are affected
by a specific feature (Subscription Length). On average, subscription lengths have a mostly
positive effect on the model.
Force Plot
We will examine the first sample in the testing set to determine which features contributed to the
"0" result. To do this, we will utilize a force plot and provide the expected value, SHAP value,
and testing sample.

shap.plots.force(explainer.expected_value[0], shap_values[0][0,:], X

Explain code OpenAI

We can clearly see that zero complaints and zero call failures have contributed to negative to
loss of customers.

Let’s look at customer churn samples with label “1”.

shap.plots.force(explainer.expected_value[1], shap_values[1][6, :],

Explain code OpenAI

You can see all of the features with the value and magnitude that have contributed to a loss of
customers. It seems that even one unresolved complaint can cost a telecommunications
company.

Decision Plot
We will now display the decision_plot . It visually depicts the model decisions by mapping the
cumulative SHAP values for each prediction.

shap.decision_plot(explainer.expected_value[1], shap_values[1], X_te

Explain code OpenAI

Each plotted line on the decision plot shows how strongly the individual features contributed to a
single model prediction, thus explaining what feature values pushed the prediction.

Note: The target label “1” decision plot is tilted towards “1”.
Display the decision plot for the target label “0”

shap.decision_plot(explainer.expected_value[0], shap_values[0], X_te

Explain code OpenAI

For the decision plot is tilted towards “0”.

Application of SHAP Values
Apart from machine learning interpretability and explainability, SHAP value can be used for:

1. Model debugging. By examining the SHAP values, we can identify any biases or outliers in
the data that may be causing the model to make mistakes.

2. Feature importance. Identifying and removing low-impact features can create a more
optimized model.

3. Anchoring explanations. We can use SHAP values to explain individual predictions by

highlighting the essential features that caused that prediction. It can help users understand
and trust a model's decisions.

4. Model summaries. It can provide a global summary of a model in the form of a SHAP
value summary plot. It gives an overview of the most important features across the entire
dataset.

5. Detecting biases. The SHAP value analysis helps identify if certain features
disproportionately affect particular groups. It enables the detection and reduction of
discrimination in the model.

6. Fairness auditing. It can be used to assess a model's fairness and ethical implications.

7. Regulatory approval. SHAP values can help gain regulatory approval by explaining the
model's decisions.
Conclusion
We have explored SHAP values and how we can use them to provide interpretability for
machine learning models. While having an accurate model is essential, companies need to go
beyond accuracy and focus on interpretability and transparency to gain the trust of users and
regulators.

Being able to explain why a model made a particular prediction helps debug potential biases,
identify data issues, and justify the model's decisions.

Shap
100% (1)
Shap
214 pages
Msazure - Create Your Own GenAI Apps
No ratings yet
Msazure - Create Your Own GenAI Apps
30 pages
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Explain Machine Learning Model Using SHAP
No ratings yet
Explain Machine Learning Model Using SHAP
28 pages
Content://com Opera Mini Native Operafile/?o File:///storage/emulated/0/download/shap
No ratings yet
Content://com Opera Mini Native Operafile/?o File:///storage/emulated/0/download/shap
11 pages
Econ ML2
No ratings yet
Econ ML2
6 pages
Econ ML3
No ratings yet
Econ ML3
5 pages
EXPLAINING_XGBOOST_PREDICTIONS_WITH_SHAP_VALUE_A_C
No ratings yet
EXPLAINING_XGBOOST_PREDICTIONS_WITH_SHAP_VALUE_A_C
13 pages
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
No ratings yet
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
8 pages
DM Chapter 10
No ratings yet
DM Chapter 10
3 pages
An Introduction to Explainable AI With Shapley Values — SHAP Latest Documentation
No ratings yet
An Introduction to Explainable AI With Shapley Values — SHAP Latest Documentation
20 pages
Sales Reward Points Prediction Using Machine Learning 1
No ratings yet
Sales Reward Points Prediction Using Machine Learning 1
7 pages
SHAP - Background and Application
No ratings yet
SHAP - Background and Application
2 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
71 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Christoph Molnar - Interpretable Machine Learning-Lulu - Com (2020)
No ratings yet
Christoph Molnar - Interpretable Machine Learning-Lulu - Com (2020)
255 pages
2024-04-16 Jagoda Bobińska Pasquale Gravante SHAP
No ratings yet
2024-04-16 Jagoda Bobińska Pasquale Gravante SHAP
11 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
T9
No ratings yet
T9
45 pages
T9 IML
No ratings yet
T9 IML
44 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
DV Special Exploration Activity
No ratings yet
DV Special Exploration Activity
12 pages
Chapter Two_ Classification Feb 26 2024
No ratings yet
Chapter Two_ Classification Feb 26 2024
18 pages
How Shapley Values Work - A Simple Guide
No ratings yet
How Shapley Values Work - A Simple Guide
11 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Assignment 4 r Program1
No ratings yet
Assignment 4 r Program1
11 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Train
No ratings yet
Train
12 pages
Comparing Interpretability and Explainability For Feature Selection
No ratings yet
Comparing Interpretability and Explainability For Feature Selection
12 pages
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
No ratings yet
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
25 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Day 30 UnderstandingYourData 7steps
No ratings yet
Day 30 UnderstandingYourData 7steps
4 pages
Hands-On Machine Learning Model Interpretation - Towards Data Science
No ratings yet
Hands-On Machine Learning Model Interpretation - Towards Data Science
78 pages
ML Notes.docx
No ratings yet
ML Notes.docx
15 pages
( ) Opening Up The Neural Network Classifier For Shap Score Computation
No ratings yet
( ) Opening Up The Neural Network Classifier For Shap Score Computation
11 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
DeepTrading With TensorFlow 4 - TodoTrader
No ratings yet
DeepTrading With TensorFlow 4 - TodoTrader
14 pages
Unit 5 Advanced Topics in Data Science
No ratings yet
Unit 5 Advanced Topics in Data Science
31 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
The Explanation Game: Explaining Machine Learning Models Using Shapley Values
No ratings yet
The Explanation Game: Explaining Machine Learning Models Using Shapley Values
20 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
No ratings yet
Machine Learning with PySpark and MLlib — Solving a Binary Classification Problem _ by Susan Li _ Towards Data Science
10 pages
7장 미루자롤 Explain Model
No ratings yet
7장 미루자롤 Explain Model
25 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
XAI Basics
No ratings yet
XAI Basics
34 pages
SHAP-Based Explanation Methods: A Review For NLP Interpretability
No ratings yet
SHAP-Based Explanation Methods: A Review For NLP Interpretability
11 pages
Model Engineering
No ratings yet
Model Engineering
7 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
assignmnet (1)
No ratings yet
assignmnet (1)
25 pages
Ai Model Validation
No ratings yet
Ai Model Validation
32 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
From Everand
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
Charlie Masterson
No ratings yet
JavaScript: Advanced Guide to Programming Code with JavaScript
From Everand
JavaScript: Advanced Guide to Programming Code with JavaScript
Charlie Masterson
No ratings yet
Module 3_ Machine Learning Algorithms
No ratings yet
Module 3_ Machine Learning Algorithms
17 pages
Amazon Com Inc 2023 Annual Report
No ratings yet
Amazon Com Inc 2023 Annual Report
25 pages
AI Final Exam
No ratings yet
AI Final Exam
12 pages
postdoc-BioSwarm-en
No ratings yet
postdoc-BioSwarm-en
3 pages
Abulag, Lovely T. Bsit Iii-B
No ratings yet
Abulag, Lovely T. Bsit Iii-B
5 pages
Alphago Zero Dethroned
No ratings yet
Alphago Zero Dethroned
37 pages
Download ebooks file IoT for Defense and National Security Robert Douglass all chapters
100% (5)
Download ebooks file IoT for Defense and National Security Robert Douglass all chapters
66 pages
Sih Project Presentation DNS
No ratings yet
Sih Project Presentation DNS
4 pages
SUMMER SPLENDOR _CLASS X _2025-2026
No ratings yet
SUMMER SPLENDOR _CLASS X _2025-2026
8 pages
Intel Information Risk Analyzer: Hackpions GDS ES Hackathon
No ratings yet
Intel Information Risk Analyzer: Hackpions GDS ES Hackathon
26 pages
cc105_manuscript
No ratings yet
cc105_manuscript
17 pages
WORKSHEET 2: Al or Not: (If It
No ratings yet
WORKSHEET 2: Al or Not: (If It
19 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
Module 8
No ratings yet
Module 8
42 pages
OBE Outline - AISP (AIS 3311) - MTH - Spring23
No ratings yet
OBE Outline - AISP (AIS 3311) - MTH - Spring23
4 pages
Now
No ratings yet
Now
3 pages
Use This to Learn Anything Fast
No ratings yet
Use This to Learn Anything Fast
6 pages
Automated Class Attendance Management System Using Face Recognition
No ratings yet
Automated Class Attendance Management System Using Face Recognition
7 pages
Assessment Task 3_Individual_Excel_Assignment Guidelines_Rubric_ACF 5904 s1 2025(1)
No ratings yet
Assessment Task 3_Individual_Excel_Assignment Guidelines_Rubric_ACF 5904 s1 2025(1)
13 pages
Nm1022-Naan Mudhalvan 3
No ratings yet
Nm1022-Naan Mudhalvan 3
24 pages
AI Agents in Manufacturing PPT Presentation
100% (1)
AI Agents in Manufacturing PPT Presentation
27 pages
Review of Machine Learning Models for WinLoss Prediction in League of Legends
No ratings yet
Review of Machine Learning Models for WinLoss Prediction in League of Legends
7 pages
MULTIPLE COLOR DETECTION IN REAL - TIME USING PYTHON (Paper) PDF
No ratings yet
MULTIPLE COLOR DETECTION IN REAL - TIME USING PYTHON (Paper) PDF
7 pages
Multimedia_generation_using_neural_network_DeepDream
No ratings yet
Multimedia_generation_using_neural_network_DeepDream
6 pages
Msc. Research Proposal Master Program
No ratings yet
Msc. Research Proposal Master Program
8 pages
AI Governance in A Complex and Rapidly Changing Regulatory Landscape A Global Perspective - 2024 - Springer Nature
No ratings yet
AI Governance in A Complex and Rapidly Changing Regulatory Landscape A Global Perspective - 2024 - Springer Nature
18 pages
Implementing An Automated Inventory Management System For Small and Medium-Sized Enterprises
No ratings yet
Implementing An Automated Inventory Management System For Small and Medium-Sized Enterprises
7 pages
Electronics 13 00272 v2
No ratings yet
Electronics 13 00272 v2
19 pages
Dissertation Topics in Counselling Psychology
100% (2)
Dissertation Topics in Counselling Psychology
8 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages

1 _ An Introduction to SHAP Values and Machine Learning Interpretability _ DataCamp

Uploaded by

1 _ An Introduction to SHAP Values and Machine Learning Interpretability _ DataCamp

Uploaded by

An Introduction to SHAP Values and Machine

What are SHAP Values?

SHAP Values in Machine Learning

Gradient boosting models

The Properties of SHAP Values

How to Implement SHAP Values in Python

pip install shap

Explain code OpenAI

conda install -c conda-forge shap

Explain code OpenAI

Model Training and Evaluation

2. Train Random Forest Classifier on the training set.

3. Make predictions using a testing set.

4. Display classification report.

from sklearn.metrics import classification_report

X = customer.drop("Churn", axis=1) # Independent variables

# Split into train and test

# Train a machine learning model

# Make prediction on the testing data

Explain code OpenAI

precision recall f1-score support

0 0.97 0.96 0.97 815

Explain code OpenAI

Setting up SHAP Explainer

Explain code OpenAI

Explain code OpenAI

Explain code OpenAI

Each point represents a row of data from the original dataset.

Note: for label “1” the visualization will be flipped.

"Subscription Length", shap_values[0], X_test,interaction_index="Age")

Explain code OpenAI

Explain code OpenAI

Let’s look at customer churn samples with label “1”.

Explain code OpenAI

shap.decision_plot(explainer.expected_value[1], shap_values[1], X_te

Explain code OpenAI

shap.decision_plot(explainer.expected_value[0], shap_values[0], X_te

Explain code OpenAI

For the decision plot is tilted towards “0”.

3. Anchoring explanations. We can use SHAP values to explain individual predictions by

You might also like