Probabilistic ML Crash Course - Leblanc, Mason
Probabilistic ML Crash Course - Leblanc, Mason
Mason Leblanc
Copyright © 2024 by Mason Leblanc
All rights reserved. No part of this publication may be reproduced,
distributed, or transmitted in any form or by any means, including
photocopying, recording, or other electronic or mechanical methods,
without the prior written permission of the publisher, except in the
case of brief quotations embodied in critical reviews and certain
other non commercial uses permitted by copyright law.
Table of Contents
Introduction 5
Part 1: Foundations 8
Conclusion 183
Introduction
Have you ever wondered if machines can truly understand the world
around them? Can they predict the future, make decisions, and
navigate complex situations with the same nuanced understanding
of uncertainty that we humans possess?
While traditional machine learning has achieved remarkable feats,
often it operates in a black box, making predictions without truly
comprehending the "why" behind them. This lack of interpretability
can lead to biases, inaccuracies, and a sense of unease about the
growing role of AI in our lives.
But what if there was a different approach? What if we could build
AI systems that not only make accurate predictions but also
understand the inherent uncertainty of the world? This is where
Probabilistic Machine Learning (PML) comes in.
PML embraces uncertainty as a fundamental truth, allowing AI to
reason, learn, and make decisions just like we do. By incorporating
probability and statistics, PML builds robust and interpretable models
that can explain their reasoning and adapt to new situations.
This book, Probabilistic Machine Learning Crash Course: A
Quick Guide to Building Robust and Interpretable AI Models
(Even if You're Afraid of Math), is your invitation to enter this
exciting world. We'll set sail on a quest together, starting from the
basic concepts of probability and statistics, demystifying PML without
relying on complex equations.
Even if you haven't touched machine learning before, fear not! This
book is designed for beginners. We'll build foundational knowledge
brick by brick, using real-world examples and intuitive explanations
to guide you through the core concepts.
By the end of this journey, you'll be not just building PML models, but
also understanding their logic. You'll be able to:
● P redict movie ratings with a fascinating technique called
Probabilistic Matrix Factorization (our hands-on project!).
● C
hoose the right PML model for your problem and interpret
its results with confidence.
● N
avigate the future of AI with a critical eye, understanding
the power and limitations of probabilistic approaches.
Whether you're a data enthusiast, a programmer, or simply curious
about the future of AI, this book is your key to unlocking the
fascinating world of PML. Join me on this adventure, and let's build
together AI systems that are not only powerful but also transparent,
trustworthy, and truly understand the world they inhabit.
Are you ready to embrace the power of uncertainty? Turn the
page and let's begin!
Part 1
Foundations
Chapter 1: Introduction to Probabilistic Thinking
Code Samples
Python
# Import libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
# Load data (replace with your data source)
data = pd.read_csv("house_prices.csv") # Example: house size
and price data
# Define independent and dependent variables
X = data[["size"]] # Independent variable: house size
y = data["price"] # Dependent variable: house price
# Split data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on test data
y_pred = model.predict(X_test)
# Evaluate model performance
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print("Mean squared error:", mse)
# Predict price for a new house (example)
new_size = 2000 # Example: house size of 2000 square feet
predicted_price = model.predict([[new_size]])[0]
print("Predicted price for a house of size", new_size, ":",
predicted_price)
1. I mport libraries: We import necessary libraries for data
manipulation (pandas) and linear regression modeling
(sklearn).
2. L oad data: Replace the placeholder with your actual data
source containing house size and price information.
3. D efine variables: Specify the independent variable (house
size) and the dependent variable (house price).
4. S plit data: Divide the data into training and testing sets using
train_test_split. This helps evaluate the model's performance
on unseen data.
5. C
reate and train the model: Instantiate a
LinearRegression object and train it on the training data using
fit.
6. M ake predictions: Use the trained model to predict house
prices for the test data using predict.
7. E valuate performance: Calculate the mean squared error
(mean_squared_error) to assess how well the predictions
match the actual prices.
8. P
redict for a new house: Provide a new house size and use
the model to predict its price.
Remember to replace the example data and adjust the code based
on your specific dataset and prediction needs. This provides a basic
framework for building your first linear regression model in Python.
You've built your first linear regression model, and now you're staring
at a set of numbers representing its parameters. But these aren't just
random digits; they hold the key to unlocking the secrets your model
has learned. Now let's interpret linear regression parameters,
transforming them from cryptic messages into valuable insights.
The Key Players
Remember the equation of a line: y = b0 + b1 * x. In linear
regression, b0 is the intercept and b1 is the slope. Let's break down
their individual stories:
Intercept (b0)
This represents the predicted value of y when x is zero. Think of it as
the y-axis value where the line crosses when there's no influence
from x.
Imagine a model predicting website traffic based on advertising
spend. An intercept of 10 doesn't mean zero ads lead to 10 visits. It
suggests a baseline traffic of 10, likely from organic sources, even
without paid advertising.
Slope (b1)
This tells you how much y changes for every unit increase in x. It
reflects the strength and direction of the relationship between your
variables.
A model predicting stock price changes based on company news
sentiment might have a steeper slope for negative sentiment,
implying a larger predicted price decrease compared to positive
sentiment.
Interpreting the Impact
The sign and magnitude of the slope paint a vivid picture:
● P ositive slope: As x increases, so does y. Think of larger
houses being predicted as more expensive, or higher
education levels leading to higher predicted salaries.
● N egative slope: The opposite is true. For instance, more
years of experience might lead to a lower predicted risk of
heart disease.
● L
arger slope magnitude: A steeper slope indicates a stronger
relationship. Each unit change in x has a bigger impact on y.
Don't just take the numbers at face value. Here are more concepts
you should know about:
Confidence Intervals
These represent the range within which the true parameter value
likely lies. A wider interval suggests more uncertainty in the estimate.
Statistical Significance
This tells you if the observed relationship is unlikely due to chance. A
statistically significant result strengthens your confidence in the
model's findings.
Visualization
Tools like scatterplots with regression lines and residual plots can
visually depict the relationship and identify potential outliers or non-
linear patterns.
Real-World Examples: Decoding the Language of Linear
Regression Parameters
While the previous examples explored interpreting parameters in
general terms, Here are concrete real-world scenarios to solidify
your understanding:
1. Predicting Online Ad Click-Through Rates (CTRs)
Imagine you're an e-commerce company building a model to predict
click-through rates (CTRs) on your ads based on various factors like
product price, image quality, and ad copy length. After training, you
obtain the following parameters:
● I ntercept: 0.02 (2% base CTR even without considering
other factors)
● P roduct Price: -0.001 per dollar (lower price leads to slightly
higher CTR)
● I mage Quality Score: 0.01 per point (higher quality images
increase CTR)
● A d Copy Length: -0.002 per word (shorter copy is slightly
more likely to be clicked)
Interpretation:
● E ven for an ad with an average price, average image quality,
and wordy copy, there's a 2% baseline CTR, suggesting
organic interest in your product category.
● L
owering the price slightly (by $1) can increase the predicted
CTR by 0.1%, implying price sensitivity among potential
customers.
● I nvesting in high-quality product images can significantly
boost CTR (1% increase for a 10-point improvement),
highlighting the power of visual appeal.
● K eeping ad copy concise can be beneficial, with each
additional word potentially decreasing CTR by 0.2%.
2. Forecasting Energy Consumption in Buildings
Suppose you're a building management company aiming to predict
energy consumption based on factors like weather data, occupancy
levels, and building efficiency ratings. Your model yields these
parameters:
● I ntercept: 50 kWh (base energy consumption even without
considering other factors)
● A verage Temperature: 1.5 kWh per degree Celsius increase
● O ccupancy Level: 2 kWh per additional person
● B uilding Efficiency Rating: -3 kWh per unit increase (higher
rating implies lower consumption)
Interpretation
● E
ven on a day with no occupants and average temperature,
the building has a base consumption of 50 kWh, likely due to
standby systems and minimal lighting.
● W armer weather leads to increased energy use, with each
degree Celsius rise predicted to consume an additional 1.5
kWh, highlighting the importance of temperature control.
● M
ore occupants translate to higher energy needs, with each
person adding 2 kWh to the predicted consumption, prompting
optimization of shared spaces and equipment.
● B
uildings with higher efficiency ratings consume less energy,
with each unit decrease in rating potentially saving 3 kWh,
underlining the value of energy-efficient upgrades.
Remember, these are simplified examples, and real-world
applications might involve more complex models and interpretations.
It's crucial to consider domain knowledge, potential biases, and
context when drawing conclusions. By effectively interpreting
parameters, you can translate data-driven insights into actionable
strategies for improved advertising campaigns, optimized energy
management, or other domains you explore.
6.3 Beyond Linear Regression
While linear regression is a powerful tool, it's not the only treasure
map at your disposal. Let's explore two other popular probabilistic
models that unlock new prediction possibilities: logistic regression
and Naive Bayes.
1. Logistic Regression: Unveiling the Secrets of Classifications
Imagine you're sorting emails into spam or inbox folders. Linear
regression wouldn't suffice, as it predicts continuous values (like
price). Enter logistic regression, a model specializing in
classifications. It estimates the probability of a data point belonging
to a specific category, just like deciding whether an email is spam
(category 1) or not (category 2).
Key Concepts:
● S igmoid function: This S-shaped curve transforms the linear
model's output into a probability between 0 and 1,
representing the likelihood of belonging to each category.
● D ecision boundary: This line separates the data points
classified into different categories based on the probability
threshold (e.g., 0.5 for equal weight to both categories).
Example:
Predicting customer churn (yes/no) based on purchase history and
demographics. The model estimates the probability of each
customer churning, helping you identify those at high risk and
implement targeted retention strategies.
2. Naive Bayes: Simple Yet Effective Predictions
For quick and efficient predictions, consider Naive Bayes. This
model assumes features (like product features or email words) are
independent in influencing the outcome (like product purchase or
spam classification). While this assumption might not always hold
true, Naive Bayes often delivers surprisingly accurate results,
especially for large datasets.
Key Concepts:
● C
onditional probabilities: The model estimates the probability
of each category given the combination of feature values (e.g.,
probability of spam email given presence of certain words).
● B
ayes' theorem: This formula combines these probabilities
with prior class probabilities to calculate the overall probability
of a data point belonging to a particular category.
Classifying emails as spam based on the presence of specific
keywords is an examples. The model calculates the probability of
spam given each keyword and combines them to determine the
overall spam probability for each email, aiding in effective email
filtering.
Be reminded that each model has its strengths and limitations.
Choose the one that best aligns with your prediction task
(classification vs. continuous values) and data characteristics
(feature independence).
Code Samples:
Logistic Regression: Unveiling Classifications
Example: Predicting Email Spam
Imagine classifying emails as spam (category 1) or not spam
(category 2) based on features like sender address, keywords, and
presence of attachments. Here's a breakdown:
1. Data Preparation: Load your email data, representing each email
as a vector of features (e.g., word counts, presence of URLs). Label
each email as spam (1) or not spam (0).
2. Model Training: Use a logistic regression library like scikit-learn
in Python. Train the model on your labeled data, learning the
relationship between features and spam probability.
3. Prediction: For a new email, extract its feature vector. Feed the
vector to the trained model to get a probability score between 0 and
1. Set a threshold (e.g., 0.5) to classify: probability above the
threshold is spam, below is not spam.
Code Snippet (Conceptual)
Python
Python
Python
Python
Python
# Import libraries
import pandas as pd
# Load data from CSV file
data = pd.read_csv("customer_data.csv")
# Explore the data
print(data.head())
print(data.info())
● T his code imports the Pandas library for data manipulation.
● I t reads the customer data from a CSV file.
● T he head() method displays the first few rows to get a
glimpse of the data.
● T he info() method provides information about data types,
missing values, etc.
2. Public Datasets (using Kaggle API)
Python
Python
Python
Python
Python
Python
Python
# Create new feature (e.g., purchase frequency per month)
data["purchase_freq_month"] = data["purchases"] /
data["months_since_first_purchase"]
# Combine features (e.g., customer location and product
category)
data["location_category"] =
data["location"].str.cat(data["category"], sep="_")
This code demonstrates creating a new feature based on existing
ones and combining features into a new categorical variable. Adapt
these examples to create features relevant to your specific problem
and domain knowledge.
3. Choosing and Tuning Hyperparameters (Example with Scikit-
learn)
Python
Python
Python
import shap
# Use SHAP to explain model predictions and visualize feature
importances
explainer = shap.TreeExplainer(rf_model)
shap_values = explainer(X_val)
# Visualize feature importances with SHAP summary plot
shap.summary_plot(shap_values, X_val)
This code showcases using SHAP library to explain model
predictions and visualize feature importances. It creates a
TreeExplainer object based on the trained model and calculates
SHAP values for the validation set. Finally, it generates a summary
plot visualizing the impact of each feature
Phase 3: Refinement Dojo
1. Hyperparameter Tuning Continued (Scikit-learn example)
Python
Python
Python
Python
Python
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
2. Load Data
Python
# Load your movie rating data (user ID, movie ID, rating)
data = np.loadtxt("movie_ratings.txt")
# Extract user IDs, movie IDs, and ratings
user_ids = data[:, 0].astype(int)
movie_ids = data[:, 1].astype(int)
ratings = data[:, 2].astype(float)
# Create a sparse matrix for efficient handling of large datasets
ratings_matrix = csr_matrix((ratings, (user_ids, movie_ids)))
Model Training:
1. Define PMF Function
Python
Python
Python
Python