Bayesian Analysis with Python: A practical guide to probabilistic modeling
3/5
()
Osvaldo Martin
Osvaldo Martin is a researcher at CONICET, in Argentina. He has experience using Markov Chain Monte Carlo methods to simulate molecules and perform Bayesian inference. He loves to use Python to solve data analysis problems. He is especially motivated by the development and implementation of software tools for Bayesian statistics and probabilistic modeling. He is an open-source developer, and he contributes to Python libraries like PyMC, ArviZ and Bambi among others. He is interested in all aspects of the Bayesian workflow, including numerical methods for inference, diagnosis of sampling, evaluation and criticism of models, comparison of models and presentation of results.
Read more from Osvaldo Martin
Bayesian Analysis with Python Rating: 4 out of 5 stars4/5Python贝叶斯分析(第2版): Chinese Edition Rating: 0 out of 5 stars0 ratings
Related to Bayesian Analysis with Python
Related ebooks
Finding Data Patterns in the Noise: A Data Scientist's Tale Rating: 0 out of 5 stars0 ratingsData Science Fusion: Integrating Maths, Python, and Machine Learning Rating: 0 out of 5 stars0 ratingsAI vs ML Rating: 0 out of 5 stars0 ratingsInstant PhoneGap Social App Development Rating: 0 out of 5 stars0 ratingsSoftware Testing Rating: 0 out of 5 stars0 ratingsIntroduction to Microsoft SQL Server Rating: 0 out of 5 stars0 ratingsAPIs Made Easy Rating: 0 out of 5 stars0 ratingsjQuery: Novice to Ninja: Novice to Ninja Rating: 4 out of 5 stars4/5Robot Operating System: Mastering Autonomous Systems for Seamless Integration and Control Rating: 0 out of 5 stars0 ratingsCSS Mastery: Styling Web Pages Like a Pro Rating: 0 out of 5 stars0 ratingsC Programming For Beginners: The Complete Step-By-Step Guide To Mastering The C Programming Language Like A Pro Rating: 0 out of 5 stars0 ratingsBuilding an Operating System with Rust: A Practical Guide Rating: 0 out of 5 stars0 ratingsBricklin on Technology Rating: 5 out of 5 stars5/5Algorithm Standard Requirements Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Java and LingPipe Cookbook Rating: 0 out of 5 stars0 ratingsSpace, Time, and Cloud: The Emerging Consumer Tech Reality Rating: 0 out of 5 stars0 ratingsExploring Apple Mac Big Sur Edition: The Illustrated, Practical Guide to Using your Mac Rating: 0 out of 5 stars0 ratingsApplied Machine Learning Solutions with Python: SOLUTIONS FOR PYTHON, #1 Rating: 0 out of 5 stars0 ratingsSocial Media Data Mining and Analytics Rating: 0 out of 5 stars0 ratingsOData Programming Cookbook for .NET Developers Rating: 0 out of 5 stars0 ratingsWeb Scraping for SEO with Python Rating: 0 out of 5 stars0 ratingsHow to Design Optimization Algorithms by Applying Natural Behavioral Patterns Rating: 0 out of 5 stars0 ratingsMastering Excel Formulas Save Time and Boost Productivity Rating: 5 out of 5 stars5/5ISO/IEC 20000 - An Introduction Rating: 0 out of 5 stars0 ratingsParallel and High Performance Programming with Python Rating: 0 out of 5 stars0 ratingsPython for Absolute Beginners: Learn to Code Fast! Rating: 0 out of 5 stars0 ratings
Computers For You
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms Rating: 0 out of 5 stars0 ratingsStandard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratings101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsComputer Science I Essentials Rating: 5 out of 5 stars5/5Quantum Computing For Dummies Rating: 3 out of 5 stars3/5Discord For Dummies Rating: 0 out of 5 stars0 ratingsA Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsA Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5Algorithms For Dummies Rating: 4 out of 5 stars4/5
Reviews for Bayesian Analysis with Python
1 rating0 reviews
Book preview
Bayesian Analysis with Python - Osvaldo Martin
Bayesian Analysis with Python
Third Edition
A practical guide to probabilistic modeling
Osvaldo Martin
PICBIRMINGHAM—MUMBAI
Bayesian Analysis with Python
Third Edition
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Lead Senior Publishing Product Manager: Tushar Gupta
Acquisition Editor – Peer Reviews: Bethany O’Connell
Project Editor: Namrata Katare
Development Editor: Tanya D’cruz
Copy Editor: Safis Editing
Technical Editor: Aniket Shetty
Indexer: Rekha Nair
Proofreader: Safis Editing
Presentation Designer: Pranit Padwal
Developer Relations Marketing Executive: Monika Sangwan
First published: November 2016
Second edition: December 2018
Third edition: January 2024
Production reference: 2300724
Published by Packt Publishing Ltd.
Grosvenor House 11
St Paul’s Square
Birmingham B3 1RB, UK.
ISBN 978-1-80512-716-1
www.packt.com
In gratitude to my family: Romina, Abril, and Bruno.
Foreword
As we present this new edition of Bayesian Analysis with Python, it’s essential to recognize the profound impact this book has had on advancing the growth and education of the probabilistic programming user community. The journey from its first publication to this current edition mirrors the evolution of Bayesian modeling itself – a path marked by significant advancements, growing community involvement, and an increasing presence in both academia and industry.
The field of probabilistic programming is in a different place today than it was when the first edition was devised in the middle of the last decade. As long-term practitioners, we have seen firsthand how Bayesian methods grew from a more fringe methodology to the primary way of solving some of the most advanced problems in science and various industries. This trend is supported by the continued development of advanced, performant, high-level tools such as PyMC. With this is a growing number of new applied users, many of whom have limited experience with either Bayesian methods, PyMC, or the underlying libraries that probabilistic programming packages increasingly rely on to accelerate computation. In this context, this new edition comes at the perfect time to introduce the next generation of data scientists to this increasingly powerful methodology.
Osvaldo Martin, a teacher, applied statistician, and long-time core PyMC developer, is the perfect guide to help readers navigate this complex landscape. He provides a clear concise and comprehensive introduction to Bayesian methods and the PyMC library, and he walks readers through a variety of real-world examples. As the population of data scientists using probabilistic programming grows, it is important to instill them with good habits and a sound workflow; Dr. Martin here provides sound, engaging guidance for doing so.
What makes this book a go-to reference is its coverage of most of the key questions posed by applied users: How do I express my problem as a probabilistic program? How do I know if my model is working? How do I know which model is best? Herein you will find a primer on Bayesian best practices, updated to current standards based on methodological improvements since the release of the last edition. This includes innovations related to the PyMC library itself, which has come a long way since PyMC3, much to the benefit of you, the end-user.
Complementing these improvements is the expansion of the PyMC ecosystem, a reflection of the broadening scope and capabilities of Bayesian modeling. This edition includes discussions on four notable new libraries: Bambi, Kulprit, PreliZ, and PyMC-BART. These additions, along with the continuous refinement of text and code, ensure that readers are equipped with the latest tools and methodologies in Bayesian analysis. This edition is not just an update but a significant step forward in the journey of probabilistic programming, mirroring the dynamic evolution of PyMC and its community.
The previous two editions of this book have been cornerstones for many in understanding and applying Bayesian methods. Each edition, including this latest one, has evolved to incorporate new developments, making it an indispensable resource for both newcomers and experienced practitioners. As PyMC continues to evolve - perhaps even to newer versions by the time this book is read - the content here remains relevant, providing foundational knowledge and insights into the latest advancements. In this edition, readers will find not only a comprehensive introduction to Bayesian analysis but also a window into the cutting-edge techniques that are currently shaping the field. We hope this book serves as both a guide and an inspiration, showcasing the power and flexibility of Bayesian modeling in addressing complex data-driven challenges.
As co-authors of this foreword, we are excited about the journey that lies ahead for readers of this book. You are joining a vibrant, ever-expanding community of enthusiasts and professionals who are pushing the boundaries of what’s possible in data analysis. We trust that this book will be a valuable companion in your exploration of Bayesian modeling and a catalyst for your own contributions to this dynamic field.
Christopher Fonnesbeck, PyMC’s original author and Principal Quantitative Analyst for the Philadelphia Phillies
Thomas Wiecki, CEO & Founder of PyMC Labs
Contributors
About the author
Osvaldo Martin is a researcher at The National Scientific and Technical Research Council (CONICET), in Argentina. He has worked on structural bioinformatics of biomolecules and has used Markov Chain Monte Carlo methods to simulate molecular systems. He is currently working on computational methods for Bayesian statistics and probabilistic programming. He has taught courses about structural bioinformatics, data science, and Bayesian data analysis. He was also the head of the organizing committee of PyData San Luis (Argentina) 2017, the first PyData in LatinAmerica. He contributed to many open-source projects, including ArviZ, Bambi, Kulprit, PreliZ, and PyMC.
I would like to thank Romina for her continuous support. I also want to thank Tomás Capretto, Alejandro Icazatti, Juan Orduz, and Bill Engels for providing invaluable feedback and suggestions on my drafts. A special thanks go to the core developers and all contributors of the Python packages used in this book. Their dedication, love, and hard work have made this book possible.
About the reviewer
Joon (Joonsuk) Park is a former quantitative psychologist and currently a machine learning engineer. He graduated from the Ohio State University with a PhD in Quantitative Psychology in 2019. His research during graduate study was focused on the applications of Bayesian statistics to cognitive modeling and behavioral research methodology. He transitioned into an industry data science and has worked as a data scientist since 2020. He has also published several books on psychology, statistics, and data science in Korean.
Join our community Discord space
Join our Discord community to meet like-minded people and learn alongside more than 5000 members at: https://packt.link/bayesian
PICTable of Contents
Thinking Probabilistically
1.1 Statistics, models, and this book’s approach
1.2 Working with data
1.3 Bayesian modeling
1.4 A probability primer for Bayesian practitioners
1.4.1 Sample space and events
1.4.2 Random variables
1.4.3 Discrete random variables and their distributions
1.4.4 Continuous random variables and their distributions
1.4.5 Cumulative distribution function
1.4.6 Conditional probability
1.4.7 Expected values
1.4.8 Bayes’ theorem
1.5 Interpreting probabilities
1.6 Probabilities, uncertainty, and logic
1.7 Single-parameter inference
1.7.1 The coin-flipping problem
1.7.2 Choosing the likelihood
1.7.3 Choosing the prior
1.7.4 Getting the posterior
1.7.5 The influence of the prior
1.8 How to choose priors
1.9 Communicating a Bayesian analysis
1.9.1 Model notation and visualization
1.9.2 Summarizing the posterior
1.10 Summary
1.11 Exercises
Programming Probabilistically
2.1 Probabilistic programming
2.1.1 Flipping coins the PyMC way
2.2 Summarizing the posterior
2.3 Posterior-based decisions
2.3.1 Savage-Dickey density ratio
2.3.2 Region Of Practical Equivalence
2.3.3 Loss functions
2.4 Gaussians all the way down
2.4.1 Gaussian inferences
2.5 Posterior predictive checks
2.6 Robust inferences
2.6.1 Degrees of normality
2.6.2 A robust version of the Normal model
2.7 InferenceData
2.8 Groups comparison
2.8.1 The tips dataset
2.8.2 Cohen’s d
2.8.3 Probability of superiority
2.8.4 Posterior analysis of mean differences
2.9 Summary
2.10 Exercises
Hierarchical Models
3.1 Sharing information, sharing priors
3.2 Hierarchical shifts
3.3 Water quality
3.4 Shrinkage
3.5 Hierarchies all the way up
3.6 Summary
3.7 Exercises
Modeling with Lines
4.1 Simple linear regression
4.2 Linear bikes
4.2.1 Interpreting the posterior mean
4.2.2 Interpreting the posterior predictions
4.3 Generalizing the linear model
4.4 Counting bikes
4.5 Robust regression
4.6 Logistic regression
4.6.1 The logistic model
4.6.2 Classification with logistic regression
4.6.3 Interpreting the coefficients of logistic regression
4.7 Variable variance
4.8 Hierarchical linear regression
4.8.1 Centered vs. noncentered hierarchical models
4.9 Multiple linear regression
4.10 Summary
4.11 Exercises
Comparing Models
5.1 Posterior predictive checks
5.2 The balance between simplicity and accuracy
5.2.1 Many parameters (may) lead to overfitting
5.2.2 Too few parameters lead to underfitting
5.3 Measures of predictive accuracy
5.3.1 Information criteria
5.3.2 Cross-validation
5.4 Calculating predictive accuracy with ArviZ
5.5 Model averaging
5.6 Bayes factors
5.6.1 Some observations
5.6.2 Calculation of Bayes factors
5.7 Bayes factors and inference
5.8 Regularizing priors
5.9 Summary
5.10 Exercises
Modeling with Bambi
6.1 One syntax to rule them all
6.2 The bikes model, Bambi’s version
6.3 Polynomial regression
6.4 Splines
6.5 Distributional models
6.6 Categorical predictors
6.6.1 Categorical penguins
6.6.2 Relation to hierarchical models
6.7 Interactions
6.8 Interpreting models with Bambi
6.9 Variable selection
6.9.1 Projection predictive inference
6.9.2 Projection predictive with Kulprit
6.10 Summary
6.11 Exercises
Mixture Models
7.1 Understanding mixture models
7.2 Finite mixture models
7.2.1 The Categorical distribution
7.2.2 The Dirichlet distribution
7.2.3 Chemical mixture
7.3 The non-identifiability of mixture models
7.4 How to choose K
7.5 Zero-Inflated and hurdle models
7.5.1 Zero-Inflated Poisson regression
7.5.2 Hurdle models
7.6 Mixture models and clustering
7.7 Non-finite mixture model
7.7.1 Dirichlet process
7.8 Continuous mixtures
7.8.1 Some common distributions are mixtures
7.9 Summary
7.10 Exercises
Gaussian Processes
8.1 Linear models and non-linear data
8.2 Modeling functions
8.3 Multivariate Gaussians and functions
8.3.1 Covariance functions and kernels
8.4 Gaussian processes
8.5 Gaussian process regression
8.6 Gaussian process regression with PyMC
8.6.1 Setting priors for the length scale
8.7 Gaussian process classification
8.7.1 GPs for space flu
8.8 Cox processes
8.8.1 Coal mining disasters
8.8.2 Red wood
8.9 Regression with spatial autocorrelation
8.10 Hilbert space GPs
8.10.1 HSGP with Bambi
8.11 Summary
8.12 Exercises
Bayesian Additive Regression Trees
9.1 Decision trees
9.2 BART models
9.2.1 Bartian penguins
9.2.2 Partial dependence plots
9.2.3 Individual conditional plots
9.2.4 Variable selection with BART
9.3 Distributional BART models
9.4 Constant and linear response
9.5 Choosing the number of trees
9.6 Summary
9.7 Exercises
Inference Engines
10.1 Inference engines
10.2 The grid method
10.3 Quadratic method
10.4 Markovian methods
10.4.1 Monte Carlo
10.4.2 Markov chain
10.4.3 Metropolis-Hastings
10.4.4 Hamiltonian Monte Carlo
10.5 Sequential Monte Carlo
10.6 Diagnosing the samples
10.7 Convergence
10.7.1 Trace plot
10.7.2 Rank plot
10.7.3 ^R (R hat)
10.8 Effective Sample Size (ESS)
10.9 Monte Carlo standard error
10.10 Divergences
10.11 Keep calm and keep trying
10.12 Summary
10.13 Exercises
Where to Go Next
Table of Contents
Bayesian Analysis with Python
Third Edition
Preface
Chapter 1
Thinking Probabilistically
1.1 Statistics, models, and this book’s approach
1.2 Working with data
1.3 Bayesian modeling
1.4 A probability primer for Bayesian practitioners
1.5 Interpreting probabilities
1.6 Probabilities, uncertainty, and logic
1.7 Single-parameter inference
1.8 How to choose priors
1.9 Communicating a Bayesian analysis
1.10 Summary
1.11 Exercises
Join our community Discord space
Chapter 2
Programming Probabilistically
2.1 Probabilistic programming
2.2 Summarizing the posterior
2.3 Posterior-based decisions
2.4 Gaussians all the way down
2.5 Posterior predictive checks
2.6 Robust inferences
2.7 InferenceData
2.8 Groups comparison
2.9 Summary
2.10 Exercises
Join our community Discord space
Chapter 3
Hierarchical Models
3.1 Sharing information, sharing priors
3.2 Hierarchical shifts
3.3 Water quality
3.4 Shrinkage
3.5 Hierarchies all the way up
3.6 Summary
3.7 Exercises
Join our community Discord space
Chapter 4
Modeling with Lines
4.1 Simple linear regression
4.2 Linear bikes
4.3 Generalizing the linear model
4.4 Counting bikes
4.5 Robust regression
4.6 Logistic regression
4.7 Variable variance
4.8 Hierarchical linear regression
4.9 Multiple linear regression
4.10 Summary
4.11 Exercises
Join our community Discord space
Chapter 5
Comparing Models
5.1 Posterior predictive checks
5.2 The balance between simplicity and accuracy
5.3 Measures of predictive accuracy
5.4 Calculating predictive accuracy with ArviZ
5.5 Model averaging
5.6 Bayes factors
5.7 Bayes factors and inference
5.8 Regularizing priors
5.9 Summary
5.10 Exercises
Join our community Discord space
Chapter 6
Modeling with Bambi
6.1 One syntax to rule them all
6.2 The bikes model, Bambi’s version
6.3 Polynomial regression
6.4 Splines
6.5 Distributional models
6.6 Categorical predictors
6.7 Interactions
6.8 Interpreting models with Bambi
6.9 Variable selection
6.10 Summary
6.11 Exercises
Join our community Discord space
Chapter 7
Mixture Models
7.1 Understanding mixture models
7.2 Finite mixture models
7.3 The non-identifiability of mixture models
7.4 How to choose K
7.5 Zero-Inflated and hurdle models
7.6 Mixture models and clustering
7.7 Non-finite mixture model
7.8 Continuous mixtures
7.9 Summary
7.10 Exercises
Join our community Discord space
Chapter 8
Gaussian Processes
8.1 Linear models and non-linear data
8.2 Modeling functions
8.3 Multivariate Gaussians and functions
8.4 Gaussian processes
8.5 Gaussian process regression
8.6 Gaussian process regression with PyMC
8.7 Gaussian process classification
8.8 Cox processes
8.9 Regression with spatial autocorrelation
8.10 Hilbert space GPs
8.11 Summary
8.12 Exercises
Join our community Discord space
Chapter 9
Bayesian Additive Regression Trees
9.1 Decision trees
9.2 BART models
9.3 Distributional BART models
9.4 Constant and linear response
9.5 Choosing the number of trees
9.6 Summary
9.7 Exercises
Join our community Discord space
Chapter 10
Inference Engines
10.1 Inference engines
10.2 The grid method
10.3 Quadratic method
10.4 Markovian methods
10.5 Sequential Monte Carlo
10.6 Diagnosing the samples
10.7 Convergence
10.8 Effective Sample Size (ESS)
10.9 Monte Carlo standard error
10.10 Divergences
10.11 Keep calm and keep trying
10.12 Summary
10.13 Exercises
Join our community Discord space
Chapter 11
Where to Go Next
Join our community Discord space
Bibliography
Other Books You May Enjoy
Index
Preface
Bayesian statistics has been developing for more than 250 years. During this time, it has enjoyed as much recognition and appreciation as it has faced disdain and contempt. Throughout the last few decades, it has gained more and more attention from people in statistics and almost all the other sciences, engineering, and even outside the boundaries of the academic world. This revival has been possible due to theoretical and computational advancements developed mostly throughout the second half of the 20th century. Indeed, modern Bayesian statistics is mostly computational statistics. The necessity for flexible and transparent models and a more intuitive interpretation of statistical models and analysis has only contributed to the trend.
In this book, our focus will be on a practical approach to Bayesian statistics and we will not delve into discussions about the frequentist approach or its connection to Bayesian statistics. This decision is made to maintain a clear and concise focus on the subject matter. If you are interested in that perspective, Doing Bayesian Data Analysis may be the book for you [Kruschke, 2014]. We also avoid philosophical discussions, not because they are not interesting or relevant, but because this book aims to be a practical guide to Bayesian data analysis. One good reading for such discussion is Clayton [2021].
We follow a modeling approach to statistics. We will learn how to think in terms of probabilistic models and apply Bayes’ theorem to derive the logical consequences of our models and data. The approach will also be computational; models will be coded using PyMC [Abril-Pla et al., 2023] and Bambi [Capretto et al., 2022]. These are libraries for Bayesian statistics that hide most of the mathematical details and computations from the user. We will then use ArviZ [Kumar et al., 2019], a Python package for exploratory analysis of Bayesian models, to better understand our results. We will also be assisted by other libraries in the Python ecosystem, including PreliZ [Icazatti et al., 2023] for prior elicitation, Kulprit for variable selection, and PyMC-BART [Quiroga et al., 2022] for flexible regression. And of course, we will also use common tools from the standard Python Data stack, like NumPy [Harris et al., 2020], matplotlib [Hunter, 2007], Pandas [Wes McKinney, 2010], etc.
Bayesian methods are theoretically grounded in probability theory, and so it’s no wonder that many books about Bayesian statistics is full of mathematical formulas requiring a certain level of mathematical sophistication. Learning the mathematical foundations of statistics will certainly help you build better models and gain intuition about problems, models, and results. Nevertheless, libraries such as PyMC allow us to learn and do Bayesian statistics with only a modest amount of mathematical knowledge, as you will be able to verify yourself throughout this book.
Who this book is for
If you are a student, data scientist, researcher in the natural or social sciences, or developer looking to get started with Bayesian data analysis and probabilistic programming, this book is for you. The book is introductory, so no previous statistical knowledge is required. However, the book assumes you have experience with Python and familiarity with libraries like NumPy and matplotlib.
What this book covers
Chapter 1, Thinking Probabilistically, covers the basic concepts of Bayesian statistics and its implications for data analysis. This chapter contains most of the foundational ideas used in the rest of the book.
Chapter 2, Programming Probabilistically, revisits the concepts from the previous chapter from a more computational perspective. The PyMC probabilistic programming library and ArviZ, a Python library for exploratory analysis of Bayesian models are introduced.
Chapter 3, Hierarchical Models, illustrates the core ideas of hierarchical models through examples.
Chapter 4, Modeling with Lines, covers the basic elements of linear regression, a very widely used model and the building block of more complex models, and then moves into generalizing linear models to solve many data analysis problems.
Chapter 5, Comparing Models, discusses how to compare and select models using posterior predictive checks, LOO, and Bayes factors. The general caveats of these methods are discussed and model averaging is also illustrated.
Chapter 6, Modeling with Bambi, introduces Bambi, a Bayesian library built on top of PyMC that simplifies working with generalized linear models. In this chapter, we will also discuss variable selection and new models like splines.
Chapter 7, Mixture Models, discusses how to add flexibility to models by mixing simpler distributions to build more complex ones. The first non-parametric model in the book is also introduced: the Dirichlet process.
Chapter 8, Gaussian Processes, covers the basic idea behind Gaussian processes and how to use them to build non-parametric models over functions for a wide array of problems.
Chapter 9, Bayesian Additive Regression Trees, introduces readers to a flexible regression model that combines decision trees and Bayesian modeling techniques. The chapter will cover the key features of BART, including its flexibility in capturing non-linear relationships between predictors and outcomes and how it can be used for variable selection.
Chapter 10, Inference Engines, provides an introduction to methods for numerically approximating the posterior distribution, as well as a very important topic from the practitioner’s perspective: how to diagnose the reliability of the approximated posterior.
Chapter 11, Where to Go Next?, provides a list of resources to keep learning from beyond this book, and a concise farewell speech.
What’s new in this edition?
We have incorporated feedback from readers of the second edition to refine the text and the code in this third edition, to improve clarity and readability. We have also added new examples and new sections and removed some sections that in retrospect were not that useful.
In the second edition, we extensively use PyMC and ArviZ. In this new edition, we use the last available version of PyMC and ArviZ at the time of writing and we showcase some of its new features. This new edition also reflects how the PyMC ecosystem has bloomed in the last few years. We discuss 4 new libraries:
Bambi, a library for Bayesian regression models with a very simple interface. We have a dedicated chapter to it.
Kulprit, a very new library for variable selection built on top of Bambi. We show one example of how to use it and provide the intuition for the theory behind this package.
PreliZ is a library for prior elicitation. We use it from Chapter 1 and in many chapters after that.
PyMC-BART, a library that extends PyMC to support Bayesian Additive Regression Trees. We have a dedicated chapter to it.
The following list delineates the changes introduced in the third edition as compared to the second edition.
Chapter 1, Thinking Probabilistically We have added a new introduction to probability theory. This is something many readers asked for. The introduction is not meant to be a replacement for a proper course in probability theory, but it should be enough to get you started.
Chapter 2, Programming Probabilistically We discuss the Savage-Dickey density ratio (also discussed in Chapter 5). We explain the InferenceData object from ArviZ and how to use coords and dims with PyMC and ArviZ. We moved the section on hierarchical models to its own chapter, Chapter 3.
Chapter 3, Hierarchical Models We have promoted the discussion of hierarchical models to its dedicated chapter. We refine the discussion of hierarchical models and add a new example, for which we use a dataset from football European leagues.
Chapter 4, Modeling with Lines This chapter has been extensively rewritten. We use the Bikes dataset to introduce both simple linear regression and negative binomial regression. Generalized linear models (GLMs) are introduced early in this chapter (in the previous edition they were introduced in another chapter). This helps you to see the connection between linear regression and GLMs and allows us to introduce more advanced concepts in Chapter 6. We discuss the centered vs non-centered parametrization of linear models.
Chapter 5, Comparing Models We have cleaned the text to make it more clear and removed some bits that were not that useful after all. We now recommend the use of LOO over WAIC. We have added a discussion about the Savage-Dickey density ratio to compute Bayes factors.
Chapter 6, Modeling with Bambi We show you how to use Bambi, a high-level Bayesian model-building interface written in Python. We take advantage of the simple syntax offered by Bambi to expand what we learned in Chapter 4, including splines, distributional models, categorical models, and interactions. We also show how Bambi can help us to interpret complex linear models that otherwise can become confusing, error-prone, or just time-consuming. We close the chapter by discussing variable selection with Kulprit, a Python package that tightly integrates with Bambi.
Chapter 7, Mixture Models We have clarified some of the discussions based on feedback from readers. We also discuss Zero-Inflated and hurdle models and show how to use rootograms to evaluate the fit of discrete models.
Chapter 8, Gaussian Processes We have cleaned the text to make explanations clear and removed some of the boilerplate code and text for a more fluid reading. We also discuss how to define a kernel with a custom distance instead of the default Euclidean distance. We discuss the practical application of Hilbert Space Gaussian processes, a fast approximation to Gaussian processes.
Chapter 9, Bayesian Additive Regression Trees This is an entirely new chapter discussing BART models, a flexible and easy-to-use non-parametric Bayesian method.
Chapter 10, Inference Engines We have removed the discussion on variational inference as it is not used in the book. We have updated and expanded the discussion of trace plots, ^R, ESS, and MCSE. We also included a discussion on rank plots and a better example of divergences and centered vs non-centered parameterizations.
Installation instructions
The code in the book was written using Python version 3.11.6. To install Python and Python libraries, I recommend using Anaconda, a scientific computing distribution. You can read more about Anaconda and download it at https://www.anaconda.com/products/distribution. This will install many useful Python packages on your system.
Additionally, you will need to install some packages. To do that, please use:
conda install -c conda-forge pymc==5.8.0 arviz==0.16.1 bambi==0.13.0
pymc-bart==0.5.2 kulprit==0.0.1 preliz==0.3.6 nutpie==0.9.1
You can also use pip if you prefer:
pip install pymc==5.8.0 arviz==0.16.1 bambi==0.13.0 pymc-bart==0.5.2
kulprit==0.0.1 preliz==0.3.6 nutpie==0.9.1
An alternative way to install the necessary packages once Anaconda is installed in your system is to go to https://github.com/aloctavodia/BAP3 and download the environment file named bap3.yml. With it, you can install all the necessary packages using the following command:
conda env create -f bap3.yml
The Python packages used to write this book are listed here:
ArviZ 0.16.1
Bambi 0.13.0
Kulprit 0.0.1
PreliZ 0.3.6
PyMC 5.8.0
PyMC-BART 0.5.2
Python 3.11.6
Notebook 7.0.6
Matplotlib 3.8.0
NumPy 1.24.4
Numba 0.58.1
Nutpie 0.9.1
SciPy 1.11.3
Pandas 2.1.2
Xarray 2023.10.1
How to run the code while reading
The code presented in each chapter assumes that you have imported at least some of these packages. Instead of copying and pasting the code from the book, I recommend downloading the code from https://github.com/aloctavodia/BAP3 and running it using Jupyter Notebook (or Jupyter Lab). Additionally, most figures in this book are generated using code that is present in the notebooks but not always shown in the book.
If you find a technical problem while running the code in this book, a typo in the text, or any other errors, please fill in the issue at https://github.com/aloctavodia/BAP3 and I will try to resolve it as soon as possible.
Conventions used
There are several text conventions used throughout this book.
code_in_text: Indicates code words in the text, filenames, or names of functions. Here is an example: Most of the preceding code is for plotting; the probabilistic part is performed by the y = stats.norm(mu, sd).pdf(x) line.
A block of code is set as follows:
Code 1
μ = 0. σ = 1. X = pz.Normal(μ, σ) x = X.rvs(3)
Bold: Indicates a new term, or an important word.
Italics: Suggests a less rigorous or colloquial utilization of a term.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you open an issue ticket at https://github.com/aloctavodia/BAP3
Becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
For more information about Packt, please visit https://www.packtpub.com/.
Share your thoughts
Once you’ve read Bayesian Analysis with Python, Third Edition, we’d love to hear your thoughts! Scan the QR code below to go straight to the Amazon review page for this book and share your feedback.
PIChttps://packt.link/r/1805127160
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere? Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste