0% found this document useful (0 votes)
63 views

Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar

ebook

Uploaded by

pipicdps
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Introduction To Conformal Prediction With Python: A Short Guide For Quantifying Uncertainty of Machine Learning Models 1st Edition Christoph Molnar

ebook

Uploaded by

pipicdps
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Full download test bank at ebookmeta.

com

Introduction To Conformal Prediction With Python :


A Short Guide For Quantifying Uncertainty Of
Machine Learning Models 1st Edition Christoph
Molnar
For dowload this book click LINK or Button below

https://ebookmeta.com/product/introduction-to-
conformal-prediction-with-python-a-short-guide-
for-quantifying-uncertainty-of-machine-learning-
models-1st-edition-christoph-molnar/
OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Learn TensorFlow 2.0: Implement Machine Learning and


Deep Learning Models with Python 1st Edition Pramod
Singh

https://ebookmeta.com/product/learn-tensorflow-2-0-implement-
machine-learning-and-deep-learning-models-with-python-1st-
edition-pramod-singh/

Learning Genetic Algorithms with Python Empower the


performance of Machine Learning and AI models with the
capabilities of a powerful search algorithm 1st Edition
Gridin
https://ebookmeta.com/product/learning-genetic-algorithms-with-
python-empower-the-performance-of-machine-learning-and-ai-models-
with-the-capabilities-of-a-powerful-search-algorithm-1st-edition-
gridin/

Deep Learning for Finance: Creating Machine & Deep


Learning Models for Trading in Python 1st Edition
Sofien Kaabar

https://ebookmeta.com/product/deep-learning-for-finance-creating-
machine-deep-learning-models-for-trading-in-python-1st-edition-
sofien-kaabar/

Machine Learning for Knowledge Discovery with R:


Methodologies for Modeling, Inference and Prediction
1st Edition Kao-Tai Tsai

https://ebookmeta.com/product/machine-learning-for-knowledge-
discovery-with-r-methodologies-for-modeling-inference-and-
prediction-1st-edition-kao-tai-tsai/
Multivariate Statistical Machine Learning Methods for
Genomic Prediction Montesinos López

https://ebookmeta.com/product/multivariate-statistical-machine-
learning-methods-for-genomic-prediction-montesinos-lopez/

Machine Learning with Python Cookbook, 2nd Edition Kyle


Gallatin

https://ebookmeta.com/product/machine-learning-with-python-
cookbook-2nd-edition-kyle-gallatin/

Machine Learning with Python Cookbook 2nd Edition Chris


Albon

https://ebookmeta.com/product/machine-learning-with-python-
cookbook-2nd-edition-chris-albon/

Beginner's Guide to Streamlit with Python: Build Web-


Based Data and Machine Learning Applications 1st
Edition Sujay Raghavendra

https://ebookmeta.com/product/beginners-guide-to-streamlit-with-
python-build-web-based-data-and-machine-learning-
applications-1st-edition-sujay-raghavendra/

Building Machine Learning and Deep Learning Models on


Google Cloud Platform: A Comprehensive Guide for
Beginners 1st Edition Ekaba Bisong

https://ebookmeta.com/product/building-machine-learning-and-deep-
learning-models-on-google-cloud-platform-a-comprehensive-guide-
for-beginners-1st-edition-ekaba-bisong/
Introduction To Conformal
Prediction With Python
A Short Guide for Quantifying Uncertainty of Machine
Learning Models

Christoph Molnar
Introduction To Conformal Prediction With
Python
A Short Guide for Quantifying Uncertainty of Machine Learning Models
© 2023 Christoph Molnar, Germany, Munich
christophmolnar.com
For more information about permission to reproduce selections from this book,
write to [email protected].
2023, First Edition
Christoph Molnar
c/o MUCBOOK, Heidi Seibold
Elsenheimerstraße 48
80687 München, Germany

commit id: 7319978


Content

1 Summary 7

2 Preface 9

3 Who This Book Is For 10

4 Introduction to Conformal Prediction 11


4.1 We need uncertainty quantification . . . . . . . . . . . . . . . . . 11
4.2 Uncertainty has many sources . . . . . . . . . . . . . . . . . . . . 12
4.3 Distinguish good from bad predictions . . . . . . . . . . . . . . . 13
4.4 Other approaches don’t have guaranteed coverage . . . . . . . . . 15
4.5 Conformal prediction fills the gap . . . . . . . . . . . . . . . . . . 16

5 Getting Started with Conformal Prediction in Python 19


5.1 Installing the software . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Let’s classify some beans . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 First try: a naive approach . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Second try: conformal classification . . . . . . . . . . . . . . . . . 24
5.5 Getting started with MAPIE . . . . . . . . . . . . . . . . . . . . 28

6 Intuition Behind Conformal Prediction 33


6.1 Conformal prediction is a recipe . . . . . . . . . . . . . . . . . . . 37
6.2 Understand parallels to out-of-sample evaluation . . . . . . . . . . 38
6.3 How to interpret prediction regions and coverage . . . . . . . . . 41
6.4 Conformal prediction and supervised learning . . . . . . . . . . . 41

7 Classification 43
7.1 Back to the beans . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 The naive method doesn’t work . . . . . . . . . . . . . . . . . . . 45
7.3 The Score method is simple but not adaptive . . . . . . . . . . . 46

4
7.4 Use Adaptive Prediction Sets (APS) for conditional coverage . . . 51
7.5 Top-k method for fixed size sets . . . . . . . . . . . . . . . . . . . 58
7.6 Regularized APS (RAPS) for small sets . . . . . . . . . . . . . . . 59
7.7 Group-balanced conformal prediction . . . . . . . . . . . . . . . . 61
7.8 Class-Conditional APS (CCAPS) for coverage by class . . . . . . 63
7.9 Guide for choosing a conformal classification method . . . . . . . 64

8 Regression and Quantile Regression 65


8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Rent Index Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Conformalized Mean Regression . . . . . . . . . . . . . . . . . . . 66
8.4 Conformalized Quantile Regression (CQR) . . . . . . . . . . . . . 75

9 A Glimpse Beyond Classification and Regression 83


9.1 Quickly categorize conformal prediction by task and score . . . . 83
9.2 Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . 85
9.3 Multi-Label Classification . . . . . . . . . . . . . . . . . . . . . . 85
9.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.5 Probability Calibration . . . . . . . . . . . . . . . . . . . . . . . . 88
9.6 And many more tasks . . . . . . . . . . . . . . . . . . . . . . . . 88
9.7 How to stay up to date . . . . . . . . . . . . . . . . . . . . . . . . 89

10 Design Your Own Conformal Predictor 90


10.1 Steps to build your own conformal predictor . . . . . . . . . . . . 90
10.2 Finding the right non-conformity score . . . . . . . . . . . . . . . 91
10.3 Start with a heuristic notion of uncertainty . . . . . . . . . . . . . 92
10.4 A general recipe for 1D uncertaity heuristics . . . . . . . . . . . . 92
10.5 Metrics for evaluating conformal predictors . . . . . . . . . . . . . 93

11 Q & A 95
11.1 How do I choose the calibration size? . . . . . . . . . . . . . . . . 95
11.2 How do I make conformal prediction reproducible? . . . . . . . . 95
11.3 How does alpha affect the size of the prediction regions? . . . . . 95
11.4 What happens if I choose a large 𝛼 for conformal classification? . 96
11.5 How to interpret empty prediction sets? . . . . . . . . . . . . . . 96
11.6 Can I use the same data for calibration and model evaluation? . . 96
11.7 What if I find errors in the book or want to provide feedback? . . 97

5
12 Acknowledgements 98

References 99

6
1 Summary
A prerequisite for trust in machine learning is uncertainty quantification. Without
it, an accurate prediction and a wild guess look the same.
Yet many machine learning models come without uncertainty quantification. And
while there are many approaches to uncertainty – from Bayesian posteriors to
bootstrapping – we have no guarantees that these approaches will perform well
on new data.
At first glance conformal prediction seems like yet another contender. But con-
formal prediction can work in combination with any other uncertainty approach
and has many advantages that make it stand out:

• Guaranteed coverage: Prediction regions generated by conformal predic-


tion come with coverage guarantees of the true outcome
• Easy to use: Conformal prediction approaches can be implemented from
scratch with just a few lines of code
• Model-agnostic: Conformal prediction works with any machine learning
model
• Distribution-free: Conformal prediction makes no distributional assump-
tions
• No retraining required: Conformal prediction can be used without re-
training the model
• Broad application: conformal prediction works for classification, regres-
sion, time series forecasting, and many other tasks

Sound good? Then this is the right book for you to learn about this versatile,
easy-to-use yet powerful tool for taming the uncertainty of your models.
This book:

• Teaches the intuition behind conformal prediction

7
• Demonstrates how conformal prediction works for classification and regres-
sion
• Shows how to apply conformal prediction using Python
• Enables you to quickly learn new conformal algorithms

With the knowledge in this book, you’ll be ready to quantify the uncertainty of
any model.

8
2 Preface
My first encounter with conformal prediction was years ago, when I read a paper
on feature importance. I wasn’t looking for uncertainty quantification. Never-
theless, I tried to understand conformal prediction but was quickly discouraged
because I didn’t immediately understand the concept. I moved on.
About 4 years later, conformal prediction kept popping up on my Twitter and
elsewhere. I tried to ignore it, mostly successfully, but at some point I became
interested in understanding what conformal prediction was. So I dug deeper and
found a method that I actually find intuitive.
My favorite way to learn is to teach, so I decided to do a deep dive in the form of an
email course. For 5 weeks, my newsletter Mindful Modeler1 became a classroom
for conformal prediction. I didn’t know how this experiment would turn out. But
it quickly became clear that many people were eager to learn about conformal
prediction. The course was a success. So I decided to build on that and turn
everything I learned about conformal prediction into a book. You hold the results
in your hand (or in your RAM).
I love turning academic knowledge into practical advice. Conformal prediction is
in a sweet spot: There’s an explosion of academic interest and conformal predic-
tion holds great promise for practical data science. The math behind conformal
prediction isn’t easy. That’s one reason why I gave it a pass for a few years. But
it was a pleasant surprise to find that from an application perspective, conformal
prediction is simple. Solid theory, easy to use, broad applicability – conformal
prediction is ready. But it still lives mostly in the academic sphere.
With this book, I hope to strengthen the knowledge transfer from academia to
practice and bring conformal prediction to the streets.

1
https://mindfulmodeler.substack.com/

9
3 Who This Book Is For
This book is for data scientists, statisticians, machine learners and all other mod-
elers who want to learn how to quantify uncertainty with conformal prediction.
Even if you already use uncertainty quantification in one way or another, confor-
mal prediction is a valuable addition to your toolbox.
Prerequisites:

• You should know the basics of machine learning


• Practical experience with modeling is helpful
• If you want to follow the code examples, you should know the basics of
Python or at least another programming language
• This includes knowing how to install Python and Python libraries
The book is not an academic introduction to the topic, but a very practical one.
So instead of lots of theory and math, there will be intuitive explanations and
hands-on examples.

10
4 Introduction to Conformal
Prediction
In this chapter, you’ll learn

• Why and when we need uncertainty quantification


• What conformal prediction is

4.1 We need uncertainty quantification


Machine learning models make predictions and to fully trust them, we need to
know how certain those predictions really are.
Uncertainty quantification is essential in many situations:

• When we use model predictions to make decisions


• When we want to design robust systems that can handle unexpected situa-
tions
• When we have automated a task with machine learning and need an indi-
cator of when to intervene
• When we want to communicate the uncertainty associated with our predic-
tions to stakeholders

The importance of quantifying uncertainty depends on the application for which


machine learning is being used. Here are some use cases:

• Uncertainty quantification can improve fraud detection in insurance claims


by providing context to case workers evaluating potentially fraudulent
claims. This is especially important when a machine learning model used
to detect fraud is uncertain in its predictions. In such cases, the case

11
workers can use the uncertainty estimates to prioritize their review of the
claim and intervene if necessary.
• Uncertainty quantification can be used to improve the user experience in a
banking app. While the classification of financial transactions into “rent,”
“groceries,” and so on can be largely automated through machine learning,
there will always be transactions that are difficult to classify. Uncertainty
quantification can identify tricky transactions and prompt the user to clas-
sify them.
• Demand forecasting using machine learning can be improved by using un-
certainty quantification, which can provide additional context on the con-
fidence in the prediction. This is especially important in situations where
the demand must meet a certain threshold in order to justify production.
By understanding the uncertainty of the forecast, an organization can make
more informed decisions about whether to proceed with production.

ė Note
As a rule of thumb, you need uncertainty quantification whenever a point
prediction isn’t informative enough.

But where does this uncertainty come from?

4.2 Uncertainty has many sources


A prediction is the result of measuring and collecting data, cleaning the data, and
training a model. Uncertainty can creep into the pipeline at every step of this
long journey:

• The model is trained on a random sample of data, making the model itself
a random variable. If you were to train the model on a different sample
from the same distribution, you would get a slightly different model.
• Some models are even trained in a non-deterministic way. Think of random
weight initialization in neural networks or sampling mechanisms in random
forests. If you train a model with non-deterministic training twice on the
same data, you will get slightly different models.
• This uncertainty in model training is worse when the training dataset is
small.

12
• Hyperparameter tuning, model selection, and feature selection have the
same problem – all of these modeling steps involve estimation based on
random samples of data, which adds to uncertainty to the modeling process.
• The data may not be perfectly measured. The features or the target may
contain measurement errors, such as people filling out surveys incorrectly,
copying errors, and faulty measurements.
• Data sets may have missing values.
Some examples:

• Let’s say we’re predicting house values. The floor type feature isn’t always
accurate, so our model has to work with data that contains measurement
errors. For this and other reasons, the model will not always predict the
house value correctly.
• Decision trees are known to be unstable – small changes in the data can lead
to large differences in how the tree looks like. While this type of uncertainty
is “invisible” when only one tree is trained, it becomes apparent when the
model is retrained, since a new tree will likely have different splits.
• Image classification: Human labelers may disagree on how to classify an
image. A dataset consisting of different human labelers will therefore con-
tain uncertainty as the model will never be able to perfectly predict the
“correct” class, because the true class is up for debate.

4.3 Distinguish good from bad predictions


A trained machine learning model can be thought of as a function that takes the
features as input and outputs a prediction. But not all predictions are equally
hard. Some predictions will be spot on but others will be like wild guesses by
the model, and if the model doesn’t output some kind of confidence or certainty
score, we have a problem: We can’t distinguish good predictions from wild guesses.
Both are just spit out by the model.
Imagine an image classifier that decides whether a picture shows a cat, a dog
or some other animal. Digging a bit into the data, we find that there are some
images where the pets are dressed in costumes, see Figure 4.1b.
For classification, we at least have an idea of how uncertain the classification was.
Look at these two distributions of model probability scores in Figure Figure 4.2:

13
(a) Clearly A Dog (b) Don’t let these dogs bamboozle you.
They want you to believe that they are
ghosts. They are not!

Figure 4.1: Not all images are equally difficult to classify.

14
One classification is quite clear, because the probability is so high. In the other
case, it was a close call for the “cat” category, so we would assume that this
classification was less certain.

(a) Easy Dogo (b) Difficult

Figure 4.2: Classification scores by class.

At first glance, aren’t we done when the model outputs probabilities and we use
them to get an idea of uncertainty? Unfortunately, no. Let’s explore why.

4.4 Other approaches don’t have guaranteed


coverage
For classification we get the class probabilities, Bayesian models produce pre-
dictive posterior distributions, and random forests can show the variance across
trees. In theory, we could just use rely on such approaches to uncertainty. If we
do that, why would we need conformal prediction?

15
The main problem is that these approaches don’t come with any reasonable1
guarantee that they cover the true outcome (Niculescu-Mizil and Caruana 2005;
Lambrou et al. 2012; Johansson and Gabrielsson 2019; Dewolf et al. 2022).

• Class probabilities: We should not interpret these scores as actual prob-


abilities – they just look like probabilities, but are usually not calibrated.
Probability scores are calibrated if, for example, among all classifications
with a score of 90%, we find the true class 9 times out of 10.
• Bayesian posterior predictive intervals: While these intervals express our
belief about where the correct outcome is likely to be, the interval is based
on distributional assumptions for the prior and the distribution family cho-
sen for the data. But unfortunately, reality is often more complex than the
simplified distribution assumptions that we make.
• Bootstrapping: Refitting the model with sampled data can give us an idea
of the uncertainty of a prediction. However, bootstrapping is known to
underestimate the true variance, meaning that 90% prediction intervals are
likely to cover the true value less than 90% of the time (Hesterberg 2015).
Bootstrapped intervals are usually too narrow, especially for small samples.

ė Naive Approach
The naive approach is to take at face value the uncertainty scores that the
model spits out - confidence intervals, variance, Bayesian posteriors, multi-
class probabilities. The problem: you can’t expect these outcomes to be well
calibrated.

4.5 Conformal prediction fills the gap


Conformal prediction is a set of methods that takes an uncertainty score and
turns it into a rigorous score. “Rigorous” means that the output has probabilistic
guarantees that it covers the true outcome.

1
Some methods, such as Bayesian posteriors actually do have guarantees that they cover the
true values. However, this depends on modeling assumptions, such as the priors and data
distributions. Such distributional assumptions are an oversimplification for practically all
real applications and are likely to be violated. Therefore, you can’t count on coverage
guarantees that are based on strong assumptions.

16
Conformal prediction changes what a prediction looks like: it turns point pre-
dictions into prediction regions.2 For multi-class classification it turns the class
output into a set of classes:

Conformal prediction has many advantages that make it a valuable tool to


wield:
• Distribution-free: No assumptions about the distribution of the data,
unlike for Bayesian approaches where you have to specify the priors and
data distribution
• Model-agnostic: Conformal prediction can be applied to any predictive
model
• Coverage guarantee: The resulting prediction sets come with guarantees
of covering the true outcome with a certain probability
2
There’s a difference between confidence intervals (or Bayesian posteriors for that matter) and
prediction intervals. The latter quantify the uncertainty of a prediction and therefore can
be applied to any predictive model. The former only makes sense for parametric models like
logistic regression and describes the uncertainty of the model parameters.

17
Ÿ Warning

Conformal prediction has one important assumption: exchangeability. If the


data used for calibration is very different from the data for which you want
to quantify the predictive uncertainty, the coverage guarantee goes down the
drain. For e.g. conformal time series forecasting, exchangeability is relaxed
but needs other assumptions.

Before we delve into theory and intuition, let’s see conformal prediction in ac-
tion.

18
5 Getting Started with Conformal
Prediction in Python
In this chapter, you’ll learn:

• That naively trusting class probabilities is bad


• How to use conformal prediction in Python with the MAPIE library
• How to implement a simple conformal prediction algorithm yourself

5.1 Installing the software


To run the examples in this book on your machine, you need Python and some
libraries installed. These are the libraries that I used, along with their version:

• Python (3.10.7)
• scikit-learn (1.2.0)
• MAPIE1 (0.6.1)
• pandas (1.5.2)
• matplotlib (3.6.2)

Before we dive into any kind of theory with conformal prediction let’s just get a
feel for it with a code example.

1
https://mapie.readthedocs.io/en/latest/index.html

19
5.2 Let’s classify some beans
A (fictional) bean company uses machine learning to automatically classify dry
beans2 into 1 of 7 different varieties: Barbunya, Bombay, Cali, Dermason, Horoz,
Seker, and Sira.
The bean dataset contains 13,611 beans (Koklu and Ozkan 2020). Each row is a
dry bean with 8 measurements such as length, roundness, and solidity, in addition
to the variety which is the prediction target.
The different varieties have different characteristics, so it makes sense to classify
them and sell the beans by variety. Planting the right variety is important for
reasons of yield and disease protection. Automating this classification task with
machine learning frees up a lot of time that would otherwise be spent doing it
manually.
Here is how to download the data:

import os
import wget
import zipfile
from os.path import exists

# Download if not available


bean_data_file = "./DryBeanDataset/Dry_Bean_Dataset.xlsx"
base = "https://archive.ics.uci.edu/ml/machine-learning-databases/"
dataset_number = "00602"
if not exists(bean_data_file):
filename = "DryBeanDataset.zip"
url = base + dataset_number + "/" + filename
wget.download(url)
with zipfile.ZipFile(filename, 'r') as zip_ref:
zip_ref.extractall('./')
os.remove(filename)

2
Dry beans are not to be confused with dried beans. Well, you buy dry beans dried, but not
all dried beans are dry beans. Get it? Dry beans are a type of bean (small and white) eaten
in Turkey, for example.

20
The model was trained in ancient times by some legendary dude who left the
company a long time ago. It’s a Naive Bayes model. And it sucks. This is his
code:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB

# Read in the data from Excel file


bean_data_file = "./DryBeanDataset/Dry_Bean_Dataset.xlsx"
beans = pd.read_excel(bean_data_file)
# Labels are characters but should be integers for sklearn
le = LabelEncoder()
beans["Class"] = le.fit_transform(beans["Class"])
# Split data into classification target and features
y = beans["Class"]
X = beans.drop("Class", axis = 1)

# Split of training data


X_train, X_rest1, y_train, y_rest1 = train_test_split(
X, y, train_size=10000, random_state=2
)

# From the remaining data, split of test data


X_test, X_rest2, y_test, y_rest2 = train_test_split(
X_rest1, y_rest1, train_size=1000, random_state=42
)

# Split remaining into calibration and "new" data


X_calib, X_new, y_calib, y_new = train_test_split(
X_rest2, y_rest2, train_size=1000, random_state=42
)

# Fit the model


model = GaussianNB().fit(X_train, y_train)

21
Instead of splitting the data only into training and testing, we split the 13,611
beans into:

• 10,000 data samples (X_train, y_train) for training the model


• 1,000 data samples (X_test, y_test) for evaluating model performance
• 1,000 data samples (X_calib, y_calib) for calibration (more on that later)
• The remaining 1,611 data samples (X_new, y_new) for the conformal pre-
diction step and for evaluating the conformal predictor (more on that later)
The dude didn’t even bother to tune hyperparameters or do model selection.
Yikes. Well, let’s have a look at the predictive performance:

from sklearn.metrics import confusion_matrix


# Check accuracy
y_pred = model.predict(X_test)
print("Accuracy:", (y_pred == y_test).mean())
# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(pd.DataFrame(cm, index=le.classes_, columns=le.classes_))

Accuracy: 0.758
BARBUNYA BOMBAY CALI DERMASON HOROZ SEKER SIRA
BARBUNYA 46 0 47 0 6 0 4
BOMBAY 0 33 0 0 0 0 0
CALI 20 0 81 0 3 0 0
DERMASON 0 0 0 223 0 32 9
HOROZ 0 0 4 3 104 0 22
SEKER 2 0 0 26 1 127 22
SIRA 0 0 0 10 10 21 144

75.80% of the beans in the test data are classified correctly. How to read this
confusion matrix: rows indicate the true classes and columns the predicted classes.
For example, 47 BARBUNYA beans were falsely classified as CALI.
The classes seem to have different classification difficulties, for example Bombay
is always classified correctly in the test data, but Barbunya only half of the time.

22
Overall the model is not the best model.3
Unfortunately, the model can’t be easily replaced because it’s hopelessly inter-
twined with the rest of the bean company’s backend. And nobody wants to be
the one to pull the wrong piece out of this Jenga tower of a backend.
The dry bean company is in trouble. Several customers have complained that
they bought bags of one variety of beans but there were too many beans of other
varieties mixed in.
The bean company holds an emergency meeting and it’s decided that they will
offer premium products with a guaranteed percentage of the advertised bean
variety. For example, a bag labeled “Seker” should contain at least 95% Seker
beans.

5.3 First try: a naive approach


Great, now all the pressure is on the data scientist to provide such guarantees all
based on this bad model. Her first approach is the “naive approach” to uncertainty
which means taking the probability outputs and believing in them. So instead of
just using the class, she takes the predicted probability score, and if that score is
above 95%, the bean makes it into the 95% bag.
It’s not yet clear what to do with beans that don’t make the cut for any of the
classes, but stew seems to be the most popular option among the employees. The
data scientist doesn’t fully trust the model scores, so she checks the coverage of
the naive approach. Fortunately, she has access to new, labeled data that she
can use to estimate how well her approach is working.
She obtains the probability predictions for the new data, keeps only beans with
>=0.95 predicted probability, and checks how often the ground truth is actually
in that 95% bag.

3
Other models, like random forest, are more likely to be calibrated for this dataset. But I
found that out later, when I was already pretty invested in the dataset. And I liked the data,
so we’ll stick with this example. And it’s not that uncommon to get stuck with suboptimal
solutions in complex systems, like legacy code, etc.

23
# Get the "probabilities" from the model
predictions = model.predict_proba(X_calib)
# Get for each instance the highest probability
high_prob_predictions = np.amax(predictions, axis=1)
# Select the predictions where probability over 99%
high_p_beans = np.where(high_prob_predictions >= 0.95)
# Let's count how often we hit the right label
its_a_match = (model.predict(X_calib) == y_calib)
coverage = np.mean(its_a_match.values[high_p_beans])
print(round(coverage, 3))

0.896

Ideally, 95% or more of the beans should have the predicted class, but she finds
that the 95%-bag only contains 89.6% of the correct variety.
Now what?
She could use methods such as Platt scaling or isotonic regression to calibrate
these probabilities, but again, with no guarantee of correct coverage for new
data.
But she has an idea.

5.4 Second try: conformal classification


The data scientist decides to think about the problem in a different way: she
doesn’t start with the probability scores, but with how she can get a 95% coverage
guarantee.
Can she produce a set of predictions for each bean that covers the true class with
95% probability? It seems to be a matter of finding the right threshold.
So she does the following:
She ignores that the output could be a probability. Instead, she uses the model
“probabilities” to construct a measure of uncertainty:

24
𝑠𝑖 = 1 − 𝑓(𝑥𝑖 )[𝑦𝑖 ]
A slightly sloppy notation for saying that we take 1 minus the model score for
the true class. For example, if the ground truth for bean number 8 is “Seker”
and the probability score for Seker is 0.9, then 𝑠8 = 0.1. In conformal prediction
language, this 𝑠𝑖 -score is called non-conformity score.

ė Non-conformity score

The non-conformity score 𝑠𝑖 for a new data point measures how unusual a
suggested outcome 𝑦 seems like given the model output for 𝑥𝑖 . To decide
which of the possible 𝑦’s are “conformal” (and together form the prediction
region), conformal prediction calculates a threshold. This threshold is based
on the non-conformity scores of the calibration data in combination with
their true labels.

Then she does the following to find the threshold:

1. Start with data not used for model training


2. Calculate the scores 𝑠𝑖
3. Sort the scores from low (certain) to high (uncertain)
4. Compute the threshold 𝑞 ̂ where 95% of the 𝑠𝑖 ’s are smaller (=95% quantile)

The threshold is therefore chosen to cover 95% of the true bean classes.
In Python, this procedure can be done in just a few lines of code:

# Size of calibration data


n = len(X_calib)
# Get the probability predictions
predictions = model.predict_proba(X_calib)
# We only need the probability for the true class
prob_true_class = predictions[np.arange(n),y_calib]
# Turn into uncertainty score (larger means more uncertain)
scores = 1 - prob_true_class

Next, she has to find the cut-off.

25
# Setting the alpha so that we get 95% prediction sets
alpha = 0.05
# define quantile
q_level = np.ceil((n+1)*(1-alpha))/n
qhat = np.quantile(scores, q_level, method='higher')

The quantile level (based on 𝛼) requires a finite sample correction to calculate


the corresponding quantile 𝑞.̂ In this case, the 0.95 was multiplied with (n+1)/n
which means that 𝑞𝑙𝑒𝑣𝑒𝑙 = 0.951 for n = 1000.
If we visualize the scores, we can see that it’s a matter of cutting off at the right
position:

import matplotlib.pyplot as plt

# Get the "probabilities" from the model


predictions = model.predict_proba(X_calib)
# Get for each instance the actual probability of ground truth
prob_for_true_class = predictions[np.arange(len(y_calib)),y_calib]
# Create a histogram
plt.hist(1 - prob_for_true_class, bins=30, range=(0, 1))
# Add a title and labels
plt.xlabel("1 - s(y,x)")
plt.ylabel("Frequency")
plt.show()

26
500

Frequency 400

300

200

100

0
0.0 0.2 0.4 0.6 0.8 1.0
1 - s(y,x)

How does the threshold come into play?


For the figure above, we would cut off all above 𝑞 ̂ = 0.99906. Because for bean
scores 𝑠𝑖 below 0.99906 (equivalent to class “probabilities” > 0.001), we can be
confident that we have the right class included 95% of the time.
But there’s a catch: For some data points, there will be more than one class that
makes the cut. But prediction sets are not a bug, they are a feature of conformal
prediction.

ė Prediction Set
A prediction set – for multi-class tasks – is a set of one or more classes.
Conformal classification gives you a set for each instance.

To generate the prediction sets for a new data point, the data scientist has to
combine all classes that are below the threshold 𝑞 ̂ into a set.

prediction_sets = (1 - model.predict_proba(X_new) <= qhat)

Let’s look at the prediction sets for 3 “new” beans (X_new):

27
for i in range(3):
print(le.classes_[prediction_sets[i]])

['DERMASON']
['DERMASON']
['DERMASON' 'SEKER']

On average, the prediction sets cover the true class with a probability of 95%.
That’s the guarantee we get from the conformal procedure.
How could the bean company work with such prediction sets? The first set has
only 1 bean variety “DERMASON”, so it would go into a DERMASON bag.
Beans #3 has a prediction set with two varieties. Maybe a chance to offer bean
products with guaranteed coverage, but containing two varieties? Anything with
more categories could be sorted manually, or the CEO could finally make bean
stew for everyone.
The CEO is now more relaxed and confident in the product.
Spoiler alert: the coverage guarantees don’t work the way the bean CEO thinks
they do, as we will soon learn (what they actually need is a class-wise coverage
guarantee that we will learn about in the classification chapter.
And that’s it. You have just seen conformal prediction in action. To be exact,
this was the score method that you will encounter again in the classification
chapter.

5.5 Getting started with MAPIE


The data scientist could also have used MAPIE4 , a Python library for conformal
prediction.

4
https://mapie.readthedocs.io/en/latest/index.html

28
from mapie.classification import MapieClassifier

cp = MapieClassifier(estimator=model, cv="prefit", method="score")


cp.fit(X_calib, y_calib)

y_pred, y_set = cp.predict(X_new, alpha=0.05)


y_set = np.squeeze(y_set)

We’re no longer working with the Naive Bayes model object, but our model is
now a MapieClassifier object. If you are familiar with the sklearn library, it will
feel natural to work with objects in MAPIE. These MAPIE objects have a .fit-
function and a .predict()-function, just like sklearn models do. MapieClassifier
can be thought of as wrapper around our original model.

Figure 5.1: Conformal prediction wraps the model

And when we use the “predict” method of this conformal classifier, we get both the
usual prediction (“y_pred”) and the sets from the conformal prediction (“y_set”).
It’s possible to specify more than one value for 𝛼. But in the code above only 1
value was specified, so the resulting y_set is an array of shape (1000, 7, 1), which
means 1000 data points, 7 classes, and 1 𝛼. The np.squeeze function removes the
last dimension.
Let’s have a look at some of the resulting prediction sets. Since the cp_score
only contains “True” and “False” at the corresponding class indices, we have to
use the class labels to get readable results. Here are the first 5 prediction sets for
the beans:

29
for i in range(5):
print(le.classes_[y_set[i]])

['DERMASON']
['DERMASON']
['DERMASON' 'SEKER']
['DERMASON']
['DERMASON' 'SEKER']

These prediction sets are of size 1 or 2. Let’s have a look at all the other beans
in X_new:

# first count number of classes per bean


set_sizes = y_set.sum(axis=1)
# use pandas to compute how often each size occurs
print(pd.Series(set_sizes).value_counts())

2 871
1 506
3 233
4 1
dtype: int64

Most sets have size 1 or 2, many fewer have 3 varieties, only one set has 4 varieties
of beans.
This looks different if we make 𝛼 small, saying that we want a high probability
that the true class is in there.

y_pred, y_set = cp.predict(X_new, alpha=0.01)


# remove the 1-dim dimension
y_set = np.squeeze(y_set)
for i in range(4):
print(le.classes_[y_set[i]])

30
['DERMASON']
['DERMASON' 'SEKER']
['DERMASON' 'SEKER']
['DERMASON']

And again we look at the distribution of set sizes:

set_sizes = y_set.sum(axis=1)
print(pd.Series(set_sizes).value_counts())

3 780
2 372
4 236
1 222
5 1
dtype: int64

As expected, we get larger sets with a lower value for 𝛼. This is because the
lower the 𝛼, the more often the sets have to cover the true parameter. So we can
already see that there is a trade-off between set size and coverage. We pay for
higher coverage with larger set sizes. That’s why 100% coverage (𝛼 = 0) would
produce a stupid solution: it would just include all bean varieties in every set for
every bean.
If we want to see the results under different 𝛼’s, we can pass an array to MAPIE.
MAPIE will then automatically calculate the sets for all the different 𝛼 confidence
levels. We just have to make sure that we use the third dimension to pick the
right value:

y_pred, y_set = cp.predict(X_new, alpha=[0.1, 0.05])


# get prediction sets for 10th observation and second alpha (0.05)
print(le.classes_[y_set[10,:,1]])

['HOROZ' 'SIRA']

31
We can also create a pandas DataFrame to hold our results, which will print
nicely:

y_pred, y_set = cp.predict(X_new, alpha=0.05)


y_set = np.squeeze(y_set)
df = pd.DataFrame()

for i in range(len(y_pred)):
predset = le.classes_[y_set[i]]
# Create a new dataframe with the calculated values
temp_df = pd.DataFrame({
"set": [predset],
"setsize": [len(predset)]
}, index=[i])
# Concatenate the new dataframe with the existing one
df = pd.concat([df, temp_df])

print(df.head())

set setsize
0 [DERMASON] 1
1 [DERMASON] 1
2 [DERMASON, SEKER] 2
3 [DERMASON] 1
4 [DERMASON, SEKER] 2

Working with conformal prediction and MAPIE is a great experience. But are
the results really what the bean company was looking for? We’ll learn in the
Classification chapter why the bean CEO may have been celebrating too soon. A
hint: the coverage guarantee of the conformal predictor only holds on average –
not necessarily per class.

ė Coverage

The percentage of prediction sets that contain the true label

The next chapter is about the intuition behind conformal prediction.

32
6 Intuition Behind Conformal
Prediction
In this chapter, you will learn

• How conformal prediction works on an intuitive level


• The general “recipe” for conformal prediction
• Parallels to model evaluation

Let’s say you have an image classifier that outputs probabilities, but you want
prediction sets with guaranteed coverage of the true class.
First, we sort the predictions of the calibration dataset from certain to uncertain.
The calibration dataset must be separate from the training dataset. For the
image classifier, we could use 𝑠𝑖 = 1 − 𝑓(𝑥𝑖 )[𝑦𝑖 ] as the so-called non-conformity
score, where 𝑓(𝑥𝑖 )[𝑦𝑖 ] is the model’s probability output for the true class. This
procedure places all images somewhere on a scale of how certain the classification
is, as shown in the following figure.

Ÿ Don’t use training data for calibration

Models have a tendency to overfit the training examples, which in turn biases
their non-conformity scores. If we were to calibrate using the training data,
it’s likely that the threshold would be too small and therefore the coverage
would be too low (less than 1 − 𝛼). The guaranteed coverage only works by
calibrating with data that wasn’t used to train the model.

The dog on the left has a model output of 0.95 and therefore gets s = 0.05, but
the dogs on the right in their spooky costumes bamboozle the neural network.
This spooky image gets a score of only 0.15 for the class dog, which translates
into a score of s = 0.85.

33
Figure 6.1: Images from calibration data sorted from certain to uncertain

We rely on this ordering of the images to divide the images into certain (or
conformal) and uncertain. The size of each fraction depends on the confidence
level 𝛼 that the user chooses.
If 𝛼 = 0.1, then we want to have 90% of the mismatches in the “certain” section.
Finding the threshold is easy because it means calculating the quantile 𝑞:̂ the
score value where 90% (= 1 − 𝛼) of the images are below and 10% (= 𝛼) are
above:
In this example, the scary dogs fall into the uncertain region.
Another assumption that conformal prediction requires is exchangeability.

ė Exchangeability
For the coverage guarantee to hold, the calibration data must be “exchange-
able” with the new data we expect. For example, if they are randomly drawn
from the same distribution, they are exchangeable. If they come from differ-
ent distributions, they may not be exchangeable.

Time series data, for example, are not exchangeable, since the temporal order
matters. We will see how conformal prediction can still be adapted for such cases.

34
Figure 6.2: The threshold divides images along the uncertainty scale into certain
and uncertain.

Exchangeability is a bit less strict than identical and independently distributed


(i.i.d.) data, a typical assumption for many statistical procedures.
A point that I found confusing at first: We picked the threshold without looking
at wrong classifications. Because there will be scores for wrong classes that also
fall into the “certain” region, but we seemingly ignore them when picking the
threshold 𝑞.̂
Within the prediction sets, conformal classification foremost controls the coverage
of positive labels: there’s a guarantee that, on average, 1 − 𝛼 of the sets contain
the true class.
So is it really true that negative examples don’t matter? Because this would
mean that we don’t care how many wrong classes are in the prediction sets. If
we didn’t care about false positives at all, we could always include all the classes
in the prediction sets and guarantee coverage of 100%! A meaningless solution,
of course.
So one part of conformal prediction is about controlling the coverage of positive
labels and the other part is minimizing the number of negative labels, meaning not

35
having too many “wrong” labels in the prediction sets. CP researchers therefore
always look at the average size of prediction sets. Given that two CP algorithms
provide the same guaranteed coverage, the preferred algorithm is the one that
produces smaller prediction sets. In addition, some CP algorithms guarantee
upper bounds on the coverage probability, which also keeps the sets small.
Let’s move on to the conformal prediction step.
For a new image, we check all possible classes: compute the non-conformity score
for each class and keep the classes where the score falls below the threshold 𝑞.̂
All scores below the threshold are conformal with scores that we observed in the
calibration set and are seen as certain enough (based on 𝛼).

Figure 6.3: Prediction step in conformal prediction for classification.

In this example, the image has the prediction set {cat, lion} because both classes
are “conformal” and made the cut. All other class labels are too uncertain and
therefore excluded.

36
Now perhaps it is clearer what happens to the “wrong classes”: If the model
is worth its money, the probabilities for the wrong classes will be rather low.
Therefore the non-conformity score will probably be above the threshold and the
corresponding classes will not be included in the prediction set.

6.1 Conformal prediction is a recipe


Conformal prediction for classification is different from CP for regression. For
example, we use different non-conformity scores and conformal classification pro-
duces prediction sets while conformal regression produces prediction intervals.
Even among classification, there are many different CP algorithms. However, all
conformal prediction algorithms follow roughly the same recipe. That’s great as
it makes it easier to learn new CP algorithms.
Conformal prediction has 3 steps: training, calibration, and prediction.
Training is what you would expect:

1. Split data into training and calibration


2. Train model on training data
Calibration is where the magic happens:

1. Compute uncertainty scores (aka non-conformity scores) for calibration


data
2. Sort the scores from certain to uncertain
3. Decide on a confidence level 𝛼 (𝛼 = 0.1 means 90% coverage)
4. Find the quantile 𝑞 ̂ where 1 − 𝛼 (multiplied with a finite sample correction)
of non-conformity scores are smaller
Prediction is how you use the calibrated scores:

1. Compute the non-conformity scores for the new data


2. Pick all y’s that produce a score below 𝑞 ̂
3. These y’s form your prediction set or interval

37
In the case of classification, the y’s are classes and for regression, the y’s are all
possible values that could be predicted.
A big differentiator between conformal prediction algorithms is the choice of the
non-conformity score. In addition, they can differ in the details of the recipe and
slightly deviate from it as well. In a way, the recipe isn’t fully accurate, or rather
it’s about a specific version of conformal prediction that is called split conformal
prediction. Splitting the data only once into training and calibration is not the
best use of data. If you are familiar with evaluation in machine learning, you
won’t be surprised about the following extensions.

6.2 Understand parallels to out-of-sample


evaluation
So far we have learned about conformal prediction using a single split into training
and calibration. But you can also do the split repeatedly using:

• k-fold cross-splitting (like in cross-validation)


• bootstrapping
• leave-one-out (also called jackknife)

Do these sound familiar to you? If you are familiar with evaluating and tuning ma-
chine learning algorithms, then you already know these resampling strategies.

ė Inductive Conformal Prediction


The version of conformal prediction that relies on splitting data into train-
ing and calibration is called inductive or split conformal prediction. The
alternative is transductive or full conformal prediction (see next box).

For evaluating or tuning machine learning models, you also have to work with
data that was not used for model training. So it makes sense that we encounter
the same options for conformal prediction where we also have to find a balance
between training the model with as much data as possible, but also having access
to “fresh” data for calibration.

38
Figure 6.4: Different strategies for splitting data into training and calibration
sets.

For cross-conformal prediction, you split the data, for example, into 10 pieces.
You take the first 9 pieces together to train the model and compute the non-
conformity scores for the remaining 1/10th. You repeat this step 9 times so that
each piece is once in the calibration set. You end up with non-conformity scores
for the entire dataset and can continue with computing the quantile for conformal
prediction as in the single split scenario.
If you take cross-conformal prediction to the extreme you end up with the leave-
one-out (LOO) method, also called jackknife, where you train a total of n models,
each with n-1 data points (𝑛 is number of data points in both training and
calibration).
All three options are inductive approaches to conformal prediction. Another
approach is transductive or full conformal prediction.

ė Transductive Conformal Prediction


Transductive CP (also called full CP) uses the entire dataset, including the
new data point, for creating prediction regions. Transductive CP doesn’t
split the data and instead refits the model multiple times to produce a pre-

39
diction region: To get the prediction set for a new data point, the model
has to be retrained for every possible value of 𝑦𝑛𝑒𝑤 . Transductive CP isn’t
covered in this book.

Which approach should you pick?

• Single split: Computation-wise the cheapest. Results in a higher variance


of the prediction sets and is a non-optimal use of data. Ignores variance
from model refits. Preferable if refitting the model is expensive.
• Leave-one-out (LOO): Most expensive, since you have to train n models.
The LOO approach potentially produces smaller prediction sets/intervals
as models are usually more stable when trained with more data points.
Preferable if model refit is fast and/or the dataset is small.
• CV and other resampling methods: trade-off between single split and LOO.

In the MAPIE Python library, switching between resampling techniques is as


simple as changing a parameter. The following code creates a conformal regression
object with the split strategy.

cp = MapieRegressor(model, cv="prefit")

You can change conformal regression to cross-splitting by changing the CV op-


tion:

cp = MapieRegressor(model, cv=10)

Ÿ Warning

If you don’t specify the cv option at all, MAPIE will use 5-fold cross-splitting
– even if you have already trained your model.

Entering the calibration step is the same for all “cv” options – with cross-splitting
or LOO it just takes longer because the model is trained multiple times.

cp.fit(x_calib, y_calib)

40
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of A note on the
position and extent of the great temple
enclosure of Tenochtitlan,
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

Title: A note on the position and extent of the great temple


enclosure of Tenochtitlan,

Author: Alfred Percival Maudslay

Release date: July 11, 2022 [eBook #68502]

Language: English

Original publication: United Kingdom: Taylor & Francis, 1912

Credits: Richard Tonsing and the Online Distributed


Proofreading Team at https://www.pgdp.net (This file
was produced from images generously made available
by The Internet Archive)

*** START OF THE PROJECT GUTENBERG EBOOK A NOTE ON


THE POSITION AND EXTENT OF THE GREAT TEMPLE
ENCLOSURE OF TENOCHTITLAN, ***
Transcriber’s Note:
The cover image was created by the transcriber
and is placed in the public domain.
A NOTE
ON THE POSITION AND EXTENT

OF THE

GREAT TEMPLE ENCLOSURE OF


TENOCHTITLAN,
AND THE POSITION, STRUCTURE AND ORIENTATION

OF THE

TEOCOLLI OF
HUITZILOPOCHTLI.

BY

ALFRED P. MAUDSLAY.

LONDON:
PRINTED BY TAYLOR & FRANCIS, RED LION COURT, FLEET
STREET, E.C.
1912.
A NOTE
ON THE POSITION AND EXTENT
OF THE
GREAT TEMPLE ENCLOSURE OF
TENOCHTITLAN
AND THE POSITION, STRUCTURE, AND
ORIENTATION
OF THE
TEOCALLI OF HUITZILOPOCHTLI.
BY
ALFRED P. MAUDSLAY.

Extracts from the works of the earliest authorities referring to the


Great Temple Enclosure of Tenochtitlan and its surroundings are
printed at the end of this note, and the following particulars
concerning the authors will enable the reader to form some judgment
of the comparative value of their evidence.
The Anonymous Conqueror.—The identity of this writer is
unknown. That he was a companion of Cortés during the Conquest is
undoubted. His account is confined to the dress, arms, customs,
buildings, &c. of the Mexicans. The original document has never
been found, and what we now possess was recovered from an Italian
translation.
Motolinia.—Fray Toribio de Benavento, a Franciscan monk,
known best by his assumed name of Motolinia, left Spain in January
1524 and arrived in the City of Mexico in the month of June of the
same year. From that date until his death in August 1569 he lived an
active missionary life among the Indians in many parts of Mexico
and Guatemala.
He was in fullest sympathy with the Indians, and used his utmost
efforts to defend them from the oppression of their conquerors.
Motolinia appears in the books of the Cabildo in June 1525 as
“Fray Toribio, guardian del Monesterio de Sor. San Francisco”; so he
probably resided in the City at that date, and must have been familiar
with what remained of the ancient City.
Sahagun, Fr. Bernadino de, was born at Sahagun in Northern
Spain about the last year of the 15th Century. He was educated at the
University of Salamanca, and became a monk of the Order of Saint
Francis, and went to Mexico in 1529. He remained in that country,
until his death in 1590, as a missionary and teacher.
No one devoted so much time and study to the language and
culture of the Mexicans as did Padre Sahagun throughout his long
life. His writings, both in Spanish, Nahua, and Latin, were numerous
and of the greatest value. Some of them have been published and are
well known, but it is with the keenest interest and with the
anticipation of enlightenment on many obscure questions that all
engaged in the study of ancient America look forward to the
publication of a complete edition of his great work, ‘Historia de las
Cosas de Nueva España,’ with facsimiles of all the original coloured
illustrations under the able editorship of Don Francisco del Paso y
Troncoso. Señor Troncoso’s qualifications for the task are too well
known to all Americanists to need any comment, but all those
interested in the subject will join in hearty congratulations to the
most distinguished of Nahua scholars and rejoice to hear that his
long and laborious task is almost completed and that a great part of
the work has already gone to press.
Torquemada, Fr. Juan de.—Little is known about the life of
Torquemada beyond the bare facts that he came to Mexico as a child,
became a Franciscan monk in 1583 when he was eighteen or twenty
years old, and that he died in the year 1624. He probably finished the
‘Monarquia Indiana’ in 1612, and it was published in Seville in 1615.
Torquemada knew Padre Sahagun personally and had access to his
manuscripts.
Duran, Fr. Diego.—Very little is known about Padre Duran. He
was probably a half-caste, born in Mexico about 1538. He became a
monk of the Order of St. Dominic about 1578 and died in 1588.
His work entitled ‘Historia de las Indias de Nueva Espana y Islas
de Tierra Firme’ exists in MS. in the National Library in Madrid. The
MS. is illustrated by a number of illuminated drawings which Don
José Ramíres, who published the text in Mexico in 1867, reproduced
as a separate atlas without colour. Señor Ramíres expresses the
opinion that the work “is a history essentially Mexican, with a
Spanish physiognomy. Padre Duran took as the foundation and plan
of his work an ancient historical summary which had evidently been
originally written by a Mexican Indian.”
Tezozomoc, Don Hernando Alvaro.—Hardly anything is known
about Tezozomoc. He is believed to have been of Royal Mexican
descent, and he wrote the ‘Cronica Mexicana’ at the end of the 16th
Century, probably about 1598.
Ixtlilxochitl.—A fragment of a Codex, known as the ‘Codice
Goupil,’ is published in the ‘Catalogo Boban,’ ii. 35, containing a
picture of the great Teocalli with a description written in Spanish.
The handwriting is said by Leon y Gama to be that of Ixtlilxochitl.
Don Fernando de Alva Ixtlilxochitl was born in 1568 and was
descended from the royal families of Texcoco and Tenochtitlan. He
was educated in the College of Sta. Cruz and was the author of the
history of the Chichamecs. He died in 1648 or 1649.
The ‘Codice Goupil’ was probably a translation into Spanish of an
earlier Aztec text.
The picture of the great Teocalli is given on Plate D.

The positions of the Palace of Montezuma, the Palace of


Tlillancalqui, the Cuicacalli or Dance House, and the old Palace of
Montezuma have been defined by various writers and are now
generally accepted.
The principal difficulty arises in defining the area of the Temple
Enclosure and the position and orientation of the Teocalli of
Huitzilopochtli.
THE TEMPLE ENCLOSURE.
The Temple Enclosure was surrounded by a high masonry wall
(Anon., Torq., Moto.) known as the Coatenamitl or Serpent Wall,
which some say was embattled (Torq. quoting Sahagun, Moto.).
There were four principal openings (Anon., Torq., Moto., Duran)
facing the principal streets or causeways (Torq., Moto., Duran).
(Tezozomoc alone says there were only three openings—east, west
and south—and three only are shown on Sahagun’s plan.) “It was
about 200 brazas square” (Sahagun), i. e. about 1013 English feet
square. However, Sahagun’s plan (Plate C) shows an oblong.
As the four openings faced the principal streets or causeways, the
prolongation of the line of the causeways of Tacuba and Iztapalapa
must have intersected within the Temple Enclosure. This
intersection coincides with junction of the modern streets of
Escalerillas, Relox, Sta. Teresa, and Seminario (see Plate A).
We have now to consider the boundaries of the Temple Enclosure,
and this can best be done by establishing the positions of the Temple
of Tezcatlipoca and the Palace of Axayacatl.

The Temple of Tezcatlipoca. (Tracing A2.)

(Duran, ch. lxxxiii.)


“This Temple was built on the site (afterwards) occupied by the
Archbishop’s Palace, and if anyone who enters it will take careful
notice he will see that it is all built on a terrace without any lower
windows, but the ground floor (primer suelo) all solid.”
This building is also mentioned in the 2nd Dialogue of Cervantes
Salazar[1], where, in reply to a question, Zuazo says:—“It is the
Archbishop’s Palace, and you must admire that first story (primer
piso) adorned with iron railings which, standing at such a height
above the ground, rests until reaching the windows on a firm and
solid foundation.” To this Alfaro replies:—“It could not be
demolished by Mines.”
The Arzobispado, which still occupies the same site in the street of
that name, must therefore have been originally built on the solid
foundation formed by the base of the Teocalli of Tezcatlipoca.

The Palace of Axayacatl. (Tracing A2.)

(‘Descripción de las dos Piedras, etc.,’ 1790, by Don Antonio de


Leon y Gama. Bustamante, Edition ii. p. 35.)
“In these houses of the family property of the family called Mota[2],
in the street of the Indio Triste.... These houses were built in the 16th
century on a part of the site occupied by the great Palace of the King
Axayacatl, where the Spaniards were lodged when first they entered
Mexico, which was contiguous (estaba inmediato) with the wall that
enclosed the great Temple.”
Don Carlos M. de Bustamante adds in a footnote to this passage:
—“Fronting these same buildings, behind the convent of Santa
Teresa la Antigua, an image of Our Lady of Guadalupe was
worshipped, which was placed in that position to perpetuate the
memory that here mass was first celebrated in Mexico, in the block
(cuadra) where stood the gate of the quarters of the Spaniards.... This
fact was often related to me by my deceased friend, Don Francisco
Sedano, one of the best antiquarians Mexico has known.”
(García Icazbalceta, note to 2nd Dialogue of Cervantes Salazar,
p. 185.)
“The Palace of Axayacatl, which served as a lodging or quarters for
the Spaniards, stood in the Calle de Sta. Teresa and the 2a Calle del
Indio Triste.”
So far as I can ascertain, no eye-witness or early historian
describes the position of the Palace of Axayacatl, but tradition and a
consensus of later writers place it outside the Temple Enclosure to
the north of the Calle de Sta. Teresa and to the west of the 2a Calle
del Indio Triste. No northern boundary is given.
Taking the point A in the line of the Calle de Tacuba as the
hypothetical site of the middle of the entrance in the Eastern wall of
the Temple Enclosure and drawing a line A-B to the Eastern end of
the C. de Arzobispado, we get a distance of about 450 feet; extend
this line in a northerly direction for 450 feet to the point C, and the
line B-C may be taken as the Eastern limit of the Temple Enclosure.
The Northern and Southern entrance to the Enclosure must have
been at D and E, that is in the line of the Calle de Iztapalapa.
Extending the line B-E twice its own length in a westerly direction
brings us to the South end of the Empedradillo at the point F.
Completing the Enclosure we find the Western entrance at G in the
line of the Calle de Tacuba and the north-west corner at H.
This delimitation of the Temple Enclosure gives a parallelogram
measuring roughly 900′ × 1050′, not at all too large to hold the
buildings it is said to have contained, and not far from Sahagun’s
doscientos brazas en cuadro (1012′ × 1012′).
It divides the Enclosure longitudinally into two equal halves,
which is on the side of probability.
It leaves two-thirds of the Enclosure to the West and one-third to
the East of the line of the Calle de Iztapalapa[3].
It includes the site of the Temple of Tezcatlipoca.
It agrees with the generally accepted position of the Palace of
Axayacatl and of the Aviary.
It includes the site of the Teocalli, the base of which was
discovered at No. 8, 1ra Calle de Relox y Cordobanes.
It will now be seen how closely this agrees with the description
given by Don Lucas Alaman, one of the best modern authorities on
the topography of the City.
(Disertaciones, by Don Lucas Alaman, 1844. Octava Disertacion,
vol. ii. p. 246.)
“We must now fix the site occupied by the famous Temple of
Huichilopochtli[4]. As I have stated above, on the Southern side it
formed the continuation of the line from the side walk (acera) of the
Arzobispado towards the Alcaiceria touching the front of the present
Cathedral. On the West it ran fronting the old Palace of Montezuma,
with the street now called the Calle del Empedradillo (and formerly
called the Plazuela del Marques del Valle) between them, but on the
East and North it extended far beyond the square formed by the
Cathedral and Seminario, and in the first of these directions reached
the Calle Cerrada de Sta. Teresa, and followed the direction of this
last until it met that of the Ensenanza now the Calle Cordobanes and
the Montealegre.”
THE GREAT TEOCALLI OF
HUITZILOPOCHTLI.
The general description of the ancient City by eye-witnesses does
not enable us to locate the position of the great Teocalli with
exactness, but further information can be gained by examining the
allotment of Solares or City lots to the Conquerors who took up their
residence in Mexico and to religious establishments; these
allotments can in some instances be traced through the recorded
Acts of the Municipality.
(7th Disertacion, p. 140. Don Lucas Alaman.) (Tracing A1.)
“From the indisputable testimony of the Acts of the Municipality
and much other corroborative evidence one can see that the site of
the original foundation (the Monastery) of San Francisco was in the
Calle de Sta. Teresa on the side walk which faces South.
“At the meeting of the Municipality of 2nd May, 1525, there was
granted to Alonzo de Ávila a portion of the Solar between his house
and the Monastery of San Francisco in this City. This house of
Alonzo de Ávila stood in the Calle de Relox at the corner of the Calle
de Sta. Teresa (where now stands the druggist’s shop of Cervantes
and Co.), and this is certain as it is the same house which was
ordered to be demolished and [the site] sown with salt, as a mark of
infamy, when the sons of Alonzo de Ávila were condemned to death
for complicity in the conspiracy attributed to D. Martin Cortés. By
the decree of the 1st June, 1574, addressed to the Viceroy, Don
Martin Enríquez, he was permitted to found schools on this same
site, with a command that the pillar and inscription relating to the
Ávilas which was within the same plot, should be placed outside ‘in a
place where it could be more open and exposed.’ As the schools were
not built on this site, the University sold it on a quit rent (which it
still enjoys) to the Convent of Sta. Isabel, to which the two houses
Nos. 1 and 2 of the 1st Calle de Relox belong, which are the said
druggist’s shop and the house adjoining it, which occupy the site
where the house of Alonzo de Ávila stood.
“In addition to this, by the titles of a house in the Calle de
Montealegre belonging to the convent of San Jeronimo which the
Padre Pichardo examined, it is certain that Bernadino de Albornoz,
doubtless the son of the Accountant Rodrigo de Albornoz, was the
owner of the houses which followed the house of Alonzo de Ávila in
the Calle de Sta. Teresa; and by the act of the Cabildo of the 31st Jan.,
1529, it results that this house of Albornoz was built on the land
where stood the old San Francisco, which the Municipality
considered itself authorised to dispose of as waste land.”
(Duran, vol. ii. ch. lxxx.)
“The Idol Huitzilopochtli which we are describing ... had its site in
the houses of Alonzo de Ávila, which is now a rubbish heap.”
(Alaman, Octava Disertacion, p. 246.)
“One can cite what is recorded in the books of the Acts of the
Municipality in the Session of 22nd February, 1527, on which day, on
the petition of Gil González de Benavides, the said Señores (the
Licenciate Marcos de Aguilar, who at that time ruled it, and the
members who were present at the meeting) granted him one solar
[city lot] situated in this city bordering on the solar and houses of his
brother Alonzo de Ávila, which is (en la tercia parte donde estaba el
Huichilobos) in the third portion where Huichilobos[5] stood. It was
shown in the 7th Dissertation that these houses of Alonzo de Ávila
were the two first in the Ira Calle de Relox, turning the corner of the
Calle de Sta. Teresa, and consequently that the solar that was given
to Gil González de Benavides was the next one in the Calle de Relox,
for the next house in the Calle de Sta. Teresa was that of the
Accountant Albornoz. This opinion agrees with that of Padre
Pichardo, who made such a lengthy study of the subject, and who
was able to examine the ancient titles of many properties.”
In a note to the 2nd Dialogue of Cervantes Salazar, Don J. Garcia
Icazbalceta discusses the position of the original Cathedral
and quotes a decree of the Cabildo, dated 8th Feb., 1527,
allotting certain sites as follows:—
“The said Señores [here follow the names of those present] declare
that inasmuch as in time past when the Factor and Veedor were
called Governors of New Spain they allotted certain Solares within
this City, which Solares are facing Huichilobos (son frontero del
Huichilobos), which Solares (because the Lord Governor on his
arrival together with the Municipality reclaimed them, and allotted
them to no one for distribution) are vacant and are [suitable] for
building and enclosure; and inasmuch as the aforesaid is prejudicial
to the ennoblement of this city, and because their occupation would
add to its dignity, they make a grant of the said space of Solares,
allotting in the first place ten Solares for the church and churchyard,
and for outbuildings in the following manner:—Firstly they say that
they constitute as a plaza (in addition to the plaza in front of the new
houses of the Lord Governor), the site and space which is unoccupied
in front of the corridors of the other houses of the Governor where
they are used to tilt with reeds, to remain the same size that it is at
present.
“At the petition of Cristóbal Flores, Alcalde, the said Señores grant
to him in this situation the Solar which is at the corner, fronting the
houses of Hernando Alonzo Herrero and the high roads, which
(Solar) they state it is their pleasure to grant to him.
“To Alonzo de Villanueva another Solar contiguous to that of the
said Cristóbal Flores, in front of the Solar of the Padre Luis Méndez,
the high road between them, etc.”
(Here follow the other grants.)

“Then the said Señores ... assign as a street for the exit and service
of the said Solares ... a space of 14 feet, which street must pass
between the Solar of Alonzo de Villanueva and that of Luis de la
Torre and pass through to the site of the Church, on one side being
the Solar of Juan de la Torre, and on the other the Solar of Gonzalo
de Alvarado.”
In the same note Icazbalceta discusses the measurements of the
Solares, which appear to have varied between 141 × 141 Spanish feet
(= 130 ¾′ × 130¾′ English) and 150 × 150 Spanish feet (= 139′ ×
139′ English), which latter measurement was established by an Act of
the Cabildo in Feb. 1537. He also printed with the note a plan of what
he considered to be the position of the Solares dealt with in this Act
of Cabildo. This plan is incorporated in Tracing A1.
Plate C is a copy of a plan of the Temple Enclosure found with a
Sahagun MS., preserved in the Library of the Royal Palace at Madrid
and published by Dr. E. Seler in his pamphlet entitled ‘Die
Ausgrabungen am Orte des Haupttempels in Mexico’ (1904).
We know from Cortés’s own account, confirmed by Gomara, that
the Great Teocalli was so close to the quarters of the Spaniards that
the Mexicans were able to discharge missiles from the Teocalli into
the Spanish quarters, and according to Sahagun’s account the
Mexicans hauled two stout beams to the top of the Teocalli in order
to hurl them against the Palace of Axayacatl so as to force an
entrance. It was on this account Cortés made such a determined
attack on the Teocalli and cleared it of the enemy.
We also know from the Acts of the Cabildo that the group of
Solares beginning with that of Cristóbal Flores (Nos. 1–9) are
described as “frontero del Huichilobos,” i. e. opposite (the Teocalli
of) Huichilobos, and we also learn that the Solar of Alonzo de Avila
was “en la tercia parte donde estaba el Huichilobos,” i. e. in the third
part or portion where (the Teocalli of) Huichilobos stood. Alaman
confesses that he cannot understand this last expression, but I
venture to suggest that as the Temple Enclosure was divided
unevenly by the line of the Calle de Iztapalapa, two-thirds lying to the
West of that line and one-third to the East of it, the expression
implies that the Teocalli was situated in the Eastern third of the
Enclosure. This would bring it sufficiently near to the Palace of
Axayacatl for the Mexicans to have been able to discharge missiles
into the quarters of the Spaniards. It would also occupy the site of
the Solar de Alonzo de Avila, and might be considered to face the
Solar of Cristóbal Flores and his neighbours, and we should naturally
expect to find it in line with the Calle de Tacuba. Sahagun’s plan is
not marked with the points of the compass, but if we should give it
the same orientation as Tracing A2, the Great Teocalli falls fairly into
its place.
Measurements of the Great Teocalli.
There were two values to the Braza or Fathom in old Spanish
measures, one was the equivalent of 65·749 English inches, and the
other and more ancient was the equivalent of 66·768 English inches.
In computing the following measurements I have used the latter
scale:—
Spanish. English.
1 foot = 11·128 inches.
3 feet = 1 vara = 33·384 „ = 2·782 feet.
2 varas = 1 Braza = 66·768 „ = 5·564 „

The Pace is reckoned as equal to 2·5 English feet and the Ell
mentioned by Tezozomoc as the Flemish Ell = 27·97 English inches
or 2·33 English feet.
There is a general agreement that the Teocalli was a solid
quadrangular edifice in the form of a truncated step pyramid.
The dimensions of the Ground plan are given as follows:—

Spanish Measure. English


feet.
Anonimo 150 × 120 paces = 375 × 300
Torquemada 360 × 360 feet = 333·84 ×
333·84
Gomara 50 × 50 Brazas = 278·2 ×
278·2
Tezozomoc 125 Ells (one side) = 291·248
Bernal Díaz = six large Solares measuring 150 × 150 feet each, = 341 × 341
which would give a square of about
Ixlilxochitl 80 Brazas = 445[6]
Motolinea says the Teocalli at Tenayoca measured 40 × 40 =222·56 ×
Brazas 222·56

The measurements are rather vague. The Anonymous Conqueror’s


measurements may refer to the Teocalli at Tlatelolco and the length
may have included the Apetlac or forecourt. Torquemada may be
suspected of exaggeration. Tezozomoc was not an eye-witness and
Bernal Díaz’s estimate of six large Solares is only an approximation.
In Tracing A2 I have taken 300 × 300 English feet as the
measurement of the base of the Teocalli.

Orientation of the Great Teocalli.


Sahagun Facing the West.
Torquemada Its back to the East, “which is the practice the large Temples
ought to follow.”
Motolinea The ascent and steps are on the West side.
Tezozomoc The principal face looked South.
Ixtlilxochitl Facing the West.

I think the evidence of Sahagun, Torquemada, Motolinia, and


Ixtlilxochitl must be accepted as outweighing that of Tezozomoc, who
also says that the pyramidal foundation was ascended by steps on
three sides, a statement that is not supported by any other authority
and which received no confirmation from the description of the
attack on the Teocalli as given by Cortés and Bernal Díaz.

The Stairway.

Sahagun says “it was ascended by steps very narrow and


straight.”
Anonimo (Tlaltelolco ?)—120–130 steps on one side only.
Ixtlilxochitl—160 steps.
Bernal Díaz (Tlaltelolco ?)—114 steps.
Cortés—over 100 steps.
Torquemada—113 steps on the West side only.
Motolinia—over 100 steps on the West side.
Duran—120 steps on the West side.

Torquemada says that the steps were each one foot high, and
Duran describes the difficulty of raising the image and litter of the
God from the ground to the platform on the top of the Teocalli owing
to the steepness of the steps and the narrowness of the tread.

The sides and back of the Teocalli were in the form of great
steps.

Cortés says that there were 3 or 4 ledges or passages one pace


wide.
Bernal Díaz—5 recesses (concavidades).

Both the pictures show four ledges.


The Anonymous Conqueror gives the width of the ledges as two
paces.
The height of the wall between each ledge is given as follows:—

Cortés—the height of three men = say 16′.


Anonimo—the height of two men = say 10′ 8″.
Motolinia—1½ to 2 Brazas = say 11′.

The size of the platform on the top of the Teocalli cannot be


decided from the written records. Torquemada says that there was
ample room for the Priests of the Idols to carry out their functions
unimpeded and thoroughly, yet in an earlier paragraph he appears to
limit the width to a little more than seventy feet. Possibly this
measurement of seventy feet is meant to apply to a forecourt of the
two sanctuaries.
Motolinia gives the measurement of the base of the Teocalli at
Tenayoca as 222½′ × 222½′ (English), and the summit platform as
about 192′ × 192′ (English). Applying the same proportion to a
Teocalli measuring 300′ × 300′ at the base, the summit platform
would measure about 259′ × 259′.
Duran says “in front of the two chambers where these Gods
(Huitzilopochtli and Tlaloc) stood there was a Patio forty feet square
cemented over and very smooth, in the middle of which and fronting
the two chambers was a somewhat sharp pointed green stone about
waist high, of such a height that when a man was thrown on his back
on the top of it his body would bend back over it. On this stone they
sacrificed men in the way we shall see in another place.”
Ixtlilxochitl gives a similar description but, says the sacrificial
stone was on one side towards (hacia) the doorway of the larger
chamber of Huitzilopochtli.

The Oratories of Huitzilopochtli and Thaloc.

You might also like