0% found this document useful (0 votes)
3 views15 pages

2502.03609v1

This document presents a novel approach to multivariate conformal prediction (CP) using optimal transport (OT), termed OT-CP, which addresses the challenges of ranking multivariate scores. The method constructs distribution-free prediction sets that maintain coverage guarantees while extending CP techniques to multidimensional settings. The authors demonstrate the effectiveness of OT-CP on benchmark datasets for multivariate regression problems, highlighting its advantages over traditional univariate CP methods.

Uploaded by

WarKING GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

2502.03609v1

This document presents a novel approach to multivariate conformal prediction (CP) using optimal transport (OT), termed OT-CP, which addresses the challenges of ranking multivariate scores. The method constructs distribution-free prediction sets that maintain coverage guarantees while extending CP techniques to multidimensional settings. The authors demonstrate the effectiveness of OT-CP on benchmark datasets for multivariate regression problems, highlighting its advantages over traditional univariate CP methods.

Uploaded by

WarKING GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Multivariate Conformal Prediction using Optimal Transport

Michal Klein 1 Louis Bethune 1 Eugene Ndiaye 1 Marco Cuturi 1

Abstract R such as the prediction error of a model ŷ, for each obser-
vation (x, y) in Dn and ranking these score values. The con-
Conformal prediction (CP) quantifies the uncer- formal prediction set for the new input xn+1 is the collection
tainty of machine learning models by construct-
arXiv:2502.03609v1 [stat.ML] 5 Feb 2025

of all possible responses y whose score S(xn+1 , y, ŷ) ranks


ing sets of plausible outputs. These sets are small enough to meet the prescribed confidence threshold,
constructed by leveraging a so-called conformity compared to the scores S(xi , yi , ŷ) in the observed data.
score, a quantity computed using the input point
of interest, a prediction model, and past observa- CP has undergone tremendous developments in recent years
tions. CP sets are then obtained by evaluating the (Barber et al., 2023; Park et al., 2024; Tibshirani et al., 2019;
conformity score of all possible outputs, and se- Guha et al., 2024) which mirror its increased applicability to
lecting them according to the rank of their scores. challenging settings (Straitouri et al., 2023; Lu et al., 2022).
Due to this ranking step, most CP approaches rely To name a few, it has been applied for designing uncertainty
on a score functions that are univariate. The chal- sets in active learning (Ho & Wechsler, 2008), anomaly
lenge in extending these scores to multivariate detection (Laxhammar & Falkman, 2015; Bates et al.,
spaces lies in the fact that no canonical order for 2021), few-shot learning (Fisch et al., 2021), time series
vectors exists. To address this, we leverage a nat- (Chernozhukov et al., 2018; Xu & Xie, 2021; Chernozhukov
ural extension of multivariate score ranking based et al., 2021; Lin et al., 2022; Zaffran et al., 2022), or to infer
on optimal transport (OT). Our method, OT-CP, performance guarantees for statistical learning algorithms
offers a principled framework for constructing (Holland, 2020; Cella & Ryan, 2020); and recently to Large
conformal prediction sets in multidimensional set- Language Models (Kumar et al., 2023; Quach et al., 2023).
tings, preserving distribution-free coverage guar- We refer to the extensive reviews in (Balasubramanian et al.,
antees with finite data samples. We demonstrate 2014) for other applications in machine learning.
tangible gains in a benchmark dataset of multi- By design, CP requires the notion of order, as the inclu-
variate regression problems and address compu- sion of a candidate response depends on its relative rank-
tational & statistical trade-offs that arise when ing to the scores observed previously. Hence, the classi-
estimating conformity scores through OT maps. cal strategies developed so far largely target score func-
tions with univariate outputs. This limits their applicability
to multivariate responses, as ranking multivariate scores
1. Introduction S(x, y, ŷ) ∈ Rd , d ≥ 2 is not as straightforward as ranking
univariate scores in R.
Conformal prediction (CP) (Gammerman et al., 1998; Vovk
et al., 2005; Shafer & Vovk, 2008) has emerged as a simple Ordering Vector Distributions using Optimal Transport.
framework to quantify the prediction uncertainty of ma- In parallel to these developments, and starting with the sem-
chine learning algorithms without relying on distributional inal reference of (Chernozhukov et al., 2017) and more
assumptions on the data. For a sequence of observed data, generally the pioneering work of (Hallin et al., 2021; 2022;
and a new input point, 2023), multiple references have explored the possibilities
offered by the optimal transport theory to define a meaning-
Dn = {(x1 , y1 ), ..., (xn , yn )} and xn+1 , ful ranking or ordering in a multidimensional space. Simply
put, the analog of a rank function computed on the data can
the objective is to construct a set that contains the unob- be found in the optimal Brenier map that transports the data
served response yn+1 with a specified confidence level measure to a uniform, symmetric, centered measure of refer-
100(1 − α)%. This involves evaluating scores S(x, y, ŷ) ∈ ence in Rd . As a result, a simple notion of a univariate rank
*
Equal contribution 1 Apple. for a vector z ∈ Rd can be found by evaluating the distance
Correspondence to: Eugene Ndiaye <e [email protected]>. of the image of z (according to that optimal map) to the ori-
gin. This approach ensures that the ordering respects both

1
Multivariate Conformal Prediction using Optimal Transport

the geometry, i.e., the spatial arrangement of the data and However, this result is typically not directly usable, as
its distribution: points closer to the center get lower ranks. the ground-truth distribution F is unknown and must be
approximated empirically with Fn using finite samples of
Contributions We propose to leverage recent advances data. When the sample size goes to infinity, one expects to
in computational optimal transport (Peyré & Cuturi, 2019), recover Equation (2). The following result provides the tool
using notably differentiable transport map estimators (Poola- to obtain the finite sample version (Shafer & Vovk, 2008).
dian & Niles-Weed, 2021; Cuturi et al., 2019), and apply Lemma 2.1. If Z1 , . . . , Zn , Z be a sequence of real-valued
such map estimators in the definition of multivariate score exchangeable random variables, then it holds
functions. More precisely:
 
• OT-CP: We extend conformal prediction techniques to 1 2
Fn (Z) ∼ U 0, , , . . . , 1
multivariate score functions by leveraging optimal trans- n n
port ordering, which offers a principled way to define and ⌊nb⌋ − ⌈na⌉ + 1
P(Fn (Z) ∈ [a, b]) = Un+1 ([a, b]) = .
compute a higher-dimensional quantile and cumulative dis- n+1
tribution function. As a result, we obtain distribution-free
uncertainty sets that capture the joint behavior of multi- By choosing any a, b such that Un+1 ([a, b]) ≥ 1 − α,
variate predictions that enhance the flexibility and scope Lemma 2.1 guarantees a coverage, that is at least equal
of conformal predictions. to the prescribed level of uncertainty
• We propose a computational approach to this theoreti-
cal ansatz using the entropic map (Pooladian & Niles- P (Z ∈ Rα,n ) ≥ 1 − α.
Weed, 2021) computed from solutions to the Sinkhorn
problem (Cuturi, 2013). We prove that our approach pre- where, the uncertainty set Rα,n = Rα (Dn ) is defined based
serves the coverage guarantee while being tractable. on observations Dn = {Z1 , . . . , Zn } as:
• We show the application of OT-CP using a recently re- 
Rα,n = z ∈ R : Fn (z) ∈ [a, b] . (4)
leased benchmark of regression tasks (Dheur et al., 2025).
We acknowledge the concurrent proposal of Thurin et al.
In short, Equation (4) is an empirical version of Equation (1)
(2025), who adopt a similar approach to ours, with, however,
based on finite data samples that still preserves the coverage
a few important practical differences, discussed in more
probability (1 − α) and does not depend on the ground-truth
detail in Section 6.
distribution of the data.
Given data Dn , a prediction model ŷ and a new input Xn+1 ,
2. Background
one can build an uncertainty set for the unobserved output
2.1. Univariate Conformal Prediction Yn+1 by applying it to observed score functions.
We recall the basics of conformal prediction based on real- Proposition 2.2 (Conformal Prediction Coverage). Con-
valued score function and refer to the recent tutorials (Shafer sider Zi = S(Xi , Yi ) for i in [n] and Z = S(Xn+1 , Yn+1 )
& Vovk, 2008; Angelopoulos & Bates, 2021). In the follow- in Lemma 2.1. The conformal prediction set is defined as
ing, we denote [n] := {1, . . . , n}. 
Rα,n (Xn+1 ) = y ∈ Y : Fn ◦ S(Xn+1 , y) ∈ [a, b]
For a real-valued random variable Z, it is common to con-
struct an interval [a, b], within which it is expected to fall, as and satisfies a finite sample coverage guarantee
Rα = {z ∈ R : F (z) ∈ [a, b]} (1) P (Yn+1 ∈ Rα,n (Xn+1 )) ≥ 1 − α.
This is based on the probability integral transform that states
that the cumulative distribution function F maps variables The conformal prediction coverage guarantee in Proposi-
to uniform distribution, i.e., P(F (Z) ∈ [a, b]) = U([a, b]). tion 2.2 holds for the unknown ground-truth distribution of
To guarantee a (1 − α) uncertainty region, it suffices to the data P, does not require quantifying the estimation error
choose a and b such that U([a, b]) ≥ 1 − α which implies |Fn − F |, and is applicable to any prediction model ŷ as
long as it treats the data exchangeably, e.g., a pre-trained
P (Z ∈ Rα ) ≥ 1 − α. (2)
model independent of Dn .
Applying it to a real-valued score Z = S(X, Y ) of the Leveraging the quantile function Fn−1 = Qn , and by setting
prediction model ŷ, an uncertainty set for the response of a = 0 and b = 1 − α, we have the usual description
a given a input X can be expressed as


Rα (X) = y ∈ Y : F ◦ S(X, y) ∈ [a, b] . (3) Rα,n (Xn+1 ) = y ∈ Y : S(Xn+1 , y) ≤ Qn (1 − α)

2
Multivariate Conformal Prediction using Optimal Transport

namely the set of all possible responses whose score rank As (Dheur et al., 2025), we will use conformalized quantile
is smaller or equal to ⌈(1 − α)(n + 1)⌉ compared to the regression (Romano et al., 2019) to define the score func-
rankings of previously observed scores. For the absolute tions above, for each output i ∈ [d], where the conformity
value difference score function, the CP set corresponds to score is given by:

si (x, yi ) = max{ˆli (x) − yi , yi − ûi (x)},


 
Rα,n (Xn+1 ) = ŷ(Xn+1 ) ± Qn (1 − α) .

with ˆli (x) and ûi (x) representing the lower and upper con-
Center-Outward View Another classical choice is a = α2
ditional quantiles of Yi |X = x at levels αl and αu , re-
and b = 1 − α2 . In that case, we have the usual confidence
spectively. In our experiments, we consider equal-tailed
set that corresponds to a range of values that captures the
prediction intervals, where αl = α2 , αu = 1 − α2 , and α
central proportion with α/2 of the data lying below Q(α/2)
denotes the miscoverage level.
and α/2 lying above Q(1 − α/2).
Merge-CP. An alternative approach is simply to use a
Introducing the center-outward distribution of Z as the func-
squared Euclidean aggregation,
tion T = 2F − 1 , the probability integral transform T (Z)
is uniform in the unit ball [−1, 1]. This ensures a symmetric s(x, y) := ∥ŷ(x) − y∥2 ,
description of Rα = T −1 (B(0, 1 − α)) around a central
point such as the median Q(1/2) = T −1 (0), with the ra- where the choice of the norm (e.g., ℓ1 , ℓ2 , or ℓ∞ ) depends on
dius of the ball that corresponds to the desired confidence the desired sensitivity to errors across tasks. This approach
level of uncertainty. Similarly, we have the empirical center- reduces the multidimensional residual to a scalar conformity
outward distribution Tn = 2Fn − 1 and the center-outward score, leveraging the natural ordering of real numbers. This
view of the conformal prediction set follows as simplification not only makes it straightforward to apply
 univariate conformal prediction methods, but also avoids
Rα,n (Xn+1 ) = y ∈ Y : |Tn ◦ S(Xn+1 , y)| ≤ 1 − α . the complexities of directly managing vector-valued scores
in conformal prediction. A variant consists of applying a
If Z follows a probability distribution P, then the transfor- Mahalanobis norm (Johnstone & Cox, 2021) in lieu of the
mation z 7→ T (z) is mapping the source distribution P to squared Euclidean norm, using the covariance matrix Σ
the uniform distribution U over a unit ball. In fact, it can be estimated from the training data (Johnstone & Cox, 2021;
characterized as essentially the unique monotone increasing Katsios & Papadopulos, 2024; Henderson et al., 2024),
function such that T (Z) is uniformly distributed.
s(x, y) := ∥Σ−1/2 (ŷ(x) − y)∥2 ,
2.2. Multivariate Conformal Prediction
2.3. Kantorovich Ranks
While many conformal methods exist for univariate pre-
diction, we focus here on those applicable to multivariate A naive way to define ranks in multiple dimensions might
outputs. As recalled in (Dheur et al., 2025), several alterna- be to measure how far each point is from the origin and
tive conformal prediction approaches have been proposed then rank them by that distance. This breaks down if the
to tackle multivariate prediction problems. Some of these distribution of the data is stretched or skewed in certain
methods can directly operate using a simple predictor (e.g., directions. To correct for this, Hallin et al. (2021) developed
a conditional mean) of the response y, while some may re- a formal framework of center-outward distributions and
quire stronger assumptions, such as requiring an estimator of quantiles, also called Kantorovich ranks (Chernozhukov
the joint probability density function between x and y, or ac- et al., 2017), extending the familiar univariate concepts of
cess to a generative model that mimics the conditional distri- ranks and quantiles into higher dimensions by building on
bution of y given x) (Izbicki et al., 2022; Wang et al., 2022). elements of optimal transport theory.

We restrict our attention to approaches that make no such Optimal Transport Map. Let µ and ν be source and
assumption, reflecting our modeling choices for OT-CP. target probability measures on Ω ⊂ Rd . One can look for a
M-CP. We will consider the template approach of (Zhou map T : Ω → Ω that pushes forward µ to ν and minimizes
et al., 2024) to use classical CP by aggregating a score the average transportation cost
function computed on each of the d outputs of the multivari- Z
ate response. Given a conformity score si (to be defined T ⋆ ∈ arg min ∥x − T (x)∥2 dµ(x). (6)
T# µ=ν Ω
next) for the i-th dimension, Zhou et al. (2024) define the
following aggregation rule: Brenier’s theorem states that if the source measure µ has a
density, there exists a solution to (6) that is the gradient of a
sM-CP (x, y) = max si (x, yi ). (5) convex function ϕ : Ω → R such that T ⋆ = ∇ϕ.
i∈[d]

3
Multivariate Conformal Prediction using Optimal Transport

In the one-dimensional case, the cumulative distribution where the weights depend on z as:
function of a distribution P is the unique increasing function  
transporting it to the uniform distribution. This monotonic- exp − ∥z − uj ∥2 − gj⋆ /ε
p j (z) := Pm 2 ⋆ . (12)
ity property generalizes to higher dimensions through the k=1 exp (− (∥z − uk ∥ − gk ) /ε)
gradient of a convex function ∇ϕ. Thus, one may view the
optimal transport map in higher dimensions as a natural ana- Analogously to (12), one can obtain P an estimator for the
n
log of the univariate cumulative distribution function: both inverse map (T ⋆ )−1 as Tεinv (u) := i=1 q j (u)zj , with
represent a unique, monotone way to send one probability weights q j (u) arising for a vector u from the Gibbs distri-
distribution onto another. bution of the values [∥zi − u∥2 − fi⋆ ]i
Definition 2.3. The center-outward distribution of a random
variable Z ∼ P is defined as the optimal transport map 3. Kantorovich Conformal Prediction
T = ∇ϕ that pushes P forward to the uniform distribution
3.1. Multi-Output Conformal Prediction
U on the unit ball B(0, 1). The rank of Z is defined as
Rank(Z) = ∥T (Z)∥, the distance from the origin. We suppose that P is only available through a finite samples
and consider the discrete transport map
Quantile region is an extension of quantiles to multiple
dimensions to represent region in the sample space that con- Tn+1 : (Zi )i∈[n+1] → (Ui )i∈[n+1]
tains a given proportion of probability mass. The quantile
region at probability level (1 − α) ∈ (0, 1) can be defined as which can be obtained by solving the optimal assignment
problem, which seeks to minimize the total transport cost
Rα = {z ∈ Rd : ∥T (z)∥ ≤ 1 − α}. between the empirical distributions Pn+1 and Un+1 :
n+1
By definition of the spherical uniform distribution, we have Tn+1 ∈ arg min
X
∥Zi − T (Zi )∥2 , (13)
∥T (Z)∥ is uniform on (0, 1) which implies T ∈T i=1

P(Z ∈ Rα ) = 1 − α. (7) where T is the set of bijections mapping the observed sam-
ple (Zi )i∈[n+1] to the target grid (Ui )i∈[n+1] .
2.4. Entropic Map. Definition 3.1. Let (Z1 , . . . , Zn , Zn+1 ) be a sequence of
⋆ exchangeable variables in Rd that follow a common distri-
A convenient estimator to approximate the Brenier map T
bution P. The discrete center-outward distribution Tn+1 is
from samples (z1 , . . . , zn ) and (u1 , . . . , um ) is the entropic
the transport map pushing forward Pn+1 to Un+1 .
map (Pooladian & Niles-Weed, 2021): Let ε > 0 and write
Kij = [exp(−∥zi − uj ∥2 /ε)]ij , the kernel matrix. Define, Following (Hallin et al., 2021), we begin by constructing
f g the target discritbution Un+1 as a discretized version of a
f ⋆ , g⋆ = argmax ⟨f , 1nn ⟩ + ⟨g, 1mm ⟩ − ε⟨e ε , Ke ε ⟩ . (8) spherical uniform distribution. It is defined such that the
f ∈Rn ,g∈Rm
total number of points n + 1 = nR nS + no , where no
The Equation (8) is an unconstrained concave optimiza- points are at the origin:
tion problem known as the regularized OT problem in dual
form (Peyré & Cuturi, 2019, Prop. 4.4) and can be solved • nS unit vectors u1 , . . . , unS are uniform on the sphere.
numerically with the Sinkhorn algorithm (Cuturi, 2013). n o
Equipped with these optimal vectors, one can define the • nR radius are regularly spaced as n1R , n2R , . . . , 1 .
maps, valid out of sample:
The grid discretizes the sphere into layers of concentric
fε (z) = minε ([∥z − uj ∥2 − gj⋆ ]j ) , (9) shells, with each shell containing nS equally spaced points
gε (u) = minε ([∥zi − u∥2 − fi⋆ ]i ) , (10) along the directions determined by the unit vectors. The
discrete spherical uniform distribution places equal mass
where for a vector u or arbitrary size s we define the log- over each points of the grid, with no /(n + 1) mass on the
sum-exp operator as minε (u) := −ε log( 1s 1Ts e−u/ε ). Us- origin and 1/(n + 1) on the remaining points. This ensures
ing the Brenier (1991) theorem, linking potential values to isotropic sampling at fixed radius onto [0, 1].
optimal map estimation, one obtains an estimator for T ⋆ :
By definition of target distribution Un+1 , it holds
m
X  
Tε (z) := z − ∇fε (z) = p j (z)uj , (11) 1 2
∥Tn+1 (Zn+1 )∥ ∼ Un+1 0, , ,...,1 . (14)
j=1 nR nR

4
Multivariate Conformal Prediction using Optimal Transport

In order to define an empirical quantile region as Equa- 1D score using OT. We redefine the non-conformity score
tion (7), we need an extrapolation T̄n+1 of Tn+1 out of the function of an observation as
samples (Zi )i∈[n+1] . By definition of such maps
SOT−CP (x, y) = ∥T ⋆ ◦ S(x, y)∥ (16)
∥T̄n+1 (Zn+1 )∥ = ∥Tn+1 (Zn+1 )∥
where T ⋆ is the optimal Brenier (1991) map that pushes the
is still uniformly distributed. With an appropriate choice of distribution of vector-valued scores onto a uniform ball dis-
radius rα,n+1 , the empirical quantile region can be defined tribution U of the same dimension. This approach ultimately
relies on the natural ordering of the real line, making it possi-
Rα,n+1 = {z ∈ Rd : ∥T̄n+1 (z)∥ ≤ rα,n+1 }. ble to directly apply one-dimensional conformal prediction
methods to the sequence of transformed scores
When working with such finite samples Z1 , . . . , Zn , Zn+1 ,
and considering the asymptotic regime (Chewi et al., 2024; Zi = ∥SOT−CP (Xi , Yi )∥ for i ∈ [n + 1].
Hallin et al., 2021), the empirical source distribution Pn+1
converges to the true distribution P and the empirical trans- In practice, T ⋆ can be replaced by any approximation T̂ that
port map T̄n+1 converges to the true transport map T ⋆ . As preserves the permutation invariance of the score function.
such, with the choice rα,n+1 = 1 − α, one can expect that The resulting conformal prediction set, OT-CP is
P (Z ∈ Rα,n+1 ) ≈ 1 − α when n is large.
ROT−CP (Xn+1 , α) = Rα (T̂ , Xn+1 )
However, the core point of conformal prediction methodol-
ogy is to go beyond asymptotic results or regularity assump- with respect to a given transport map T̂ , and where
tions about the data distribution. The following result show 
how to select a radius preserving the coverage with respect Rα (T̂ , x) = y ∈ Y : Fn (∥SOT−CP (x, y)∥2 ) ≤ 1 − α .
to the ground-truth distribution such as in Equation (18).
have a coverage (1 − α), where Fn is empirical (univariate)
Proposition 3.2. Given n discrete sample points distributed cumulative distribution function of the observed scores
over a sphere with radius {0, n1R , n2R , . . . , 1} and directions 
uniformly sampled on the sphere, the smallest radius to ∥SOT−CP (X1 , Y1 )∥, . . . , ∥SOT−CP (Xn , Yn )∥ .
obtain a coverage (1 − α) is determined by
  Proposition 2.2 directly implies
jα (n + 1)(1 − α) − no
rα,n+1 = where jα = ,
nR nS P(Yn+1 ∈ ROT−CP (Xn+1 )) ≥ 1 − α.
where nS is the number of directions, nR is the number of Remark 3.4. Our proposed conformal prediction frame-
radius, and no is the number of copies of the origin. work OT-CP with optimal transport merging score function
generalizes the Merge-CP approaches. More specifically,
The corresponding conformal prediction set is obtained as: under the additional assumption that we are transporting
a source Gaussian (resp. uniform) distribution to a target
{y ∈ Y : ∥T̄n+1 ◦ S(Xn+1 , y)∥ ≤ rα,n+1 }. (15)
Gaussian (resp. uniform) distribution, the transport map is
Remark 3.3 (Computational Issues). While appealing, the affine (Gelbrich, 1990; Muzellec & Cuturi, 2018) with a pos-
previous result has notable computational limitations. At itive definite linear map term. This results in Equation (16)
every new candidate y ∈ Y, the empirical transport map being equivalent to the Mahalanobis distance.
must be recomputed which might be untractable. Moreover,
the coverage guarantee does not hold if the transport map 3.3. Coverage Guarantees under Approximations
is computed solely on a hold-out independent dataset, as When dealing with high-dimensional data or complex dis-
it is usually done in split conformal prediction. Plus, for tributions, it is essential to find computationally feasible
computational efficiency, the empirical entropic map cannot methods to approximate the optimal transport map T ⋆ with
be directly leveraged, since the target values would no longer a map T̂ . In practical applications, we will rely on empirical
follow a uniform distribution, as described in Equation (14). approximations of the Brenier (1991) map using finite sam-
To address these challenges, we propose two simple ap- ples. Note that this approach may encouter a few statistical
proaches in the following section. roadblocks, as such estimators are significantly hindered by
the curse of dimensionality (Chewi et al., 2024). However,
conformal prediction allows us to maintain a coverage level
3.2. Optimal Transport Merging
irrespective of sample size limitations. We defer the pre-
We introduce optimal transport merging, a procedure that sentation of this practical approach to section 3.4 and focus
reduces any vector-valued score S(x, y) ∈ Rd to a suitable first on coverage guarantees.

5
Multivariate Conformal Prediction using Optimal Transport

Coverages of Approximated Quantile Region However, this is only an empirical coverage statement:
Let us assume an arbitrary approximation T̂ of the Brenier
n+1
(1991) map and define the corresponding quantile region as 1 X
1{Zi ∈ R(T̂n+1 , r̂α,n+1 )} ≥ 1 − α
n + 1 i=1
R T̂ , r = {z ∈ Rd : ∥T̂ (z)∥ ≤ r},


which does not imply coverage wrt P unless n → ∞. The


The coverage in Equation (18) is not automatically main-
following result shows how to obtain finite sample validity.
tained since Û := T̂# P may not coincide with U. As a
result, the validity of the approximated quantile region may Lemma 3.5 (Coverage of Empirical Quantile Region). Let
be compromised unless we can control the magnitude of the Z1 , . . . , Zn , Zn+1 be a sequence of exchangeable variables
error ∥Û − U∥, which requires additional regularity assump- in Rd , then, P(Zn+1 ∈ R b α,n+1 ) ≥ 1 − α, where, for
tions. In its standard formulation, conformal prediction simplicity, we denoted the approximated empirical quantile
relies on an empirical setting and does not directly apply to region as R b α,n+1 = R(T̂n+1 , r̂α,n+1 ).
the continuous case, and hence does not provide a solution
for calibrating entropic quantile regions. However, a careful This can be directly applied to obtain conformal prediction
inspection of the 1D case reveals that understanding the set for vector-valued non-conformity score functions Zi =
distribution of the probability integral transform is key: S(Xi , Yi ) ∈ Rd for i in [n + 1] in Lemma 3.5.
Proposition 3.6. The conformal prediction set is defined as
1 1
 
• U 0, n, 2, . . . , 1 ∼ Fn (Z) ̸= F (Z) ∼ U(0, 1) . n o
R
b α,n+1 (Xn+1 ) = y ∈ Y : ∥T̂ ◦ S(Xn+1 , y)∥ ≤ r̂α,n+1
Instead of relying on an analysis of approximation error 
to quantify the deviation |Fn − F | under certain regularity with r̂α,n+1 = inf r ≥ 0 : Ûn+1 (B(0, r)) ≥ 1 − α . It
conditions, conformal prediction fully characterizes the dis- satisfies a distribution-free finite sample coverage guarantee
tribution of the probability integral transform and calibrates  
the radius of the quantile region accordingly. We follow this P Yn+1 ∈ R b α,n+1 (Xn+1 ) ≥ 1 − α. (19)
idea and note that by definition, we have
Approaches relying on vector-valued probability integral
P(R(T̂ , r)) = P(∥T̂ (z)∥ ≤ r) = Û(B(0, r)).
transform, e.g., by leveraging Copulas, have been recently
Instead of relying on Û ≈ U, we define explored (Messoudi et al., 2021; Park et al., 2024) and con-
cluded that loss of coverage can occur when the estimated
rα (T̂ , P) = inf{r : Û(B(0, r)) ≥ 1 − α} (17) copula of the scores deviates from the true copula and thus
does not formally guarantee finite-sample validity. To our
that naturally leads to a desired coverage with the approxi- knowledge, Proposition 3.6 provides the first calibration
mated transported map. For r̂α = rα (T̂ , P), it holds guarantee for such confidence regions without assumptions
  on the distribution, for any approximation map T̂ .
P Z ∈ R(T̂ , r̂α ) ≥ 1 − α.
3.4. Implementation with the Entropic Map
By extension, a quantile region of the vector-valued score We assume access to two families of samples: residuals
Z = S(X, Y ) ∈ Rd of a prediction model ŷ provides an (z1 , . . . , zn ), and a discretization of the uniform grid on the
uncertainty set for the response of a given input X, with the sphere, (u1 , . . . , um ), with sizes n, m that will be usually
prescribed coverage (1 − α) expressed as different, n ̸= m. Learning the entropic map estimator as in
 Section 3.4 requires running the Sinkhorn (1964) algorithm
R
b α (X) = y ∈ Y : ∥T̂ ◦ S(X, y)∥ ≤ r̂α .
for a given regularization ε on a n × m cost matrix. At test
time, for each evaluation, computing the weights in Equa-
P(Y ∈ R
b α (X)) ≥ 1 − α. (18) tion (12) requires computing the distances of a new score
z to the uniform grid. The complexity is therefore O(nm)
We give the finite sample analogy of Equation (18), which when training the map and conformalizing its norms, and
provides a coverage guarantee even when the transport O(m) to transport a conformity score for a given y.
map is an approximation obtained using both entropic
regularization and finite sample data e.g in Equation (11). Sampling on the sphere. As mentioned by Hallin et al.
Given such an approximated map T̂n+1 and applying and (2021), it is preferable to sample the uniform measure
the empirical radius r̂α,n+1 = rα (T̂n+1 , Pn+1 ), it holds Ud with diverse samples. This can be achieved using
stratified sampling on radii lengths and low-discrepancy
Pn+1 (Zn+1 ∈ R(T̂n+1 , r̂α,n+1 )) ≥ 1 − α. samples picked on the sphere. We borrow inspiration from

6
Multivariate Conformal Prediction using Optimal Transport
103
method
M-CP
Merge-CP
102 Merge-CP (Mah)
OT-CP
region size

101

100

2) io (2) 2) 2) 2) 2) 2) 2) 3) 3) (3) (3) p (3


) )
s (4 air (6
) 6) 6)
r2 ( b s1 ( alcofi ( edm ( enb ( se ( taxi ( ra ( scpf ( sf1 sf2 old 1d ( tp7d (
ans
u
bi r t h c h o u j u slum seh a t p a
hou
Figure 1. We report the mean and standard error of the region size across 10 different seeds. For M-CP, we use 300 samples to compute
the conditional mean, and for OT-CP, we use ε = 0.1 and 215 = 32768 points in the uniform target measure. Overall, OT-CP displays
smaller region size than other baselines (13 out of 17 datasets). The output dimension d of each dataset is provided next to its name.

the review provided in (Nguyen et al., 2024) and pick sion boxes, we follow (Dheur et al., 2025) and leverage
their Gaussian based mapping approach (Basu, 2016). the empirical quantiles return by MQF2 to compute boxes
This consists of mapping a low-discrepancy sequence (Zhou et al., 2024).
w1 , . . . , wL on [0, 1]d to a potentially low-discrepancy
OT-CP: our implementation requires tuning two important
sequence θ1 , . . . , θL on Sd−1 through the mapping
hyperparameters: the entropic regularization ε and the total
θ = Φ−1 (w)/∥Φ−1 (w)∥2 , where Φ−1 is the inverse CDF
number of points used to discretize the sphere m, not nec-
of N (0, 1) applied entry-wise.
essarily equal to the input data sample size n. These two
parameters describe a fundamental statistical and computa-
4. Experiments tional trade-off. On the one hand, it is known that increasing
m will mechanically improve the ability of Tε to recover
4.1. Setup and Metrics
in the limit T ⋆ (or at least solve the semi-discrete (Peyré
We borrow the experimental setting provided by Dheur et al. & Cuturi, 2019) problem of mapping n data points to the
(2025) and benchmark multivariate conformal methods on sphere). However, large m incurs a heavier computational
a total of 24 tabular datasets. Total data size n in these price when running the Sinkhorn algorithm. On the other
datasets ranges from 103 to 50,000, with input dimension hand, increasing ε improves on both computational and
p ranging from 1 to 348, and output dimension d ranging statistical aspects, but deviates further the estimated map
from 2 to 16. We adopt their approach, which is to rely on a from the ground truth T ⋆ to target instead a blurred map.
multivariate quantile function forecaster (MQF2 , Kan et al., We have experimented with these aspects and derive from
2022), a normalizing flow that is able to quantify output our experiments that both m and ε should be increased
uncertainty conditioned on input x. However, in accordance to track increase in dimension. As a sidenote, we do ob-
with our stance mentioned in the background section, we serve that debiasing the outputs of the Sinkhorn algorithm
will only assume access to the conditional mean (point-wise) does not result in improved results, which agrees with the
estimator for OT-CP. findings in (Pooladian et al., 2022). We use the OTT-JAX
toolbox (Cuturi et al., 2022) to compute these maps.
As is common in the field, we evaluate the methods using
several metrics, including marginal coverage (MC), and
4.3. Results
mean region size (Size). The latter is using importance
sampling, leveraging (when computing test time metrics We present results by differentiating datasets with small
only), the generative flexibility provided by the MQF2 as an dimension d ≤ 6 from datasets with higher dimensionality
invertible flow. See (Dheur et al., 2025) and their code for 14 ≤ d ≤ 16, that we expect to be more challenging to han-
more details on the experimental setup. dle with OT approaches, owing to the curse of dimensional-
ity that might degrade the quality of multivariate quantiles.
4.2. Hyperparameter Choices Results in Figure 4 indicate an improvement (smaller re-
gion for similar coverage) on 15 out of 18 datasets in lower
We apply default parameters for all three competing meth-
dimensions, this edge vanishing in the higher-dimensional
ods, M-CP and Merge-CP, using (or not) the Mahalanobis
regime. Ablations provided in Figure 2 highlight the role of
correction. For M-CP using conformalized quantile regres-

7
Multivariate Conformal Prediction using Optimal Transport
#target points = 8192 #target points = 32768
epsilon
1012 0.001 0.1
0.01 1.0
109
region size

106
103
100
ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)

taxi (2)
jura (3)
scpf (3)
house (2)

sf1 (3)
sf2 (3)

atp7d (6)
slump (3)

air (6)
atp1d (6)

rf1 (8)
rf2 (8)
wq (14)
households (4)

oes10 (16)
oes97 (16)
scm1d (16)
scm20d (16)

ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)

taxi (2)
jura (3)
scpf (3)

rf1 (8)
rf2 (8)
wq (14)
house (2)

sf1 (3)
sf2 (3)
slump (3)

air (6)
atp1d (6)
atp7d (6)
households (4)

oes10 (16)
oes97 (16)
scm1d (16)
scm20d (16)
Figure 2. This plot details the impact of the two important hyperparameters one needs to set in OT-CP: number of target points m sampled
from the uniform ball and the ε regularization level. As can be seen, larger sample size m improves region size (smaller the better) for
roughly all datasets and regularization strengths. On the other hand, one must tune ε to operate at a suitable regime: not too low, which
results in the well-documented poor statistical performance of unregularized / linear program OT, nor too high, which would lead to a
collapse of the entropic map to the sphere. Using OTT-JAX and its automatic normalizations, we see that ε = 0.1 works best overall.

80 method 1013 method


60 M-CP M-CP
Merge-CP 1010 Merge-CP
time (s)

Merge-CP (Mah) Merge-CP (Mah)


region size
40
OT-CP 107 OT-CP
20
104
0
101
ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)

taxi (2)
jura (3)
scpf (3)
sf1 (3)
sf2 (3)

air (6)
atp1d (6)
atp7d (6)
house (2)

slump (3)
households (4)

(8) (8) (14


) 6) 6) 6)
0 (1 s97 (1 m1d (1 20d (1
6)
rf1 rf2 wq s 1
oe oe sc scm
Figure 4. As in 1, we report mean and standard errors for re-
Figure 3. Computational time on small dimensional datasets. OT- gion size (log scale) across 10 different seeds for larger datasets.
CP incurs more compute time due to the OT map estimation. See We keep the same parameters and importantly ε = 0.1 and
Fig.7 for a similar picture for higher dimensional datasets. 215 = 32768 points in the uniform target measure. We expect
the performance of OT-CP to decrease with dimensionality, but it
does provide a convincing alternative to the other approaches.

ε and m, the entropic regularization strength and the sphere


size respectively. These results show that results for high lenges brought forward by the estimation of OT maps in
m tend to be better but more costly, while the tuning of high dimensions (Chewi et al., 2024) require being particu-
the regularization strength ε needs to be tuned according to larly careful when tuning entropic regularization and grid
dimension (Vacher & Vialard, 2022). Finally, Figure 5 pro- size. However, we show that there exists a reasonable setting
vides an illustration of the non-elliptic CP regions outputted for both of these parameters that delivers good performance
by OT-CP, by pulling back the rescaled uniform sphere across most tasks.
using the inverse entropic mapping described in Section 3.4.

5. Conclusion
We have proposed OT-CP, a new approach that can leverage
a recently proposed formulation for multivariate quantiles
that uses optimal transport theory and optimal transport
map estimators. We show the theoretical soundness of this
approach, but, most importantly, demonstrate its applicabil-
ity throughout a broad range of tasks compiled by (Dheur
et al., 2025). Compared to similar baselines that either use
a conditional mean regression estimator (Merge-CP), or
more involved quantile regression estimators (M-CP), OT-
CP shows overall superior performance, while incurring,
predictably, a higher train / calibration time cost. The chal-

8
Multivariate Conformal Prediction using Optimal Transport

= 0.05 References
= 0.1 Angelopoulos, A. N. and Bates, S. A gentle introduction
= 0.2 to conformal prediction and distribution-free uncertainty
source quantification. arXiv preprint arXiv:2107.07511, 2021.
target
Balasubramanian, V., Ho, S.-S., and Vovk, V. Conformal
prediction for reliable machine learning: theory, adapta-
tions and applications. Newnes, 2014.

Barber, R. F., Candes, E. J., Ramdas, A., and Tibshirani,


R. J. Conformal prediction beyond exchangeability. The
Annals of Statistics, 51(2):816–845, 2023.

Basu, K. Quasi-Monte Carlo Methods in Non-Cubical


Spaces. Stanford University, 2016.

Bates, S., Candès, E., Lei, L., Romano, Y., and Sesia,
M. Testing for outliers with conformal p-values. arXiv
preprint arXiv:2104.08279, 2021.

Brenier, Y. Polar factorization and monotone rearrangement


of vector-valued functions. Communications on Pure
Figure 5. Conformal sets recovered by mapping back the reduced
sphere on the Manhattan map, in agreement with Equation 18, on a
and Applied Mathematics, 44(4), 1991. doi: 10.1002/cpa.
prediction for the taxi dataset. We use the inverse entropic map 3160440402.
mentioned in Section 3.4, mapping back the gridded sphere of size
m = 215 for each level, and plotting its outer contour. Cella, L. and Ryan, R. Valid distribution-free inferential
models for prediction. arXiv preprint arXiv:2001.09225,
2020.

6. Concurrent Work. Chernozhukov, V., Galichon, A., Hallin, M., and Henry, M.
Monge–Kantorovich depth, quantiles, ranks and signs.
Concurrently to our work, Thurin et al. (2025) proposed
The Annals of Statistics, 45(1):223 – 256, 2017. doi:
recently to leverage OT in CP with a similar approach, de-
10.1214/16-AOS1450. URL https://doi.org/10.
riving a similar CP set as in Equation (15) and analyzing
1214/16-AOS1450.
a variant with asymptotic conditional coverage under addi-
tional regularity assumptions. However, our methods differ Chernozhukov, V., Wüthrich, K., and Zhu, Y. Exact and ro-
in several key aspects. On the computational side, our im- bust conformal inference methods for predictive machine
plementation leverages general entropic maps (Section 3.4) learning with dependent data. Conference On Learning
without compromising finite-sample coverage guarantees, Theory, 2018.
an aspect we analyze in detail in Section 3.3. In contrast,
their approach requires solving a linear assignment problem, Chernozhukov, V., Wüthrich, K., and Zhu, Y. An exact and
using for instance the Hungarian algorithm, which has cubic robust conformal inference method for counterfactual and
complexity O(n3 ) in the number of target points, and which synthetic controls. Journal of the American Statistical
also requires having a target set on the sphere that is of the Association, 116(536):1849–1864, 2021.
same size as the number of input points. With our notations
in Section 3.4, they require n = m, whereas we set m to any- Chewi, S., Niles-Weed, J., and Rigollet, P. Statistical opti-
where between 212 and 215 , independently of n. While they mal transport. arXiv preprint arXiv:2407.18163, 2024.
mention efficient approximations that reduce complexity to
quadratic in (Thurin et al., 2025, Remark 2.3), their theo- Cuturi, M. Sinkhorn distances: Lightspeed computation
retical results do not yet cover these cases since their anal- of optimal transport. In Advances in neural information
ysis relies on the fact that ranks are random permutations processing systems, pp. 2292–2300, 2013.
of {1/n, 2/n, . . . , 1}, which cannot be extended to using
Sinkhorn with soft assignment. In contrast, our work es- Cuturi, M., Teboul, O., and Vert, J.-P. Differentiable ranking
tablishes formal theoretical coverage guarantees even when and sorting using optimal transport. Advances in neural
approximated (pre-trained) transport map are used. information processing systems, 32, 2019.

9
Multivariate Conformal Prediction using Optimal Transport

Cuturi, M., Meng-Papaxanthos, L., Tian, Y., Bunne, C., Johnstone, C. and Cox, B. Conformal uncertainty sets for
Davis, G., and Teboul, O. Optimal transport tools (ott): robust optimization. In Carlsson, L., Luo, Z., Cheru-
A jax toolbox for all things wasserstein, 2022. URL bin, G., and An Nguyen, K. (eds.), Proceedings of the
https://arxiv.org/abs/2201.12324. Tenth Symposium on Conformal and Probabilistic Pre-
diction and Applications, volume 152 of Proceedings
Dheur, V., Fontana, M., Estievenart, Y., Desobry, N., and of Machine Learning Research, pp. 72–90. PMLR, 08–
Taieb, S. B. Multi-output conformal regression: A unified 10 Sep 2021. URL https://proceedings.mlr.
comparative study with new conformity scores, 2025. press/v152/johnstone21a.html.
URL https://arxiv.org/abs/2501.10533.
Kan, K., Aubet, F.-X., Januschowski, T., Park, Y., Benidis,
Fisch, A., Schuster, T., Jaakkola, T., and Barzilay, R. Few-
K., Ruthotto, L., and Gasthaus, J. Multivariate quantile
shot conformal prediction with auxiliary tasks. ICML,
function forecaster. In International Conference on Artifi-
2021.
cial Intelligence and Statistics, pp. 10603–10621. PMLR,
Gammerman, A., Vovk, V., and Vapnik, V. Learning by 2022.
transduction, 1998.
Katsios, K. and Papadopulos, H. Multi-label conformal
Gelbrich, M. On a formula for the l2 wasserstein metric prediction with a mahalanobis distance nonconformity
between measures on euclidean and hilbert spaces. Math- measure. In Vantini, S., Fontana, M., Solari, A., Boström,
ematische Nachrichten, 147(1), 1990. H., and Carlsson, L. (eds.), Proceedings of the Thirteenth
Symposium on Conformal and Probabilistic Prediction
Guha, E., Natarajan, S., Möllenhoff, T., Khan, M. E., with Applications, volume 230 of Proceedings of Machine
and Ndiaye, E. Conformal prediction via regression- Learning Research, pp. 522–535. PMLR, 09–11 Sep
as-classification. arXiv preprint arXiv:2404.08168, 2024. 2024. URL https://proceedings.mlr.press/
Hallin, M., del Barrio, E., Cuesta-Albertos, J., and Matrán, v230/katsios24a.html.
C. Distribution and quantile functions, ranks and signs
Kumar, B., Lu, C., Gupta, G., Palepu, A., Bellamy, D.,
in dimension d: A measure transportation approach.
Raskar, R., and Beam, A. Conformal prediction with large
The Annals of Statistics, 49(2):1139 – 1165, 2021. doi:
language models for multi-choice question answering.
10.1214/20-AOS1996. URL https://doi.org/10.
arXiv preprint arXiv:2305.18404, 2023.
1214/20-AOS1996.
Laxhammar, R. and Falkman, G. Inductive conformal
Hallin, M., La Vecchia, D., and Liu, H. Center-outward
anomaly detection for sequential detection of anomalous
r-estimation for semiparametric varma models. Journal of
sub-trajectories. Annals of Mathematics and Artificial
the American Statistical Association, 117(538):925–938,
Intelligence, 2015.
2022.
Hallin, M., Hlubinka, D., and Hudecová, Š. Efficient fully Lin, Z., Trivedi, S., and Sun, J. Conformal prediction inter-
distribution-free center-outward rank tests for multiple- vals with temporal dependence. Transactions of Machine
output regression and manova. Journal of the American Learning Research, 2022.
Statistical Association, 118(543):1923–1939, 2023. Lu, C., Lemay, A., Chang, K., Höbel, K., and Kalpathy-
Henderson, I., Mazoyer, A., and Gamboa, F. Adap- Cramer, J. Fair conformal predictors for applications in
tive inference with random ellipsoids through confor- medical imaging. In Proceedings of the AAAI Conference
mal conditional linear expectation. arXiv preprint on Artificial Intelligence, volume 36, pp. 12008–12016,
arXiv:2409.18508, 2024. 2022.

Ho, S.-S. and Wechsler, H. Query by transduction. IEEE Messoudi, S., Destercke, S., and Rousseau, S. Copula-based
transactions on pattern analysis and machine intelligence, conformal prediction for multi-target regression. Pattern
2008. Recognition, 120:108101, 2021.

Holland, M. J. Making learning more transparent using Muzellec, B. and Cuturi, M. Generalizing point embeddings
conformalized performance prediction. arXiv preprint using the wasserstein space of elliptical distributions. Ad-
arXiv:2007.04486, 2020. vances in Neural Information Processing Systems, 31,
2018.
Izbicki, R., Shimizu, G., and Stern, R. B. Cd-split and
hpd-split: Efficient conformal regions in high dimen- Nguyen, K., Bariletto, N., and Ho, N. Quasi-monte carlo
sions. Journal of Machine Learning Research, 23(87): for 3d sliced wasserstein. In The Twelfth International
1–32, 2022. Conference on Learning Representations, 2024.

10
Multivariate Conformal Prediction using Optimal Transport

Park, J. W., Tibshirani, R., and Cho, K. Semiparametric Zaffran, M., Féron, O., Goude, Y., Josse, J., and Dieuleveut,
conformal prediction. arXiv preprint arXiv:2411.02114, A. Adaptive conformal predictions for time series. In In-
2024. ternational Conference on Machine Learning, pp. 25834–
25866. PMLR, 2022.
Peyré, G. and Cuturi, M. Computational optimal transport.
Foundations and Trends® in Machine Learning, 11, 2019. Zhou, Y., Lindemann, L., and Sesia, M. Conformalized
adaptive forecasting of heterogeneous trajectories. arXiv
Pooladian, A.-A. and Niles-Weed, J. Entropic estimation of preprint arXiv:2402.09623, 2024.
optimal transport maps. arXiv preprint arXiv:2109.12004,
2021.

Pooladian, A.-A., Cuturi, M., and Niles-Weed, J. Debiaser


beware: Pitfalls of centering regularized transport maps.
In International Conference on Machine Learning, pp.
17830–17847. PMLR, 2022.

Quach, V., Fisch, A., Schuster, T., Yala, A., Sohn, J. H.,
Jaakkola, T. S., and Barzilay, R. Conformal language
modeling. arXiv preprint arXiv:2306.10193, 2023.

Romano, Y., Patterson, E., and Candes, E. Conformalized


quantile regression. Advances in neural information pro-
cessing systems, 32, 2019.

Shafer, G. and Vovk, V. A tutorial on conformal prediction.


Journal of Machine Learning Research, 2008.

Sinkhorn, R. A relationship between arbitrary positive ma-


trices and doubly stochastic matrices. Ann. Math. Statist.,
35:876–879, 1964.

Straitouri, E., Wang, L., Okati, N., and Rodriguez, M. G.


Improving expert predictions with conformal prediction.
In International Conference on Machine Learning, pp.
32633–32653. PMLR, 2023.

Thurin, G., Nadjahi, K., and Boyer, C. Optimal transport-


based conformal prediction, 2025. URL https://
arxiv.org/abs/2501.18991.

Tibshirani, R. J., Foygel Barber, R., Candes, E., and Ramdas,


A. Conformal prediction under covariate shift. Advances
in neural information processing systems, 32, 2019.

Vacher, A. and Vialard, F.-X. Parameter tuning and model


selection in optimal transport with semi-dual brenier for-
mulation. In Oh, A. H., Agarwal, A., Belgrave, D., and
Cho, K. (eds.), Advances in Neural Information Process-
ing Systems, 2022.

Vovk, V., Gammerman, A., and Shafer, G. Algorithmic


learning in a random world. Springer, 2005.

Wang, Z., Gao, R., Yin, M., Zhou, M., and Blei, D. M. Prob-
abilistic conformal prediction using conditional random
samples. arXiv preprint arXiv:2206.06584, 2022.

Xu, C. and Xie, Y. Conformal prediction interval for dy-


namic time-series. ICML, 2021.

11
coverage

0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)
house (2)
taxi (2)
jura (3)

A. Appendix
scpf (3)
sf1 (3)
sf2 (3)
slump (3)
households (4)
air (6)
atp1d (6)
atp7d (6)

#target points = 4096


rf1 (8)
rf2 (8)
wq (14)
oes10 (16)
oes97 (16)
scm1d (16)
scm20d (16) time (s) coverage

0
5
10
15
20
25
30
ansur2 (2)
0.60
0.65
0.70
0.75
0.80
0.85

bio (2)
births1 (2)
calcofi (2)

rf1
edm (2)
enb (2)

(8)
house (2)
rf1

taxi (2)
jura (3)
scpf (3)
sf1 (3)

rf2
sf2 (3)
M-CP

slump (3)
OT-CP

(8)
households (4)
air (6)
atp1d (6)
method
Merge-CP

atp7d (6)

#target points = 8192


rf1 (8)

wq
rf2 (8)
w

wq (14)

(14
oes10 (16)

)
Merge-CP (Mah)

oes97 (16)
scm1d (16)
scm20d (16)

oes

12
1 0
1

ansur2 (2)
bio (2)

)
births1 (2)

oes
calcofi (2)
edm (2)
9

enb (2)
house (2)
taxi (2)
jura (3)
scpf (3)
sf1 (3) )
sf2 (3)
M-CP
1

OT-CP

slump (3)
households (4)
air (6)
method
Merge-CP

atp1d (6)
)

atp7d (6)
rf1 (8)

We provide a few additional results related to the experiments proposed in Section 4.

#target points = 16384


oes oes scm scm2

scm scm2
rf2 (8)
0

wq (14)
oes10 (16)
Merge-CP (Mah)

oes97 (16)
scm1d (16)
Multivariate Conformal Prediction using Optimal Transport

(16 97 (16 1d (16 0d (16


(8) f2 (8) q (14) 0 (16) 7 (16) d (16) d (16)

scm20d (16)
)

ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)
house (2)
taxi (2)
jura (3)
scpf (3)
sf1 (3)
Figure 6. Coverage for higher dimensional datasets, corresponding to the setting displayed in Figure 6.

Figure 7. Runtimes for higher dimensional datasets, corresponding to the setting displayed in Figure 6.

sf2 (3)
slump (3)
households (4)
air (6)
atp1d (6)

Figure 8. Ablation: coverage quality as a function of hyperparameters, with the setting corresponding to Figure 2.
atp7d (6)
rf1 (8)

0.01
#target points = 32768
0.001
rf2 (8)
wq (14)

epsilon
oes10 (16)
oes97 (16)
scm1d (16)
scm20d (16)

1.0
0.1
time (s)

0
25
50
75
100
125
150
175
ansur2 (2)
coverage
bio (2)
births1 (2)
calcofi (2)
edm (2)

0.6
0.7
0.8
0.9
enb (2)
house (2)
taxi (2)
jura (3)
scpf (3)
sf1 (3)
sf2 (3)
slump (3)
households (4)
air (6)
atp1d (6) ansur2 (2)
atp7d (6)

#target points = 4096


rf1 (8)
rf2 (8)
wq (14)
oes10 (16) bio (2)
oes97 (16)
scm1d (16)
scm20d (16)
ansur2 (2) births1 (2)
bio (2)
births1 (2)
calcofi (2)
edm (2)
enb (2)
house (2)
calcofi (2)
taxi (2)
jura (3)
scpf (3)
sf1 (3)
sf2 (3)
edm (2)
slump (3)
households (4)
air (6) enb (2)
atp1d (6)
atp7d (6)

#target points = 8192


rf1 (8)
rf2 (8)
wq (14) house (2)
oes10 (16)
oes97 (16)
scm1d (16)

13
scm20d (16)
ansur2 (2)
bio (2)
taxi (2)
births1 (2)
calcofi (2)
edm (2) jura (3)
enb (2)
house (2)
taxi (2)
jura (3)
scpf (3) scpf (3)
sf1 (3)
sf2 (3)
slump (3)
households (4)
air (6) sf1 (3)
atp1d (6)
atp7d (6)

#target points = 16384


rf1 (8)
rf2 (8)
wq (14)
oes10 (16)
sf2 (3)
oes97 (16)
Multivariate Conformal Prediction using Optimal Transport

scm1d (16)
scm20d (16) slump (3)
ansur2 (2)
bio (2)
births1 (2)
calcofi (2)
M-CP

edm (2)
households (4)
OT-CP

enb (2)
house (2)
taxi (2)
jura (3) air (6)
scpf (3)
method

sf1 (3)
Merge-CP

sf2 (3)
slump (3)
households (4) atp1d (6)

Figure 10. Ablation: running time as a function of hyperparameters, with the setting corresponding to Figure 2.
air (6)
atp1d (6)
atp7d (6)
Figure 9. Coverage of all baselines on small dimensional datasets, corresponding to the region sizes given in Figure 1.

0.01
#target points = 32768
rf1 (8)

0.001
rf2 (8) atp7d (6)
epsilon
wq (14)
oes10 (16)
Merge-CP (Mah)

oes97 (16)
scm1d (16)

1.0
0.1
scm20d (16)
Multivariate Conformal Prediction using Optimal Transport

B. Proofs
Proposition B.1. Given n discrete sample points distributed over a sphere with radii {0, n1R , n2R , . . . , 1} and directions
uniformly sampled on the sphere, the smallest radius rα = njαR satisfying (1 − α)-coverage is is determined by
 
(n + 1)(1 − α) − no
jα = ,
nS

where nS is the number of directions, nR is the number of radii, and no is the number of copies of the origin (∥U ∥ = 0).

Proof. The discrete spherical uniform distribution places the same probability mass on all n + 1 sample points, including
the no copies of the origin. As such, given a radius rj = njR , we have

1
P(∥U ∥ = rj ) = nS · .
n+1
The cumulative probability up to radius rj is given by:
j
X no nS
P(∥U ∥ ≤ rj ) = P(∥U ∥ = 0) + P(∥U ∥ = rk ) = +j× .
n+1 n+1
k=1


To find the smallest rα = nR such that P(∥U ∥ ≤ rjα ) ≥ 1 − α, it suffices to solve:
no nS
+ jα × ≥ 1 − α.
n+1 n+1

Lemma B.2 (Coverage of Empirical Quantile Region). Let Z1 , . . . , Zn , Zn+1 be a sequence of exchangeable variables in
Rd , then, P(Zn+1 ∈ Rb α,n+1 ) ≥ 1 − α, where, for simplicity, we denoted the approximated empirical quantile region as
Rα,n+1 = R(T̂n+1 , r̂α,n+1 ).
b

Proof. By exchangeability of Z1 , . . . , Zn+1 and symmetry of the set R


b α,n+1 , it holds

P(Zn+1 ∈ R
b α,n+1 ) = P(Zi ∈ R
b α,n+1 ) ∀i ∈ [n + 1].

By taking the average on both side, we have:


n+1
1 X
P(Zn+1 ∈ R
b α,n+1 ) = P(Zi ∈ R
b α,n+1 )
n + 1 i=1
" n+1
#
1 X
=E 1{Zi ∈ Rb α,n+1 }
n + 1 i=1
 
= E Pn+1 (Zn+1 ∈ R b α,n+1 )

≥ 1 − α.

14
Multivariate Conformal Prediction using Optimal Transport

ansur2 (2) bio (2) births1 (2) calcofi (2) edm (2) enb (2) house (2) taxi (2) jura (3) scpf (3) sf1 (3) sf2 (3)
epsilon #target
0.001 4096 3.3±0.064 0.46±0.057 78±70 2.6±0.089 1.9±0.3 0.81±0.21 2±0.051 7±0.12 13±2.6 0.78±0.4 14±2.6 0.82±0.32
8192 3.4±0.059 0.45±0.057 78±70 2.6±0.089 1.9±0.29 0.81±0.2 2±0.05 7±0.13 11±2.6 0.73±0.23 16±3.9 0.4±0.16
16384 3.4±0.059 0.46±0.058 78±70 2.6±0.093 1.8±0.28 0.83±0.21 2±0.048 7±0.13 12±2.3 0.87±0.34 21±4.8 0.44±0.2
32768 3.4±0.063 0.46±0.058 78±70 2.6±0.092 1.9±0.3 0.81±0.2 2±0.05 7±0.13 12±2.6 1.2±0.47 16±2.9 0.57±0.18
0.01 4096 3.3±0.055 0.55±0.12 78±70 2.5±0.084 1.9±0.3 0.81±0.21 2±0.05 7.5±0.63 11±2.8 0.43±0.15 12±2.1 0.2±0.086
8192 3.3±0.054 0.56±0.13 78±70 2.5±0.082 1.8±0.3 0.8±0.21 2±0.049 7.5±0.69 10±2.6 0.37±0.15 12±2.8 0.17±0.063
16384 3.3±0.045 0.56±0.12 78±70 2.5±0.082 1.7±0.24 0.8±0.21 2±0.05 7.5±0.71 13±4.3 0.4±0.18 11±2.9 0.19±0.076
32768 3.3±0.064 0.56±0.12 78±70 2.5±0.085 1.7±0.26 0.82±0.22 2±0.049 7.5±0.69 10±2.7 0.41±0.17 12±2.6 0.18±0.071
0.1 4096 3.3±0.058 0.49±0.011 78±70 2.5±0.084 1.6±0.25 0.81±0.21 2.3±0.065 8.3±1.4 9.2±2.8 0.37±0.15 6.6±0.96 0.48±0.1
8192 3.3±0.059 0.49±0.011 78±70 2.5±0.084 1.6±0.26 0.8±0.21 2.3±0.065 8.2±1.5 9.4±2.9 0.4±0.15 6.1±0.89 0.53±0.11
16384 3.3±0.054 0.49±0.012 78±70 2.5±0.081 1.6±0.26 0.8±0.21 2.3±0.058 8.2±1.4 9.4±2.9 0.37±0.12 6.4±0.83 0.45±0.092
32768 3.3±0.051 0.49±0.011 77±70 2.5±0.083 1.5±0.25 0.79±0.2 2.3±0.057 8.2±1.4 8.9±2.9 0.36±0.12 6.5±1.2 0.5±0.1
1 4096 3.6±0.055 0.65±0.019 78±70 2.5±0.1 1.7±0.27 0.92±0.24 3±0.13 6.4±0.14 13±4 0.45±0.16 9.5±1.9 0.84±0.13
8192 3.6±0.067 0.59±0.013 78±70 2.5±0.099 1.7±0.26 0.91±0.24 3±0.14 6.3±0.14 13±4 0.42±0.14 10±1.8 0.93±0.16
16384 3.5±0.072 0.57±0.016 78±70 2.5±0.099 1.7±0.27 0.91±0.24 3±0.13 6.4±0.14 14±4 0.48±0.17 9.8±1.7 0.91±0.17
32768 3.5±0.061 0.6±0.028 78±71 2.5±0.1 1.7±0.27 0.91±0.24 2.9±0.13 6.4±0.15 13±4 0.47±0.17 10±1.7 0.9±0.17

slump (3) households (4) air (6) atp1d (6) atp7d (6)
epsilon #target
0.001 4096 15±7.6 37±1.4 2.6E+03±1.9E+03 81±19 8.5E+02±4.5E+02
8192 7.9±2 36±1.9 7.1E+02±56 99±41 5.9E+02±1.8E+02
16384 11±3.7 34±1.3 6.9E+02±52 65±19 9.4E+02±3E+02
32768 12±4.3 36±2.6 6.8E+02±36 87±28 5.1E+02±2E+02
0.01 4096 20±6.8 37±1.6 8.5E+02±1E+02 85±24 7.9E+02±4.1E+02
8192 12±4.9 34±1.7 1.3E+03±7E+02 82±24 4E+02±1.5E+02
16384 7.1±2.2 33±0.81 5.5E+02±47 1.1E+02±26 3.7E+02±68
32768 10±4 31±0.97 4.8E+02±51 42±9.1 2.8E+02±98
0.1 4096 5.8±1.3 27±1.3 3.2E+02±32 8.1±1.7 33±9.2
8192 5.9±1.3 26±1.3 3.1E+02±33 5.7±1 27±6.9
16384 5.9±1.4 25±1 3.1E+02±34 4±1.4 26±7.7
32768 5.1±1.1 25±1 3.1E+02±34 3.8±0.88 16±5.1
1 4096 14±5.3 29±1.3 4.3E+02±31 6.2±1.7 69±25
8192 15±5.3 30±2.1 3.4E+02±38 5.6±2.2 69±25
16384 16±5.6 28±1.1 4.1E+02±36 6.1±2 76±27
32768 15±5.5 29±1.9 4.3E+02±38 5.6±1.5 73±24

rf1 (8) rf2 (8) wq (14) oes10 (16) oes97 (16) scm1d (16) scm20d (16)
epsilon #target
0.001 4096 2E+13±2E+13 2E+13±2E+13 7.1E+09±3E+09 2.9E+08±8.3E+07 8.7E+08±4E+08 4E+07±3.6E+07 1.7E+07±1.1E+07
8192 2E+13±2E+13 2E+13±2E+13 3.7E+09±1.9E+09 3.7E+08±1.3E+08 1.4E+09±1.2E+09 9.3E+05±5E+05 2.5E+08±1.9E+08
16384 2E+13±2E+13 2E+13±2E+13 6.6E+09±3.2E+09 5.6E+08±4.3E+08 2.5E+08±1.3E+08 3.5E+05±1.3E+05 8.9E+07±5.7E+07
32768 2E+13±2E+13 2E+13±2E+13 3.1E+09±1.2E+09 5.5E+08±3E+08 3.1E+08±9.5E+07 9.7E+05±4.5E+05 1.3E+09±1.3E+09
0.01 4096 2E+13±2E+13 2E+13±2E+13 1.1E+10±7.3E+09 4.3E+09±3.8E+09 3.5E+09±2.5E+09 4.1E+08±3.8E+08 1.3E+11±1.1E+11
8192 2E+13±2E+13 2E+13±2E+13 6.4E+10±6E+10 3E+10±2.8E+10 1E+10±6.1E+09 8.1E+08±5.5E+08 1.1E+11±1.1E+11
16384 2E+13±2E+13 2E+13±2E+13 3.3E+09±7.9E+08 1.1E+09±4.3E+08 1E+10±5.7E+09 4.8E+07±3.7E+07 1.3E+09±8.3E+08
32768 2E+13±2E+13 2E+13±2E+13 5.1E+11±4.9E+11 6.5E+09±5E+09 4E+09±3.2E+09 1.6E+07±9.5E+06 2.7E+08±1.3E+08
0.1 4096 2E+13±2E+13 2E+13±2E+13 8.7E+09±3.7E+09 4.8E+04±3.2E+04 6E+09±6E+09 1.5E+03±6.7E+02 1.3E+06±6.4E+05
8192 2E+13±2E+13 2E+13±2E+13 4.8E+09±1.5E+09 1.7E+05±1.3E+05 6E+09±6E+09 6.2E+02±2.8E+02 1.2E+06±8.7E+05
16384 2E+13±2E+13 2E+13±2E+13 1.3E+10±6.8E+09 5.2E+04±4.7E+04 5.6E+09±5.6E+09 2.2E+02±46 2.9E+05±1E+05
32768 2E+13±2E+13 2E+13±2E+13 7.4E+09±2.9E+09 7.6E+03±5.1E+03 9.2E+07±8.1E+07 1.1E+02±17 1.1E+05±3.1E+04
1 4096 2E+13±2E+13 2E+13±2E+13 8E+08±2E+08 6.6E+02±3.4E+02 8.3E+05±8.1E+05 4.1E+02±76 5.2E+05±6.5E+04
8192 2E+13±2E+13 2E+13±2E+13 6.9E+08±1.7E+08 3.5E+02±1.8E+02 7.7E+05±7.6E+05 8.5E+02±3.1E+02 1.1E+06±3.9E+05
16384 2E+13±2E+13 2E+13±2E+13 5.3E+08±1.2E+08 2.2E+02±1.5E+02 4E+05±4E+05 1.3E+02±14 4.7E+05±1.8E+05
32768 2E+13±2E+13 2E+13±2E+13 5.5E+08±1.5E+08 1.9E+02±1.6E+02 3.1E+05±3.1E+05 1E+02±11 3.4E+05±6.4E+04

Table 1. Mean region size for varying ε and the number of target points in the ball.

15

You might also like