Waise Conformalobjectdetection Submitted
Waise Conformalobjectdetection Submitted
1 Introduction
Recent works in object detection show a great variety of models and approaches.
Among the most notable we can mention: RCNN [13], Fast-RCNN [12], Reti-
naNet [22], FPN [21], YOLO and its several versions [28–30], SSD [24] or DETR
[7].
Despite their impressive success observed on various benchmarks, many chal-
lenges remain ahead. For critical systems, several additional guarantees shall be
provided to avoid catastrophic consequences: in an autonomous vehicle, a pedes-
trian mislocated by the system could be hurt or killed; in a cancer detection
system, several cancer cells missed by the object detector could not be treated.
To ensure the safety of the user, the uncertainty of the location of the object to
detect should be quantified, allowing to create safeguards around the object.
The main challenge consists in providing reliable uncertainty quantification of
their prediction errors. While many object detection models compute so-called
confidence scores which can be interpreted as basic estimators of uncertainty,
they are often unreliable (i.e. over or under-estimating the true uncertainty).
2 F. de Grancey et al.
Another difficulty stems from the complex interplay between the classification-
type errors and the localization-type errors of the object detectors. In addition,
the risks associated with each type of error are application-dependent.
For safety-related applications, one may seek to obtain various guarantees.
One such guarantee related to object localization, that will be addressed in this
paper, may read: ensure that at least a significant portion (i.e. a user-specified
fraction) of the objects recognized in visual images satisfy this property: their
true bounding boxes are fully covered6 by the boxes predicted by a given object
detection model. This type of guarantee may be helpful, for example, to build
reliable models for tumor discovery, obstacle detection or trajectory estimation.
6
All the coordinates of the true box will be found inside the rectangle defined by the
predicted bounding box of the object.
Object Detection With Probabilistic Guarantees 3
1. Data collection: Two different datasets are collected: a training set and a
calibration set, which will be used to learn and evaluate a ML model. (See
below for independence and distribution requirements on the data.)
2. Training step: a machine learning model fˆ is learned on the training set.
The underlying model can be of virtually any kind (a deep neural network,
a random forest, etc).
3. Conformalization step: the learned model fˆ is evaluated on the calibration
set. This step consists in measuring the errors of fˆ on the calibration set,
and in reporting a quantile qα of these errors for some pre-specified
risk level
α ∈ (0, 1). More precisely, given a non-conformity score s ŷ, y to assess the
“distance” between a prediction ŷ = fˆ(x) and a ground truth y, we compute
the errors of fˆ on all data points (xi , yi ) of the calibration set8 :
Ri = s ŷi , yi ,
i = 1, . . . , nc , (1)
7
More complex variants exist. The typical process outlined here is more precisely
known as split conformal prediction.
8
The errors are sometimes called “residuals” (hence the Ri notation).
4 F. de Grancey et al.
P Y ∈ C α (X) ≥ 1 − α .
We say in this case that the method has a coverage of 1 − α. The above
guarantee means that, for a fraction 1−α of all possible calibration sets in Step 3
and possible data points (x, y) in Step 4, the prediction set C α (x) contains the
true label y. In other words, if we repeated the overall conformal prediction
process 1-4 many times independently, it would err a fraction at most α of the
time. Details about dangers of interpretation are given in Section 6.
9
Mathematically speaking, it is in fact sufficient that the calibration data and the
data at inference time are exchangeable, conditionally on the training data.
Object Detection With Probabilistic Guarantees 5
quantile alpha=0.1
mean 25
4000
20
2000 10
5
1000
0
0
−20 −15 −10 −5 0 5 10 15 20 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Errors Ri in pixels Risk Level alpha
Fig. 2. Left: histogram of the errors Ri for the coordinate ymax , and the corresponding
quantile qα for α = 0.1. Right: Evolution of the quantile value qα with risk level α.
In Table 1, the first four lines (coordinate-wise) give the evaluation of the
test
observed coverage on the DBDD set, i.e., for each coordinate, the proportion of
(true positive) boxes for which the true coordinate lies within the corresponding
prediction set. We can see that Thm. 1 is verified whatever the specified coverage.
In this section we seek the following guarantee: at inference time, among all
true bounding (pedestrian) boxes that are detected, a fraction 1 − α of them
are correctly covered by conformalized boxes.10 We explained in Eq. (4) how to
compute error margins to locate unknown coordinates xmin , xmax , ymin , ymax of a
box, given predictions x̂min , x̂max , ŷmin , ŷmax . It might be tempting to define the
10
The 1 − α guarantee only holds on average over all calibration sets, see Section 6.
8 F. de Grancey et al.
conformalized box as the largest (worst-case) box whose coordinates are within
the intervals C α , i.e., the box with coordinates
5 Image-Wise Conformalization
6 Statistical Pitfalls
solution cannot work in settings where a very large number of subdomains need
to be distinguished (since calibration sets need to be large enough) or where
these subdomains are not known a priori. In any case, due to the statistical
nature of conformal prediction methods, one must keep in mind that there are
some boxes or images on which these methods will fail at inference.
A guarantee “on average” over calibration sets. Similarly, while the
1 − α coverage is correct on average over all possible calibration sets, its value
might be different for the single calibration set used in practice. The way the
coverage varies from one calibration set to another was described in details in [1].
Next we illustrate this variability on BDD100k with box-wise conformalization
(as in Section 4.3) and α = 0.1, by re-sampling various calibration sets and re-
porting the associated test coverage values. The histogram on Figure 4 shows a
large variability of coverage values, which means that different calibration sets
lead to different coverage values at inference. Fortunately here, most values are
above the specified coverage of 0.9 (since the margins are a little conservative
due to the Bonferroni correction), but the tail probability on the left of 0.9 im-
plies that the user has still a chance to use a calibration set that would result in
a lower-than-expected coverage at inference. Recent works have proposed vari-
ants of conformal prediction (with more conservative margins) to deal with this
variability (e.g., [4, 10]).
35
30
25
20
15
10
0
0.88 0.90 0.92 0.94 0.96
Fraction of Valid Boxes
Fig. 4. Empirical coverage distribution measured on the same test set when sampling
different calibration sets and applying box-wise conformalization as in Section 4.3.
1.0 BDD100k Validation Set 1.0 Cityscape Validation Set 120 Conformalized with BDD100k
Conformalized with Cityscape
0.9 0.9 100
0.8 0.8 80
Pixels
0.7 0.7 60
0.6 0.6 40
0.5 0.5 20
0.4 0.4 0
0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
Risk Level alpha Risk Level alpha Risk Level alpha
Fig. 5. Box-wise empirical coverage oberved on (left) BDD100k validation set vs.
(middle) Cityscape validation set (expected coverage is black dotted line). Right: quan-
tile curves for Ymax with BDD100k or Cityscape training sets considered for calibration.
References
1. Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and
distribution-free uncertainty quantification (2021), arXiv:2107.07511
2. Azevedo, T., de Jong, R., Maji, P.: Stochastic-yolo: Efficient probabilistic object
detection under dataset shifts (2020), arXiv:2009.02967
3. Barber, R.F., Candes, E.J., Ramdas, A., Tibshirani, R.J.: Conformal prediction
beyond exchangeability (2022), arXiv:2202.13415
4. Bates, S., Angelopoulos, A., Lei, L., Malik, J., Jordan, M.I.: Distribution-free,
risk-controlling prediction sets. Journal of the ACM 68(6) (2021)
5. Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Top-
ics, vol. 1. Chapman and Hall/CRC (2015)
6. Bonnin, H., Jenn, E., Alecu, L., Fel, T., Gardes, L., Gerchinovitz, S., Ponsolle,
L., Mamalet, F., Mussot, V., Cappi, C., Delmas, K., Lefevre, B.: Can we reconcile
safety objectives with machine learning performances? In: Proc. of the 11th Edition
of European Congress of Embedded Real Time Systems (ERTS) (2022)
7. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-
to-end object detection with transformers. In: Computer Vision – ECCV 2020.
Springer International Publishing (2020)
8. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R.,
Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene
understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR) (2016)
9. Deepshikha, K., Yelleni, S.H., Srijith, P.K., Mohan, C.K.: Monte carlo dropblock
for modelling uncertainty in object detection (2021), arXiv:2108.03614
10. Ducoffe, M., Gerchinovitz, S., Sen Gupta, J.: A high-probability safety guarantee
for shifted neural network surrogates. In: Proceedings of the Workshop on Artificial
Intelligence Safety (SafeAI 2020). pp. 74–82 (2020)
11. Feng, D., Harakeh, A., Waslander, S.L., Dietmayer, K.: A review and comparative
study on probabilistic object detection in autonomous driving. IEEE Transactions
on Intelligent Transportation Systems (2021)
12. Girshick, R.B.: Fast r-cnn. In: 2015 IEEE International Conference on Computer
Vision (ICCV) (2015)
13. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for
accurate object detection and semantic segmentation. In: 2014 IEEE Conference
on Computer Vision and Pattern Recognition (2014)
14. Harakeh, A., Smart, M., Waslander, S.L.: BayesOD: A Bayesian approach for un-
certainty estimation in deep object detectors (2019), arXiv:1903.03838
15. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for
computer vision? In: Proceedings of the 31st International Conference on Neural
Information Processing Systems (2017)
16. Kraus, F., Dietmayer, K.: Uncertainty estimation in one-stage object detection. In:
2019 IEEE Intelligent Transportation Systems Conference (ITSC) (2019)
17. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research
Logistics Quarterly 2(1-2) (1955)
18. Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive
uncertainty estimation using deep ensembles. In: Proceedings of the 31st Interna-
tional Conference on Neural Information Processing Systems (2017)
19. Le, M.T., Diehl, F., Brunner, T., Knol, A.: Uncertainty estimation for deep neural
object detectors in safety-critical applications. In: 2018 21st International Confer-
ence on Intelligent Transportation Systems (ITSC) (2018)
14 F. de Grancey et al.
20. Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R.J., Wasserman, L.: Distribution-free
predictive inference for regression. Journal of the American Statistical Association
113(523), 1094–1111 (2018)
21. Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyra-
mid networks for object detection. In: 2017 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR) (2017)
22. Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object
detection. IEEE Transactions on Pattern Analysis Machine Intelligence 42(02)
(2020)
23. Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona,
P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in
context. In: Computer Vision – ECCV 2014. Springer International Publishing
(2014)
24. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: Ssd:
Single shot multibox detector (2016)
25. Lyu, Z., Gutierrez, N., Rajguru, A., Beksi, W.J.: Probabilistic object detection via
deep ensembles. In: European Conference on Computer Vision. Springer (2020)
26. Miller, D., Dayoub, F., Milford, M., Sunderhauf, N.: Evaluating merging strategies
for sampling-based uncertainty techniques in object detection. In: 2019 Interna-
tional Conference on Robotics and Automation (ICRA) (2019)
27. Miller, D., Nicholson, L., Dayoub, F., Sünderhauf, N.: Dropout sampling for ro-
bust object detection in open-set conditions. In: 2018 International Conference on
Robotics and Automation (ICRA) (2018)
28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified,
real-time object detection. In: 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) (2016)
29. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: 2017 IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR) (2017)
30. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement (2018),
arXiv:1804.02767
31. Rucklidge, W.J.: Efficiently locating objects using the hausdorff distance. Interna-
tional Journal of Computer Vision 24, 251–270 (1997)
32. Tibshirani, R.J., Barber, R.F., Candes, E.J., Ramdas, A.: Conformal prediction un-
der covariate shift. In: Proceedings of the 33rd International Conference on Neural
Information Processing Systems (2019)
33. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World.
Springer-Verlag, Berlin, Heidelberg (2005)
34. Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.:
BDD100K: a diverse driving dataset for heterogeneous multitask learning (2018),
arXiv:1805.04687