BRISK: Binary Robust Invariant Scalable Keypoints
BRISK: Binary Robust Invariant Scalable Keypoints
1
points lying on appropriately scaled concentric circles is Probably the most appealing features at the moment are
applied at the neighborhood of each keypoint to retrieve the SURF [2], which have been demonstrated to be signif-
gray values: processing local intensity gradients, the fea- icantly faster than SIFT. SURF detection uses the determi-
ture characteristic direction is determined. Finally, the nant of the Hessian matrix (blob detector), while the de-
oriented BRISK sampling pattern is used to obtain pair- scription is done by summing Haar wavelet responses at the
wise brightness comparison results which are assembled region of interest. While demonstrating impressive timings
into the binary BRISK descriptor. with respect to the state-of-the-art, SURF are, in terms of
speed, still orders of magnitude away from the fastest, yet
Once generated, the BRISK keypoints can be matched limited quality features currently available.
very efficiently thanks to the binary nature of the descriptor. In this paper, we present a novel methodology dubbed
With a strong focus on efficiency of computation, BRISK ‘BRISK’ for high-quality, fast keypoint detection, descrip-
also exploits the speed savings offered in the SSE instruc- tion and matching. As suggested by the name, the method
tion set widely supported on today’s architectures. is rotation as well as scale invariant to a significant extent,
achieving performance comparable to the state-of-the-art
2. Related Work while dramatically reducing computational cost. Follow-
ing a description of the approach, we present experimen-
Identifying local interest points to be used for image
tal results performed on the benchmark datasets and using
matching can be traced a long way back in the literature,
the standardized evaluation method of [12, 13]. Namely,
with Harris and Stephens [7] proposing one of the earli-
we present evaluation of BRISK with respect to SURF and
est and probably most well-known corner detectors. The
SIFT which are widely accepted as a standard of compari-
seminal work of Mikolajzyk et al.[13] presented a compre-
son under common image transformations.
hensive evaluation of the most competent detection meth-
ods at the time, which revealed no single all-purpose de- 3. BRISK: The Method
tector but rather the complementary properties of the differ-
ent approaches depending on the context of the application. In this section, we describe the key stages in BRISK,
The more recent FAST criterion [14] for keypoint detection namely feature detection, descriptor composition and key-
has become increasingly popular in state-of-the-art methods point matching to the level of detail that the motivated
with hard real-time constraints, with AGAST [10] extend- reader can understand and reproduce. It is important to
ing this work for improved performance. note that the modularity of the method allows the use of
Amongst the best quality features currently in the litera- the BRISK detector in combination with any other keypoint
ture is the SIFT [9]. The high descriptive power and robust- descriptor and vice versa, optimizing for the desired perfor-
ness to illumination and viewpoint changes has rated the mance and the task at hand.
SIFT descriptor at the top of the rankings list in the survey
3.1. Scale-Space Keypoint Detection
in [11]. However, the high dimensionality of this descriptor
makes SIFT prohibitively slow. PCA-SIFT [8] reduced the With the focus on efficiency of computation, our detec-
descriptor from 128 to 36 dimensions, compromising how- tion methodology is inspired by the work of Mair et al.[10]
ever its distinctiveness and increasing the time for descrip- for detecting regions of interest in the image. Their AGAST
tor formation which almost annihilates the increased speed is essentially an extension for accelerated performance of
of matching. The GLOH descriptor [12] is also worth not- the now popular FAST, proven to be a very efficient basis
ing here, as it belongs to the family of SIFT-like methods for feature extraction. With the aim of achieving invariance
and has been shown to be more distinctive but also more to scale which is crucial for high-quality keypoints, we go
expensive to compute than SIFT. a step further by searching for maxima not only in the im-
The growing demand for high-quality, high-speed fea- age plane, but also in scale-space using the FAST score s as
tures has led to more research towards algorithms able to a measure for saliency. Despite discretizing the scale axis
process richer data at higher rates. Notable is the work at coarser intervals than in alternative high-performance de-
of Agrawal et al.[1] who apply a center-symmetric local tectors (e.g. the Fast-Hessian [2]), the BRISK detector es-
binary pattern as an alternative to SIFT’s orientation his- timates the true scale of each keypoint in the continuous
tograms approach. The most recent BRIEF [4] is designed scale-space.
for super-fast description and matching and consists of a In the BRISK framework, the scale-space pyramid lay-
binary string containing the results of simple image inten- ers consist of n octaves ci and n intra-octaves di , for
sity comparisons at random pre-determined pixel locations. i = {0, 1, . . . , n − 1} and typically n = 4. The oc-
Despite the simplicity and efficiency of this approach, the taves are formed by progressively half-sampling the orig-
method is very sensitive to image rotation and scale changes inal image (corresponding to c0 ). Each intra-octave di is lo-
restricting its application to general tasks. cated in-between layers ci and ci+1 (as illustrated in Figure
log2(t) t: scale
1). The first intra-octave d0 is obtained by downsampling
octave ci+1
the original image c0 by a factor of 1.5, while the rest of
i+1
the intra-octave layers are derived by successive halfsam-
pling. Therefore, if t denotes scale then t(ci ) = 2i and
intra-octave di
t(di ) = 2i · 1.5.
boring FAST scores s in the same layer. The score s is Figure 1. Scale-space interest point detection: a keypoint (i.e. saliency
maximum) is identified at octave ci by analyzing the 8 neighboring
defined as the maximum threshold still considering an im- saliency scores in ci as well as in the corresponding scores-patches in
age point a corner. Secondly, the scores in the layer above the immediately-neighboring layers above and below. In all three layers
and below will need to be lower as well. We check inside of interest, the local saliency maximum is sub-pixel refined before a 1D
parabola is fitted along the scale-axis to determine the true scale of the
equally sized square patches: the side-length is chosen to be keypoint. The location of the keypoint is then also re-interpolated between
2 pixels in the layer with the suspected maximum. Since the the patch maxima closest to the determined scale.
neighboring layers (and therefore its FAST scores) are rep-
resented with a different discretization, some interpolation
is applied at the boundaries of the patch. Figure 1 depicts
an example of this sampling and the maxima search.
−5
The long-distance pairs are used for this computation, based
on the assumption that local gradients annihilate each other
−10 and are thus not necessary in the global gradient determina-
tion – this was also confirmed by experimenting with varia-
−15
tion of the distance threshold δmin .
−15 −10 −5 0 5 10 15
Recall [−]
Recall [−]
187 0.6 0.6
20 20
0.4 0.4
0 0
20 30 40 50 60 20 30 40 50 60 0.2 0.2
Viewpoint change [deg] Viewpoint change [deg]
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) Graffiti (b) Wall 1−Precision [−] 1−Precision [−]
Recall [−]
Recall [−]
40 40 0.6 0.6
20 20 0.4 0.4
0 0 0.2 0.2
1 1.5 2 2.5 2 3 4 5 6
scale change [−] Second image number [−] 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(c) Boat (d) Leuven 1−Precision [−] 1−Precision [−]
Figure 5. Repeatability scores for 50% overlap error of the BRISK and the (c) Image Rotation of 60◦ on Wall 1. (d) Boat 1-4
SURF detector. The resulting similarity correspondences (approximately
matched between the detectors) are given as numbers above the bars. 1 1
SIFT(660), SURF(465), BRISK(476) SIFT(2670), SURF(2714), BRISK(2712)
0.8 0.8
Recall [−]
Recall [−]
0.6 0.6
equivalent repeatability as the SURF detector as long as the 0.4 0.4
image transformations applied are not too large. Given the
0.2 0.2
clear advantage in computational cost of the BRISK over
0 0
the SURF detector however, the proposed method consti- 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
1−Precision [−] 1−Precision [−]
tutes a strong competitor, even if the performance at larger
(e) Bikes 1-4 (f) Trees 1-4
transformations appears to be slightly inferior.
1 1
4.2. Evaluation and Comparison of the Overall 0.8
SIFT(458), SURF(467), BRISK(467)
0.8
SIFT(1555), SURF(1562), BRISK(1645)
BRISK Algorithm
Recall [−]
Recall [−]
0.6 0.6
Since our work aims at providing an overall fast as well 0.4 0.4
as robust detection, description and matching, we evaluate 0.2 0.2
the joint performance of all these stages in BRISK and com- 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
pare it to SIFT and SURF. Figure 6 shows the precision- 1−Precision [−] 1−Precision [−]
recall curves using threshold-based similarity matching for
(g) Leuven 1-4 (h) Ubc 1-4
a selection of image pairs of different datasets. Again, for Figure 6. Evaluation results showing precision-recall curves (of all detec-
this assessment we adapt the detection thresholds such that tion, extraction and matching stages jointly) for BRISK, SURF and SIFT.
they output an approximately equal number of correspon- Results are shown for viewpoint changes (a and b), pure in-plane rotation
(c), zoom and rotation (d), blur (e and f), brightness changes (g) and JPEG
dences in the spirit of fairness. Note that the evaluation compression (h). The number of similarity correspondences are indicated
results here are different from the ones in [3], where all de- in the figures per algorithm. The red dotted line in (f) shows the perfor-
scriptors are extracted on the same regions (obtained with mance of BRISK descriptors extracted from SURF regions, yielding 2274
the Fast-Hessian detector). correspondences. Overall, BRISK exhibits competitive performance in all
cases and even outperforms SIFT and SURF in some cases.
As illustrated in Figure 6, BRISK performs competi-
tively with SIFT and SURF in all datasets and even out-
performs the other two in some cases. The reduced perfor-
mance of BRISK in the Trees dataset is attributed to the de- tive to blur than blob-like detectors. We therefore also show
tector performance: while SURF detects 2606 and 2624 re- the evaluation of the BRISK descriptors extracted from the
gions in the images, respectively, BRISK only detects 2004 SURF regions for the Trees dataset, demonstrating again
regions in image 4 compared to 5949 found in image 1 that the descriptor performance is comparable to SURF.
to achieve the approximately same number of correspon- Evidently, SIFT performs significantly worse in the
dences. The same holds for the other blur dataset, Bikes: Trees, Boat, and Ubc datasets, which can be explained with
saliency as assessed with FAST is inherently more sensi- the limited detector repeatability in these cases. On the
1 1 SIFT SURF BRISK
0.8 0.8 Detection threshold 4.4 45700 67
Number of points 1851 1557 1051
Recall [−]
Recall [−]
0.6 0.6
BRIEF64 Detection time [ms] 1611 107.9 17.20
0.4 SU−BRISK 0.4
Description time [ms] 9784 559.1 22.08
S−BRISK
0.2 0.2
BRISK Total time [ms] 11395 667.0 39.28
0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1
Time per point (ms) 6.156 0.4284 0.03737
1−Precision [−] 1−Precision [−]
Table 1. Detection and extraction timings for the first image in the Graffiti
(a) Wall 1-2 (b) Boat 1-2 sequence (size: 800 × 640 pixels).
Figure 7. Comparison of different BRISK versions to 64 byte BRIEF.
BRIEF, as well as both SU-BRISK (single-scale, unrotated) and S-BRISK
(single-scale) are extracted from AGAST keypoints detected in the original SIFT SURF BRISK
image. Notice that the BRISK pattern was scaled such that it matches the
BRIEF patch size. The standard version of BRISK had to be extracted
Points in first image 1851 1557 1051
from our scale-invariant corner detection with adapted threshold to match Points in second image 2347 1888 1385
the number of correspondences: they are 850 in the Wall pair and 1530 in Total time [ms] 291.6 194.6 29.92
the Boat pair. Time per comparison [ns] 67.12 66.20 20.55
Table 2. Matching timings for the Graffiti image 1 and 3 setup.
Amongst avenues for further research into BRISK, we [6] A. J. Davison, N. D. Molton, I. Reid, and O. Stasse.
aim to explore alternatives to the scale-space maxima search MonoSLAM: Real-time single camera SLAM. IEEE
of saliency scores to yield higher repeatability whilst main- Transactions on Pattern Analysis and Machine Intelligence
taining speed. Furthermore, we aim at analyzing both theo- (PAMI), 29(6):1052–1067, 2007. 1
retically and experimentally the BRISK pattern and the con- [7] C. Harris and M. Stephens. A combined corner and edge de-
figuration of comparisons, such that the information content tector. In Proceedings of 4th Alvey Vision Conference, pages
147–151, 1988. 2
and/or robustness of the descriptor is maximized.
[8] Y. Ke and R. Sukthankar. PCA-SIFT: A More Distinctive
Representation for Local Image Descriptors. 2004. 2
6. Acknowledgements
[9] D. G. Lowe. Distinctive image features from scale-invariant
This research was supported by the Autonomous Sys- keypoints. International Journal of Computer Vision (IJCV),
tems Lab, ETH Zurich and the EC’s 7th Framework 60(2):91–110, 2004. 1, 2
Programme (FP7/2001-2013) under grant agreement no. [10] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and
231855 (sFly). We are grateful to Simon Lynen and Davide G. Hirzinger. Adaptive and generic corner detection based
on the accelerated segment test. In Proceedings of the Eu-
Scaramuzza for their valuable inputs, as well as to many
ropean Conference on Computer Vision (ECCV), 2010. 2,
other colleagues at ETH Zurich for very helpful discussions.
5
[11] K. Mikolajczyk and C. Schmid. A performance evaluation
References of local descriptors. In Proceedings of the IEEE Conference
[1] M. Agrawal, K. Konolige, and M. R. Blas. CenSurE: Center on Computer Vision and Pattern Recognition (CVPR), 2003.
surround extremas for realtime feature detection and match- 2
ing. In Proceedings of the European Conference on Com- [12] K. Mikolajczyk and C. Schmid. A performance evaluation
puter Vision (ECCV), 2008. 2 of local descriptors. IEEE Transactions on Pattern Analysis
[2] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. SURF: and Machine Intelligence (PAMI), 2:1115–1125, 2005. 2, 5
Speeded up robust features. Computer Vision and Image Un- [13] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman,
derstanding (CVIU), 110(3):346–359, 2008. 1, 2, 5 J. Matas, F. Schaffalitzky, T. Kadir, and L. Gool. A com-
[3] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up parison of affine region detectors. International Journal of
robust features. In Proceedings of the European Conference Computer Vision (IJCV), 65(1):43–72, 2005. 2, 5
on Computer Vision (ECCV), 2006. 6 [14] E. Rosten and T. Drummond. Machine learning for high-
[4] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: speed corner detection. In Proceedings of the European Con-
Binary Robust Independent Elementary Features. In Pro- ference on Computer Vision (ECCV), 2006. 1, 2
ceedings of the European Conference on Computer Vision [15] E. Tola, V. Lepetit, and P. Fua. Daisy: an Efficient Dense
(ECCV), 2010. 1, 2, 3, 4 Descriptor Applied to Wide Baseline Stereo. IEEE Transac-
[5] M. Chli and A. J. Davison. Active Matching. In Proceedings tions on Pattern Analysis and Machine Intelligence (PAMI),
of the European Conference on Computer Vision (ECCV), 32(5):815–830, 2010. 4
2008. 1