Logodet-3K: A Large-Scale Image Dataset For Logo Detection
Logodet-3K: A Large-Scale Image Dataset For Logo Detection
shown in Fig. 1, our proposed LogoDet-3K dataset far exceeds several baseline models and our method, and further verify the
the existing logo dataset both in the number of categories and effectiveness of our method and better generalization ability
the number of images. Fig. 2 gives some image samples from of LogoDet-3K on logo detection and retrieval tasks.
various categories of LogoDet-3K. In addition, imbalanced The rest of this paper is organized as follows. Section
samples and very small logo objects make this dataset more II reviews related work. Section III given the process of
challenging. datasets construction and statistics. And Section IV elaborates
We further propose a strong baseline method Logo-Yolo the proposed large-scale logo detection method. Experimental
based on the network architecture YOLOv3 for logo detec- results and analysis are reported in Section V. Finally, we
tion. Logo-Yolo takes characteristics of LogoDet-3K, such conclude the paper and give future work in Section VI.
as various logo object sizes, sample imbalance and different
background scenarios into consideration, and incorporates II. R ELATED W ORK
Focal Loss [22] into the state-of-the-art detection framework Our work is closely related to two research fields: (1) logo
YOLOv3 for logo detection. CIoU loss [23] is further adopted detection datasets and (2) logo detection researches.
to obtain more accurate regression results. Finally, we con-
duct comprehensive experiments on LogoDet-3K using several
state-of-the-art object detection models and our proposed A. Logo Detection Datasets
method, as well as ablation study and qualitative analysis. The large-scale dataset is an important factor for supporting
This paper has three main contributions. (1) We introduce a advanced object detection algorithms, especially in the deep
new large-scale logo dataset LogoDet-3K1 with 3,000 classes, learning era, and it is no exception in logo detection. The first
194,261 objects and 158,652 images, which is the largest benchmark for logo detection is the BelgaLogos dataset [16],
logo classes with full annotation. (2) We propose a strong which contains only 37 logo categories totaling 1,000 images.
baseline method Logo-Yolo, which adopts the YOLOv3 de- Over the years, some larger logo datasets such as FlickrLogos-
tection framework, and combines Focal loss and CIoU loss 32 [2] and Logos in the wild [24] have been proposed.
to achieve better detection performance on LogoDet-3K. (3) However, these datasets lack the diversity and coverage in
We perform extensive experiments on LogoDet-3K by using logo categories and images. For example, FlickrLogos-32
only consists of 32 logo categories with 70 images each
1 We will release the dataset upon publication. category. This is far less than millions of images required
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 3
in deep learning. Some researchers constructed some larger on hand-crafted visual features (e.g. SIFT and HOG [25]) and
datasets, such as WebLogo-2M [17], LOGO-Net [25] and conventional classification models (e.g. SVM [3]). Recently,
PL2K [18]. However, WebLogo-2M is collected from online some deep learning techniques have been applied in logo
search engines and just automatically be labeled at image level detection [36], [37], [4], [38]. For example, Oliveira et al. [39]
with much noise, while PL2K and LOGO-Net are not publicly adopted pre-trained CNN models and used them as a part
available. of Fast Region-Based Convolutional Networks recognition
In order to solve the problem, we propose the LogoDet-3K, pipeline. Fehérvári et al. [18] combined metric learning and
which is a large-scale, high-coverage and high-quantity dataset basic object detection networks to achieve few-shot logo de-
with 3,000 logo categories, 158,652 images and 194,261 ob- tection. Compared with existing logo detectors, our proposed
jects. Table I summarizes the statistics of existing logo datasets Logo-Yolo is more effective for large-scale logo category and
and LogoDet-3K. We can see that LogoDet-3K has more logo sample imbalance.
categories and logo objects, which is more helpful to explore
data-driven deep learning techniques for logo detection. III. L OGO D ET-3K
A. Dataset Construction
B. Logo Detection
The construction of LogoDet-3K is comprised of three
In previous years, DPM [31] and HOG [25], are widely steps, namely logo image collection, logo image filtering and
used as traditional object detection methods. Later, with the logo object annotation. Each image is manually examined and
development of convolutional neural networks, more and more reviewed to guarantee the quality of LogoDet-3K after filtering
works start to utilize deep learning techniques, such as Faster and annotation. The dataset building process is detailed in
RCNN [13], YOLO [15] and [32] self-attention for logo the following subsections. Additionally, each logo name is
detection. In general, deep learning based object detector could assigned to one of nine super-classes based on the daily
be divided into two types: two-stage detector and single-stage need of life and the main positioning of common enterprises,
detector. The popular two-stage detectors are the series of R- namely Clothing, Food, Transportation, Electronics, Neces-
CNN like Faster RCNN [13], which introduced the region pro- sities, Leisure, Medicine, Sport and Others. In this paper,
posal network and individual blocks to improve the detection Table II gives the statistics of super classes of LogoDet-3K
performance. In contrast, the paradigm of single-stage detector dataset.
aims to be faster and more efficient solution by classifying Logo Image Collection. A large-scale logo detection
anchors directly and then refining them without proposal dataset should include comprehensive categories. Before
generation network, such as SSD [14], RetinaNet [22] and crawling logo images, we built a comprehensive logo list
YOLO series [15]. Recently, the proposed anchor-free method based on the ‘Forbes Global 2,000’2 and other famous logo
CornerNet [33] is highly acclaimed, while SNIPER [34] and lists. Finally, we collected 3,000 logo names for our logo
Cascade R-CNN [35] are introduced to further improve the vocabulary, which covers nine super-classes.
performance. Subsequently, we used the logo name from the logo vo-
In general, logo detection has little advanced as a kind cabulary as the query to crawl logo images from the Google
of generic object detection. An important reason is that the search engine. Top-500 retrieved results were kept for the logo
development of logo detection technology is limited by the size
of logo dataset. Early logo detection methods are established 2 https://www.forbes.com/global2000/list/tab:overall
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 4
Fig. 3: Multiple logo categories for some brands, where a distinction between these logo categories via adding the suffix ‘-1’, ‘-2’.
Fig. 5: The detailed statistics of LogoDet-3K about Image and object distribution in per category, the number of objects in per image and
object size in per image.
details of our dataset, we provide the statistics at the super- In addition, we adopted K-means clustering statistics to re-
class and category level. Fig. 4 shows the distribution of compute the pre-anchors size for LogoDet-3K to select the best
images for each logo in LogoDet-3K. The thicker the columnar anchor size, and introduced recent proposed CIoU loss [23] to
area in histogram, the larger the proportion. From Fig. 4, obtain more accurate regression results.
we can see that imbalanced distribution across different logo Improved Losses for Logo Detection. Fewer logo objects
categories are one characteristic of LogoDet-3K, posing a in the image produce more negative samples, leading to
challenge for effective logo detection with few samples. an imbalance between positive and negative samples. Focal
In addition, Fig. 5 summarizes the distribution of images Loss [22] is proposed to solve the problem of sample im-
and categories in LogoDet-3K. Fig. 5 (A) shows the distribu- balance. Therefore, we incorporates the Focal Loss into the
tion of the number of images for each category. Fig. 5 (B) whole loss of Logo-Yolo, the classification loss is formulated
shows the distribution of the number of objects of each as follows:
class. As we can see, there exists imbalanced distribution β
−α(1 − y 0 ) log y 0 , y=1
across different logo objects and images for different logo Focal Loss = (1)
−(1 − α)y 0β log(1 − y 0 ) , y = 0
categories. Fig. 5 (C) gives the number of objects in each
image. We can see that most images contain one or two logo where y ∈ {±1} is a ground-truth class and y 0 ∈ [0, 1]
objects. As shown in Fig. 5 (D), LogoDet-3K is composed is the model’s estimated probability by activation function.
of 4.81% small instances (area < 322 ), 29.79% medium Focus loss introduces two factors α and β, where α is used to
instances (322 <= area <= 962 ) and 65.40% large instances balance positive and negative samples, while β focuses more
(area > 962 ). The large percentage of small and medium logo on difficult samples.
objects (∼ 35%) will create another challenge to logo detection In addition, Ln -norm loss is widely adopted for bounding
on this dataset, since small logos are harder to detect. box regression, while it is not tailored to the evaluation
We also provide the statistics of logo categories, images metric (Intersection over Union (IoU)) in existing methods.
and logo objects in 9 different super classes in Fig. 6, which We further incoporate the CIoU loss [23] into the whole loss
can direct to getting the difference on numbers. The Food, of YOLOv3 to solve the problem of inconsistency between
Clothes and Necessities class are larger in objects and images the metric and the border regression on logo detection, and
compared with other classes. the IoU-based loss can be defined as,
LCIoU = 1 − IoU + RCIoU (Bpd , Bgt ) (2)
IV. A PPROACH
where RCIoU is penalty term for predicted box Bpd and target
Taking characteristics of LogoDet-3K into consideration, we box Bgt .
propose a strong baseline Logo-Yolo for logo detection, which CIoU loss considered three geometric factors in the bound-
adopted the state-of-the-art deep detector YOLOv3 as the ing box regression, including overlap area, central point dis-
backbone to cope with small-scale and multi-scale logos. Since tance and aspect ratio to solve the problem of inconsistency
the logo image contains fewer objects, there will be conducted between the metric and the border regression during logo
more negative samples and hard samples, we utilized Focal detection. Therefore, the method to minimize the normalized
Loss [22] to solve the problem of logo sample imbalance. distance between central points of two bounding boxes, and
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 6
TABLE III: Statistics of three benchmarks. TABLE IV: Statistics of three super-classes.
]Datasets ]Classes ]Images ]Objects ]Trainval ]Test ]Datasets ]Classes ]Images ]Objects ]Trainval ]Test
LogoDet-3K-1000 1,000 85,344 101,345 75,785 11,236 Food 932 53,350 64,276 47,321 6,029
LogoDet-3K-2000 2,000 116,393 136,815 103,356 13,037 Clothes 604 31,266 37,601 27,732 3,534
LogoDet-3K 3,000 158,652 194,261 142,142 16,510 Necessities 432 24,822 30,643 22,017 2,805
Fig. 8: Qualitative result comparison on LogoDet-3K between YOLOv3 and Logo-Yolo. Green boxes: ground-truth boxes. Red boxes: correct
detection boxes. yellow boxes: mistakes detection boxes.
small logo objects and fewer objects for many images in real- mistakes, such as treating a person or hamburger as logos, and
world scenarios, and the one-stage method is more suitable for thus the bounding boxes of detected logos are inaccurate, or
this case. Therefore, we use the one-stage YOLOv3 detector missing. In contrast, our method obtains better performance
as the basis of our method. both in the bounding box regression and the confidence of
We then compare the performance of Logo-Yolo with all detected logos. In particular, our method has an advantage in
baselines, and observe that Logo-Yolo achieves the best per- small logo detection, such as the detected logos in the last two
formance among these models. It’s worth noting that mAP of images in Fig. 8.
Logo-Yolo is 58.86%, 56.42% and 52.28% on three bench- In addition, Table VI gives the comparison of three super-
marks, and Logo-Yolo achieves the performance gain with classes on different methods. Compared with existing base-
3.65%, 4.10% and 3.67% compared with YOLOv3 in Table V. lines, the Logo-Yolo detector also obtains better results with
Our method Logo-Yolo detection performance achieves the 56.73%, 61.32% and 61.43% on the super classes of Food,
best result on the 1000-2000-3000 datasets, which proves the Clothes, and Necessities, respectively, which are 3.24%, 4.31%
stability of the method. and 3.75% higher than YOLOv3. This experiment also illus-
Some detection results of Logo-Yolo are given in Fig. 7, trates the effectiveness of our method. As we can see from
including the regression bounding box and the classification Table VI, the number of Necessities categories is 172 less
accuracy. The red box represents the prediction box and than the clothes categories, but relatively similar detection
the green box is the ground-truth box. Clearly, Logo-Yolo results have been obtained (61.32% vs 61.43%), indicating
can detect objects with occlusion, ambiguities and smaller, that the Necessities category dataset is more difficult to detect.
it obtains more accurate bounding box regression. And as Analyzing food logos with a large number of categories and
shown in Fig. 8, the detector YOLOv3 makes some detection images, the detection performance of the 932 food category
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 8
Fig. 9: The Precision-Recall curve of Logo-Yolo and YOLOv3. The larger the enclosing area under the curve, the better the detection effect.
Fig. 10: Left: Performance evaluation for different IoU thresholds. Right: The comparison of Logo-Yolo and YOLOv3 with increasing
iterations.
TABLE VII: Evaluation on individual modules and two modules of TABLE IX: The performance of Logo-Yolo on FlickrLogos-32 (%).
Logo-Yolo (%).
Method mAP
Model mAP Bag of Words (BoW) [5] 54.50
YOLOv3 48.61 Deep Logo [37] 74.40
YOLOv3+Pre-anchors Design 50.12 BD-FRCN-M [39] 73.50
YOLOv3+Focal Loss 49.21 Faster RCNN [13] 70.20
YOLOv3+CIoU loss 49.86 YOLO [43] 68.70
Logo-Yolo(w/o Pre-anchors Design) 49.92 YOLOv3 [15] 71.70
Logo-Yolo(w/o Focal Loss) 51.50 Logo-Yolo 74.62
Logo-Yolo(w/o CIoU loss) 50.64 Logo-Yolo (Pre-trained) 76.11
Logo-Yolo 52.28
TABLE VIII: The performance of Logo-Yolo on Top-Logo-10 (%). 1.5 percent improvement after pre-training on LogoDet-3K,
Method mAP showing better generalization ability of LogoDet-3K. We can
Faster RCNN [13] 41.80 also see similar trends on FlickrLogo-32 in Table IX. Overall,
SSD [14] 38.70 the evaluation on these two datasets verify the effectiveness
YOLO [43] 44.58 of Logo-Yolo, and also shows better generalization ability of
YOLOv3 [15] 50.10 LogoDet-3K on other logo detection datasets.
In addition, we further select QMUL-OpenLogo dataset to
Logo-Yolo 52.17
evaluate the general object detection. This dataset is the largest
Logo-Yolo (Pre-trained) 53.62
publicly available logo detection dataset, and contains 352 cat-
egories and 27,038 images. To further exploit the fine-tuning
of models. Fig. 10 (Right) shows higher performance with capability of LogoDet-3K, we analyze the difference between
increasing iterations. It can be seen that our method converges LogoDet-3K pre-trained weights and QMUL-OpenLogo pre-
at about 400,000 iterations and keeps higher accuracy than trained weights.
YOLOv3 in the training process. According to Table X, our LogoDet-3K dataset shows
strong generalization ability. Compared with YOLOv3 and
Logo-Yolo method, our fine-tuned LogoDet-3K model for
D. Ablation Study
QMUL-OpenLogo detection can significantly boost the per-
We conduct a comprehensive analysis of effects of three formance, with 1.73 points (53.69% vs 51.96%) for YOLOv3,
sub-variables and two modules from Logo-Yolo. Table VII and 2.16 points (55.37% vs 53.21%) for Logo-Yolo, the
shows an ablation study on the effects of different com- Logo-Yolo gains a 1.68 improvement (55.37% vs 53.69%).
binations of K-means, Focal Loss and CIoU loss. Firstly, The results are shown that the effectiveness of pre-trained
three modules are added to YOLOv3, and the results improve models and Logo-Yolo method. By pre-training the LogoDet-
1.51%, 0.60% and 1.25%, which proves the effectiveness of 3K dataset which removes the 352 categories from QMUL-
the Pre-anchors Design, Focal Loss and CIoU loss, respec- OpenLogo (LogoDet-3K w/o QMUL-OpenLogo), we can still
tively. Then, we conduct the two modules experiments from achieve competitive results with 52.36% on the QMUL-
Logo-Yolo. The result of Logo-Yolo is higher than Logo-Yolo OpenLogo benchmark, 0.4 points higher than the result in
without Pre-anchors Design, which explains the effectiveness YOLOv3 method, and 1.25 points for Logo-Yolo. It shows that
of two losses. Similarly, compared to Logo-Yolo without the LogoDet-3K dataset has the generalization ability. Com-
Focal Loss or CIoU loss, our proposed method achieves pared with QMUL-OpenLogo, our LogoDet-3K benchmark
improvement, which demonstrates the effectiveness of another has much higher performance gain. By involving QMUL-
two modules for Logo-Yolo. OpenLogo Pre-training before LogoDet-3K, we can slightly
improve the YOLOv3 with 0.34. For the Logo-Yolo, the
E. Generalization Ability on Logo Detection QMUL-OpenLogo pre-training before LogoDet-3K can further
To evaluate the robustness and generalization ability of bring in 0.73 points gain. The results shows LogoDet-3K
Logo-Yolo architecture and its pre-trained models, we explore contains richer logo features than QMUL-OpenLogo dataset,
other two datasets Top-Logo-10 [27] and FlickrLogos-32 [2]. which can be widely used for logo detection.
The former contains 10 unique logo classes with 70 images for
each logo class, and the latter is a popular logo dataset with F. Generalization Ability on Logo Retrieval
full annotations, comprising 8,240 images from 32 categories. For the retrieval experiments, each of the ten FlickrLogos-
Logo-Yolo (per-trained) first loades the model trained on 32 train samples for each brand serves as query sample. This
LogoDet-3K, and is then trained on the target dataset while allows to assess the statistical significance of results similar
Logo-Yolo is directly trained on the target dataset with random to a 10-fold-cross-validation strategy. As shown in Table XI
parameter initialization. the ResNet101+Litw [24] is the better logo retrieval method.
Table VIII summarizes experimental results for Top-Logo- Detected logos are described by the feature extraction network
10. We observe that our method Logo-Yolo achieves better per- outputs where three different state-of-the-art classification ar-
formance compared with other models. There is further about chitectures, namely VGG16, ResNet101 and DenseNet161,
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 10
Fig. 11: Qualitative result of some failure cases on Logo-Yolo. Green boxes denotes the ground-truth. Red boxes represent correct logo
detections, while yellow are mistakes.
G. Discussion R EFERENCES
Compared with existing methods, our proposed method ob-
[1] Y. Gao, F. Wang, H. Luan, and T.-S. Chua, “Brand data gathering from
tains better detection performance, especially in solving small live social media streams,” in International Conference on Multimedia
objects and complex backgrounds of logo images compared Retrieval, 2014, pp. 169–176.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. XX, MONTH YEAR 11
[2] S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol, “Scalable logo [28] Y. Liao, X. Lu, C. Zhang, Y. Wang, and Z. Tang, “Mutual enhancement
recognition in real-world images,” in ACM Conference on International for detection of multiple logos in sports videos,” in IEEE International
Conference on Multimedia Retrieval, 2011, pp. 1–8. Conference on Computer Vision, 2017, pp. 4856–4865.
[3] J. Revaud, M. Douze, and C. Schmid, “Correlation-based burstiness for [29] W. Z. L. Xie, Q. Tian and B. Zhang, “Fast and accurate near-duplicate
logo retrieval,” in ACM International Conference on Multimedia, 2012, image search with affinity propagation on the imageweb,” in Computer
pp. 965–968. Vision Image Understand, 2014, pp. 31–41.
[4] Y. Kalantidis, L. G. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis, [30] H. Su, X. Zhu, and S. Gong, “Open logo detection challenge,” in British
“Scalable triangulation-based logo recognition,” in ACM International Machine Vision Conference, 2018, pp. 111–119.
Conference on Multimedia Retrieval, 2011, pp. 1–7. [31] P. F. Felzenszwalb, R. B. Girshick, D. A. McAllester, and D. Ramanan,
[5] S. Romberg and R. Lienhart, “Bundle min-hashing for logo recogni- “Object detection with discriminatively trained part-based models,”
tion,” in ACM Conference on International Conference on Multimedia IEEE Transactions on Pattern Analysis and Machine Intelligence., pp.
Retrieval, 2013, pp. 113–120. 1627–1645, 2010.
[6] J. W. Yan, Wei-Qi and M. Kankanhalli, “Automatic video logo detection [32] P. Gao, K. Lu, J. Xue, L. Shao, and J. Lyu, “A coarse-to-fine facial
and removal,” Multimedia Systems, pp. 379–391, 2005. landmark detection method based on self-attention mechanism,” IEEE
[7] X. F. R. L. Y. Bao, H. Li and Q. Jia, “Region-based cnn for logo Transactions on Multimedia, pp. 1–10, 2020.
detection.” in Internet Multimedia Computing and Service, 2016, pp. [33] H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,”
319–322. in European Conference on Computer Vision, 2018, pp. 765–781.
[8] C. Eggert, D. Zecha, S. Brehm, and R. Lienhart, “Improving small [34] B. Singh, M. Najibi, and L. S. Davis, “SNIPER: efficient multi-scale
object proposals for company logo detection,” ACM on International training,” in Conference on Neural Information Processing Systems,
Conference on Multimedia Retrieval, pp. 167–174, 2017. 2018, pp. 9333–9343.
[9] L. Yang, P. Luo, C. C. Loy, and X. Tang, “A large-scale car dataset [35] Z. Cai and N. Vasconcelos, “Cascade R-CNN: delving into high quality
for fine-grained categorization and verification,” in IEEE Conference on object detection,” in IEEE Conference on Computer Vision and Pattern
Computer Vision and Pattern Recognition, 2015, pp. 3973–3981. Recognition, 2018, pp. 6154–6162.
[10] Y. Gao, Y. Zhen, H. Li, and T. Chua, “Filtering of brand-related mi- [36] S. Bianco, M. Buzzelli, D. Mazzini, and R. Schettini, “Logo recognition
croblogs using social-smooth multiview embedding,” IEEE Transactions using CNN features,” in International Conference on Image Analysis and
on Multimedia, pp. 2115–2126, 2016. Processing, 2015, pp. 438–448.
[11] L. Liu, D. Dzyabura, and N. Mizik, “Visual listening in: Extracting brand [37] F. N. Iandola, A. Shen, P. Gao, and K. Keutzer, “DeepLogo: hitting
image portrayed on social media,” in AAAI Conference on Artificial logo recognition with the deep neural network hammer,” arXiv preprint
Intelligence, 2018, pp. 71–77. arXiv:1510.02131, 2015.
[12] Z. Cheng, X. Wu, Y. Liu, and X. Hua, “Video ecommerce++: Toward [38] H. Su, S. Gong, and X. Zhu, “Scalable logo detection by self co-
large scale online video advertising,” IEEE Transactions on Multimedia, learning,” Pattern Recognition., p. 107003, 2020.
pp. 1170–1183, 2017. [39] G. Oliveira, X. Frazão, A. Pimentel, and B. Ribeiro, “Automatic graphic
[13] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real- logo detection via fast region-based convolutional networks,” in Inter-
time object detection with region proposal networks,” in Conference on national Joint Conference on Neural Networks, 2016, pp. 985–991.
Neural Information Processing Systems, 2015, pp. 91–99. [40] M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. A. Zisserman, “The pascal visual object classes (VOC) challenge,”
Berg, “SSD: single shot multibox detector,” in European Conference on International Journal of Computer Vision., pp. 303–338, 2010.
Computer Vision, 2016, pp. 21–37. [41] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie,
[15] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” “Feature pyramid networks for object detection,” in IEEE Conference
arXiv preprint arXiv:1804.02767, 2018. on Computer Vision and Pattern Recognition, 2017, pp. 936–944.
[16] J. Neumann, H. Samet, and A. Soffer, “Integration of local and global [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
shape analysis for logo classification,” Pattern Recognition Letters., pp. large-scale image recognition,” in International Conference on Learning
1449–1457, 2002. Representations, 2015, pp. 1–14.
[17] H. Su, S. Gong, and X. Zhu, “WebLogo-2M: scalable logo detection [43] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only
by deep learning from the web,” in IEEE International Conference on look once: Unified, real-time object detection,” in IEEE Conference on
Computer Vision Workshops, 2017, pp. 270–279. Computer Vision and Pattern Recognition, 2016, pp. 779–788.
[18] I. Fehérvári and S. Appalaraju, “Scalable logo recognition using prox- [44] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in
ies,” in IEEE Winter Conference on Applications of Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition, 2017,
2019, pp. 715–725. pp. 6517–6525.
[19] J. Wang, W. Min, S. Hou, S. Ma, Y. Zheng, H. Wang, and S. Jiang,
“Logo-2K+: a large-scale logo dataset for scalable logo classification,”
in AAAI Conference on Artificial Intelligence, 2020, pp. 6194–6201.
[20] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: a large-
scale hierarchical image database,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2009, pp. 248–255.
[21] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in
context,” in European Conference on Computer Vision, 2014, pp. 740–
755.
[22] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” in IEEE International Conference on Computer
Vision, 2017, pp. 2999–3007.
[23] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU
loss: Faster and better learning for bounding box regression,” in AAAI
Conference on Artificial Intelligence, 2020, pp. 12 993–13 000.
[24] A. Tüzkö, C. Herrmann, D. Manger, and J. Beyerer, “Open set logo
detection and retrieval,” in Conference on Computer Vision, Imaging
and Computer Graphics Theory and Applications, 2018, pp. 284–292.
[25] S. C. Hoi, X. Wu, H. Liu, Y. Wu, H. Wang, H. Xue, and Q. Wu, “LOGO-
Net: large-scale deep logo detection and brand recognition with deep
region-based convolutional networks,” arXiv preprint arXiv:1511.02462,
2015.
[26] S. Bianco, M. Buzzelli, D. Mazzini, and R. Schettini, “Deep learning
for logo recognition,” Neurocomputing., pp. 23–30, 2017.
[27] H. Su, X. Zhu, and S. Gong, “Deep learning logo detection with data
expansion by synthesising context,” in IEEE Winter Conference on
Applications of Computer Vision, 2017, pp. 530–539.