Zeni_Distilling_Knowledge_From_Refinement_in_Multiple_Instance_Detection_Networks_CVPRW_2020_paper

This document discusses advancements in Weakly Supervised Object Detection (WSOD) using Multiple Instance Detection Networks (MIDN) to improve object detection accuracy with minimal annotation. The authors propose a method called Boosted-OICR, which incorporates knowledge distillation and an adaptive supervision aggregation function to enhance the performance of existing models. Experimental results demonstrate significant improvements in detection metrics on the Pascal VOC 2007 dataset, making the proposed approach competitive with state-of-the-art methods.

Uploaded by

luis.zeni

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Zeni_Distilling_Knowledge_From_Refinement_in_Multiple_Instance_Detection_Networks_CVPRW_2020_paper

Uploaded by

luis.zeni

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Distilling Knowledge from Refinement in Multiple Instance Detection Networks

Luis Felipe Zeni Claudio R. Jung

[email protected] [email protected]

Institute of Informatics, Federal University of Rio Grande do Sul, Brazil

Abstract where the object detector is trained using only image cate-
gory annotations (presence or absence of interest classes in
Weakly supervised object detection (WSOD) aims to the image), which is much easier and faster to generate.
tackle the object detection problem using only labeled im- Most WSOD methods [2, 1, 19, 5, 18, 23] follow the
age categories as supervision. A common approach used Multiple Instance Learning (MIL) pipeline [6] to train de-
in WSOD to deal with the lack of localization information tectors using only image category level annotations. In
is Multiple Instance Learning, and in recent years meth- the adaptation of MIL to the WSOD task, each image is
ods started adopting Multiple Instance Detection Networks considered a bag of positive and negative object propos-
(MIDN), which allows training in an end-to-end fashion. In als generated by object proposal methods such as Selective
general, these methods work by selecting the best instance Search [22] or Edge Boxes [27]. The training process in
from a pool of candidates and then aggregating other in- the MIL framework encompasses two steps: (i) to train an
stances based on similarity. In this work, we claim that instance selector to compute the object score of each object
carefully selecting the aggregation criteria can consider- proposal; (ii) to select the proposal with the highest score
ably improve the accuracy of the learned detector. We start and use it to mine positive instances and train detector es-
by proposing an additional refinement step to an existing timators. The majority of recent methods explore features
approach (OICR), which we call refinement knowledge dis- extracted by Convolutional Neural Networks (CNN) as an
tillation. Then, we present an adaptive supervision aggre- off-the-shelf feature extractor [2, 10] or train an end-to-end
gation function that dynamically changes the aggregation Multiple Instance Detection Network (MIDN) [1].
criteria for selecting boxes related to one of the ground- The lack of localization supervision during the training
truth classes, background, or even ignored during the gen- process, as expected, makes detection accuracy of WSOD
eration of each refinement module supervision. Experi- methods worse than its supervised counterparts. However,
ments in Pascal VOC 2007 demonstrate that our Knowledge the promise of a lower annotation cost attracted the efforts
Distillation and smooth aggregation function significantly of many researchers to WSOD, and significant improve-
improves the performance of OICR in the weakly supervised ments were achieved in recent years exploring a variety of
object detection and weakly supervised object localization strategies [2, 19, 20, 18, 23].
tasks. These improvements make the Boosted-OICR com- In this paper, we focused on the instance mining step
petitive again versus other state-of-the-art approaches. of MIL-based methods, and used a modification of an ex-
isting baseline approach as a proof-of-concept. More pre-
cisely, we propose improvements to boost the performance
1. Introduction of OICR, which we call Boosted-OICR (BOICR). We first
observed that it is possible to extract extra information
Supervised object detection has been achieving increas- from the refinement modules to boost the detection mAP
ingly better results in terms of accuracy and speed along the of OICR, which we call refinement knowledge distillation.
past years [13, 11]. The main drawback of these methods is We also propose an adaptive supervision aggregation func-
the need for annotated bounding boxes, which is a tedious, tion that dynamically changes the IoU threshold to select
error-prone, time-consuming, and expensive task. The an- boxes that will be aggregated as belonging to one of the
notation cost directly impacts the viability of deployment of ground-truth class, background, or ignored during the gen-
these detectors in real-world applications, particularly when eration of each refinement module supervision. The selec-
starting from scratch for a specific application. One ap- tion process follows the principle that at the beginning of the
proach that researchers are exploring to alleviate the annota- training is better to aggregate boxes with small IoU (since
tion cost is Weakly Supervised Object Detection (WSOD), the best instance is typically small and comprehends a small

1
portion of the object, such as the face for a person or cat). one supervision box is interesting because, usually, objects
To avoid an overgrowth of the object-related proposals, the can have multiple parts and also have multiple instances
IoU threshold is tightened as the training phase advances. present in the image. However, a limitation of the clustering
We also embedded an adapted version of the “trick” pro- process is that it increases the computational cost making
posed in [20], which ignores boxes with small intersection the whole training process slower. Our Boosted-OICR has
in the refinement losses. We evaluate our method in Pascal a better mAP result than [18] without using the clustering
VOC 2007, and our approach presents competitive state-of- process.
art results both in detection mAP and CorLoc mAP. Diba et. al. [5] proposed a three-stage cascaded method
Our main contributions in this paper are the introduction that mines boxes from Class Activation Maps (CAM). The
of: i) a module to distill extra knowledge from refinement first stage is inspired by [26], which uses a fully convolu-
agents; and ii) an adaptive supervision aggregation function tional CNN with global average pooling (GAP) to create
to mine candidate instances. Next, we present the state-of- the CAMs in conjunction with the classification scores. The
the-art on WSOD, and then describe the proposed method- second stage uses the CAM from the first stage as supervi-
ology with the experimental results and conclusions. sion to generate a segmentation map that is used to select
a set of candidate bounding boxes using the connective al-
2. Related Work gorithm from [26]. Finally, the features of the candidate
There is a considerable number of WSOD works that boxes are extracted by an SPP layer [9], and a MIL algo-
precede the CNN era [14, 16, 17]. However, we focus on rithm is applied to select the best candidate boxes for each
CNN- based methods as all state-of-the-art methods rely on class. In the same direction, Wei et al. [25] introduced a
CNN architectures. The adoption of CNN features was not method that uses CAMs to mine tight object boxes by ex-
immediate, and initial works started combining the CNN ploiting segmentation confidence maps. The segmentation
features with features extracted by other kinds of feature confidence maps are employed to evaluate the objectness
descriptors. Cinbis et. al. [2] proposed a multi-fold mul- scores of proposals according to two properties – purity and
tiple instance learning training procedure, which splits the completeness –, and the detection process is based on [19].
positive instances in K training folds. The method com- Although the idea of using CAMs to guide the selection of
bines the Fisher Vector with CNN features as descriptors, the supervision boxes is interesting, the training process of
and an objectness refinement is proposed to improve local- [5, 25] is overly complex.
ization accuracy. Since a pre-trained CNN is only used as a Wan et al. [24] proposed a min-entropy latent model to
feature extractor, its weights are not fine-tuned, which can measure the randomness of object localization. The learn-
lead to lower accuracy. Li et al. [10] introduced a two-stage ing process operates with two network branches. The first
adaptation algorithm. The first stage fine-tunes the network branch is designated for discovering objects using a global
to collect class-specific object proposals with higher preci- min-entropy layer that defines the distribution of object
sion; the second uses confident object candidates to opti- probability. This discovery process targets at finding can-
mize the CNN representations to turn image classifiers into didate object cliques, which is a proposal with high object
object detectors gradually. A drawback of the method is the confidence. The second branch is designated to localize ob-
need for individually forwarding each region proposal into jects using a local min-entropy layer and a softmax layer.
CNN to extract features, making the whole process very The local min-entropy layer classifies the object candidates
slow. This problem is solved in more recent methods us- in a clique into pseudo objects and hard negatives by opti-
ing Spatial Pyramidal Pooling (SPP) [9]. mizing the local entropy.
Bilen et al. [1] proposed a two-stream method, where one Non-convexity is also a common problem in multiple in-
stream performs classification and the other detection. The stance learning, which might lead to sub-optimal results.
output of both streams is combined into a global scoring Wan et al. [23] introduced a continuation optimization
matrix by taking the Hadamard product of the two streams. method that uses a series of smoothed loss functions to
The classification scores are calculated by summing the val- approximate the target (desired) loss, claiming that this
ues in the proposals dimension of this matrix. Tang et smoothed process alleviates the non-convexity problem in
al. [19] improved the smoothed version of MIL proposed MIL. The authors also propose a parametric strategy, for
by [1] using an online instance classification refinement that instance, subset partition, which is combined with a deep
utilizes cascaded refinement modules to increase the detec- neural network to activate a full object extent. In contrast,
tion performance, where each refinement steep makes the Tang et al. [20] proposed a two-stage region proposal net-
detector able to detect larger objects parts during training work that explores the responses in mid-layers of a network
gradually. In [18], the refinement process of [19] is further to create object proposals. The process creates coarse pro-
improved, adding proposal clusters to select one or more posals using an objectness score metric and sliding window
supervision boxes during the training. Selecting more than boxes. Later, the coarse proposals are refined proposals us-
Figure 1: The proposed architecture and its four modules. The proposals feature extraction module uses an SSP layer
to extract features from proposals generated by selective search. The multiple instance detection network module learns
to select the best proposal instance and generates an image classification score. The instance refinement modules have k
instances, and each one learns to refine instances from its predecessor result. Finally, the knowledge distillation module
aggregates all the knowledge learned by all the K refinement agents.

ing a region-based CNN classifier, which are used to train of K refinement agents. The k th refinement agent uses as
the network proposed in [19]. supervision the output from the previous agent {k − 1},
In summary, existing WSOD approaches vary regard- and the supervision for the 1st refinement agent (k = 1)
ing the selection of candidate proposals, the strategy for comes from the instance classifier branch. The third state,
mining instances, and the underlying classification network proposed by us, utilizes the knowledge of all K refinement
that guides the supervision, which leads to different levels agents to train a new agent. We call this process knowledge
of complexity for both implementation and training times. distillation as it aims to extract extra knowledge during the
This paper focuses mostly on the instance selection part, refinement process.
and we used the continuation function proposed in [23] In this section, we will explain all the employed stages in
as inspiration to adaptively select positive and negative in- detail. Also, in section 3.4, we explain the adaptive supervi-
stances. We also present and additional step to the refine- sion aggregation function that is employed by all refinement
ment supervision of [19]. The proposed method is presented agents during the learning process.
next.
3.1. Instance selection
3. The Proposed Approach Following [19], we use the method proposed by [1] be-
cause of its effectiveness and implementation convenience.
Since we propose improvements to boost OICR’s The instance selection works by branching the proposal fea-
pipeline [19], we will try to follow the same notation of ture vectors into two streams, and each stream starts with an
the original paper, and Fig. 1 shows a high-level diagram fc layer to produce two matrices xc , xd ∈ RC×|R| , where
of all stages of the proposed architecture. The first stage C is the number of classes and |R| is the number of propos-
aims to extract feature vectors from a given image, and can- als. A softmax function is applied to both matrices along
didate proposals are extracted using selective search [22]. different dimensions, yielding
The image and the extracted proposals feed a CNN back-
c d
bone with SPP to produce a fixed-size feature map to each exij exij
proposal. The proposals feature maps are converted to pro- σ(xc )]ij = PC xckj
, σ(xd )]ij = P|R| . (1)
xcik
k=1 e k=1 e
posal feature vectors using two fully connected (fc) layers,
which are branched into three different stages. The two first The two streams are then combined to generate pro-
stages are similar to [19] stages, where the first one trains posal scores using Hadamard (element-wise) matrix prod-
a basic instance classifier, and the second stage trains a set uct, yielding xR = σ(xc ) ⊙ σ(xd ). Finally, the classifica-
To build Yjk , first the proposal with highest score is selected
from the agent k − 1th supervision, sa given in Eq. (3).

jck−1 = arg max xcr

R(k−1)
. (3)
r

The highest score proposal is labeled as belonging to class

k k ′
λ = 0.5 λ = 0.25 c, i.e., ycj k−1 = 1 and y ′ k−1 = 0, c 6= c. Next, proposals
c cjc
with high overlap with jck−1 are labeled as belonging to the
same class of jck−1 , otherwise the adjacent proposals are
labeled as background. More precisely, this assignment is
given by
k
if IoU (jck−1 , jcj

c, )≥λ
c∗ kj = , (4)
C + 1, otherwise
λ = 0.1 λ = 0.01
where λ is the IoU threshold. We claim in this work that
Figure 2: Effect of changing the IoU threshold λ for in- selecting a fixed value for λ might not be the best choice,
stance selection. Green boxes are denote the supervision, and present our dynamic threshold in Section 3.4. Each ycj k
blue boxes pass the threshold (selected) and red boxes fail ∗k k
is updated using c j , that is, yc∗k j = 1. Meanwhile, if there
(not selecetd). j
is no object c in the image, all values are set to zero, i.e.,
k
ycj = 0.
tion score φc ∈ (0, 1) for class c is obtained by by summing Now that ycj k
is ready it can be used as supervision to
P|R|
over proposal dimensions, i.e,. φc = r=1 xR cr . We train train the k th refine agent using the loss function in Eq. 5.
the instance classifier using multi-class cross entropy loss,
|R| C+1
defined as 1 XX k k
LK
agent = − w y log xRk
cr , (5)
C |R| r=1 c=1 r cr
X
Lclass = − yc log φc + (1 − yc ) log(1 − φc ), (2)
c=1 where wrk is a weight term introduced to reduce noise dur-
Rk−1
ing the supervision and is obtained as wrk = xcj k−1 . More
c
where yc =∈ 0, 1 indicates if the image contains any in- details can be found in [19].
stance of class c in the image. More details can be found
in [1, 19] 3.3. Knowledge distillation module

3.2. Classifier refinement agents The motivation behind cascading K refinement agents
in [19] is that it allows the detector to gradually learn larger
To refine the outputs of the instance classifier, we use the parts of objects, starting from the best instance only. How-
online labeling and refinement strategy proposed by [19]. ever, we can observe that the supervision generated by a
Here we refer to each k th refinement pass as k th refinement k th agent will not be directly used by the k + 2th agent.
agent. In contrast with the instance classifier, each refine- This happens because agent k + 1 will learn with the super-
ment agent outputs an additional dimension for background vision k and will pass its own supervision to the next agent
in its score vector xRk
j ∈ R(C+1)×1 , k ∈ 1, 2, ..., K, where k + 2. In other words, during the agent supervision process,
the k is the index of the agent, K is the total of agents, some knowledge could be lost between the connections of
and the C + 1th dimension relates to the background. The the agents. We try to recover this information loss using
score vector from the instance classifier is represented here our knowledge distillation module. The distillation agent is
as xR0
j ∈ RC×1 , and is used to initialize the refinements. a special kind of agent that learns using all the K outputs
To obtain xRkj for k > 0, the feature vector related to the as supervision. In reality, this agent only differs in the su-
proposals is passed through a single fc layer, and a softmax pervision part when compared with a standard refinement
layer is applied over class dimension. agent.
Each agent needs some kind of supervision to learn how The distillation agent also outputs a score vector in the
to separate the proposals related to the background from format xDkj ∈ R(C+1)×1 . To obtain xDk j , the proposals-
those related to ground-truth classes. Thus, the supervision related feature vector is passed through a single fc layer,
for agent k is obtained from the previous agent xR(k−1) and and a softmax layer is applied over the class dimension.
a supervision label vector is created for each proposal j in The supervision process of the distillation agent, instead
the format Yjk = [y1j k k
, y2j k
, · · · , y(C+1),j ]T ∈ R(C+1)×1 . of getting the supervision from a previous agent, uses all
process. The function should be monotonically increasing,
such that more candidates are aggregated in the beginning
and less at the end. During our experiments, we evaluated a
set of different adaptive supervision aggregation functions,
and the best results were archived using the following func-
tion, also explored by C-MIL in a different context [23]:
1 log(s + lb ) − log lb
λ= , (7)
2 log(S + lb ) − log lb
where s is the current training step, S is the total of training
Figure 3: A visual example of instance mining for steeps, and lb defines the velocity that the curve grows.
“chicken” class, where the green box is the best instance.
Boxes in blue present large IoU, in red present small (but
not zero) IoU, and in yellow, the IoU is zero.

refinement agents outputs as supervision. More precisely, it

is computed by averaging the outputs of the K refinement
agents outputs:
K
1 X Rk
xDcj = xcj . (6)
K
k=1

Using xD Figure 4: A visual interpretation of the proposed adaptive

cj as the input to the supervision, the remaining
process is similar to the described in section 3.2 and the loss supervision aggregation function. X-axis shows the itera-
function Ldestill is the same as the weighted softmax loss in tion step number, and Y-axis shows the IoU with the box of
Eq. (5). the highest score.

3.4. Adaptive supervision aggregation function Another deficiency of the supervision selection approach
given by Eq. (4) is that when more than one instance of
In [19], the authors experimentally chose λ = 0.5 as the
a class is present in the image, it will obligatorily include
proposal selection scheme in Eq. (4) to create the supervi-
all other instances as background in during the supervision
sion matrices wrk and ycr k
. The interpretation of this value
(since their IoU with the best instance is small – in gen-
is that only boxes with IoU > 0.5 w.r.t. the best overall
eral, null). This is a bad decision, as we do not want to
proposal are selected as belonging to the ground-truth class
lower the scoring of these instances. In Fig. 3, we present a
c. The problem with using a fixed value is that at the be-
visual example of this problem, considering the “chicken”
ginning of training, the instance selection module tends to
class. In the figure, the rectangles are the candidate propos-
select only small boxes as top score proposals, typically re-
als, with the best one shown in green. Boxes shown in blue
lated to discriminant features of the objects (e.g., the face of
indicate proposals considered similar to the best one, ac-
a person or animal, as shown in Fig. 2). As a consequence,
cording to Eq. (4), which leaves several proposals related to
only other small boxes will have IoU > 0.5 w.r.t. this box,
the chicken class (in yellow) marked as background, which
and hence only small boxes will be considered as belong-
is not desirable.
ing to the class c. Figure 2 shows the effect of changing λ,
One solution to solve the penalization of other instances
where green denotes the best proposal, and blue the similar
in the loss is to include the “trick” proposed by [20], where
proposals according to the selected threshold.
a threshold value λign is used to ignore boxes with a low
Although the goal of refinement agents is to gradually
IoU w.r.t. jck−1 in the loss. With the trick, all the instances
improve the detectors to find larger parts of objects, start-
of Fig. 3 in yellow would be ignored, and the ones in red
ing with a larger value for λ causes each agent to highlight
would be marked as background.
only small boxes in beginning of the process, and in some
In contrast to [20], where λign has a fixed value, we pro-
cases, the optimization will be stuck in small boxes during
pose to use an adaptive value similar to the scheme used
all training (especially for deformable objects). Relaxing λ
for mining positive instances. Although the choice for λign
alleviates this issue, but it also tends to include proposals
could be independent from λ, we propose a “complemen-
that are not related to the correct class.
tary” threshold selection scheme given by
Instead of using a fixed value for λ, we use an adap-
tive supervision function that changes λ during the training λign = λmax − λ, (8)
where λmax defines the starting point of the adaptive trick. ID K λ λign distillation mAP
Fig. 4 presents the visual interpretation of λ and λign 1 3 0.5 0 No 42.3
2 3 adaptive 0 No 41.6
during the supervision process. Thus, we can adapt Eq. (4) 3 3 adaptive adaptive No 46.6
to include the trick as is defined in Eq. (9), leading to 4 3 adaptive adaptive Yes 49.7
5 4 adaptive adaptive No 48.1
if IoU (jck−1 , jcj
k

 c, ) ≥ λ,
∗k
c j = C + 1, if IoU (jck−1 , jcj
k
) ≥ λign , (9) Table 1: Ablation study performance (%) on the VOC 2007.
−1, otherwise


where −1 defines indices to be ignored in the agent loss approach among WSOD methods [23, 19, 24, 18] and cre-
functions. ates a total of ten augmented images. The learning process
was done using the SGD algorithm with momentum 0.9,
3.5. Final loss function
weight decay 5e−4 , and batch size 2. We set lb = 100 and
The classification, refinement and distillations modules λmax = 0.51. The learning rate is set to 0.001 for the first
present individual loss functions. However, we train our 30K and 60K iterations and then decreases to 0.0001 in the
model using a single loss that combine the individual loss following 20K and 30K iterations, respectively, for pascal
functions given by VOC 2007 and 2012. During test time, all ten images are
passed in the network, and the outputs are averaged. As
K
X an additional result, we also trained a supervised object
L = Lclass + Ldistill + Lkagent . (10) detector by choosing top-scoring proposals as ground truth
k=1
labels, as done in [19, 18, 23]. To make a fair comparison,
. we also trained a Fast RCNN (FRCNN) [8] detection
network using the five image scales. The supervision
4. Experiments boxes are chosen by its score (larger than 0.3) and using
Boosted-OICR was evaluated on the challenging PAS- non-maxima suppression (with 30% IoU threshold).
CAL VOC 2007 and 2012 datasets [7]. Although the
ground truth bounding box annotations are present in these 4.2. Ablation experiments
datasets, we only use the (weak) classification annotations We conduct some ablation experiments to illustrate the
(presence or absence of a class in the given image). The per- effectiveness of the proposed improvements over the base-
formed evaluation is based on the two standard metrics in line method OICR [19].
WSOD, that is, mean average precision (mAP) [7] and cor- We first study the impact of using the adaptive supervi-
rect localization (CorLoc) [4]. The former provides a mea- sion aggregation function instead of fixed IoU thresholds for
sure of how well the detector adapts to all instances, while proposal mining. We display the different scenarios in Ta-
the latter indicates if the best detection is a good match. ble 1. The experiment with ID= 1 presents the results using
Both metrics utilizes PASCAL criteria of IoU > 0.5 be- the standard OICR pipeline. In the experiment ID= 2 we
tween ground truths and predicted boxes. replace the fixed λ value by the proposed adaptive aggrega-
4.1. Implementation Details tion function defined in Eq. (7), in this experiment all boxes
with IoU < λ are considered as background. As the ex-
All experiments were performed using PyTorch periment suggests, using the adaptive supervision aggrega-
1.2 [12]1 . Our method uses VGG16 [15] pre-trained on Im- tion function alone without the adaptive trick makes the re-
ageNet [3] as backbone. We replaced the last max-pooling sults worse than the OICR’s baseline. However, adding the
layer by the SPP layer, and the last FC layer and softmax adaptive trick (experiment ID=3) leads to an improvement
loss layer by the layers described in Section 3. The new of 4.3% in the final mAP, suggesting that using our adap-
layers are initialized using Gaussian distributions with tive supervision aggregation function can boost the OICR
0-mean and standard deviations 0.01. Biases are initialized detection mAP significantly.
to 0. The object proposals are extracted using Selective We also evaluated the effect of including the distillation
Search [22]. For data augmentation, the input images refinement module. In fact, one could argue that using such
were re-sized into five scales {480, 576, 688, 864, 1200} a module could produce the same result as cascading one
concerning the smallest image dimension. During training more refinement agent. To show the difference, we tested
time, the scale of the image was randomly selected, and the our method using K = 4 (and no distillation) vs. K = 3
image was randomly horizontal flipped, which is a standard with distillation, and results with distillation were consider-
1 Source code available at: http://github.com/luiszeni/ ably better (see experiments ID= 4 vs. ID= 5 in Table 1).
Boosted-OICR As we can see, adding the knowledge distillation improves
Network Method aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mAP
WSDDN [1] 46.4 58.3 35.5 25.9 14.0 66.7 53.0 39.2 8.9 41.8 26.6 38.6 44.7 59.0 10.8 17.3 40.7 49.6 56.9 50.8 39.2
OICR [19] 58.0 62.4 31.1 19.4 13.0 65.1 62.2 28.4 24.8 44.7 30.6 25.3 37.8 65.5 15.7 24.1 41.7 46.9 64.3 62.6 42.0
WCCN [5] 49.5 60.6 38.6 29.2 16.2 70.8 56.9 42.5 10.9 44.1 29.9 42.2 47.9 64.1 13.8 23.5 45.9 54.1 60.8 54.5 42.8
TS2C [25] 59.3 57.5 43.7 27.3 13.5 63.9 61.7 59.9 24.1 46.9 36.7 45.6 39.9 62.6 10.3 23.6 41.7 52.4 58.7 56.6 44.3
WeakRPN [21] 57.9 70.5 37.8 5.7 21.0 66.1 69.2 59.4 3.4 57.1 57.3 35.2 64.2 68.6 32.8 28.6 50.8 49.5 41.1 30.0 45.3
VGG16
PCL [18] 54.4 69.0 39.3 19.2 15.7 62.9 64.4 30.0 25.1 52.5 44.4 19.6 39.3 67.7 17.8 22.9 46.6 57.5 58.6 63 43.5
MELM [24] 55.6 66.9 34.2 29.1 16.4 68.8 68.1 43.0 25.0 65.6 45.3 53.2 49.6 68.6 2.0 25.4 52.5 56.8 62.1 57.1 47.3
C-MIL [23] 62.5 58.4 49.5 32.1 19.8 70.5 66.1 63.4 20.0 60.5 52.9 53.5 57.4 68.9 8.4 24.6 51.8 58.7 66.7 63.5 50.5
Ours 68.6 62.4 55.5 27.2 21.4 71.1 71.6 56.7 24.7 60.3 47.4 56.1 46.4 69.2 2.7 22.9 41.5 47.7 71.1 69.8 49.7
OICR [19] 65.5 67.2 47.2 21.6 22.1 68.0 68.5 35.9 5.7 63.1 49.5 30.3 64.7 66.1 13.0 25.6 50.0 57.1 60.2 59.0 47.0
TS2C [25] - - - - - - - - - - - - - - - - - - - - 48.0
FRCNN PCL [18] 63.2 69.9 47.9 22.6 27.3 71.0 69.1 49.6 12.0 60.1 51.5 37.3 63.3 63.9 15.8 23.6 48.8 55.3 61.2 62.1 48.8
Re-train WeakRPN [21] 63.0 69.7 40.8 11.6 27.7 70.5 74.1 58.5 10.0 66.7 60.6 34.7 75.7 70.3 25.7 26.5 55.4 56.4 55.5 54.9 50.4
C-MIL [23] 61.8 60.9 56.2 28.9 18.9 68.2 69.6 71.4 18.5 64.3 57.2 66.9 65.9 65.7 13.8 22.9 54.1 61.9 68.2 66.1 53.1
Ours 65.8 58.6 55.0 32.4 19.5 74.2 71.4 70.9 19.2 54.8 46.2 67.5 57.0 65.6 1.4 16.7 40.4 53.0 69.5 61.1 50.0

Table 2: Detection performance (%) on the VOC 2007 test set. Comparison to the state-of-the-arts.

Network Method aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mAP
WSDDN [1] 65.1 58.8 58.5 33.1 39.8 68.3 60.2 59.6 34.8 64.5 30.5 43.0 56.8 82.4 25.5 41.6 61.5 55.9 65.9 63.7 53.5
OICR [19] 81.7 80.4 48.7 49.5 32.8 81.7 85.4 40.1 40.6 79.5 35.7 33.7 60.5 88.8 21.8 57.9 76.3 59.9 75.3 81.4 60.6
WCCN [5] 83.9 72.8 64.5 44.1 40.1 65.7 82.5 58.9 33.7 72.5 25.6 53.7 67.4 77.4 26.8 49.1 68.1 27.9 64.5 55.7 56.7
TS2C [25] 84.2 74.1 61.3 52.1 32.1 76.7 82.9 66.6 42.3 70.6 39.5 57.0 61.2 88.4 9.3 54.6 72.2 60.0 65.0 70.3 61.0
VGG16 WeakRPN [21] 77.5 81.2 55.3 19.7 44.3 80.2 86.6 69.5 10.1 87.7 68.4 52.1 84.4 91.6 57.4 63.4 77.3 58.1 57.0 53.8 63.8
PCL [18] 79.6 85.5 62.2 47.9 37.0 83.8 83.4 43.0 38.3 80.1 50.6 30.9 57.8 90.8 27.0 58.2 75.3 68.5 75.7 78.9 62.7
MELM [24] - - - - - - - - - - - - - - - - - - - - 61.4
C-MIL [23] - - - - - - - - - - - - - - - - - - - - 65.0
Ours 86.7 73.3 72.4 55.3 46.9 83.2 87.5 64.5 44.6 76.7 46.4 70.9 67.0 88.0 9.6 56.4 69.1 52.4 79.8 82.8 65.7

Table 3: Localization performance (%) on the VOC 2007 trainval set. Comparison to the state-of-the-arts.

Method mAP Corloc sults generated by our WSOD method. We also re-trained
WCCN [5] 37.9 -
OICR [19] 37.9 62.1
an Fast-RCNN detector using the learned pseudo objects as
TS2C [25] 40 64.4 ground-truth, and achieved 50% mAP, as shown in Table 2,
WeakRPN [21] 40.8 64.9 which improved our method by 0.3%.
PCL [18] 40.6 63.2 Table 3 presents a comparison in localization per-
MELM [24] 42.4 -
C-MIL [23] 46.6 67.4 formance of our method and SOTA in the Pascal
Ours * 66.3 VOC 2007 train-val set. Boosted-OICR outperformed
OICR [19] (5.1%), WCCN [5] (9.0%), TS2C [25] (4.7%),
Table 4: Detection (test set) and localization (trainval set) WeakRPN [21] (1.9%), PCL [18] (3.0%), MELM [24]
performance (%) on the VOC 2012 dataset using VGG16. (4.3%), and C-MIL [23] (0.7%). The better corloc result
of our method in comparison with C-MIL suggests that C-
MIL is just a little better dealing with images with more than
the results in 1.6% mAP more than adding an extra refine- one instance (which impacts the final detection mAP). We
ment agent. We select the model utilized in the experiment also compare the localization performance of our method in
ID=4 as default to the next experiments. pascal VOC 20122 . in Table 4. Boosted-OICR presents a
competitive corloc in VOC 2012 outperforming OICR [19]
4.3. Comparison with state-of-the-art (4.2%), TS2C [25] (1.9%), WeakRPN [21] (1.4%) and
PCL [18] (3.1%), being inferior to C-MIL [23] by 1.1%
We compare our results with other state-of-the-art mAP.
(SOTA) methods in the Pascal VOC 2007 and 2012
datasets. Table 2 shows a comparison of detection perfor- 5. Conclusions
mance of our method and SOTA in the Pascal VOC 2007
test set. It can be seen that Boosted-OICR improves the In this paper, we propose two improvements to boost
original OICR paper [19] in 7.7% mAP and outperformed the online instance classifier refinement. First, we pro-
other approaches such as WCCN [5] (6.9%), TS2C [25] pose a knowledge distillation methodology that extracts ex-
(5.4%), WeakRPN [21] (4.4%), PCL [18] (6.2%), and tra knowledge from the refinement agents. Second, we pro-
MELM [24] (2.4%). Boosted-OICR was only inferior to C- pose an adaptive supervision aggregation function that im-
MIL [23] by a small value (0.8% mAP). However, Boosted- proves the way that each refinement agent learns to separate
OICR presented the highest AP results in 9 of the total 20 2 We submitted our results for VOC 2012 to the evaluation server, but
classes (aeroplane, bird, bottle, bus, car, dog, still did not get the feedback. The anonymous submission link is http:
motorbike, train and tv). Figure 5 presents some re- //host.robots.ox.ac.uk:8080/anonymous/E7JSMD.html
Figure 5: Detection examples for Pascal VOC 2007 dataset. Blue rectangles are ground-truth boxes that have at least one
detection with IoU > 0, and yellow ones are ground-truth with no detection intersection. Green boxes are correct detections
(IoU > 0.5 with ground truth), and red boxes are wrong detections. The label in each detection box is the class label and
confidence score of the detection.

class-related instances, background instances, and which (when there are occlusions), or the whole body.
instances ignore. Both contributions were built using OICR In the future, we intend to explore improvements that
as a baseline approach, and the proposed contributions were make WSOD methods to not focus on the most discrimi-
able to provide a 7.4 mAP boost over the OICR base- nated part of deformable objects such as the human face.
line method. Boosted-OICR presents competitive SOTA We further plan to explore mid-layers of the network and
results on Pascal VOC 2007 dataset, being inferior only class activation maps to create object proposals as an alter-
to [23] by a small margin (0.8% mAP). Also, Boosted- native to the selective search module.
OICR presents the highest AP results in 9 of the 20 classes,
such as airplane, bird, bottle, and train. Al- Acknowledgments
though Boosted-OICR has the best performance in these
classes, it fails in deformable objects such as person class. The authors would like to thank Brazilian funding agen-
In fact, the person class is very challenging, since the cies CNPq and CAPES (Finance Code 001), as well as
GT annotations might contain only the face or upper body NVIDIA Corporation for the donation of a Titan Xp Pas-
cal GPU used for this research.
References [15] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. arXiv
[1] Hakan Bilen and Andrea Vedaldi. Weakly Supervised Deep preprint arXiv:1409.1556, 2014. 6
Detection Networks. Proceedings of the IEEE Computer So-
[16] Parthipan Siva, Chris Russell, Tao Xiang, and Lourdes
ciety Conference on Computer Vision and Pattern Recogni-
Agapito. Looking beyond the image: Unsupervised learn-
tion, 2016-Decem:2846–2854, 2016. 1, 2, 3, 4, 7
ing for object saliency and detection. In Proceedings of the
[2] Ramazan Gokberk Cinbis, Jakob Verbeek, and Cordelia
IEEE conference on computer vision and pattern recogni-
Schmid. Weakly supervised object localization with multi-
tion, pages 3238–3245, 2013. 2
fold multiple instance learning. IEEE transactions on pattern
[17] Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, and Trevor
analysis and machine intelligence, 39(1):189–203, 2016. 1,
Darrell. Weakly-supervised discovery of visual pattern con-
2
figurations. In Advances in Neural Information Processing
[3] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
Systems, pages 1637–1645, 2014. 2
and Li Fei-Fei. Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vision and [18] Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai,
pattern recognition, pages 248–255. Ieee, 2009. 6 Wenyu Liu, and Alan Loddon Yuille. Pcl: Proposal cluster
learning for weakly supervised object detection. IEEE trans-
[4] Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari.
actions on pattern analysis and machine intelligence, 2018.
Weakly supervised localization and learning with generic
1, 2, 6, 7
knowledge. International journal of computer vision,
100(3):275–293, 2012. 6 [19] Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu.
[5] Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash, Multiple instance detection network with online instance
and Luc Van Gool. Weakly supervised cascaded convo- classifier refinement. Proceedings - 30th IEEE Conference
lutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017,
on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January:3059–3067, 2017. 1, 2, 3, 4, 5, 6, 7
2017-Janua:5131–5139, 2017. 1, 2, 7 [20] Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan,
[6] Thomas G. Dietterich, Richard H. Lathrop, and Tomas Wenyu Liu, Junzhou Huang, and Alan Yuille. Weakly su-
Lozano-Perez. Solving the multiple instance problem with pervised region proposal network and object detection. In
axis-parallel rectangles. Artificial Intelligence, 89(1-2):31– Proceedings of the European conference on computer vision
71, 1997. 1 (ECCV), pages 352–368, 2018. 1, 2, 5
[7] Mark Everingham, Luc Van Gool, Christopher KI Williams, [21] Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan,
John Winn, and Andrew Zisserman. The pascal visual object Wenyu Liu, Junzhou Huang, and Alan Yuille. Weakly Super-
classes (voc) challenge. International journal of computer vised Region Proposal Network and Object Detection. Lec-
vision, 88(2):303–338, 2010. 6 ture Notes in Computer Science (including subseries Lecture
[8] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE inter- Notes in Artificial Intelligence and Lecture Notes in Bioin-
national conference on computer vision, pages 1440–1448, formatics), 11215 LNCS:370–386, 2018. 7
2015. 6 [22] Jasper RR Uijlings, Koen EA Van De Sande, Theo Gev-
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. ers, and Arnold WM Smeulders. Selective search for ob-
Spatial pyramid pooling in deep convolutional networks for ject recognition. International journal of computer vision,
visual recognition. IEEE transactions on pattern analysis 104(2):154–171, 2013. 1, 3, 6
and machine intelligence, 37(9):1904–1916, 2015. 2 [23] Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao,
[10] Dong Li, Jia-Bin Huang, Yali Li, Shengjin Wang, and Ming- and Qixiang Ye. C-MIL: Continuation Multiple Instance
Hsuan Yang. Weakly Supervised Object Localization with Learning for Weakly Supervised Object Detection. The IEEE
Progressive Domain Adaptation. 2016 IEEE Conference Conference on Computer Vision and Pattern Recognition
on Computer Vision and Pattern Recognition (CVPR), pages (CVPR), 1:2199–2208, 2019. 1, 2, 3, 5, 6, 7, 8
3512–3520, 2016. 1, 2 [24] Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, and Qix-
[11] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian iang Ye. Min-Entropy Latent Model for Weakly Supervised
Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Object Detection. IEEE Transactions on Pattern Analysis
Berg. Ssd: Single shot multibox detector. In European con- and Machine Intelligence, pages 1–1, 2019. 2, 6, 7
ference on computer vision, pages 21–37. Springer, 2016. 1 [25] Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi,
[12] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Jinjun Xiong, Jiashi Feng, and Thomas Huang. Ts2c:
Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- Tight box mining with surrounding segmentation context for
ban Desmaison, Luca Antiga, and Adam Lerer. Automatic weakly supervised object detection. In Proceedings of the
differentiation in pytorch. 2017. 6 European Conference on Computer Vision (ECCV), pages
[13] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, 434–450, 2018. 2, 7
stronger. In Proceedings of the IEEE conference on computer [26] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva,
vision and pattern recognition, pages 7263–7271, 2017. 1 and Antonio Torralba. Learning deep features for discrimi-
[14] Olga Russakovsky, Yuanqing Lin, Kai Yu, and Li Fei-Fei. native localization. In Computer Vision and Pattern Recogni-
Object-centric spatial pooling for image classification. Com- tion (CVPR), 2016 IEEE Conference on, pages 2921–2929.
puter Vision–ECCV 2012, pages 1–15, 2012. 2 IEEE, 2016. 2
[27] C Lawrence Zitnick and Piotr Dollár. Edge boxes: Locating
object proposals from edges. In European Conference on
Computer Vision, pages 391–405. Springer, 2014. 1

CCF-Net: A Cascade Center-Based Framework Towards E Cient Human Parts Detection
No ratings yet
CCF-Net: A Cascade Center-Based Framework Towards E Cient Human Parts Detection
13 pages
IET Image Processing - 2021 - Song - Multiple Object Tracking Based On Multi Task Learning With Strip Attention
No ratings yet
IET Image Processing - 2021 - Song - Multiple Object Tracking Based On Multi Task Learning With Strip Attention
13 pages
1412.1441v3
No ratings yet
1412.1441v3
10 pages
Drones 05 00066 v3
No ratings yet
Drones 05 00066 v3
24 pages
Accurate Single Stage Detector Using Recurrent Rolling Convolution
No ratings yet
Accurate Single Stage Detector Using Recurrent Rolling Convolution
9 pages
UAV Target Detection Algorithm Based On Improved YOLOv8
No ratings yet
UAV Target Detection Algorithm Based On Improved YOLOv8
11 pages
s40747-024-01687-7
No ratings yet
s40747-024-01687-7
18 pages
Electronics 09 01235
No ratings yet
Electronics 09 01235
14 pages
A brief review and challenges of object 2020
No ratings yet
A brief review and challenges of object 2020
17 pages
sensors-23-07190
No ratings yet
sensors-23-07190
27 pages
Havi Doc Batch 10
No ratings yet
Havi Doc Batch 10
17 pages
CRC ds2019
No ratings yet
CRC ds2019
15 pages
Group Number - 2 - MOVING OBJECT CLASSIFICATION USING YOLO Algorithm
No ratings yet
Group Number - 2 - MOVING OBJECT CLASSIFICATION USING YOLO Algorithm
15 pages
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
No ratings yet
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
10 pages
Applsci 13 09316
No ratings yet
Applsci 13 09316
18 pages
Improving Detection Capabilities of YOLOv8-n For S
No ratings yet
Improving Detection Capabilities of YOLOv8-n For S
10 pages
IJISAE 20 Divya+kumawat 3 1834
No ratings yet
IJISAE 20 Divya+kumawat 3 1834
10 pages
Ilchae Jung Real-Time MDNet ECCV 2018 Paper
No ratings yet
Ilchae Jung Real-Time MDNet ECCV 2018 Paper
16 pages
3 - Combining Self-Supervised Learning and Yolo v4 Network for Construction Vehicle Detection
No ratings yet
3 - Combining Self-Supervised Learning and Yolo v4 Network for Construction Vehicle Detection
10 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
A_Rich_Feature_Fusion_Single-Stage_Object_Detector
No ratings yet
A_Rich_Feature_Fusion_Single-Stage_Object_Detector
8 pages
Fusion_Enhancement_of_YOLOv5_and_Copula_Bayesian_Classifier_for_Hand_Gesture_Recognition_in_Smart_Sports_Venues
No ratings yet
Fusion_Enhancement_of_YOLOv5_and_Copula_Bayesian_Classifier_for_Hand_Gesture_Recognition_in_Smart_Sports_Venues
8 pages
applsci-14-11257
No ratings yet
applsci-14-11257
17 pages
I-YOLO: A Novel Single-Stage Framework For Small Object Detection
No ratings yet
I-YOLO: A Novel Single-Stage Framework For Small Object Detection
18 pages
An Analysis of Scale Invariance in Object Detection
No ratings yet
An Analysis of Scale Invariance in Object Detection
10 pages
Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation
No ratings yet
Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation
6 pages
Sensors 23 05824
No ratings yet
Sensors 23 05824
23 pages
applsci-13-05409
No ratings yet
applsci-13-05409
20 pages
A_Small-Sized_Object_Detection_Oriented_Multi-Scale_Feature_Fusion_Approach_With_Application_to_Defect_Detection
No ratings yet
A_Small-Sized_Object_Detection_Oriented_Multi-Scale_Feature_Fusion_Approach_With_Application_to_Defect_Detection
14 pages
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
No ratings yet
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
14 pages
Ymer 230109
No ratings yet
Ymer 230109
11 pages
applsci-14-07686-v2
No ratings yet
applsci-14-07686-v2
17 pages
Moving Traffic Object Detection Based on Bayesian Theory Fusion
No ratings yet
Moving Traffic Object Detection Based on Bayesian Theory Fusion
13 pages
electronics-14-01149
No ratings yet
electronics-14-01149
18 pages
YOLOv8-CAB Improved YOLOv8 For Real-Time Object de
No ratings yet
YOLOv8-CAB Improved YOLOv8 For Real-Time Object de
15 pages
Hu Et Al. - 2020 - Gabor-CNN for Object Detection Based on Small Samples
No ratings yet
Hu Et Al. - 2020 - Gabor-CNN for Object Detection Based on Small Samples
14 pages
s00530-025-01688-7
No ratings yet
s00530-025-01688-7
19 pages
Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning
No ratings yet
Efficient Visual Tracking With Stacked Channel-Spatial Attention Learning
13 pages
Overview_of_object_detection_based_on_deep_learnin
No ratings yet
Overview_of_object_detection_based_on_deep_learnin
7 pages
Weighted Boxes Fusion: Ensembling Boxes From Different Object Detection Models
No ratings yet
Weighted Boxes Fusion: Ensembling Boxes From Different Object Detection Models
9 pages
Efficient Online Structured Output Learning for Keypoint-Based ObjectTracking
No ratings yet
Efficient Online Structured Output Learning for Keypoint-Based ObjectTracking
8 pages
Real-Time Target Detection System For Animals Based On Self-Attention Improvement and Feature Extraction Optimization
No ratings yet
Real-Time Target Detection System For Animals Based On Self-Attention Improvement and Feature Extraction Optimization
21 pages
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
No ratings yet
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
12 pages
isprs-annals-X-1-2024-123-2024
No ratings yet
isprs-annals-X-1-2024-123-2024
8 pages
reseacrh apaper iott
No ratings yet
reseacrh apaper iott
13 pages
drones-07-00188
No ratings yet
drones-07-00188
18 pages
GP-Net: A Lightweight Generative Convolutional Neural Network with Grasp Priority
No ratings yet
GP-Net: A Lightweight Generative Convolutional Neural Network with Grasp Priority
20 pages
An Improved YOLOv5 Method For Small Object
No ratings yet
An Improved YOLOv5 Method For Small Object
10 pages
applsci-14-00938
No ratings yet
applsci-14-00938
21 pages
Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper
No ratings yet
Chen Dense Learning Based Semi-Supervised Object Detection CVPR 2022 Paper
10 pages
Applied Sciences: Lightweight Attention Pyramid Network For Object Detection and Instance Segmentation
No ratings yet
Applied Sciences: Lightweight Attention Pyramid Network For Object Detection and Instance Segmentation
16 pages
Bidirectional Matching Prototypical Network For Few-Shot Image Classification
No ratings yet
Bidirectional Matching Prototypical Network For Few-Shot Image Classification
5 pages
SOD-MTGAN: Small Object Detection Via Multi-Task Generative Adversarial Network
No ratings yet
SOD-MTGAN: Small Object Detection Via Multi-Task Generative Adversarial Network
16 pages
Harley Track Check Repeat An EM Approach To Unsupervised Tracking CVPR 2021 Paper
No ratings yet
Harley Track Check Repeat An EM Approach To Unsupervised Tracking CVPR 2021 Paper
11 pages
IET Computer Vision - 2023 - Yang - Online multiple object tracking with enhanced Re‐identification
No ratings yet
IET Computer Vision - 2023 - Yang - Online multiple object tracking with enhanced Re‐identification
11 pages
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
No ratings yet
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
11 pages
21BCS1133 - Exp 2.3
No ratings yet
21BCS1133 - Exp 2.3
4 pages
MimicDet Bridging The Gap Between
No ratings yet
MimicDet Bridging The Gap Between
16 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
yolo
No ratings yet
yolo
32 pages
Yolo 220209212833
No ratings yet
Yolo 220209212833
17 pages
Object Detection Research Paper
No ratings yet
Object Detection Research Paper
5 pages
Faster RCNN Object Detection With PyTorch - DebuggerCafe
No ratings yet
Faster RCNN Object Detection With PyTorch - DebuggerCafe
1 page
Object Detection Using Image Processing
No ratings yet
Object Detection Using Image Processing
17 pages
Object_Detection_Document
No ratings yet
Object_Detection_Document
4 pages
DSP Project Report
100% (1)
DSP Project Report
14 pages
Building Vehicle Counter System Using OpenCV
No ratings yet
Building Vehicle Counter System Using OpenCV
13 pages
Crop_Disease_Detection_Documentation
No ratings yet
Crop_Disease_Detection_Documentation
2 pages
yolo
No ratings yet
yolo
34 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
Object Detection and Segmentation On Tensor Flow Using
No ratings yet
Object Detection and Segmentation On Tensor Flow Using
10 pages
Chauhan et al. - 2019 - Embedded CNN based vehicle classification and counting in non-laned road traffic-annotated
No ratings yet
Chauhan et al. - 2019 - Embedded CNN based vehicle classification and counting in non-laned road traffic-annotated
11 pages
Algoritm For MOD
No ratings yet
Algoritm For MOD
32 pages
IVA-new
No ratings yet
IVA-new
2 pages
Lab 3 Yolo Object Detection
No ratings yet
Lab 3 Yolo Object Detection
5 pages
Human Detection System Report
No ratings yet
Human Detection System Report
39 pages
Synopsis Real Time
No ratings yet
Synopsis Real Time
3 pages
Object Detection Using CNN
No ratings yet
Object Detection Using CNN
6 pages
Object Detection Presentation
100% (2)
Object Detection Presentation
28 pages
Vehicle Detection and Identification Using YOLO in Image Processing
No ratings yet
Vehicle Detection and Identification Using YOLO in Image Processing
6 pages
Midterm Sample
No ratings yet
Midterm Sample
2 pages
ICMACC Presentaion Paper id 672
No ratings yet
ICMACC Presentaion Paper id 672
28 pages
Automatic Cricket Commentary Generation A Review
No ratings yet
Automatic Cricket Commentary Generation A Review
7 pages
Project Report Pallapati
No ratings yet
Project Report Pallapati
62 pages
Object Detection and Segmentation
No ratings yet
Object Detection and Segmentation
85 pages
yolo1-11
No ratings yet
yolo1-11
38 pages
paper
No ratings yet
paper
11 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Research_paper_Format _For MCA
No ratings yet
Research_paper_Format _For MCA
6 pages