0% found this document useful (0 votes)
18 views

Deep Learning Methods For Animal Counting in Camera Trap Images

The document discusses two new deep learning methods called FilterDetector and DLEDetector that were proposed to improve animal counting from camera trap images. The methods aim to monitor biodiversity and population density of animal species.

Uploaded by

ardyth85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Deep Learning Methods For Animal Counting in Camera Trap Images

The document discusses two new deep learning methods called FilterDetector and DLEDetector that were proposed to improve animal counting from camera trap images. The methods aim to monitor biodiversity and population density of animal species.

Uploaded by

ardyth85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Artificial Intelligence (ICTAI)

2022 IEEE 34th International Conference on Tools with Artificial (ICTAl)

'HHS/HDUQLQJ0HWKRGVIRU$QLPDO&RXQWLQJLQ
Deep Learning Methods for Animal Counting in
&DPHUD7UDS,PDJHV
Camera Trap Images


<L]KHQ:DQJ<DQJ=KDQJ<XDQ)HQJDQG<L6KDQJ
Yizhen Wang, Yang Zhang, Yuan Feng, and Yi Shang
'HSWRI(OHFWULFDO(QJLQHHULQJDQG&RPSXWHU6FLHQFH
Dept. of Electrical Engineering and Computer Science
2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) | 979-8-3503-9744-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICTAI56018.2022.00143

8QLYHUVLW\RI0LVVRXUL
University of Missouri
&ROXPELD0LVVRXUL8QLWHG6WDWHV
Columbia, Missouri, United States
^\ZIP\DQJ]KDQJ\I]FVKDQJ\`#PLVVRXULHGX
{ywf3m, yangzhang, yfzc8, shangy}@missouri.edu

$EVWUDFW²&DPHUD WUDSV
Abstract-Camera traps DUH
are ZLGHO\
widely XVHG
used WR
to PRQLWRU
monitor WKH
the ,Q WKLV SDSHU
In this paper, two WZR QHZ
new PHWKRGV
methods, )LOWHU'HWHFWRU
FilterDetector DQGand
ELRGLYHUVLW\
biodiversity DQG
and SRSXODWLRQ
population GHQVLW\
density RI
of DQLPDO
animal VSHFLHV
species. &DPHUD
Camera '/('HWHFWRUDUHSURSRVHGWRLPSURYHDQLPDOFRXQWLQJEDVHG
DLEDetector, are proposed to improve animal counting based
WUDSLPDJHVDUHXVXDOO\WDNHQLQEXUVWVDQGWKHDQLPDOFRXQWLQJ
trap images are usually taken in bursts, and the animal counting RQ
on WKH
the GHWHFWLRQ
detection UHVXOWV
results RI 0LFURVRIW 0HJD'HWHFWRU
of Microsoft MegaDetector 9 V 4. 7KH
The
SUREOHPIRUDVHTXHQFHRIFDPHUDWUDSLPDJHVLVDOVRDQLPSRUWDQW
problem for a sequence of camera trap images is also an important
FDPHUDWUDSLPDJHVZHUHGLYLGHGLQWRWZRJURXSVEDVHGRQWKH
camera trap images were divided into two groups based on the
SDUWRIHYDOXDWLQJDQLPDOSRSXODWLRQGHQVLW\,QWKLVSDSHUWZR
part of evaluating animal population density. In this paper, two
GHQVLW\
density RI of DQLPDOV
animals LQ in WKH
the LPDJHV
images. )LOWHU'HWHFWRU
FilterDetector DSSOLHV
applies DQan
QHZDQLPDOFRXQWLQJPHWKRGVEDVHGRQ0LFURVRIW0HJD'HWHFWRU
new animal counting methods based on l\ficrosoft MegaDetector
9KDYHEHHQSURSRVHG)LOWHU'HWHFWRUXVHVGLIIHUHQWILOWHUVZLWK
V4 have been proposed. FilterDetector uses different filters with
HIILFLHQW
efficient ILOWHULQJ
filtering PHWKRG
method WR to WKH
the GHWHFWLRQ
detection UHVXOWV
results RI of
ERXQGLQJ
bounding ER[
box HQVHPEOH
ensemble DOJRULWKPV
algorithms WR
to DFKLHYH
achieve PRUH
more DFFXUDWH
accurate
0HJD'HWHFWRU
MegaDetector DQG and FRPELQHV
combines 1RQ0D[LPXP
Non-Maximum 6XSSUHVVLRQ
Suppression
ERXQGLQJER[GHWHFWLRQ'/('HWHFWRULVDQHQVHPEOHPHWKRGWKDW
bounding box detection. DLEDetector is an ensemble method that 106 >@WRDFKLHYHKLJKSUHFLVLRQERXQGLQJER[GHWHFWLRQ
(NMS) [6] to achieve high precision bounding box detection.
XVHV
uses WZR
two EDVH
base GHHS
deep OHDUQLQJ
learning PRGHOV
models WR
to FRUUHFW
correct DQG
and HQKDQFH
enhance WKH
the 7KH
The 106NMS DOJRULWKP
algorithm XVHGused LQ )LOWHU'HWHFWRU XWLOL]H
in FilterDetector utilize ERWK
both WKH
the
GHWHFWLRQ
detection UHVXOW
result RI
of 0HJD'HWHFWRU
MegaDetector. 2XU
Our H[SHULPHQWDO
experimental UHVXOWV
results LQ
in ,QWHUVHFWLRQRYHU8QLRQ
Intersection over Union (loU) ,R8 DQGWKHFRQILGHQFHVFRUHRIWKH
and the confidence score of the
L:LOG&DPFRPSHWLWLRQWHVWGDWDVHWVKRZWKDWERWKPHWKRGV
iWildCam 2022 competition test dataset show that both methods ERXQGLQJER[WREHWWHUUHPRYHIDOVHSRVLWLYHV'/('HWHFWRULV
bounding box to better remove false positives. DLEDetector is
RXWSHUIRUPHGWKHEHVWPHWKRGLQL:LOG&DPDQGWKHEDVHOLQH
outperformed the best method in iWildCam 2021 and the b aseline DQ
an HQVHPEOH
ensemble PHWKRG
method WKDW
that XVHV
uses WZR
two GHHS
deep OHDUQLQJ
learning PRGHOV
models WR to
PHWKRGEDVHGRQ0HJD'HWHFWRU9LQL:LOG&DPFRPSHWLWLRQ
method based on MegaDetector V4 in iWildCam 2022 competition
FRUUHFW
correct DQGand HQKDQFH
enhance WKHthe GHWHFWLRQ
detection UHVXOWV
results RI
of 0HJD'HWHFWRU
MegaDetector.
E\DQGUHVSHFWLYHO\DQGUDQNHGILUVWDQGWKLUGLQ
by 9.09% and 6.44%, respectively, and ranked first and third in
:HLJKWHG%R[)XVLRQ
Weighted Box Fusion (WBF) :%) >@DOJRULWKPLVXVHGWRIXVHWKH
[7] algorithm is used to fuse the
WKHFRPSHWLWLRQ
the competition.
GHWHFWLRQ
detection UHVXOWV
results IURP
from WZR
two GHWHFWRUV
detectors. )XUWKHUPRUH
Furthermore, D a ELQDU\
binary
.H\ZRUGV²FDPHUD
Keywords-camera WUDSV
traps, DQLPDO
animal FRXQWLQJ
counting, ERXQGLQJ
bounding ER[
box FODVVLILFDWLRQPRGHOLVWUDLQHGWRGLVWLQJXLVKEHWZHHQDQLPDOV
classification model is trained to distinguish between animals
HQVHPEOHPDFKLQHOHDUQLQJGHHSOHDUQLQJ
ensemble, machine learning, deep learning DQGEDFNJURXQGVWRHOLPLQDWHIDOVHSRVLWLYHV2XUH[SHULPHQWDO
and backgrounds to eliminate false positives. Our experimental
UHVXOWV
results RQ on L:LOG&DP
iWildCam  2022 FRPSHWLWLRQ
competition VKRZV WKDW ERWK
shows that both
,
I. ,INTRODUCTION
1752'8&7,21 )LOWHU'HWHFWRUDQG'/('HWHFWRURXWSHUIRUPHGWKHEHVWPHWKRG
FilterDetector and DLEDetector outperformed the best method
&DPHUDWUDSVDUHZLGHO\XVHGE\ELRORJLVWVDQGHWKRORJLVWV
Camera traps are widely used by biologists and ethologists in iWildCam 202 1 and the baseline. They ranked 1 VWstDQG
LQL:LOG&DPDQGWKHEDVHOLQH7KH\UDQNHG UG
and 3rd LQ
in
WRPRQLWRUELRGLYHUVLW\DQGSRSXODWLRQGHQVLW\RIDQLPDOVSHFLHV
to monitor biodiversity and population density of animal species WKHL:LOG&DPFRPSHWLWLRQ2YHUDOOWKHFRQWULEXWLRQVRI
the iWildCam 2022 competition. Overall, the contributions of
>@>@7KHFDPHUDVFDQDXWRPDWLFDOO\FROOHFWODUJHTXDQWLWLHVRI
[ 1 ] [2] . The cameras can automatically collect large quantities of WKLVZRUNLQFOXGH
this work include :
LPDJHV
images. &DPHUD
Camera WUDSV
traps DUH SODFHG LQ
are placed in DQ
an DUHD
area RI
of LQWHUHVW
interest ZLWK
with D
a 
1 . $QDQLPDOGHQVLW\DQDO\VLVPHWKRGWKDWFDQHIIHFWLYHO\
An animal density analysis method that can effectively
PRWLRQWULJJHUDQGZKHQDPRWLRQLVGHWHFWHGWKHFDPHUDZLOO
motion trigger and, when a motion is detected, the camera will LPSURYH
Improve FDPHUDcamera trap WUDS LPDJH
Image DQLPDO
animal FRXQWLQJ
counting
WDNH D
take a VHTXHQFH EXUVWV  RI
sequence (bursts) of SKRWRV
photos. $IWHU
After D
a ODUJH
large QXPEHU
number RI
of SHUIRUPDQFH
performance.
LPDJHV
images are DUH FROOHFWHG
collected, LGHQWLI\LQJ
identifying DQLPDO
animal VSHFLHV
species DQG
and FRXQWV
counts 
2. )LOWHU'HWHFWRU
FilterDetector, D a QHZ
new NMS 106 EDVHG
based PHWKRG
method WKDW
that FDQ
can
PDQXDOO\
manually LV YHU\ WLPHFRQVXPLQJ
is very time-consuming DQG and ODERULQWHQVLYH
labor-intensive. JHQHUDWHV
generates LPSURYHG ERXQGLQJ ER[
improved bounding box GHWHFWLRQV
detections IRU
for
7KHUHIRUH
Therefore, UHVHDUFKHUV
researchers DUH
are ZRUNLQJ
working RQ
on GHYHORSLQJ
developing PDFKLQH
machine FDPHUD
camera WUDS
trap LPDJHV
images EDVHG
based RQon WKH
the GHWHFWLRQ
detection UHVXOWV
results RI
of
OHDUQLQJ
learning PHWKRGV
methods WR
to DXWRPDWH
automate WKH
the SURFHVV
process RI
of DQLPDO
animal GHWHFWLRQ
detection 0HJD'HWHFWRU9
MegaDetector V4.
DQGVSHFLHVFODVVLILFDWLRQDQGORFDOL]DWLRQRIDQLPDOVLQFDPHUD
and species classification and localization of animals in camera

3 . '/('HWHFWRU
DLEDetector, D a QHZ
new GHHS
deep OHDUQLQJ
learning EDVHG
based HQVHPEOH
ensemble
WUDSLPDJHV>@>@
trap images [3] [4] .
PHWKRG
method WKDWthat XVHV WZR EDVH
uses two base GHHS
deep OHDUQLQJ
learning PRGHOV
models WR to
L:LOG&DP
iWildCam LVis DQ
an DQQXDO
annual FRPSHWLWLRQGHGLFDWHG
competition dedicated WR
to WKLV
this ILHOG
field FRUUHFW
correct DQGand HQKDQFH
enhance WKH the GHWHFWLRQ
detection UHVXOWV
results RI of
ODXQFKHG
launched LQin 
201 8 DVas SDUW
part RI
of WKH
the )LQH*UDLQHG
Fine-Grained 9LVXDO
Visual 0HJD'HWHFWRU9
MegaDetector V4.
&ODVVLILFDWLRQ )*9&  ZRUNVKRS
Classification (FGVC) workshop DW WKH &RQIHUHQFH
at the Conference RQ on 7KH UHVW
The rest RI
of WKH
the SDSHU
paper LVis RUJDQL]HG
organized DVas IROORZV
follows. 6HFWLRQ
Section ,,II
&RPSXWHU
Computer 9LVLRQ DQG 3DWWHUQ
Vision and 5HFRJQLWLRQ (CVPR)
Pattern Recognition &935  >@
[5] . 7KH
The SURYLGHV
provides D a UHYLHZ
review RI of UHODWHG
related ZRUN
work. 6HFWLRQ
Section ,,,
III GHWDLOV
details WKH
the
WDUJHWTXHVWLRQVIRUL:LOG&DPYDU\HDFK\HDUL:LOG&DP
target questions for iWildCam vary each year. iWildCam 2021 SUREOHP
problem IRUPXODWLRQ
formulation. 6HFWLRQ
Section ,9 IV GHVFULEHV
describes WKH
the PHWKRGV
methods ZHwe
IRFXVHGRQDQLPDOFRXQWLQJDQGFODVVLILFDWLRQ,WWXUQHGRXWWKDW
focused on animal counting and classification. It turned out that SURSRVH
propose. 6HFWLRQ
Section 9 V SUHVHQWV
presents RXUour H[SHULPHQWDO
experimental UHVXOWV
results. )LQDOO\
Finally,
DFFXUDWHO\
accurately FRXQWLQJ
counting DQG
and FODVVLI\LQJ
classifying DQLPDOV
animals LQ
in LPDJH
image EXUVWV
bursts 6HFWLRQ9,GUDZVFRQFOXVLRQV
Section VI draws conclusions.
VLPXOWDQHRXVO\LVGLIILFXOW7KHUHDUHPDQ\DQLPDOLPDJHLVVXHV
simultaneously is difficult. There are many animal image issues
WKDWPDNHWKHSUREOHPFKDOOHQJLQJVXFKDVSRRULOOXPLQDWLRQ
that make the problem challenging, such as poor illumination, ,,
II. 5 (/$7(':
RELATED 25.
WORK
PRWLRQEOXURFFOXVLRQDQGFDPRXIODJH7KHUHIRUHL:LOG&DP
motion blur, occlusion, and camouflage. Therefore, iWildCam 'HHSOHDUQLQJWHFKQLTXHVKDYHEHHQXVHGWRSURFHVVFDPHUD
Deep learning techniques have been used to process camera
UHIRFXVHGRQWKHFRXQWLQJSUREOHPWRFRXQWLQGLYLGXDO
2022 re-focused on the counting problem -- to count individual WUDSLPDJHU\LQUHFHQW\HDUV$OWKRXJKWKHILUVWPHQWLRQHGGHHS
trap imagery in recent years. Although the first mentioned deep
DQLPDOVDFURVVVHTXHQFHV
animals across sequences. OHDUQLQJPHWKRGXVHGIRUHFRORJLFDOLGHQWLILFDWLRQZDVLQ
learning method used for ecological identification was in 2000

2375-0197/22/$31.00 ©2022 IEEE 939


939
DOl 10.1109/ICTAI56018.2022.00143
DOI 10.1109!1CTAI56018.2022.00143
Authorized licensed use limited to: Politecnico di Milano. Downloaded on May 18,2024 at 09:34:49 UTC from IEEE Xplore. Restrictions apply.
>@PRVWGHHSOHDUQLQJPHWKRGVIRUWKLVSUREOHPZHUHSXEOLVKHG
[8], most deep learning methods for this problem were published % Animal
B. $QLPDO&RXQWLQJIRUD6HTXHQFHRI,PDJHV L:LOG&DP
Countingfor a Sequence ofImages (iWildCam
VLQFH
since 
20 1 5 . $W
At HDUOLHU
earlier VWDJH
stage, PRVW
most GHHS
deep OHDUQLQJ
learning DSSURDFKHV
approaches 
2022)
:
IRFXVHG
focused RQ on HQWLUH
entire FDPHUD
camera WUDS
trap LPDJHV
images. +RZHYHU
However, WKH
the L:LOG&DP
iWildCam 
2022 IRFXVHV
focuses HQWLUHO\
entirely RQ
o n DQLPDO
animal FRXQWLQJ
counting,
EDFNJURXQGVRIFDPHUDWUDSLPDJHVDUHFRPSOH[DQGFDQDIIHFW
backgrounds of camera trap images are complex and can affect UHJDUGOHVV
regardless RI
of DQLPDO
animal VSHFLHV
species. 7KH
The JRDO
goal LV
is WR
to FRXQW
count LQGLYLGXDO
individual
FODVVLILFDWLRQUHVXOWVRQZKROHLPDJHV$VDUHVXOWUHVHDUFKHUV
classification results on whole images. As a result, researchers DQLPDOV
animals DFURVV
across LPDJH
image EXUVWV
bursts. 7KH
The HYDOXDWLRQ
evaluation PHWULF
metric LV
is 0HDQ
Mean
EHJDQWRWU\WRXVHREMHFWGHWHFWLRQPRGHOVWRILUVWORFDOL]HDQ
began to try to use object detection models to first localize an $EVROXWH(UURU 0$( DVVKRZQEHORZ
Absolute Error (MAE), as shown below,
DQLPDOZLWKLQDQLPDJHDQGWKHQFODVVLI\WKHDQLPDOV
animal within an image, and then classify the animals.

n
ͳ
= � ෍ȁ‫ݔ‬
0HJD'HWHFWRULVDGHWHFWLRQPRGHOWUDLQHGDQGUHOHDVHGE\
MegaDetector is a detection model trained and released by ‫ܧܣܯ‬ ȁ
MA E ൌ '\' l x·௜ െ
0LFURVRIW,WLVGHVLJQHGWRKHOSFRQVHUYDWLRQELRORJLVWVLPSURYH
Microsoft. It is designed to help conservation biologists improve n L...
݊ i ' - ‫ݕ‬Y௜'· l 
=l
௜ୀଵ
WKH HIILFLHQF\
the efficiency RI UHYLHZLQJ FDPHUD
of reviewing WUDSLPDJHV
camera trap images >@7KH
[9] . The PRVW
most
FRPPRQO\XVHGVWDEOHYHUVLRQ0HJD'HWHFWRU9XVHV)DVWHU
commonly used stable version, MegaDetector V4, uses Faster­ ZKHUH each X;௜ UHSUHVHQWV
where HDFK‫ݔ‬ represents WKH
the SUHGLFWHG
predicted FRXQW
count RI
o f DQLPDOV
animals LQ
in DQ
an
5&11>@DVWKHEDFNERQHDUFKLWHFWXUH
RCNN [ 1 0] as the backbone architecture. LPDJH
image VHTXHQFH i, Yi௜ UHSUHVHQWVWKH
sequence L‫ݕ‬ represents the JURXQG
ground WUXWK
truth FRXQW
count IRUWKH
for the
VHTXHQFHDQGQLVWKHQXPEHURIVHTXHQFHVLQWKHWHVWVHW
sequence, and n is the number of sequences in the test set.
5HVQHW
Resnet >@ ZDV one
[ 1 1 ] was RQH RI
of WKH
the PRVW
most VXFFHVVIXO
successful FODVVLILFDWLRQ
classification
PRGHOV7KHFRUHLGHDRI5HV1HW³LGHQWLW\VKRUWFXWFRQQHFWLRQ´
models. The core idea of ResNet "identity shortcut connection" &
C. L:LOG&DPGDWDVHW
iWildCam dataset
DGGUHVVHGYDQLVKLQJJUDGLHQWSUREOHPVXFFHVVIXOO\DQGILQDOO\
addressed vanishing gradient problem successfully and finally ,Q
In WKLVSDSHU ZH XVHL:LOG&DP
this paper, we use iWildCam GDWDVHWV7KHUHLVD
datasets. There is a VPDOO
small
PDNHVLWSRVVLEOHWRWUDLQYHU\GHHSQHXUDOQHWZRUNV)DVWHU5
makes it possible to train very deep neural networks. Faster R­ GLIIHUHQFH EHWZHHQ the
difference between WKH GDWDVHW
dataset RI
of L:LOG&DP
iWildCam 2021 DQG
and WKDW
that RI
of
&11LVDWZRVWDJHGHWHFWRUZLWKDRQHVFDOHQHWZRUN5HJLRQ
CNN is a two-stage detector with a one-scale network Region L:LOG&DP
iWildCam 2022. 7KH
The L:LOG&DP
iWildCam 2021 :LOGOLIH
Wildlife &RQVHUYDWLRQ
Conservation
3URSRVDO
Proposal 1HWZRUN
Network (RPN)531  RU
or D
a PXOWLVFDOHSURSRVDO
multi-scale proposal JHQHUDWLRQ
generation 6RFLHW\ :&6  WUDLQLQJ
Society (WCS) training VHW
set FRQWDLQV
contains 
203,3 1 4 LPDJHV
images IURP
from
QHWZRUN VXFKDV
network such as D a )HDWXUH3\UDPLG1HWZRUN
Feature Pyramid Network (FPN).)31  0DVN5
Mask R­ VHTXHQFHVDQGWKH:&6WHVWVHWFRQWDLQVLPDJHV
36,547 sequences, and the WCS test set contains 60,2 1 4 images
&11>@LVEXLOWEDVHGRQ)DVWHU5&11&RPSDUHGWR)DVWHU
CNN [ 1 2] is built based on Faster R-CNN. Compared to Faster IURPVHTXHQFHV7KHL:LOG&DP:&6WUDLQLQJVHW
from 1 1 ,057 sequences. The iWildCam 2022 WCS training set
5&11ZKLFKKDVRXWSXWVIRUHDFKFDQGLGDWHREMHFW0DVN5
R-CNN which has 2 outputs for each candidate object, Mask R­ FRQWDLQVLPDJHVIURPVHTXHQFHVDQGWKH:&6
contains 201 ,399 images from 36,292 sequences, and the WCS
&11KDVWKHDGGLWLRQRIDWKLUGEUDQFKWKDWRXWSXWVWKHREMHFW
CNN has the addition of a third branch that outputs the object WHVW VHW
test set FRQWDLQV
contains 
60,029 LPDJHV
images IURP
from 
1 1 ,028 VHTXHQFHV
sequences. 7KHThe
PDVN
mask. FDWHJRU\
category LQIRUPDWLRQ
information SURYLGHG
provided E\by WKH
the L:LOG&DP
iWildCam  2021 GDWDVHW
dataset
1RQ0D[LPXP 6XSSUHVVLRQ
Non-Maximum 106  DQG
Suppression (NMS) and :HLJKWHG
Weighted %R[
Box DQG
and L:LOG&DP
iWildCam 
2022 GDWDVHW
dataset LV
is WKH
the VDPH ZLWK D
same, with a WRWDO
total RI
of 
205
)XVLRQ (WBF)
Fusion :%) DUHWKHWZRPRVWFRPPRQPHWKRGVIRUUHPRYLQJ
are the two most common methods for removing FODVVHVLQFOXGLQJDQLPDOFODVVHVDQGDQHPSW\FODVV
classes, including 204 animal classes and an "empty" class.
UHGXQGDQWERXQGLQJER[HV7KHSXUSRVHRI106LVWRH[FOXGH
redundant bounding boxes. The purpose of NMS is to exclude )LJXUH
Figure 
1 VKRZV
shows WKH the GLVWULEXWLRQ
distribution RI
of LPDJH
image QXPEHU
number LQ in HDFK
each
VRPHEER[HVEDVHGRQ,R8ZKLOHWKHJRDORI:%)LVWRIXVHWKH
some bboxes based on loU, while the goal ofWBF is to fuse the VHTXHQFH
sequence. )URP
From theWKH ILJXUH ZH FDQ
figure, we can VHH
see WKDW WKH QXPEHU
that the number RIof
LQIRUPDWLRQ
information RI
of DOO
all SUHGLFWHG
predicted EER[HV
bboxes EDVHG
based RQ
on WKH
the FRQILGHQFH
confidence SLFWXUHV
pictures FRQWDLQHG
contained LQ in D
a VHTXHQFH
sequence YDULHV
varies IURP
from  WR 
1 to 1 0, DQG
and WKH
the
VFRUHRIHDFKERXQGLQJER[
score of each bounding box. PRVWFRPPRQDUHWKUHHVKRWVDQGWHQVKRWV$IWHUVWDWLVWLFVWKH
most common are three shots and ten shots. After statistics, the
$IWHU
After D
a UHYLHZ
review RI
of L:LOG&DP
iWildCam 
2021 VXEPLVVLRQV
submissions, weZH IRXQG
found VKRUWHVW
shortest LQWHUYDO
interval EHWZHHQ
between WZR two DGMDFHQW
adjacent SLFWXUHV
pictures LQ
in WKH
the VDPH
same
WKDW
that PRVW
most RI
of WKH
the WRS
top WHDPV
teams XVHG
used HQVHPEOH
ensemble PHWKRGV
methods. 'LIIHUHQW
Different VHTXHQFH
sequence LVis OHVV WKDQ 
less than 1 VHFRQG
second, DQG WKH ORQJHVW
and the longest FDQ
can EH
be 
80
FODVVLILHUVZHUHWUDLQHGE\XVLQJERWKIXOOLPDJHVDQGFURSSHG
classifiers were trained by using both full images and cropped VHFRQGV
seconds. )LJXUH
Figure 2 VKRZV
shows VRPH
some VDPSOH
sample LPDJHV
images RI WKH :&6
of the WCS
EER[HV,QDGGLWLRQVRPHWHDPVDOVRWULHGWRXVHPHWKRGVVXFK
bboxes. In addition, some teams also tried to use methods such GDWDVHW)URPWKHVDPSOHLPDJHVZHFDQVHHGXHWRGLIIHUHQFHV
dataset. From the sample images we can see, due to differences
DV
as EDODQFHG
balanced JURXS
group VRIWPD[
softmax WR
to DGGUHVV
address WKH
the FODVV
class LPEDODQFH
imbalance LQ
in VKRRWLQJ
shooting WLPH
time, ORFDWLRQ
location, DQG
and HTXLSPHQW
equipment, WKH LPDJHV RI
the images of WKH
the
SUREOHP%DVHGRQL:LOG&DPWRSVROXWLRQVZHFKRVHWR
problem. Based on iWildCam 2021 top solutions, we chose to :&6GDWDVHWYDU\JUHDWO\LQSDWWHUQUHVROXWLRQSHUVSHFWLYHHWF
WCS dataset vary greatly in pattern, resolution, perspective, etc.
IRFXV
focus  RQ
on HIIHFWLYHO\
effectively UHPRYH
remove IDOVH
false SRVLWLYHV
positives DQG
and HQVHPEOH
ensemble
PXOWLSOH
multiple GHHS
deep OHDUQLQJ
learning PRGHOV
models WR
to V\QHUJLVWLFDOO\
synergistically LPSURYH
improve WKH
the
RYHUDOOGHWHFWLRQSHUIRUPDQFH
overall detection performance.

I ., . . . . .
,,,
III. 3 52%/(0)
PROBLEM 2508/$7,21
FORMULATION

$ Animal
A. $QLPDO&RXQWLQJDQG&ODVVLILFDLWLRQIRUD6HTXHQFHRI
Counting and Classificaitionfor a Sequence of
,PDJHV (iL:LOG&DP
Images WildCam 2021)
: I I • - - - - - •
0 1 ) • s • 1 • ' 1�
Numbfi of .wm.kHt«tHI by �Oftea«V<&
7KH
The JRDO
goal RI
of L:LOG&DP
iWildCam 
2021 LV
is WR
to FDWHJRUL]H
categorize DQLPDOV
animals LQWR
into
VSHFLHV
species DQG
and FRXQW
count WKH
the QXPEHU
number RI
of LQGLYLGXDOV
individuals RI
of HDFK
each VSHFLHV
species D  (b)
(a) E
DFURVV
across LPDJH
image EXUVWV
bursts, D
a VHTXHQFH
sequence RI
of LPDJHV
images FDSWXUHG
captured LQ
in TXLFN
quick )LJXUH'LVWULEXWLRQRILPDJHQXPEHULQHDFKVHTXHQFHRIL:LOG&DP
Figure 1. Distribution of image number in each sequence of iWildCam 2022
VXFFHVVLRQ
succession. 7KH
The HYDOXDWLRQ
evaluation PHWULF
metric LV
is 0HDQ
Mean &ROXPQZLVH
Columnwise 5RRW
Root WUDLQLQJGDWDVHW D WHVWGDWDVHW
training dataset (a), E 
test dataset (b).
0HDQ6TXDUHG(UURU 0&506( DVVKRZQEHORZ
Mean Squared Error (MCRMSE), as shown below, 7KHL:LOG&DPWUDLQLQJGDWDVHWGRHVQRWFRQWDLQWKHJURXQG
The iWildCam training dataset does not contain the ground­
WUXWK for
truth IRUDQLPDO
animal GHWHFWLRQ
detection DQG
and DQLPDOFRXQWLQJ
animal counting RI
of WKH
the WUDLQLQJ
training

m ௡ LPDJHV,WRQO\SURYLGHV0HJD'HWHFWRU9¶VGHWHFWLRQUHVXOWV
images. It only provides MegaDetector V4' s detection results
ͳ ͳ
‫ܧܵܯܴܥܯ‬
MCRMSE ൌ
݉j ݊ ! I=l
= ෍ ඩ ෍ሺ‫ݔ‬௜௝ െ ‫ݕ‬௜௝ ሻଶ 
௝ୀଵ ௜ୀଵ
DVDUHIHUHQFHDQGWKH'HHS0$&>@VHJPHQWDWLRQUHVXOWVIRU
as a reference and the DeepMAC [20] segmentation results for
HDFKERXQGLQJER[GHWHFWHGE\0HJD'HWHFWRU
each bounding box detected by MegaDetector.

ZKHUH
where Pm LV
is WKH
the QXPEHU
number RIof DQLPDO
animal VSHFLHV
species, Q
n LV
is WKH
the QXPEHU
number RI
of
VHTXHQFH
sequence, HDFK
each FROXPQ
column jM UHSUHVHQWV
represents D
a VSHFLHV
species, HDFK
each URZ
row L
i
represents a sequence, Xij
UHSUHVHQWVDVHTXHQFH‫ݔ‬ ௜௝ LVWKHSUHGLFWHGFRXQWIRUWKDWVSHFLHV
is the predicted count for that species
in that sequence, and Yij
LQWKDWVHTXHQFHDQG‫ݕ‬ ௜௝ LVWKHJURXQGWUXWKFRXQW
is the ground truth count.

940
940

Authorized licensed use limited to: Politecnico di Milano. Downloaded on May 18,2024 at 09:34:49 UTC from IEEE Xplore. Restrictions apply.
DSSOLHVGLIIHUHQWILOWHULQJSULQFLSDOVRQLPDJHVZLWKORZRUKLJK
applies different filtering principals on images with low or high
GHQVLW\RIDQLPDOV$OJRULWKPSURYLGHVWKHZKROHSURFHVV
density of animals. Algorithm 1 provides the whole process.

. -� $OJRULWKP3URSRVDO'HQVLW\%DVHG)LOWHULQJ$QDO\VLV
Algorithm 1 Proposal-Density-Based Filtering Analysis

.�rr
,QSXW3
Input: P =^&ROOHFWLRQRI3URSRVDOVSUHGLFWHGE\0HJD'HWHFWRUIRURQH
{Collection of Proposals predicted by MegaDetector for one
LPDJHLQFRQILGHQFHGHFUHDVLQJRUGHU`
image in confidence-decreasing order}
.
&
C =^&ROOHFWLRQRI&RQILGHQFHRI3URSRVDOVLQ'`
{Collection of Confidence of Proposals in D.}
2XWSXW)
Output: F =^&ROOHFWLRQRI3URSRVDOVDIWHUILOWHUHGE\3URSRVDO'HQVLW\
..........-,

{Collection of Proposals after filtered by Proposal-Density­


(a) %DVHG)LOWHULQJ`
Based Filtering}
)
F =^`{}
LI1XP
ifNum (P) 3 !> =8:
)
F =106NMS (P 3,R8
, loU =0.2)1RQ0D[LPXP6XSSUHVVLRQ
# Non-Maximum Suppression


(OVH
Else:
IRUL
for i =«QGR
I, 2, . . . , n do
--
Qlldup 3L  =
.

GXS (Pi) 0
.

IRUGLQ3^3
for d in P/{P,} L`
:
,;..;,_ -.:c.... . -
LI,R8
if loU (P,, 3LGd)!,QWHUVHFWLRQRI8QLRQ
> 0 . 5 : # Intersection o f Union
(c) Qlldup 3L 
GXS (Pi) += 1
HQGIRU
end for
HQGIRU
end for
if CLi !ș
LI& > 8(ndupQGXS (Pi)):
3L $GDSWLYHFRQILGHQFHWKUHVKROG
# Adaptive confidence threshold
Insert P,LLQWR)
,QVHUW3 into F

1) High-density
+LJKGHQVLW\LPDJHVILOWHULQJ
images filtering
)RUKLJKGHQVLW\LPDJHVWRFRXQWPRUHDQLPDOVZHVHWWKH
For high-density images, to count more animals we set the
(e) (f) FRQILGHQFHWKUHVKROGDV7RUHPRYHWKHGXSOLFDWLRQZHVHWD
confidence threshold as 0.0. To remove the duplication we set a
)LJXUH6DPSOHLPDJHVIURPWKHGDWDVHWV
Figure D WR
2. Sample images from the datasets. (a) G DUHLPDJHVZLWKOHVV
to (d) are images with less VWULFW
strict ,R8
loU WKUHVKROG
threshold DV
as 
0.2 RI
of 106
NMS PHWKRG
method. 7KH
The ,R8
loU LV
is
FRORUHOHPHQWV,PDJHVRQOHIWVLGHFRQWDLQVVLQJOHDQLPDO,PDJHRQULJKWVLGH
color elements. Images on left side contains single animaL Image on right side LQWHUVHFWLRQ
intersection RI
of 8QLRQ
Union ZKLFK
which FDQ
can EH
be FDOFXODWHG
calculated ZLWK
with IRUPXOD
formula
FRQWDLQVPXOWLSOHDQLPDOV
contains multiple animals . EHORZ
below.

,9
IV. 0 (7+2'6 area of intersection of P;௜ ܽ݊݀ܲ
ܽ‫݂ܲ݋݊݋݅ݐܿ݁ݏݎ݁ݐ݂݊݅݋ܽ݁ݎ‬ and Pj௝ 
METHODS
loU(P;,PJ
‫ܷ݋ܫ‬൫ܲ௜ ǡ ܲ௝ ൯ ൌ
=
area of Union of P;௜ ܽ݊݀ܲ
ܽ‫݂ܲ݋݊݋ܷ݂݅݊݋ܽ݁ݎ‬ and Pj௝
$ Density
A. 'HQVLW\$QDO\VLV
Analysis
6LQFHWKHUHLVQRJURXQGWUXWKZHDQDO\]HGWKHGDWDVHWEDVHG
Since there is no grmmd truth, we analyzed the dataset based
)LJXUH
Figure 
4. VKRZV
shows WKDW
that WKLV
this VHWWLQJ
setting RI
of SDUDPHWHUV
parameters KHOSV
helps XV
us WR
to
RQWKHUHVXOWVRI0HJD'HWHFWRU:HGLYLGHGWKHGLIIHUHQWLPDJHV
on the results ofMegaDetector. We divided the different images
GHWHFWPRUHDQLPDOVWKDQGHIDXOWVHWWLQJV FRQILGHQFHWKUHVKROG
detect more animals than default settings (confidence threshold
LQWRWZRJURXSVDFFRUGLQJWRWKHQXPEHURIDQLPDOVGHWHFWHGE\
into two groups according to the number of animals detected by
 
0.95) DQG
and PRVW
most RI WKH SUHGLFW
of the predict DUH
are FRUUHFW
correct 7KLV
This PHWKRG
method =

0HJD'HWHFWRU)RUDQLPDJHLIWKHQXPEHURIGHWHFWHGDQLPDOV
MegaDetector. For an image, if the number of detected animals
LPSURYHVWKHSXEOLFVFRUHWR
improves the public score to 0.253.
LVRUPRUHZHFRQVLGHUWKHLPDJHWREHDKLJKGHQVLW\LPDJH
is 8 or more, we consider the image to be a high-density image.
2WKHUZLVHLWLVWUHDWHGDVDORZGHQVLW\LPDJH)LJXUHVKRZV
Otherwise, it is treated as a low-density image. Figure 3 shows
WKHGLVWULEXWLRQRIDQLPDOVGHWHFWHGE\0HJD'HWHFWRU9LQHDFK
the distribution of animals detected by MegaDetector V4 in each
LPDJH,QERWKWUDLQLQJGDWDVHWDQGWHVWGDWDVHWLQPRUHWKDQ
image. In both training dataset and test dataset, in more than 40%
RIWKHLPDJHV0HJD'HWHFWRUGLGQRWGHWHFWDQLPDOV$QGLQPRVW
of the images MegaDetector did not detect animals. And in most
RIWKHSLFWXUHVZKHUHDQLPDOVZHUHGHWHFWHG0HJD'HWHFWRURQO\
of the pictures where animals were detected, MegaDetector only
GHWHFWHGRQHDQLPDO,QWKHWUDLQLQJGDWDVHWLPDJHVZLWK
detected one animaL In the training dataset, 5 , 1 27 images with
PRUHWKDQDQLPDOVZHUHGHWHFWHGDFFRXQWLQJIRUZKLOH
more than 1 0 animals were detected, accounting for 2. 5%, while
LQWKHWHVWGDWDVHWWKLVQXPEHUZDVDFFRXQWLQJIRU
in the test dataset, this number was 368, accounting for 0.6%.
(a) (b)
)LJXUH5HVXOWFRPSDULVRQRI3URSRVDO'HQVLW\%DVHGILOWHULQJ
Figure 4. Result comparison of Proposal-Density-Based filtering (b)E DQG
and
ILOWHUHGE\FRQILGHQFH on high-density images.
D RQKLJKGHQVLW\LPDJHV
filtered by confidence 0.95 (a)

� ��

� ,.,., i: 2) Low-density
/RZGHQVLW\LPDJHVILOWHULQJ

I
f •soo images filtering
! IWJ
7KH
The ILOWHULQJ
filtering PHWKRG
method RQ on ORZGHQVLW\
low-density LPDJHV
images LVis PRUH
jE ...,
i: FRPSOLFDWHG
complicated. ,W It LV
is QRWD
not a JRRGLGHD
good idea WRUHO\
more
to rely RQ,R8WKUHVKROGWR
I II on loU threshold to
. ...
. ...

' I I . I - . ' I I I - . I ILOWHURXW)DOVH3RVLWLYHSURSRVDOVEHFDXVHWKHUHDUHQRWVRPDQ\


filter out False Positive proposals because there are not so many
. ' ' '

Number of11N&ft In �equtna


. ' ' ' ,

Number of I�NCrS In leQIH!nte


.
RYHUODSSLQJEER[HV2XUPHWKRGLVEDVHGRQWKHDVVXPSWLRQWKDW
overlapping bboxes. Our method is based on the assumption that
SURSRVDOVZLWKRYHUODSSLQJGXSOLFDWLRQVKDYHPRUHSRVVLELOLW\
proposals with overlapping duplications have more possibility
D  (b)
(a) E
)LJXUH'LVWULEXWLRQRIGHWHFWHGDQLPDOVE\0HJD'HWHFWRU9LQHDFK
WR
to EH
be 73 7UXH 3RVLWLYH
TP (True Positive) WKDQ
than WKRVH
those ZLWKRXW
without DQ\
any ODSSLQJ
lapping
Figure 3. Distribution of detected animals by MegaDetector V4 in each
LPDJHRIL:LOG&DPWUDLQLQJGDWDVHW D WHVWGDWDVHW
image of iWildCam 2022 training dataset (a), E 
test dataset (b).
GXSOLFDWLRQV$QGPRUHGXSOLFDWLRQOHDGWRDKLJKHUSRVVLELOLW\
duplications. And more duplication lead to a higher possibility
 RI
of  73
TP. :H
We VHW
set 
0 . 5 DV
as WKH
the WKUHVKROG
threshold RIof SURSRVDO
proposal GXSOLFDWLRQV
duplications.
% Proposal-Density-Based
B. 3URSRVDO'HQVLW\%DVHGILOWHULQJ)LOWHU'HWHFWRU
filtering - FilterDetector $IWHU
After IHZ
few WLPHV
times RIof H[SHULPHQWV
experiments, ZH we IRXQG
found WKDW
that WKLV
this DGDSWLYH
adaptive
%DVHGRQRXUREVHUYDWLRQWKDWWKH0HJD'HWHFWRUEHKDYHV FRQILGHQFH
confidence WKUHVKROG
threshold SHUIRUPV
performs EHWWHU
better WKDQ
than D
a IL[HG
fixed WKUHVKROG
threshold
Based on our observation that the MegaDetector behaves
GLIIHUHQWO\RQLPDJHVZLWKORZRUKLJKGHQVLW\RIDQLPDOVZH  7KHHYDOXDWLRQPHWULFRIDGDSWLYHFRQILGHQFHWKUHVKROG
(0.95). The evaluation metric of adaptive confidence threshold
differently on images with low or high density of animals, we
GHVLJQD3URSRVDO'HQVLW\%DVHGILOWHULQJPHWKRGZKLFK LVVKRZQEHORZ
is shown below.
design a Proposal-Density-Based filtering method which
Ʌ൫ ndup(pi))
8(݊݀‫݌ݑ‬ሺ‫݅݌‬ሻ ൯ ൌ
= ‫ܺܣܯ‬ሺͲǤͻͷ
MAX(0.95 െ ߙ
a‫כ‬*
݊݀‫݌ݑ‬ሺ‫݅݌‬ሻ ǡ Ͳሻ
- 0) ndup(pi)•

941

Authorized licensed use limited to: Politecnico di Milano. Downloaded on May 18,2024 at 09:34:49 UTC from IEEE Xplore. Restrictions apply.
,QWKLVHYDOXDWLRQPHWULFܲLVWKHFROOHFWLRQRIDOOSURSRVDOV
In this evaluation metric, P is the collection of all proposals
IURP Megadetector
from 0HJDGHWHFWRU and DQGܲ P;௜ LV WKH ith
is the LWK proposal
SURSRVDO of ܲ ߙ
RI P . a LV WKH
is the
FRHIILFLHQW of
coefficient RI confidence
FRQILGHQFH reductionUHGXFWLRQ and ݊ௗ௨௣ LV
DQG ndup QXPEHU of
is number RI
GXSOLFDWLRQV of
duplications P;௜ 
RI ܲ ,Iܲ
. If P;௜ LV VHOHFWHG as
is selected DV final
ILQDO predictions,
SUHGLFWLRQV all
DOO of
RI
RWKHUGXSOLFDWHGSURSRVDOVZLOOEHUHPRYHGIURPܲ7KHUHVXOWV
other duplicated proposals will be removed from P . The results
VKRZWKDWZKHQߙ
show that when a = ൌ 0.ͲǤʹZHJHWWKHEHVWSXEOLFVFRUHZKLFKLV
2, we get the best public score which is
Figure
0.247. )LJXUH  LV an
5. is DQexample
H[DPSOH of RI our
RXU adaptive
DGDSWLYH filtering
ILOWHULQJ score
VFRUH
PHWKRG
method.
(a) (b)
)LJXUH6XFFHVVIXOFDVHRIGHWHFWLRQUHVXOWFRPSDULVRQEHWZHHQ
Figure 6. Successful case of detection result comparison between
0HJD'HWHFWRU(a),
MegaDetector D DQG0DVN5&11
and Mask RCNN (b).E 


(a) (b)
)LJXUH5HVXOWFRPSDULVRQRI3URSRVDO'HQVLW\%DVHGILOWHULQJ
Figure E DQG
5. Result comparison of Proposal-Density-Based filtering (b) and )LJXUH)DLOHGFDVHRIGHWHFWLRQUHVXOWFRPSDULVRQEHWZHHQ0HJD'HWHFWRU
Figure 7. Failed case of detection result comparison between MegaDetector
ILOWHUHGE\FRQILGHQFH
filtered D RQORZGHQVLW\LPDJHV
by confidence 0.95 (a) on low-density images. D DQG0DVN5&11
(a), and Mask RCNN (b).E 
 
& Deep
C. 'HHS/HDUQLQJ%DVHG(QVHPEOH0HWKRG'/('HWHFWRU
Learning Based Ensemble Method - DLEDetector FRQILGHQFH score
confidence VFRUH not
QRW less
OHVV than
WKDQ 0 1H[W we
. 5 . Next, ZH perform
SHUIRUP data
GDWD
 Our
2XU second
VHFRQG method
PHWKRG DLEDetector
'/('HWHFWRU tried
WULHG to
WR strengthen
VWUHQJWKHQ the
WKH FOHDQLQJRQWKHFURSSHGLPDJHVWRUHPRYHLPDJHVWKDWDUHWRR
cleaning on the cropped images to remove images that are too
GHWHFWLRQ results
detection UHVXOWV through
WKURXJK the
WKH combination
FRPELQDWLRQ of
RI multiple
PXOWLSOH neural
QHXUDO VPDOODOPRVWSXUHEODFNRUVHYHUHO\LPEDODQFHGLQDVSHFWUDWLR
small, almost pure black, or severely imbalanced in aspect ratio.
QHWZRUNV)RUKLJKGHQVLW\LPDJHVZHLPSOHPHQWHGDQGWUDLQHG
networks. For high density images, we implemented and trained $IWHUFOHDQLQJFURSSHGLPDJHVDUHPDUNHGDVDQLPDO
After cleaning, 1 88962 cropped images are marked as animal
WZRVXSSRUWPRGHOVWRDVVLVWWKHGHWHFWLRQRI0HJD'HWHFWRU
two support models to assist the detection of MegaDetector. DQGXVHGIRUELQDU\FODVVLILFDWLRQPRGHOWUDLQLQJWRJHWKHUZLWK
and used for binary classification model training together with
 Detection
1) 'HWHFWLRQ model
PRGHOfIRU KLJK density
or high GHQVLW\ images:
LPDJHV In,Q order
RUGHU to
WR  background
200000 EDFNJURXQG images.
LPDJHV We:H trained
WUDLQHG aD Resnet50
5HVQHW with
ZLWK
LPSURYH the
improve WKH performance
SHUIRUPDQFH of RI the
WKH detection
GHWHFWLRQ on RQ high
KLJK density
GHQVLW\ OHDUQLQJUDWHDQGHSRFKQXPEHU
learning rate 0.0001 and epoch number 50.
LPDJHV we
images, ZHtrained
WUDLQHG aD detection
GHWHFWLRQ model
PRGHO based
EDVHG onRQ Mask-RCNN
0DVN5&11   Bounding
3) %RXQGLQJ Box
%R[ Ensemble
(QVHPEOH with
ZLWK :%)
WBF: In ,Q this
WKLV step,
VWHS we
ZH
XVLQJ'HWHFWURQ>@IRUKLJKGHQVLW\LPDJHV
using Detectron2 [ 1 3] for high density images. DSSO\ the
apply WKH WBF
:%) algorithm
DOJRULWKP toWR fuse
IXVH theWKH detection
GHWHFWLRQ results
UHVXOWV of
RI
)LUVW
First, we ZH take
WDNH the
WKH high-density
KLJKGHQVLW\ images
LPDJHV from
IURP the
WKH entire
HQWLUH 0HJD'HWHFWRUDQGRXUPRGHO7KH,R8WKUHVKROGXVHGE\WKH
MegaDetector and our model. The loU threshold used by the
WUDLQLQJGDWDVHWDVWKHWUDLQLQJVHWIRUWKHGHWHFWLRQPRGHO)RU
training dataset as the training set for the detection model. For :%)DOJRULWKPLVWKHFRQILGHQFHWKUHVKROGLVDQGWKH
WBF algorithm is 0.5, the confidence threshold is 0.5, and the
DQLPDJHOHWWKHQXPEHURIEER[HVGHWHFWHGE\0HJD'HWHFWRU
an image, let the number of bboxes detected by MegaDetector IXVLRQZHLJKWRIWKHWZRPRGHOVLV
fusion weight of the two models is 1 : 1 .
ZLWKFRQILGHQFHVFRUHODUJHUWKDQEH݊WKHSL[HOFRYHUDJH
with confidence score larger than 0.95 be n, the pixel coverage   Sequential
4) 6HTXHQWLDO$QDO\VLV,QVHTXHQWLDODQDO\VLVVWHSIRUORZ
Analysis: In sequential analysis step, for low­
RIWKHVHEER[HVEHܴ
of these bboxes be Rcovered'௖௢௩௘௥௘ௗ LIWKHQXPEHURIGHWHFWHGEER[HV
if the number of detected bboxes GHQVLW\ images,
density LPDJHV if
LI the
WKH time
WLPH interval
LQWHUYDO between
EHWZHHQ two
WZR adj
DGMDFHQW
acent
n݊LVQRWOHVVWKDQDQGܴ
is not less than 5 and Rcavered௖௢௩௘௥௘ௗ LVQRWOHVVWKDQZHDGGWKH
is not less than 0. 1 , we add the LPDJHVLVODUJHUWKDQVDQGWKHQXPEHURIDQLPDOVGHWHFWHG
images is larger than 20s, and the number of animals detected
LPDJH to
image WR the
WKH training
WUDLQLQJ dataset.
GDWDVHW The
7KH size
VL]H of
RI the
WKH ILQDO WUDLQLQJ
final training LVGLIIHUHQWZHDVVXPHWKDWWKHDQLPDOVLQWKHWZRLPDJHVDUH
is different, we assume that the animals in the two images are
GDWDVHWLV
dataset is 2596. IURPGLIIHUHQWJURXSVDQGODWHUZHDGGXSWKHDQLPDOVLQHDFK
from different groups, and later we add up the animals in each
'XULQJ WUDLQLQJ we
During training, ZH use
XVH the
WKH bbox
EER[ detected
GHWHFWHG by E\ the
WKH JURXSDVWKHILQDOFRXQWLQJUHVXOW)RUKLJKGHQVLW\LPDJHVZH
group as the final counting result. For high-density images, we
0HJD'HWHFWRUZLWKDFRQILGHQFHVFRUHQRWOHVVWKDQDVWKH
MegaDetector with a confidence score not less than 0.5 as the FRQVLGHUDQLPDOVLQWKHLPDJHVDUHDOZD\VIURPWKHVDPHJURXS
consider animals in the images are always from the same group.
ODEHO$0DVN5&11PRGHOLVWUDLQHGZLWK'HWHFWURQXVLQJ
label. A Mask R-CNN model is trained with Detectron2, using $OJRULWKP shows
Algorithm2 VKRZVthe
WKH pseudocode
SVHXGRFRGH of RI our
RXU sequential
VHTXHQWLDO analysis
DQDO\VLV
WKHSUHWUDLQHGZHLJKWVRI0DVN5&115)31DVDVWDUWLQJ
the pretrained weights of Mask R-CNN R50-FPN as a starting DOJRULWKP
algorithm.
SRLQW with
point, ZLWK learning
OHDUQLQJ rate
UDWH 0.00025,
 andDQG maximum
PD[LPXP number
QXPEHU of RI
LWHUDWLRQV 50000.
iterations  Figure
)LJXUH 6 shows
VKRZV aD success
VXFFHVV case
FDVH that
WKDW our
RXU $OJRULWKP6HTXHQWLDO$QDO\VLV
Algorithm 2 Sequential Analysis
GHWHFWRUVXFFHVVIXOO\GHWHFWHGDQRFFOXGHGELUGDQG)LJXUHLV
detector successfully detected an occluded bird, and Figure 7 is '^'
D: {D1:L$QLPDOVGHWHFWHGIRULWKLPDJHLQWKHVHTXHQFH`
Animals detected for ith image in the sequence.}
aDIDLOHGFDVHZKHUHVRPHVFDWWHUHGELUGVZHUHQRWVXFFHVVIXOO\
failed case, where some scattered birds were not successfully 7^7
T: {T1:L7DNHQWLPH
Taken time (s) V RILWKLPDJHVLQWKHVHTXHQFH`
of ith images in the sequence.}
$� 
A O
GHWHFWHG Later,
detected. /DWHU after
DIWHU applying
DSSO\LQJ WBF:%) algorithm
DOJRULWKP to WR fuse
IXVH the
WKH 0� '
M D,
GHWHFWLRQUHVXOWVIURPWZRGHWHFWRUVWKHSUREOHPVROYHG
detection results from two detectors, the problem solved. IRUL
for 2, . . . , n do
i � «QGR
 Binary
2) %LQDU\&ODVVLILFDWLRQ0RGHO,QRUGHUWRHOLPLQDWHIDOVH
Classification Model: In order to eliminate false LI7if T,L±7
- T,_,L!
>� DQG'
20 and D,L!� 'L
D,_,:
SRVLWLYHV generated
positives JHQHUDWHG by E\ the
WKH detector,
GHWHFWRU we ZH train
WUDLQ aD binary
ELQDU\ $ A � $0
A+ M
0 M � ' D,L
FODVVLILFDWLRQ model
classification PRGHO to WR distinguish
GLVWLQJXLVK between
EHWZHHQ animals
DQLPDOV andDQG HOVH
else:
EDFNJURXQGV By
backgrounds. %\ going
JRLQJ through
WKURXJK theWKH test
WHVW GDWDVHW ZLWK the
dataset with WKH LI'
ifD,L!0
> M:
FODVVLILFDWLRQPRGHOXVHGIRUDQLPDOFODVVLILFDWLRQLQL:LOG&DP
classification model used for animal classification in iWildCam 0 'L
HQGIRU
end for
ZHHVWLPDWHWKDWRIWKHFDWHJRULHVDSSHDUHGLQWKH
202 1 , we estimate that 69 of the 204 categories appeared in the LI$
if A �� 
o:
WHVW dataset.
test GDWDVHW We
:H then
WKHQ pick
SLFN out
RXW images
LPDJHV that
WKDW contain
FRQWDLQ these
WKHVH 69
 UHWXUQ0
retum M
VSHFLHV of
species RI animals
DQLPDOV from
IURP the
WKH training
WUDLQLQJ dataset
GDWDVHW and
DQG cropped
FURSSHG the
WKH HOVH
else:
UHWXUQ$
return A
DQLPDOVEDVHGRQWKHGHWHFWHGEER[HVIURP0HJD'HWHFWRUZLWK
animals based on the detected bboxes from MegaDetector with

942

Authorized licensed use limited to: Politecnico di Milano. Downloaded on May 18,2024 at 09:34:49 UTC from IEEE Xplore. Restrictions apply.
)LQDOO\ our
Finally, RXU two
WZR methods
PHWKRGV ranked
UDQNHG No.
1R DQG No.
1 and 1R LQ the
3 in WKH
FRPSHWLWLRQ the
competition, WKH leaderboard
OHDGHUERDUG can
FDQ beEH checked
FKHFNHG at DW
KWWSVZZZNDJJOHFRPFRPSHWLWLRQVLZLOGFDP
htt
ps://www.kaggle .com/competitions/iwildcam2022-
IJYFOHDGHUERDUG
fgvc9/leaderboard .

9,CONCLUSION
VI. &21&/86,21$1' )8785(:25.
AND FUTURE WORK 

,QWKLVSDSHUZHIRXQGWKDWDNH\LQIRUPDWLRQRILPSURYLQJ
In this paper, we found that a key information of improving
DQLPDO counting
animal FRXQWLQJ based
EDVHG on
RQ MegaDetector
0HJD'HWHFWRU was ZDVthe
WKH density
GHQVLW\ ofRI
DQLPDOVin
animals LQthe
WKHimage,
LPDJHand
DQGproposed
SURSRVHGtwoWZRnew
QHZmethods
PHWKRGVto WR do
GR
FDPHUDtrap
camera WUDSimages
LPDJHVanimal
DQLPDOcounting
FRXQWLQJbased
EDVHGon
RQMegaDetector
0HJD'HWHFWRU
)LJXUH2YHUDOOSLSHOLQHRI'/('HWHFWRU
Figure 8. Overall pipeline of DLEDetector 9)URPWKHSHUIRUPDQFHRIRXUILOWHUEDVHGPHWKRGLWFDQEH
V 4. From the performance of our filter-based method, it can be

VHHQWKDWWKHDQLPDOFRXQWLQJDFFXUDF\RI0HJD'HWHFWRUFDQEH
seen that the animal counting accuracy of MegaDetector can be
  Overall
5) 2YHUDOOanimal
DQLPDOcounting
FRXQWLQJprocess:
SURFHVVFigure
)LJXUH 8 shows
VKRZVtheWKH
VLJQLILFDQWO\improved
significantly LPSURYHGwhen
ZKHQusing
XVLQJdifferent
GLIIHUHQWfilters
ILOWHUVfor
IRUimages
LPDJHV
RYHUDOOflow
overall IORZof RIour
RXUanimal
DQLPDOcmmting
FRXQWLQJprocess.
SURFHVVAfter
$IWHUgetting
JHWWLQJaD ZLWKGLIIHUHQWDQLPDOGHQVLWLHVFRPELQHGZLWK106DOJRULWKP
with different animal densities combined with NMS algorithm
VHTXHQFH of
sequence RI images,
LPDJHVweZH first
ILUVW check
FKHFNthe
WKHdensity
GHQVLW\ level
OHYHO of
RIthe
WKH WRremove
to UHPRYHredundancy.
UHGXQGDQF\Furthermore,
)XUWKHUPRUHourRXUdeep
GHHSlearning-based
OHDUQLQJEDVHG
LPDJHVLQWKHVHTXHQFH)RUKLJKGHQVLW\LPDJHVZHILUVWJHW
images in the sequence. For high-density images, we first get DSSURDFKVKRZVWKDWWUDLQLQJWKHGHWHFWLRQPRGHODORQHIRUKLJK
approach shows that training the detection model alone for high­
WKHGHWHFWLRQUHVXOWVRIWKHLPDJHVIURPWKH0HJD'HWHFWRUDQG
the detection results of the images from the MegaDetector and GHQVLW\images,
density LPDJHVcombining
FRPELQLQJthe WKHbinary
ELQDU\classification
FODVVLILFDWLRQmodel
PRGHOto WR
RXUGHWHFWLRQ
our PRGHOWKHQFURSWKHGHWHFWHGDQLPDOVDQG
detection model, then crop the detected animals and feed IHHG UHPRYH false
remove IDOVH positives
SRVLWLYHV in
LQ the
WKH detection
GHWHFWLRQ model
PRGHO and
DQG the
WKH
LQWRRXUELQDU\FODVVLILFDWLRQPRGHOWRHOLPLQDWHIDOVHSRVLWLYHV
into our binary classification model to eliminate false positives. 0HJD'HWHFWRUand
MegaDetector, DQGthen
WKHQusing
XVLQJthe
WKH:)%DOJRULWKP
WFB algorithm to WRfuse
IXVHthe
WKH
1H[WZHIXVHWKHILOWHUHGEER[HVZLWKWKH:%)DOJRULWKPWR
Next, we fuse the filtered bboxes with the WBF algorithm to GHWHFWLRQ results
detection UHVXOWV from
IURP the
WKH two
WZR detectors,
GHWHFWRUV can
FDQ also
DOVR improve
LPSURYH
JHWWKHILQDOGHWHFWLRQUHVXOW)RUORZGHQVLW\LPDJHVZHZLOO
get the final detection result. For low-density images, we will DQLPDOFRXQWVSHUIRUPDQFH
animal counts performance.
GLUHFWO\H[WUDFWWKHGHWHFWLRQUHVXOW
directly extract the detection result withZLWKFRQILGHQFHVFRUHQRW
confidence score not $&.12:/('*0(17
ACKNOWLEDGMENT
OHVVWKDQRI0HJD'HWHFWRUDQGXVHWKH:%)DOJRULWKPWR
less than 0.95 of MegaDetector, and use the WBF algorithm to
IXVHWKHREWDLQHGEER[HVDVWKHILQDOGHWHFWLRQUHVXOW)LQDOO\
fuse the obtained bboxes as the final detection result. Finally,
7KLVZRUNLVSDUWLDOO\VXSSRUWHGE\JUDQWVIURPWKH0LVVRXUL
This work is partially supported by grants from the Missouri
'HSDUWPHQWRI&RQVHUYDWLRQ
Department of Conservation.
ZHDSSO\RXUVHTXHQFHDQDO\VLVDOJRULWKPDQGREWDLQWKHILQDO
we apply our sequence analysis algorithm and obtain the final
DQLPDOFRXQWIRUWKHVHTXHQFH
animal count for the sequence. 5()(5(1&(6
REFERENCES
>@ A.
[I] $)2 &RQQHOO-'1LFKROVDQG.8.DUDQWK&DPHUDWUDSVLQDQLPDO
F. O'Connell, J. D. Nichols, and K. U. Karanth, Camera traps in animal
9 EXPERIMENT
V. (;3(5,0(17RESULTS
5(68/76 HFRORJ\PHWKRGVDQGDQDO\VHV6SULQJHU
ecology: methods and analyses. Springer, 20 1 1 .
'XHWRWKHODFNRIWUXHODEHOVIRUERWKWUDLQLQJGDWDVHWDQG
Due to the lack of true labels for both training dataset and >@ F)7UROOLHW&9HUPHXOHQ0&+X\QHQDQG$+DPEXFNHUV8VHRI
[2] . Trolliet, C. Vermeulen, M.-C. Huynen, and A. Hambuckers, "Use of
WHVW dataset,
test GDWDVHW the
WKH experimental
H[SHULPHQWDO results
UHVXOWV are
DUH generated
JHQHUDWHG through
WKURXJK FDPHUDWUDSVIRUZLOGOLIHVWXGLHVDUHYLHZ%LRWHFKQRORJLH$JURQRPLH
camera traps for wildlife studies: a review," Biotechnologie, Agronomie,
VXEPLVVLRQto
submission WRthe
WKHiWildCam
L:LOG&DPcompetition.
FRPSHWLWLRQIn ,QiWildCam
L:LOG&DP2022, 6RFLpWpHW(QYLURQQHPHQWYROQR
Societe et Environnernent, vol. 1 8, no. 3, 2014.
WKHWHVWGDWDVHWLVGLYLGHGLQWRWZRSDUWVFDOOHG
the test dataset is divided into two parts, called ' SXEOLFVFRUH
public score' DQG
and >@ M.
[3] 0$7DEDNHWDO0DFKLQHOHDUQLQJWRFODVVLI\DQLPDOVSHFLHVLQFDPHUD
A. Tabak et a!., "Machine learning to classify animal species in camera
SULYDWHscore'.
VFRUH Both
%RWKscores
VFRUHVare
DUHcalculated
FDOFXODWHGwith
ZLWKapproximately
DSSUR[LPDWHO\ WUDS images:
trap LPDJHV Applications
$SSOLFDWLRQV in LQ ecology,"
HFRORJ\ Methods
0HWKRGV in
LQ Ecology
(FRORJ\ and
DQG
'private
(YROXWLRQYROQRSS
Evolution, vol. 1 0, no. 4, pp. 5 85-590, 2019.
RIWKHWHVWGDWD7KHSXEOLFVFRUHLVVKRZHGRQOHDGHUERDUG
50% of the test data. The public score is showed on leader board
>@ M.
[4] 0 S.
6 Norouzzadeh
1RURX]]DGHK et HWa!.,
DO "Automatically
$XWRPDWLFDOO\ identifying,
LGHQWLI\LQJ counting,
FRXQWLQJ and
DQG
GXULQJthe
during WKHcompetition,
FRPSHWLWLRQitLWcan
FDQbeEHtreated
WUHDWHGas
DVaDvalidation
YDOLGDWLRQscore
VFRUH GHVFULELQJ wild
describing ZLOG animals
DQLPDOV inLQ camera-trap
FDPHUDWUDS images
LPDJHV with
ZLWK deep
GHHS learning,"
OHDUQLQJ
XVHGto
used WRfinetune
ILQHWXQHWKH PHWKRG7KHSULYDWH
the method. The private scoreVFRUHisLVshown
VKRZQafter
DIWHU 3URFHHGLQJVRIWKH1DWLRQDO$FDGHP\RI6FLHQFHVYROQRSS
Proceedings of the National Academy of Sciences, vol. 1 1 5, no. 25, pp.
VXEPLVVLRQ deadline,
submission GHDGOLQH which
ZKLFK isLV used
XVHG as
DV aD real
UHDO test
WHVW score
VFRUH to
WR ((
E5716-E5725, 20 1 8 .
HYDOXDWHWKHSHUIRUPDQFHRIGLIIHUHQWPHWKRGV
evaluate the performance of different methods. [>@ 6%HHU\*9DQ+RUQ20DF$RGKDDQG33HURQD7KHL:LOG&DP
5 ] S. Beery, G . Van Hom, 0. Mac Aodha, and P . Perona, "The iWildCam
FKDOOHQJHGDWDVHWDU;LYSUHSULQWDU;LY
20 1 8 challenge dataset," arXiv preprint arXiv : l 904.05986, 2019.
7$%/(1. MEAN
TABLE 0($1$%62/87((5525)25,:,/'&$02022
ABSOLUTE ERROR FOR IWILDCAM 7(67
TEST >@ S.
[6] 6%HHU\ $$JDUZDO(&ROHDQG9%LURGNDU7KHL:LOG&DP
Beery, A. Agarwal, E. Cole, and V. Birodkar, "The iWildCam 2021
'$7$6(7
DATASET FRPSHWLWLRQGDWDVHWDU;LYSUHSULQWDU;LY
competition dataset," arXiv preprint arXiv:2105.03494, 2021 .
>@ R.
[7] 5Solovyev,
6RORY\HY W.
: Wang,
:DQJ and
DQG T.
7 Gabruseva,
*DEUXVHYD "Weighted
:HLJKWHGboxes
ER[HVfusion:
IXVLRQ
0HWKRGV
Methods 3XEOLF6FRUH
Public Score 3ULYDWH6FRUH
Private Score
(QVHPEOLQJboxes
Ensembling ER[HVfrom
IURPdifferent
GLIIHUHQWobject
REMHFWdetection
GHWHFWLRQmodels,"
PRGHOVImage
,PDJHand
DQG
)LOWHU'HWHFWRU
FilterDetector 
0.247 
0.240 9LVLRQ&RPSXWLQJYROS
Vision Computing, vol. 107, p. 104 1 1 7, 2021.
>@ S.
[8] 6&KULVWLQe+HUYHWDQG1/HFRPWH$SSOLFDWLRQVIRUGHHSOHDUQLQJ
Christin, E. Hervet, and N. Lecomte, "Applications for deep learning
'/('HWHFWRU
DLEDetector 
0.255 
0.247 LQHFRORJ\0HWKRGVLQ(FRORJ\DQG(YROXWLRQYROQRSS
in ecology, " Methods in Ecology and Evolution, vol. 10, no. 10, pp. 1632-

1644, 2019.
%HQFKPDUNPD[QXPRI
Benchmark: max num of
0HJD'HWHFWRUYEER[HV   >@ S.
[9] 6Beery,
%HHU\D.
'Morris,
0RUULVand
DQGS.
6Yang,
<DQJ"Efficient
(IILFLHQWpipeline
SLSHOLQHfor
IRUcamera
FDPHUDtrap
WUDS
MegaDetector v4 bboxes 0.276 0.264
FRQILGHQFH!  LPDJHUHYLHZDU;LYSUHSULQWDU;LY
image review, " arXiv preprint arXiv: l907.06772, 2019.
(confidence>0.95)
%HQFKPDUNL:LOG&DP
Benchmark: iWildCam 2021 >@ S.
[10] 65HQ.+H5*LUVKLFNDQG-6XQ)DVWHUUFQQ7RZDUGVUHDOWLPH
Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time

0.283 
0.264 REMHFW detection
object GHWHFWLRQwith
ZLWKregion
UHJLRQproposal
SURSRVDOnetworks,"
QHWZRUNV Advances
$GYDQFHV in
LQneural
QHXUDO
ZLQQHU
w1nner
%HQFKPDUNPD[QXPRI LQIRUPDWLRQSURFHVVLQJV\VWHPVYRO
information processing systems, vol. 28, 20 1 5 .
Benchmark: max num of
0HJD'HWHFWRUYEER[HV
MegaDetector v3 bboxes 
0.299 
0.289 [>@
11] K .+H;=KDQJ65HQDQG-6XQ'HHSUHVLGXDOOHDUQLQJIRULPDJH
. He, X . Zhang, S. Ren, and J . Sun, "Deep residual learning for image
FRQILGHQFH!
(confidence > 0.98)  UHFRJQLWLRQLQ3URFHHGLQJVRIWKH,(((FRQIHUHQFHRQFRPSXWHUYLVLRQ
recognition," in Proceedings of the IEEE conference on computer vision
%HQFKPDUNL:LOG&DP
Benchmark: iWildCam 2020 DQGSDWWHUQUHFRJQLWLRQSS
and pattern recognition, 2016, pp. 770-778.

0.467 
0.443
ZLQQHU
w1nner >@ K.
[12] . He,
+H G.
* Gkioxari,
*NLR[DUL P.3 Dollar,
'ROOiU and
DQG R.
5 Girshick,
*LUVKLFN "Mask
0DVN r-cnn,"
UFQQ in
LQ
3URFHHGLQJVof
Proceedings RIthe
WKHIEEE
,(((international
LQWHUQDWLRQDOconference
FRQIHUHQFHon
RQcomputer
FRPSXWHUvision,
YLVLRQ
7DEOH 1 shows
Table VKRZV the
WKHpublic
SXEOLF score
VFRUH and
DQGprivate
SULYDWH score
VFRUH of
RI our
RXU SS
2017, pp. 2961 -2969.
PHWKRGVWRJHWKHUZLWKWKHSHUIRUPDQFHRIGLIIHUHQWEHQFKPDUN
methods together with the performance of different benchmark >@ Y.
[13] <:X$.LULOORY)0DVVD:</RDQG5*LUVKLFN'HWHFWURQ
Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, "Detectron2,"
PHWKRGV$VWKHJRDOWKLV\HDULVKDUGRQO\WHQJURXSVDFKLHYHG
methods. As the goal this year is hard, only ten groups achieved 
2019.
EHWWHUperformance
better SHUIRUPDQFHcompared
FRPSDUHGtoWRthe
WKHbest
EHVWbenchmark
EHQFKPDUNmethod.
PHWKRG

943
943

Authorized licensed use limited to: Politecnico di Milano. Downloaded on May 18,2024 at 09:34:49 UTC from IEEE Xplore. Restrictions apply.

You might also like