Salient Object Detection With Importance Degree
Salient Object Detection With Importance Degree
20, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3014886
ABSTRACT In this article, we introduce salient object detection with importance degree (SOD-ID), which is
a generalized technique for salient object detection (SOD), and propose an SOD-ID method. We define SOD-
ID as a technique that detects salient objects and estimates their importance degree values. Hence, it is more
effective for some image applications than SOD, which is shown via examples. The definition, evaluation
procedure, and data collection for SOD-ID are introduced and discussed, and we propose its evaluation
metric and data preparation, whose validity is discussed with the simulation results. Moreover, we propose
an SOD-ID method, which consists of three technical blocks: instance segmentation, saliency detection, and
importance degree estimation. The saliency detection block is proposed based on a convolutional neural
network using the results of the instance segmentation block. The importance degree estimation block
is achieved using the results of the other blocks. The proposed method accurately suppresses inaccurate
saliencies and estimates the importance degree for multi-object images. In the simulations, the proposed
method outperformed state-of-the-art methods with respect to the F-measure for SOD; and Spearman’s and
Kendall rank correlation coefficients, and the proposed metric for SOD-ID.
INDEX TERMS Saliency detection, salient object detection, instance segmentation, convolutional neural
network (CNN), rank correlation metric.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 147059
Y. Umeki et al.: SOD-ID
based on fully convolutional network (FCN) architectures five blocks that each have two or three convolutional layers
have successfully reduced inaccurate detection. Liu and J and a pooling layer. The FCN architecture is constructed by
proposed a deep hierarchical saliency network that realizes replacing the last layer of the VGG architecture with a one-
coarse-to-detailed estimation for salient objects [21]. Another channel convolutional layer. Some methods that apply merge
method adopts a recurrent network to consider the connection and convolution layers to the FCN obtain superior results to
of salient pixels [16]. past methods because the layers realize both shallow and deep
Although a major SOD dataset contains the importance convolutions; thereby, they can capture both global and local
degree for objects [10], existing methods produce binary features [6], [38].
results; that is, they classify detected objects into salient or
non-salient. The PASCAL-S dataset provides integer saliency C. LOCATION-BIASED DETECTION
values in [0, 255] with object contours. However, SOD meth- In SD and SOD, the location assumption is generally used
ods disregard the priority of each object, and instead focus on as prior information [7], [11], [13], [15], [24], [36]. Photog-
estimating the contours of salient objects. Because detecting raphers generally center interesting objects in images, and
salient objects and their correct contours is a challenging task, thus natural images often present salient areas at their center.
researchers generally propose the estimation of the priority of To exploit this tendency, some SOD methods apply higher
each detected object as future work. weights to salient pixels closer to the center of images [13],
Semantic segmentation is a technique that identifies cate- [24]. Following this strategy, in an SD method, a location-
gories to which pixels belong, such as human, tree, and car biased convolution layer was introduced in the FCN, which
[37]–[39]. Traditional semantic segmentation uses contour obtained superior results [7].
detection and the histogram of oriented gradients feature [37].
Recently, the FCN, which is a breakthrough approach for D. RSOD
semantic segmentation, has been used to successfully detect The CNN model detects the contours of salient objects and
image regions [38]. However, semantic segmentation meth- estimates their multiple saliency values because of its archi-
ods cannot separate objects that belong to the same category. tecture [25]. The architecture recursively calculates saliency
Instance segmentation is derived from semantic segmen- maps from coarse to fine levels, and finally fuses the resultant
tation and can identify not only object classes but also their saliency maps. The calculation units are learned using the
instances [40], [41]. A basic instance segmentation method multi-stage GT of the saliency maps that is generated from
uses an FCN to detect small windows that each include one PASCAL-S by thresholding its saliency maps at various val-
object [41]. Another method uses the recurrent architecture to ues. Therefore, the fused maps have various pixel values that
iteratively detect object regions based on previous detection reflect saliency levels from coarse to fine.
results [40]. Although instance segmentation and SOD simi- As an additional process, the method estimates the impor-
larly detect object contours, instance segmentation disregards tance score for each salient object from the output saliency
their importance; therefore, the purposes of the approaches map [25]. In basic terms, the score value is calculated by
has been shown to be different. averaging the saliency values of pixels within the object as
i∈X χi
P
III. FUNDAMENTALS OF SOD Rank(S(X )) = , (1)
NX
A. PASCAL-S DATASET
where S, X , X , χi , and NX denote a predicted saliency map,
The PASCAL-S dataset contains images, their fixation data,
candidate salient object, set of indices of pixels that belong
and their SOD maps with multiple values that can be used
to X , saliency value of the i-th pixel, and total number of
ground truth (GT) for SOD-ID [10]. It contains 850 natural
pixels in X , respectively. It is unknown whether the calculated
images whose full segmentation masks are provided in [42].
values are normalized because this is not clearly described in
The fixation data were obtained by applying an eye-tracker to
[25]. Note that the authors used the GT segmentation masks
eight subjects that were instructed to perform a free-viewing
in PASCAL-S in this process.
task for images. In the SOD experiment, 12 subjects were
In experiments, the method simply uses conventional
given images and asked to highlight salient objects by click-
methods for evaluation. Spearman’s rank correlation coef-
ing on them. The pixels of the SOD maps have integer values
ficient [32] is used as the evaluation metric, and the resul-
in [0, 12], and they are linearly normalized in [0, 255] for the
tant scores are linearly normalized in [0, 1]. PASCAL-S
png format. Therefore, we believe that PASCAL-S is an SOD-
without images used in training is directly used for testing
ID dataset with 13 degrees.
the method.
Spearman’s rank correlation coefficient, which is used in the SOD-ID maps. The proposed procedure calculates the
[25], is unsuitable for calculating the importance degree. sum of pixel values within objects in the GT maps of SD,
In this article, we propose an evaluation metric for the and resultant values are considered as their scores of the
importance degree of SOD-ID. As the evaluation met- importance degree, which is defined in one image as
ric for segmentation, conventional methods, for example, P
j∈ sj
the F-measure, can be used. An evaluation metric for SOD-ID Degi = Pi , (4)
is defined as a linear combination of the F-measure and the maxi { j∈i sj }
proposed metric, or the parallel use of them. The proposed
where Degi , sj , and i denote the score of the i-th object, i-th
metric F is defined based on simply combining metrics for
pixel value of the SD map, and a set of indices of pixels within
the correlation and score similarity as
the i-th object, respectively. To produce the SOD-ID map,
F(vp , vt ) = αR(vp , vt ) + (1 − α)I (vp , vt ), (2) pixel values within the i-th object are uniformly set as Degi ,
and the resultant map is linearly quantized using N . Because
where R, I , α, vp , and vt denote the correlation and similarity the GT maps of SD represent the degree of saliency for each
metrics, a balancing free parameter, and vectors for which pixel, the summation values within an object are approxi-
each element is the score value of each object, respectively. mately recognized as the degree of interest for the object.
We use the Kendall rank correlation coefficient as R [33] Similarly, a pixel value within an object in the GT maps of SD
because it straightforwardly evaluates the correlation and is approximately considered as the number of subjects that
therefore is more suitable than Spearman’s rank correlation recognize the object and categorize it as salient, and therefore,
coefficient. For I, we use the squared error and define it as in the case of a large number of subjects, the summation
N procedure is recognized as the same as that of PASCAL-
1 X
I (vp , vt ) = exp(−(vpi − vti )2 /(2σ 2 )), (3) S for SOD mentioned in Section III-A. Based on the above
N assumptions, we believe that the proposed procedure is valid
i=1
for creating SOD-ID datasets.
where N , vpi , and vti denote the number of objects, and the i-th
We experimentally show that the proposed procedure men-
element of vp and vt , respectively, and σ is a free parameter
tioned above has high validity compared with the RSOD
that controls the variance of the Gaussian distribution. R
procedure mentioned in Section III-D [25]. Using these pro-
that outputs real values in [−1, 1] is linearly normalized in
cedures, SOD-ID maps are produced using the full segmen-
[0, 1], I has real values in [0, 1] because of (3), and α is
tation masks and fixation data of PASCAL-S. Table 2 shows
restricted in [0, 1], and hence F outputs real values in [0, 1].
this comparison, where ‘‘Sum.’’ and ‘‘Ave.’’ denote the results
The metric proposition requires much experimental evidence,
of the proposed and RSOD procedures; that is, they show
but because of the limited space in this article, the validity of
values of the evaluation metrics between the SOD maps of
F is briefly shown in Section 6 and a detailed discussion on
PASCAL-S and their resultant maps, respectively. For sim-
this topic remains as future work.
plicity, we use Spearman’s and Kendall rank correlation coef-
ficients as the metrics [32], [33]. From Table 2, the proposed
D. DATASET PREPARATION
procedure is clearly better than the RSOD procedure and thus
To create SOD-ID datasets, the procedure of PASCAL-S
our opinions mentioned above has been shown to be valid.
mentioned in Section III-A is suitable. The segmentation
masks are simply obtained manually, and the importance
TABLE 2. Scores for the estimation methods of the importance degree for
degree is determined as follows: By the strict rules, the sub- the PASCAL-S dataset [10].
jects of experiments are asked to collect and rank interesting
objects in one image. The strict procedure requires several
subjects, but unfortunately, it is a difficult task for them.
By contrast, the procedure of PASCAL-S only asks subjects
to collect interesting objects. For an object, the number of
subjects that recognize it as salient is directly determined as
its values of the importance degree, and to create a GT map V. PROPOSED SOD-ID METHOD
of SOD-ID, pixels within each salient object have their scores A. OVERVIEW
based uniformly on the segmentation mask. If M subjects are The proposed SOD-ID method is briefly shown in Fig. 4. The
applied, the resultant map has M degrees. This is simple and system consists of three technical blocks: instance segmenta-
useful, but a large number of subjects are required to create tion, SD, and importance degree estimation. First, instance
general datasets. segmentation is applied to an input image to detect object
To avoid experiments using subjects, we introduce a prepa- contours, and its arbitrary method can be used here such as
ration procedure for the SOD-ID dataset based on existing that in [40], [41], [47], [48]. Second, the salient regions of the
SD data. As mentioned above, subjective experiments have input image are detected by the proposed CNN method using
the troublesome characteristic of requiring many people and object contours detected in the first block. Finally, using the
large costs. To avoid this, we use existing SD data to produce results of the first and second blocks, the proposed method
outputs an SOD-ID map with N degrees through the estima- correspond to Tables3 (a)–(c), respectively. In Table 3,
tion block of the importance degree. The technical blocks ‘‘Conv.’’, ‘‘Pool.’’, and ‘‘p∗’’ indicate convolution, max pool-
can be independently developed, and therefore the system ing layers, and the pyramid pooling module, respectively. The
provides suitable expandability and serves as a fundamental rectified linear unit [49] is used as the activation function
design of SOD-ID methods. in the convolution layers. A VGG-based method is used to
extract image features in Fig. 5 (a). The results of the first
B. PROPOSED CNN METHOD FOR SD block and the features after Pool.3, Pool.4, and Conv.5-3 are
In this section, we explain the proposed CNN method for SD merged along the channel direction, and input merged signals
in the second block that uses the detected contours of the first into Conv. 6-1. The signals after Conv. 6-2 are transformed
block. The architecture uses the contours as a part of the input using the pyramid pooling module proposed in [39], and the
and extracts their multi-resolution features to estimate the resultant signals are resized to the same size as the signals
saliency values. The loss function imposes different weights after Conv. 6-2. Finally, the resized signals and those after
for object and background regions based on the contours. Conv. 6-2 are merged along the channel direction, and pro-
Note that the proposed CNN method considers location bias cessed through Conv. 7-1, 2, and 3.
similar to conventional SD and SOD methods.
2) LOSS FUNCTION
1) ARCHITECTURE The loss function of the proposed CNN method assigns high
Fig. 5 and Table 3 show the architecture of the proposed and medium weights for salient and object regions, respec-
CNN method and its parameters, respectively. Figs. 5 (a)–(c) tively, and by contrast, low weights to background regions
TABLE 3. Construction details of the proposed CNN architecture. C. ESTIMATION OF THE IMPORTANCE DEGREE
In the proposed method, the estimation block process is
defined similarly to the proposed procedure in Section IV-D.
Object contours are already detected in the first block and
their saliency values are estimated in the second block. In the
third block, the values within one object contour are summed
and the result is its score of the importance degree as given
in (4). Similar to the proposed procedure, SOD-ID maps are
created based on the resultant scores and linearly quantized
with N .
VI. SIMULATION
In this section, we compare the performance of the proposed
method and state-of-the-art methods for SOD and SOD-
ID. We present the comparisons in Section VI-B and VI-C,
respectively, and before that, we discuss the validity of the
proposed metric in Section VI-A by presenting some exam-
ples. For this simulation, we used the instance segmentation
method proposed in [40] in the first block of the proposed
method because it is not recent but has high accuracy. Based
on Section IV-D, we introduced a dataset from the test sets
of COCO and SALICON, which contain images with seg-
mentation masks and their SD maps, respectively, where the
proposed dataset is called a SALICON-based dataset in this
section. Note that the proposed method is also represented by
Prop. in this section.
because they are generally uninteresting. The loss function L
is formulated as A. VALIDITY OF THE PROPOSED METRIC
N φ( xi ) As mentioned in Section IV-C, the validity of the proposed
1 X max φ( xi ) − yi
L(w) = , (5) metric is briefly shown in this section. Table 4 shows scores
N
i=1
β − Oi + yi of pairs of arbitrary vectors in Spearman’s and Kendall rank
correlation coefficients, and the proposed metric. In Table 4,
where yi , xi , Oi , φ(·), and β denote true saliencies, estimated the pairs from the top to the bottom, respectively, indicate
saliencies, object region masks, a normalization function, various scenarios as follows: same rank and slightly different
and a free parameter, respectively. The masks are produced value, slightly different rank and value, same rank and quite
by binarizing signals of the instance segmentation results. different value, and quite different rank and slightly different
φ(·) normalizes the estimated saliency values in [0, 1]. value. As mentioned in Section IV-C, SOD-ID metrics have
We generally set β to 2 or a value that is the maximum to simultaneously evaluate the rank correlation and the value
of Oi + yi . If the i-th pixel is in a salient object, β − similarity. In that sense, from the first and third pairs, the pro-
Oi + yi is a low value, and hence this pixel is assigned a posed metric only satisfies the above property. We observed
high weight. from the second and fourth pairs that the Kendall coefficient
is too sensitive to the rank difference to be used as the SOD-
3) TRAINING ID metric. The fourth pair shows that the rank correlation is
For training, the loss function in Section V-B2 and the training quite different, but its values are almost the same and hence
dataset of COCO and SALICON were used [45], [46]. COCO the importance of objects is also considered to be comparable.
contains natural images and their segmentation masks, and However, the score obtained using Spearman’s coefficient is
SALICON has saliency maps that correspond to them. The rather bad and its weight for the rank correlation and the value
maps were binarized using a threshold value of τ = 0.15, and similarity has been shown to be unbalanced. The proposed
their elements corresponding to background pixels, which
were detected by the masks, were set to zero. Stochastic
gradient descent was used as optimizing, where Nesterov TABLE 4. Correlation scores of pairs of arbitrary vectors.
momentum, the weight decay, and the learning rate were
set to 0.9, 0.5, and 10−3 , respectively [50]. β in the loss
function was set to 2.3, which was experimentally deter-
mined from the ratios of salient, object, and background
regions.
TABLE 5. F-measure scores of the SOD methods for the DUTS dataset [44].
TABLE 6. F-measure scores of the SOD methods for the PASCAL-S dataset [10].
TABLE 7. F-measure scores of the SOD methods for the SALICON-based dataset [45], [46].
TABLE 8. Scores for the estimation of the importance degree for the TABLE 9. Scores for the estimation of the importance degree for the
PASCAL-S dataset [10]. SALICON-based dataset [45], [46].
FIGURE 7. Resultant saliency maps for the SALICON-based dataset [45], [46].
method often detected nothing for DUTS because of its the PASCAL-S and SALICON-based datasets, and the results
above characteristic, as shown in the upper half of Table 5. were evaluated using Spearman’s and Kendall rank correla-
However, the results of Prop. except that case were equivalent tion coefficients and the proposed metric (2), where α and σ
to those of the other methods. Prop. can solve this problem were experimentally set to 0.5 and 2.0, respectively. Clearly,
using an efficient instance segmentation method that accu- the GT and resultant maps were uniformly normalized with
rately detects objects. N = 7.
importance degree. Note that the rows in Tables 8 and 9 [12] R. Zhao, W. Ouyang, H. Li, and X. Wang, ‘‘Saliency detection by multi-
correspond to those in Figs. 6 and 7, respectively. From context deep learning,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2015, pp. 1265–1274.
Table 8 and 9, Prop. clearly outperformed RSOD in terms of [13] N. Tong, H. Lu, X. Ruan, and M.-H. Yang, ‘‘Salient object detection via
the metrics. From Figs. 6 and 7, Prop. accurately estimated bootstrap learning,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
the importance degree of objects. Particularly, in ‘‘Party,’’ (CVPR), Jun. 2015, pp. 1884–1892.
[14] Y. Qin, H. Lu, Y. Xu, and H. Wang, ‘‘Saliency detection via cellular
‘‘Woman,’’ and ‘‘Man,’’ Prop. estimated the importance automata,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
degree of small objects that had low saliency scores and were Jun. 2015, pp. 110–119.
located in highly salient objects. [15] J. Kim, D. Han, Y.-W. Tai, and J. Kim, ‘‘Salient region detection via high-
dimensional color transform and local spatial support,’’ IEEE Trans. Image
Process., vol. 25, no. 1, pp. 9–23, Jan. 2016.
VII. CONCLUSION [16] L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan, ‘‘Saliency detection with
In this article, we introduced SOD-ID via discussing its def- recurrent fully convolutional networks,’’ in Proc. Eur. Conf. Comput. Vis.,
Springer, 2016, pp. 825–841.
inition, significance, dataset condition, and evaluation met- [17] T. Wang, L. Zhang, H. Lu, C. Sun, and J. Qi, ‘‘Kernelized subspace
ric property, and proposed its dataset, metric, and method. ranking for saliency detection,’’ in Proc. Eur. Conf. Comput. Vis., 2016,
The proposed metric consists of the Kendall rank correlation pp. 450–466.
coefficient and mean squared error, and simultaneously eval- [18] L. Zhang, C. Yang, H. Lu, R. Xiang, and M.-H. Yang, ‘‘Ranking saliency,’’
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 9, pp. 1892–1904,
uates the rank correlation and value similarity for SOD-ID. Sep. 2017.
The proposed dataset is generated using the proposed proce- [19] C. Sheth and R. V. Babu, ‘‘Object saliency using a background prior,’’
dure based on the COCO and SALICON datasets. The pro- in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
Mar. 2016, pp. 1931–1935.
posed method of SOD-ID consists of three processing blocks: [20] J. Yang and M.-H. Yang, ‘‘Top-down visual saliency via joint CRF and
instance segmentation, SD, and importance degree estima- dictionary learning,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39,
tion. We proposed a CNN-based SD method for the second no. 3, pp. 576–588, Mar. 2017.
[21] N. Liu and J. Han, ‘‘DHSNet: Deep hierarchical saliency network for
block that uses the results of the first block. With this strategy, salient object detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
the proposed method objectively outperformed state-of-the- nit. (CVPR), Jun. 2016, pp. 678–686.
art methods with respect to SOD and achieved an accurate [22] L. Zhang, J. Ai, B. Jiang, H. Lu, and X. Li, ‘‘Saliency detection via
absorbing Markov chain with learnt transition probability,’’ IEEE Trans.
SOD-ID. Image Process., vol. 27, no. 2, pp. 987–998, Feb. 2018.
[23] G. Li, Y. Xie, L. Lin, and Y. Yu, ‘‘Instance-level salient object segmen-
ACKNOWLEDGMENT tation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jul. 2017, pp. 2386–2395.
We thank Irina Entin, M. Eng., and Maxine Garcia, Ph.D., [24] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, ‘‘Deeply
from Edanz Group (www.edanzediting.com/ac) for editing a supervised salient object detection with short connections,’’ in Proc. IEEE
draft of this manuscript. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3203–3212.
[25] M. A. Islam, M. Kalash, and N. D. B. Bruce, ‘‘Revisiting salient object
detection: Simultaneous detection, ranking, and subitizing of multiple
REFERENCES salient objects,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
[1] L. Itti, C. Koch, and E. Niebur, ‘‘A model of saliency-based visual attention Jun. 2018, pp. 7142–7150.
for rapid scene analysis,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, [26] C. Aytekin, A. Iosifidis, and M. Gabbouj, ‘‘Probabilistic saliency estima-
no. 11, pp. 1254–1259, 1998. tion,’’ Pattern Recognit., vol. 74, pp. 359–372, Feb. 2018.
[2] J. Harel, C. Koch, and P. Perona, ‘‘Graph-based visual saliency,’’ in Proc. [27] R. Fan, M.-M. Cheng, Q. Hou, T.-J. Mu, J. Wang, and S.-M. Hu, ‘‘S4Net:
Neural Inf. Process. Syst., 2006, pp. 545–552. Single stage salient-instance segmentation,’’ in Proc. IEEE/CVF Conf.
[3] X. Hou and L. Zhang, ‘‘Saliency detection: A spectral residual approach,’’ Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6103–6112.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8. [28] L. Marchesotti, C. Cifarelli, and G. Csurka, ‘‘A framework for visual
[4] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, ‘‘Frequency-tuned saliency detection with applications to image thumbnailing,’’ in Proc. IEEE
salient region detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recog- 12th Int. Conf. Comput. Vis., Sep. 2009, pp. 2232–2239.
nit., Jun. 2009, pp. 1597–1604. [29] M. Rubinstein, D. Gutierrez, O. Sorkine, and A. Shamir, ‘‘A compar-
[5] J. Zhang and S. Sclaroff, ‘‘Exploiting surroundedness for saliency detec- ative study of image retargeting,’’ ACM Trans. Graph., vol. 29, no. 6,
tion: A Boolean map approach,’’ IEEE Trans. Pattern Anal. Mach. Intell., pp. 160–169, 2010.
vol. 38, no. 5, pp. 889–902, May 2016. [30] A. Mansfield, P. Gehler, L. V. Gool, and C. Rother, ‘‘Scene carving: Scene
[6] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, ‘‘A deep multi-level consistent image retargeting,’’ in Proc. Eur. Conf. Comput. Vis., Springer,
network for saliency prediction,’’ in Proc. 23rd Int. Conf. Pattern Recognit. 2010, pp. 143–156.
(ICPR), Dec. 2016, pp. 3488–3493. [31] A. Jose and I. Heisterklaus, ‘‘Bag of Fisher vectors representation of
[7] S. S. S. Kruthiventi, K. Ayush, and R. V. Babu, ‘‘DeepFix: A fully convo- images by saliency-based spatial partitioning,’’ in Proc. IEEE Int. Conf.
lutional neural network for predicting human eye fixations,’’ IEEE Trans. Acoust., Speech Signal Process. (ICASSP), Mar. 2017, pp. 1762–1766.
Image Process., vol. 26, no. 9, pp. 4446–4456, Sep. 2017. [32] C. Spearman, ‘‘The proof and measurement of association between two
[8] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, ‘‘Saliency filters: things,’’ Tech. Rep., 1961.
Contrast based filtering for salient region detection,’’ in Proc. IEEE Conf. [33] M. G. Kendall, ‘‘A new measure of rank correlation,’’ Biometrika, vol. 30,
Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740. nos. 1–2, pp. 81–93, Jun. 1938.
[9] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, ‘‘Salient object [34] N. Imamoglu, C. Zhang, W. Shmoda, Y. Fang, and B. Shi, ‘‘Saliency
detection: A discriminative regional feature integration approach,’’ in Proc. detection by forward and backward cues in deep-CNN,’’ in Proc. IEEE
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2083–2090. Int. Conf. Image Process. (ICIP), Sep. 2017, pp. 430–434.
[10] Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille, ‘‘The secrets of salient [35] R. Monroy, S. Lutz, T. Chalasani, and A. Smolic, ‘‘SalNet360: Saliency
object segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., maps for omni-directional images with CNN,’’ Signal Process., Image
Jun. 2014, pp. 280–287. Commun., vol. 69, pp. 26–34, Nov. 2018.
[11] J. Sun, H. Lu, and X. Liu, ‘‘Saliency region detection based on Markov [36] H. Li, H. Lu, Z. Lin, X. Shen, and B. Price, ‘‘Inner and inter label
absorption probabilities,’’ IEEE Trans. Image Process., vol. 24, no. 5, propagation: Salient object detection in the wild,’’ IEEE Trans. Image
pp. 1639–1649, May 2015. Process., vol. 24, no. 10, pp. 3176–3186, Oct. 2015.
[37] B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, ‘‘Seman- ISANA FUNAHASHI received the B.Eng. and
tic contours from inverse detectors,’’ in Proc. Int. Conf. Comput. Vis., M.Eng. degrees from the Nagaoka University
Nov. 2011, pp. 991–998. of Technology, Nagaoka, Japan, in 2017 and
[38] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks 2019, respectively. He is currently pursuing the
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Ph.D. degree with the Department of Com-
Recognit. (CVPR), Jun. 2015, pp. 3431–3440. puter and Network Engineering, The University
[39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, ‘‘Pyramid scene parsing of Electro-Communications, Tokyo, Japan. His
network,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
research interests include image processing and
Jul. 2017, pp. 2881–2890.
computer vision.
[40] B. Romera-Paredes and P. H. S. Torr, ‘‘Recurrent instance segmentation,’’
in Proc. Eur. Conf. Comput. Vis., Springer, 2016, pp. 312–329.
[41] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, ‘‘Fully convolutional instance-
aware semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jul. 2017, pp. 2359–2367.
[42] R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun,
and A. Yuille, ‘‘The role of context for object detection and semantic
segmentation in the wild,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2014, pp. 891–898. TAICHI YOSHIDA (Member, IEEE) received
[43] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for the B.Eng., M.Eng., and Ph.D. degrees in engi-
large-scale image recognition,’’ in Proc. Int. Conf. Learn. Represent., 2015, neering from Keio University, Yokohama, Japan,
pp. 1–14. in 2006, 2008, and 2013, respectively. In 2014,
[44] L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, ‘‘Learn-
he joined the Nagaoka University of Tech-
ing to detect salient objects with image-level supervision,’’ in Proc. IEEE
nology. In 2018, he joined the University of
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 136–145.
[45] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, Electro-Communications, where he is currently
and C. L. Zitnick, ‘‘Microsoft coco: Common objects in context,’’ in Proc. an Assistant Professor with the Department of
Eur. Conf. Comput. Vis., Springer, 2014, pp. 740–755. Communication Engineering and Informatics. His
[46] M. Jiang, S. Huang, J. Duan, and Q. Zhao, ‘‘SALICON: Saliency in research interests include filter bank design and
context,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), image coding applications.
Jun. 2015, pp. 1072–1080.
[47] K. Li, B. Hariharan, and J. Malik, ‘‘Iterative instance segmentation,’’ in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
pp. 3659–3667.
[48] A. Arnab and P. H. S. Torr, ‘‘Pixelwise instance segmentation with a
dynamically instantiated network,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 441–450.
MASAHIRO IWAHASHI (Senior Member, IEEE)
[49] V. Nair and G. E. Hinton, ‘‘Rectified linear units improve restricted Boltz-
received the B.Eng., M.Eng., and D.Eng. degrees
mann machines,’’ in Proc. Int. Conf. Mach. Learn., vol. 2010, pp. 807–814.
[50] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. in electrical engineering from Tokyo Metropoli-
[51] R. Szeliski, Computer Vision: Algorithms and Applications. London, U.K.: tan University, in 1988, 1990, and 1996, respec-
Springer, 2010. tively. In 1990, he joined Nippon Steel Company
Ltd. From 1991 to 1992, he was seconded to
Graphics Communication Technology Company
Ltd. In 1993, he joined the Nagaoka University
YO UMEKI received the B.Eng. and M.Eng. of Technology, where he is currently a Professor
degrees from the Nagaoka University of Technol- with the Department of Electrical Engineering,
ogy, Nagaoka, Japan, in 2015 and 2019, respec- Faculty of Technology. From 1995 to 2001, he was also a Lecturer with the
tively. He is currently pursuing the Ph.D. degree Nagaoka Technical College. From 1998 to 2001, he relocated to Thammasat
with the Department of Information Science and University, Thailand and the Electronic Engineering Polytechnic Institute of
Control Engineering. His main research interest Surabaya, Indonesia, as a JICA Expert. His research interests include digital
includes saliency detection. signal processing, multi-rate systems, and image compression. He served as
an Editorial Committee Member of the IEICE Transactions on Fundamentals
of Electronics, Communications, and Computer Sciences, from 2007 to 2011.