0% found this document useful (0 votes)

9 views

Kim - Data Driven Scene Parsing

This document summarizes a data-driven scene parsing method for recognizing construction site objects in whole images using limited training data. The method uses three modules: 1) a scene retrieval module that finds similar images in a database using GIST matching, 2) a dense scene alignment module that selects the most related images using SIFT flow matching, and 3) a scene parsing module that transfers labels from the most related images to label pixels in the query image. The method demonstrated reasonable performance, achieving an 81.48% average pixel-wise recognition rate on test construction site images using a small number (less than three) similar labeled images for each scene. This nonparametric scene parsing approach provides scalable and global object recognition capabilities using limited

Uploaded by

samir.aman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Kim - Data Driven Scene Parsing

Uploaded by

samir.aman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Automation in Construction 71 (2016) 271–282

Contents lists available at ScienceDirect

Automation in Construction

journal homepage: www.elsevier.com/locate/autcon

Data-driven scene parsing method for recognizing construction site

objects in the whole image
Hongjo Kim, Kinam Kim, Hyoungkwan Kim ⁎
Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history: Although efforts have been made for automated monitoring of construction sites, comprehensive understanding
Received 12 February 2016 of a whole image remains to be a difﬁcult task. Conventional vision-based monitoring methods have shortcom-
Received in revised form 2 June 2016 ings in obtaining semantic information regarding an entire image because these methods are not scalable to the
Accepted 15 August 2016
number of recognizable objects and training data. Most methods use a parametric model to recognize objects, in-
Available online 25 August 2016
volving cumbersome parameter tuning. This study presents the data-driven scene parsing method to recognize
Keywords:
various objects in a construction site image. For identifying object information of a query image, the monitoring
Construction site system retrieves the most relevant images to a query image using nearest neighbors and scale invariant feature
Computer vision transform ﬂow matching and transfers relevant image labels to a query image. This study demonstrated the rea-
Scene parsing sonable system performance in construction site images, recording 81.48% of average pixel-wise recognition rate
Object recognition with a small amount of similar images. The scene parsing method would enrich the raw information of a con-
Label transfer struction site image, thereby facilitating information use for various management applications.
© 2016 Elsevier B.V. All rights reserved.

1. Introduction important basis of the performance of a model because the representa-

tive patterns of objects cannot be learned from only a few examples
Information retrieved from images is widely used for construction [33].
management applications such as progress monitoring, productivity The construction industry lacks publicly available image data, which
analysis, safety management, and facility condition assessment [1–7]. is a major obstacle in implementing an intelligent monitoring system
Advanced monitoring methods essentially involve the object recogni- [4]. Therefore, developers have to prepare training data for their recog-
tion for extracting contextual information. Previous studies have suc- nition targets. This time-consuming and labor-intensive process of
cessfully identified construction workers, vehicles, structures, and collecting training data is a daunting task for a few individuals or orga-
materials for monitoring construction sites with computer vision tech- nizations. As a result, previous monitoring systems were only capable of
nology [8–32]. However, an environment in a construction site has not identifying a limited number of object classes. While a state-of-the-art
received enough attention of researchers, despite its usefulness of un- system increases the number of recognizable object classes up to 23
derstanding the construction status in a holistic way. Furthermore, the [16], a large amount of training data is still required to recognize various
number of recognizable objects in previous studies is small and fixed; construction objects and generate a robust generalization capable
thus, changing the number of target object classes entails a tedious pro- model (generalization capability is the performance of a model on un-
cess that involves gathering data, training a system, or tuning parame- seen data). Diverse photographing methods, including closed circuit
ters. These nonscalable systems capture limited information and may television, mobile phone, camcorder, unmanned aerial vehicle, and
lose important data because the number of object categories and the wearable recording devices, have augmented the types of image data
construction site environment vary over time. available; hence, monitoring systems require more training data to
To recognize various object classes, monitoring systems mainly rely deal with a large degree of appearance variance of objects.
on artificial intelligence (AI) to which supervised learning is applied. In To address these issues, a global recognition system for construction
supervised learning, developers provide training data to their model, site imagery was proposed. The initial idea of this paper was presented
and a model learns the particular patterns in the given data in order to in the 33rd ISARC (International Symposium on Automation in Con-
identify target objects. A sufficient amount of training data is the most struction) [34]. The system uses a nonparametric scene parsing method
[35], which labels all pixels of a query image to their category by trans-
⁎ Corresponding author.
ferring labels of similar images in a database. The method comprises
E-mail addresses: [email protected] (H. Kim), [email protected] (K. Kim), three modules: (1) a scene retrieval module, which determines a set
[email protected] (H. Kim). of similar images to the query image using a database and GIST

http://dx.doi.org/10.1016/j.autcon.2016.08.018
0926-5805/© 2016 Elsevier B.V. All rights reserved.
272 H. Kim et al. / Automation in Construction 71 (2016) 271–282

matching [36], (2) a dense scene alignment module that selects the reasoning processes, this method is not practical for recognizing various
most related images from the retrieved images by scale invariant fea- materials within an environment.
ture transform (SIFT) flow [37], and (3) a scene parsing module that
conveys the labels of the most related images to the query image. This
method demonstrates a reasonable performance of global recognition 2.2. Scalable nonparametric scene parsing method
on a particular construction site image when used with a small number
of pre-prepared images (less than three completely labeled images to a Conventional approaches to object recognition necessitate defining
monitoring target scene) of similar data. Furthermore, recognizable ob- an object model using template matching, bags of features, or shape
ject categories are scalable to images, which means that the system can models [35]. These methods require training for each class; thus, repet-
identify the various number of object categories in an image during con- itive, time-consuming processes are inevitable, varying by the size of
struction operation. the object category or training data available. To overcome these limita-
Our experiment was designed to demonstrate the performance of tions, Liu et al. [35] proposed a data-driven scene parsing method called
our system in global recognition and scalability to target object classes label transfer, which is a nonparametric and scalable technique. Scene
when used with a limited data set. The experiments were conducted parsing provides a semantic understanding of whole image areas by
using a database that contained 211 completely annotated construction segmenting and recognizing independent regions of an image.
site images. The images were obtained by five different individuals, with Implementing this method is a promising means of monitoring con-
three coming from outside the construction industry. A recognition rate struction sites in terms of global recognition capability, scalability, and
of 81.48% was reported for the 42 test images. Experimental settings practicality. This method transfers labels (information of object identi-
and details are described in Section 4. ty) to a query image from similar images, thereby enabling scaling to
The main contributions of the scene parsing system comprises three the number of object categories as each image has its own number of
main components: (1) the global recognition capability that enriches object classes. If a few similar images to a particular job site can be pre-
the information quality obtained from construction sites, thereby en- pared, this method can successfully identify the environment of the
abling a more comprehensive understanding of the context of job sites construction site in a query image. Because the method is a nonpara-
and advanced management applications; (2) the scalability, which metric model, which makes no assumptions on the distributions of an
maintains the performance of a monitoring system during construction object's features, only a few parameters remain to be trained. The details
operation; and (3) a practical solution for monitoring a construction site of the algorithm can be found in Section 3.
with limited data, requiring only a minimal effort in preparing a small A similar method has been used for roadway asset monitoring in
amount of training data to obtain a good recognition performance. An [65]. The study used the superparsing algorithm [66] for segmenting
additional advantage of our system is the small number of parameters, and recognizing roadway assets in video frames. The superparsing
only four, which require limited tuning, making this a highly practical method is very similar to the label transfer method used in [35]. The
system to use. main difference between the methods is the basic element of the
image. The superparsing algorithm uses superpixels (small clusters of
2. Related work pixels) as an image primitive, whereas the label transfer method uses
an original pixel grid. Using superpixels, superparsing algorithms have
2.1. Previous vision-based monitoring system recognizing objects in con- significant computational efficiency. The generation of superpixels in-
struction sites volves the image segmentation method (e.g., graph cut). Because nu-
merous materials used in construction sites appear in various sizes in
The vision-based monitoring systems have been used in previous each image and may have a similar appearance depending on the mon-
studies to monitor target objects for various purposes such as progress itoring distance and viewpoint, designing image segmentation tech-
monitoring [12,14,16,31,32,38–42], productivity analysis [8,18,20,26, niques and setting parameters to generate optimal superpixels can be
43], safety management [24,44–49], facility condition assessment [13, cumbersome. Thus, this study adopts the image parsing method used
50–58], and monitoring technology used on a construction site [10,19, in [35].
21,22,27–29,59–64]. Most of the studies have one or two recognizable
object classes. In general, this is sufficient because the number of recog-
nition targets required for their research objectives is typically less than 3. Scene parsing system for construction site monitoring
three. Nevertheless, their monitoring performance is limited to a specif-
ic application because diverse applications are not possible without 3.1. Overview of the global recognition system
comprehensive recognition on entire image areas.
To date, only a few studies have more than two recognizable objects The scene parsing system aims to recognize whole image areas, and
[16,22,32,42]. Chi and Caldas [22] have tested the performance of two the label transfer method described in [35] is used in this study. The task
classifiers, that of the naïve Bayes classifier and the neural networks, involves segmentation and recognition. Segmentation denotes separat-
as means to identify a worker, loader, and backhoe. Son et al. [32] ing independent regions of an image, whereas recognition means iden-
have identified major construction materials, including concrete, steel, tifying the information required in determining an object's identity. The
and wood, using an ensemble technique on multiple classifiers. Han system performs the two processes in a recognition-by-matching
and Golparvar-Fard [16] and Dimitrov and Golparvar-Fard [42] classi- scheme in which labels of similar images are transferred to a query
fied construction materials of over 20 classes forming a joint probability image [35]. The database has the completely labeled images that are ob-
distribution of material appearances for feature generation and using a tained from a web-based image labeling tool. The completely labeled
multiple binary support vector machine for object classification. Despite images are used as training data from which the system can exploit
these successes, the abovementioned methods have insufficient recog- the prior information of the per-pixel frequency of each object category
nizable objects for diverse applications and are not scalable to the num- to be used in Eq. (5) in Section 3.1.3. By searching the most similar im-
ber of recognizable objects; thus, the methods involve the cumbersome ages to the query image, some of the training images are used as the
re-adjustment of the system or gathering additional training data to bases for transferring their labels to a query image. Fig. 1 shows the sys-
change the number of identifiable object categories. The naïve Bayes tem architecture, and the outline of the scene parsing process are shown
classifier used in [22] is scalable; however, strong assumptions are re- in Fig. 2. As shown in the Fig. 1, the main component of the system com-
quired to be made on representative features of an object. Because the prises three modules, i.e., scene retrieval, dense scene alignment, and
establishment and validation of each assumption requires complicate scene parsing. Details are present in the Subsections 3.1.1–3.1.3.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 273

Fig. 1. Scene parsing system architecture that was adapted from [35].

Fig. 2. Outline of the scene parsing process. (a), a query image; (b), an image of the final candidate images from the scene retrieval and dense scene alignment step; (c), an integer image
representing the labels on (b); (d), transferred labels from the final candidates to (a) using a probabilistic Markov random field model; (e), an integer image representing the labels on (d);
and (f), the ground-truth labels of (a).
274 H. Kim et al. / Automation in Construction 71 (2016) 271–282

3.1.1. Scene retrieval rule can be applied for determining the most relevant label of a pixel.
When a query image comes into the system, similar images to the The posterior probability that contains the likelihood, prior, and
query image are retrieved using the nearest neighbor (NN) classiﬁers. smoothness terms is formulated as follows:
Among the variants of NN, a combination of the K-NN and ε-NN is
used in [35], which determines the K closest neighbors to the query −X
logP ðcjI;
s; fsi ;ci ;0 wi gÞ ¼X X
(K-NN) within (1 + ε) times the minimum distance from the query p
ψ cðpÞ; s; si þ α p λðcðpÞÞ þ β fp;qg∈ε ϕðcðpÞ; cðqÞ; I Þ þ logZ;
(ε-NN). The (K,ε)-NN is deﬁned as ð5Þ

N ðxÞ ¼ fyi jdist ðx; yi Þ ≤ ð1 þ ε Þdist ðx; yi Þ; yi ¼ arg mindist ðx; yi Þ; ibKg; ð1Þ where Z is the normalization constant, {si, ci, wi}i = 1:M is a set of the final
candidates, si is the SIFT image, ci is an integer image where ci(p)∈{1,
where x is a query image, yi is one of the closest neighbors, and dist (·,·)
…,L} is the index of object category of pixel p, wi = the SIFT flow field
calculates the distance between two images [35]. The distance function
from s to si [35].
dist (·,·) measures image similarities, and this study adopts an Euclide-
The likelihood term ψ is defined as
an distance of GIST [36] as a scene descriptor. The GIST descriptor is a
representation of the global scene structure of an image, computing a 8
< min jjsðpÞ−si ðp þ wðpÞÞjj; Ωp;l ≠0;
set of perceptual dimensions, such as naturalness, openness, roughness, i∈Ωp;l
ψðcðpÞ ¼ lÞ ¼ ð6Þ
expansion, and ruggedness, using a Fourier transform and principal :
τ; Ωp;l ¼ 0;
component analysis.
where Ωp,l = {i; ci(p + w(p)) = l}, l = 1, …, L, is the index set of the final
3.1.2. Dense scene alignment candidate images whose the transferred label is l to pixel p [35]. τ is the
To select the most relevant images to a query image from those re- value of the maximum difference of the SIFT feature of a query image to
trieved, visual features are extracted using the scale-invariant keypoint one of the final candidate images (τ = maxs1,s2,p ||s1(p) − s2(p)||) [35].
descriptor [67]. Different from the original procedure of extracting fea- The prior term λ is defined as
tures called SIFT [67], only the keypoint descriptor stage is applied to
every pixel of an image to characterize local image structure. The SIFT λðcðpÞ ¼ lÞ ¼ − loghist l ðpÞ; ð7Þ
keypoint is a salient feature of construction site images because the fea-
ture is an invariant to scale, rotation, affine distortion, and illumination. where λ(c(p) = l) is the prior probability that the object category l ex-
To align pixel-to-pixel correspondence between two images (one of the ists at pixel p, counting the occurrence of each object classes at each
retrieved images to a query image), the SIFT flow method [37] is used. pixel in the training set [35]. histl (p) is the spatial histogram of object
The procedure is formulated as the energy function with three key as- category l [35].
sumptions: 1) brightness constancy, 2) small motion, and 3) spatial co- The smoothness term ϕ is defined as
herence between corresponding pixels in two images: !
2
X ξ þ exp−γjjIðpÞ−IðqÞjj
EðwÞ ¼ min s1 ðpÞ−s2 ðp þ wðpÞÞ1; t þ ð2Þ ϕðcðpÞ; cðqÞÞ ¼ δ½cðpÞ≠cðqÞ ; ð8Þ
p ξþ1

∑ ηðjuðpÞj þ jvðpÞjÞþ ð3Þ where image contrast variable γ = (2 b || I(p) − I(q)||2 N)− 1, b · N
p
denoting an average over the image and [·] is the zero-one indicator
function [35,68,69]. This term restrains the label at pixel p to be similar
∑ minðλjuðpÞ−uðqÞj; dÞ þ minðλjvðpÞ−vðqÞj; dÞ; ð4Þ
ðp;qÞ∈ε to neighbor pixels when brightness values are similar or to be different
if the values have a large difference. In the case of no label information
where Eq. (2) is the data term, Eq. (3) is the small displacement term, on a pixel, a label of a neighbor is assigned by this equation.
Eq. (4) is the smoothness term, p = (x,y) is the spatial coordinate of a By minimizing Eq. (5) using the BP-S algorithm [68], all pixels on a
pixel, w(p) = [u(p),v(p)] is the flow vector at p, s1 and s2 are the per- query image obtain their own labels. The posterior probability function
pixel SIFT descriptor for two images, and ε is a set of the four spatial has four parameters, K, and M controls the mode of the model, α con-
neighborhood [35]. The data term in Eq. (2) maintains brightness controls the influence of spatial prior, and β controls the impact of smooth-
stancy by matching the SIFT descriptors along with the flow vector ness. Samples of scene parsing results are shown in Fig. 2(e).
w(p). The small displacement term in Eq. (3) limits the magnitude of
the flow vectors to be small. The smoothness term in Eq. (4) constrains 3.2. Web-based dataset generation platform
adjacent pixels of the flow vector to be similar with a spatial regulariza-
tion constant λ. For minimizing the energy function, the sequential be- The data-driven scene parsing methods require completely labeled
lief propagation (BP-S) algorithm [68] is used, with t and d as the image data for identifying objects on a query image. Labeled image
thresholds of matching outliers and flow discontinuities, respectively. data has separated objects within its boundary and object information.
From this step, M number of the final candidates (M ≤ K) are selected This data is used for 1) the prior information of the per-pixel frequency
for transferring their labels to a query image by ranking the retrieved of each object class and 2) transferring labels of final candidate images
images in descending order of energy. Fig. 2(b) is an example of the to a query image by matching similar pixels across images. To generate
final candidates to a query image in Fig. 2(a). labeled image data, a web-based image labeling platform for monitoring
construction sites was developed on a private webserver, employing
3.1.3. Scene parsing functions of the LabelMe online annotation tool [71]. In the website,
Scene parsing is the process of segmenting a query image into inde- users can designate a region of an object by clicking vertices of a polygon
pendent regions and recognizing them by transferring the labels of final and can annotate its name. Fig. 3 shows an interface of the website with
candidate images. To reconcile multiple labels of candidates to a pixel an example of a completely labeled image.
and impose spatial smoothness, a probabilistic Markov random field
(MRF) model is built [35], which is a graphical model in which pixels 3.3. Taxonomy of construction site objects
are only influenced by their adjacent pixels that have a Markov proper-
ty. A graph structure represents an image as a network of pixels with Because humans label an image, there is no guarantee of having sim-
edge and node components in place of a pixel grid. In MRF, the Bayes' ilar labeled results by different people. This is because perceptions on an
H. Kim et al. / Automation in Construction 71 (2016) 271–282 275

Fig. 3. Web-based image labeling tool.

object boundary vary with respect to individuals. For example, some in- construction industry, and thus, the taxonomy of construction site
dividuals label a building as a whole appearance, whereas others sepa- objects was presented as a reference of object names, as shown in
rate windows of the building to represent independent regions. Fig. 4. Five labelers annotated object tags on whole image areas, tak-
Another issue is that labelers can tag different names on the same ing an average of 30 min/image. The samples of the labeled images
object (e.g., reinforcement bar and rebar). Such issues are because are shown in Fig. 5. The original image size was 1920 × 1080, and
of the different object perceptions of labelers. Another problem is the size was reduced to half the size (960 × 540) for computational
that if a labeler has insufficient knowledge regarding construction efficiency.
site objects, incorrect labels might be assigned. Inconsistent training
data occurred by these issues would yield poor performance on the 4.2. Construction site image dataset
data-driven scene parsing system. In other words, different labels
on a same object results in producing redundant object classes; this Construction site images used in this experiment were obtained
effect is undesirable because each of these classes has to split the from two construction projects at Yonsei University from which images
limited object samples. Section 4.3.3 Performance of the scene parsing and videos were recorded for years: 1) the Baekyang-ro renovation pro-
system shows the importance of a sufficient amount of object sam- ject and 2) the engineering building extension project. A camcorder,
ples as it supports a high recognition rate. camera, and mobile phone were used to acquire construction site im-
To minimize these problems, it is worth directing labelers by pre- ages at a range of locations, e.g., from the rooftop of a building to the
senting construction site object taxonomy. Because monitoring applica- ground. In addition, an unmanned aerial vehicle with a rotary wing cap-
tions of this study are not limited to a specific purpose, fundamental tured aerial images of the construction sites. A number of construction
categories are established for the general purpose of monitoring. Con- operations were captured, such as earthwork, building construction,
struction site objects are classified on the basis of the following four temporary facility construction in various illumination, and weather
groups: 1) moving objects (e.g., worker, excavator, or truck), 2) material conditions, as shown in Fig. 6. In general, construction site images in
types (e.g., h-beam or pipe), 3) structure (e.g., temporary building), and the database were congested with several objects.
4) environment (e.g., ground or sky), as shown in Fig. 4. The proposed
taxonomy is set for the specific job sites used in the experiment and 4.3. Results
do not include comprehensive object categories for the construction in-
dustry. Thus, depending on monitoring applications, the taxonomy and 4.3.1. Performance evaluation criterion
the level of detail should be changed. For example, if monitoring appli- To evaluate the performance of the scene parsing system, average
cations include safety management employing posture analysis, la- pixel-wise recognition rate r was computed by the following equation
belers have to tag each part of the human body for which the [35]:
taxonomy contain the details of human body parts to capture their mo-
tion. A list of commonly used temporary construction resources in [5] 1
r ¼ ∑ ∑ 1ðoðpÞ ¼ aðpÞ; aðpÞN0Þ; ð9Þ
can be referred. ∑i mi i p∈Λi

4. Experiments
where p is a pixel in image i, a(p) is the ground-truth label [for unla-
beled pixels, a(p) = 0], o(p) is the output, Λi is a pixel grid of test
4.1. Experimental setting
image i, and mi = ∑ 1ðaðpÞN0Þ is the number of labeled pixels for
p∈Λi
The scene parsing was conducted on a desktop computer with Intel image I [35]. To calculate the recognition rates for each object category,
i7-4770 CPU, 32 GB RAM in the Windows 10 64 bit operating system. the per-class average recognition rates were computed by the following
The study used the open source code of label transfer [35] for scene equation [35]:
parsing. The code was executed in an MATLAB environment. Functions
of the LabelMe [71], built in the Ubuntu 14.04 operating system, were
used to build a web-based image labeling tool to generate training ∑ ∑ 1ðoðpÞ ¼ aðpÞ; aðpÞ ¼ lÞ
i p∈Λi
data. The 169 training images and 42 test images were generated in rl ¼ ; l ¼ 1; …; L: ð10Þ
∑ ∑ 1ðaðpÞ ¼ lÞ
the website by ﬁve labelers. Three were irrelevant to the i p∈Λi
276 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 4. Taxonomy of construction site objects.

H. Kim et al. / Automation in Construction 71 (2016) 271–282 277

Fig. 5. Samples of labeled images.

4.3.2. Parameter searching test images and the per-class recognition rates of the construction ob-
Four parameters, K number of neighbors, M number of the final can- jects in the whole test set. Fig. 10 illustrates expanded samples of the
didates, prior weight α, and spatial weight β, control the performance of scene parsing results.
the scene parsing system. The spatial regularization parameter λ of the
SIFT flow field in Eq. (4) was selected as 0.7, which is the optimal num- 5. Discussion
ber reported in [35]. The parameters K = 5, 10, 15, and 20; M = 1, 2, 3, 5,
and 10; α = 0.02, 0.04, 0.06, 0.08, 0.1, and 0.12; and β = 5, 10, 20, 30, 5.1. Analysis of the experimental result
and 40 were tested to determine the optimal solution. The various num-
bers of M were first tested on the basis of K = 20, α = 0.06, and β = 20, From the experiment, the scene parsing system shows the global
and M = 2 exhibited the best performance. Likewise, the K, α, and β recognition capability on the construction site images, recording
were tested, and the best parameter set (K = 20, M = 2, α = 0.06, 81.48% of the average pixel-wise recognition rate. Even an unlabeled
and β = 20) were empirically obtained; the testing results are shown pixel is labeled using the neighborhood information of the final candi-
in Fig. 7. For selecting K, numbers under 20 were tested because of the date images by the constraint of a MRF model. However, the perfor-
small size of the training dataset. mance varies by whether similar images are retrieved or not. In Fig. 2,
two query images of the upper two rows have similar candidate images;
thus, the performances are rather good. In contrast, the performances of
4.3.3. Performance of the scene parsing system the lower two query images are low because the system failed to re-
The performance of the scene parsing system has been demonstrat- trieve similar candidate images. The result implies that the good perfor-
ed in two indices: 1) the average pixel-wise recognition rate and 2) the mance of a query image can be guaranteed when having similar training
per-class average recognition rates. A total of 42 images were used for a images in the database and correctly retrieving them.
test set, comprising 20% of the total images in the database. The average The system demonstrates scalability to the number of recognizable
pixel-wise recognition rate was 81.48% from the 42 test images, with a object classes. By selecting the final candidate images, objects in a
median value of 84.47% and standard deviation of 13.12%. The average query image are labeled without retraining the whole system to adjust
processing time was 7.93 s/query image. The per-class average recogni- for the number of recognizable object categories. For example, each row
tion rates are selectively shown in Tables 1, 2, and 3 from the total 119 in Fig. 2 has a query image with the different number of object
object classes. The average per-class recognition rate from the test set categories.
was 59.82% with a median value of 67.37%. Fig. 8 shows the distribution Several similar images to a particular job site yields a good recogni-
of the per-class recognition rates with respect to the object tag counts. tion performance because two final candidate images have transferred
Fig. 9 shows the box plots of the pixel-wise recognition rates of the their labels into a query image in the experiment. This means that the
278 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 6. Samples of construction images in the database.

little effort for preparing a few images to monitor target areas can assure rates. This claim is supported by the increasing tendency of per-class
promising recognition performance. Nonetheless, an abundant amount recognition rates along with the number of tag counts, as shown in
of data can be helpful to recognize various objects with high recognition Fig. 8.

Fig. 7. Average pixel-wise recognition rates to each parameter, M candidates, K neighbors, spatial weight α, and prior weights β.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 279

Table 1 Table 3
Top 20 objects in a descending order of per-class recognition rates. Per-class recognition rate of major construction objects.

Object name Tag counts Recognition rate Object name Tag counts Recognition rate
Building_under_construction 91 97.54% Building_under_construction 91 97.54%
Loader 1 96.81% Loader 1 96.81%
Dust 6 95.63% Fork_lift 2 92.86%
Materials_sack 39 94.32% Crane 69 92.77%
Fork_lift 2 92.86% H_beam 1578 90.53%
Crane 69 92.77% Ground 739 86.75%
tTmporary_building 47 92.32% Building 344 86.52%
Materials_rope 13 91.48% Boring_machine 46 84.00%
Safety_fence 476 91.36% Worker 550 76.90%
Waste_re-bar 5 91.32% Concrete 46 75.44%
Banner 142 91.12% Excavator 165 69.53%
H_beam 1578 90.53% Truck 95 62.39%
Window 1044 90.32% Truck_concrete_mixer 21 61.76%
Materials_drum 40 90.30% Pillar 212 54.87%
Panel 131 87.51% Concrete_form 65 51.51%
Tube 140 87.01% Concrete_column 44 42.82%
Ground 739 86.75% Scaffold 305 39.17%
Steel_pipe 822 86.67% Re-bar 216 38.20%
Building 344 86.52% Wood 52 11.74%
Garbage 1 86.41% Soil 13 0.00%

5.2. Limitations and suggestions up to two. However, the small number of candidate images may have
insufficient object classes. Therefore, it may miss some object classes,
The major limitations of the scene parsing method can be summa- such as a concrete mixer or dump truck, which enter a construction
rized into four main components: site as required. To cope with the problem of scalability, the number
The system has missed some small objects that account for a tiny of final candidate images M should be increased for a sufficient number
area in the images. Several small objects shown in Table 3 such as scaf- of transferable object categories. The label transfer system in [35] re-
fold, rebar, and wood recorded low recognition rates of 39.17%, 38.20%, ports the experimental result that the number of final candidate images
and 11.74%, respectively, even though these objects had enough tag M = 3, 5, 7, and 9 yields high performance with the LabelMe outdoor
counts of 305, 216, and 52, respectively. This error is caused by inaccu- database. However, the increased M number demonstrates a decreasing
rate labeling of a small object. When labeling a tiny object, the labeled tendency of the average pixel-wise recognition rate in this study, as
region tends to be larger than what is necessary to cover the whole shown in the upper left graph in Fig. 7. Probable reasons for this phe-
area of the object; this results in an extra region that is not required to nomenon include 1) the small number of images in the database are in-
represent the entire area of the object. This extra region represents inac- sufficient to support similar images to a query image and 2) a
curate features of an object, thereby leading to a poor recognition rate. construction site is congested with many items in a unique environ-
To minimize this problem, segmentation methods, such as graph cut ment. Therefore, it is difficult to have similar scenes even if they are in
[72,73] or GrabCut [70,74] can assist users in extracting an exact region the same type of construction project. Furthermore, some object classes
of interest using user inputs as prior information for segmentation. might not be in a few final candidate images. The implementation of
Above all, for some applications, a target object of interest in an image complementary modules for detecting entities that occasionally appear
should account for a detectable and labelable size. in images should be introduced in the future.
In the experiment, the label transfer method could increase its rec- The system is not suitable for real-time applications because the av-
ognition performance by reducing the number of final candidate images erage processing time exceeds a second. This is because the system de-
termines the labels of every pixel. For a real-time application 1) parallel

Table 2 100%
Bottom 20 objects in an increasing order of per-class recognition rates.
90%
Object name Tag counts Recognition rate
80%
Hole 19 0.00%
Waste 17 0.00% 70%
Soil 13 0.00%
Recognition rate

Water_tank 13 0.00% 60%

Motorcycle 10 0.00%
Gas_tank 9 0.00% 50%
Bus 6 0.00%
Waste_rock 68 4.10% 40%
Gnerator_car 38 6.10%
Wood 52 11.74% 30%
Box 3 20.42%
Sewer 14 22.59% 20%
Concrete_pipe 127 26.06%
10%
Wall 109 26.71%
Door 81 27.07%
0%
Materials_pocket 11 27.46%
0 200 400 600 800 1000 1200 1400 1600
Materials_box 5 31.83%
Rubber_cone 37 32.61% Tag counts
Excavator_bucket 45 32.88%
Bicycle 4 37.25% Fig. 8. Per-class recognition rates in regard to object tag counts. A tag is one piece of an
object segment.
280 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 9. Box plots of pixel-wise and per-class recognition rates.

processing techniques can be used or 2) variants of the data-driven shown in Fig. 9. The discrepancy between the average pixel-wise and
scene parsing methods can be adopted (e.g., superparsing techniques per-class recognition rates was caused by some successfully identiﬁed
[66], successfully used in roadway asset monitoring [65]). objects taking a large portion in an image (e.g. building, ground, win-
In this study, a small number of construction site images were used dow and safety fence, as shown in Table 1). Lack of training images
for training the system. Despite the system performance on a whole caused poor performance for some objects shown in Table 2. A small
image was reasonable as it recorded the average pixel-wise recognition amount of training data taken in a ﬁxed period time can be used for
rate of 81.48%, the average per-class recognition rate was relatively low monitoring only limited types of construction sites. The monitoring sys-
(59.82%) showing high variances among the construction objects as tem might fail to produce a similar accuracy for a new image coming

Fig. 10. Samples of the scene parsing result. (a) query image, (b) labeled result, and (c) ground-truth label.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 281

from a future construction operation with an unique scenery, which would like to thank Yonsei University for granting access to the con-
would not be contained in the existing database. Liu et al. [35] indicates struction site of the Baekyang-ro renovation project.
that the performance of the scene parsing system can be improved with
increasing number of training images. A construction image database
should have more labeled construction site images for robust perfor- References
mance. Moreover, because all construction sites are unique and
[1] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, P. Fieguth, A review on computer vi-
congested with various objects, it takes a significant amount of time sion based defect detection and condition assessment of concrete and asphalt civil
for labeling (approximately 30 min/image in this study). A large amount infrastructure, Adv. Eng. Inform. 29 (2) (2015) 196–210.
of labeled image data can be obtained using a crowdsourcing platform [2] J. Yang, M.-W. Park, P.A. Vela, M. Golparvar-Fard, Construction performance moni-
toring via still images, time-lapse photos, and video streams: now, tomorrow, and
(e.g., Amazon Mechanical Turk), generating synthetic images from 3D the future, Adv. Eng. Inform. 29 (2) (2015) 211–224.
CAD models [75], or enlarging training data using image transformation [3] V. Pătrăucean, I. Armeni, M. Nahangi, J. Yeung, I. Brilakis, C. Haas, State of research in
techniques [76]. automatic as-built modelling, Adv. Eng. Inform. 29 (2) (2015) 162–171.
[4] J. Seo, S. Han, S. Lee, H. Kim, Computer vision techniques for construction safety and
For labeling construction site images, the expertise of labelers re- health monitoring, Adv. Eng. Inform. 29 (2) (2015) 239–251.
garding construction site objects is necessary to name exact object clas- [5] J. Teizer, Status quo and open challenges in vision-based sensing and tracking of
ses. Because the perception of labelers varies, labeling rules or temporary resources on infrastructure construction sites, Adv. Eng. Inform. 29 (2)
(2015) 225–238.
taxonomy should be suggested for acquiring consistent results. It is
[6] H. Son, F. Bosché, C. Kim, As-built data acquisition and its use in production monitor-
also possible to allow labelers to freely annotate an object name by ing and automated layout of civil infrastructure: a survey, Adv. Eng. Inform. 29 (2)
their perceptions. However, system developers should pay for the (2015) 172–183.
price of this autonomy by having to develop extra post-processing [7] H. Fathi, F. Dai, M. Lourakis, Automated as-built 3D reconstruction of civil infrastruc-
ture using computer vision: achievements, opportunities, and challenges, Adv. Eng.
steps to match between synonyms or to link between superordinate Inform. 29 (2) (2015) 149–161.
and subordinate terms in a sematic hierarchy. Russell et al. [71] sug- [8] J. Gong, C.H. Caldas, An object recognition, tracking, and contextual reasoning-based
gested a method for establishing semantic relationships between object video interpretation method for rapid productivity analysis of construction opera-
tions, Autom. Constr. 20 (8) (2011) 1211–1226.
labels based on WordNet [77], an electronic dictionary with semantic [9] S. Chi, C.H. Caldas, D.Y. Kim, A methodology for object identification and tracking in
hierarchies of words. One can refer to the method for the post- construction based on spatial modeling and image matching techniques, Comput.
processing. Aided Civ. Infrastruct. Eng. 24 (3) (2009) 199–211.
[10] I. Brilakis, L. Soibelman, Y. Shinagawa, Material-based construction site image re-
trieval, J. Comput. Civ. Eng. 19 (4) (2005) 341–355.
[11] M. Golparvar-Fard, F. Peña-Mora, C.A. Arboleda, S. Lee, Visualization of construction
6. Conclusion progress monitoring with 4D simulation model overlaid on time-lapsed photo-
graphs, J. Comput. Civ. Eng. 23 (6) (2009) 391–404.
This study presented a recognition system for a construction site, [12] Y.H. Wu, H. Kim, C. Kim, S.H. Han, Object recognition in construction-site images
using 3D CAD-based filtering, J. Comput. Civ. Eng. 24 (1) (2010) 56–64.
which identified whole image areas using the scene parsing method
[13] Z. Zhu, I. Brilakis, Parameter optimization for automated concrete detection in image
proposed in [35]. The system could recognize the varying size of object data, Autom. Constr. 19 (7) (2010) 944–953.
categories by transferring labels of the final candidate images to a query [14] M. Golparvar-Fard, F. Peña-Mora, S. Savarese, Automated progress monitoring using
unordered daily construction photographs and IFC-based building information
image. As the system had few parameters, it was easy to optimize the
models, J. Comput. Civ. Eng. 29 (1) (2012) 147–165.
objective function, which was a probabilistic MRF for scene parsing. [15] H. Son, C. Kim, C. Kim, Automated color model-based concrete detection in con-
Implementing the data-driven scene parsing method in a construction struction-site images by using machine learning algorithms, J. Comput. Civ. Eng.
site was novel, thereby demonstrating the global recognition capability 26 (3) (2012) 421–433.
[16] K.K. Han, M. Golparvar-Fard, Appearance-based material classification for monitor-
and scalability of this monitoring system. The performance of the scene ing of operation-level construction progress using 4D BIM and site photologs,
parsing system recorded 81.48% of the average pixel-wise recognition Autom. Constr. 53 (2015) 44–57.
rate from the 42 test images with 169 training images. The experiment [17] J. Teizer, C.H. Caldas, C.T. Haas, Real-time three-dimensional occupancy grid model-
ing for the detection and tracking of construction resources, J. Constr. Eng. Manag.
demonstrated that a high recognition rate could be attained with only 133 (11) (2007) 880–888.
the two final candidate images to a query image. Likewise, one can mon- [18] J. Zou, H. Kim, Using hue, saturation, and value color space for hydraulic excavator
itor a construction site using the scene parsing system by preparing a idle time analysis, J. Comput. Civ. Eng. 21 (4) (2007) 238–246.
[19] J. Teizer, P.A. Vela, Personnel tracking on construction sites using video cameras,
small number of completely labeled images for a particular job site Adv. Eng. Inform. 23 (4) (2009) 452–462.
scene. By semantic understanding on entire areas in images, abundant [20] J. Gong, C.H. Caldas, Computer vision-based video interpretation model for automat-
knowledge can be generated for various construction management ed productivity analysis of construction operations, J. Comput. Civ. Eng. 24 (3)
(2010) 252–263.
applications.
[21] I. Brilakis, M.W. Park, G. Jog, Automated vision tracking of project related entities,
Currently, the scene parsing method has limitations in recognizing Adv. Eng. Inform. 25 (4) (2011) 713–724.
various construction objects. Therefore, a large amount of labeled [22] S. Chi, C.H. Caldas, Automated object identification using optical video cameras on
construction sites, Comput. Aided Civ. Infrastruct. Eng. 26 (5) (2011) 368–380.
image data is still required for implementing a complementary module
[23] E. Rezazadeh Azar, B. McCabe, Automated visual recognition of dump trucks in con-
such as a detector for particular object classes. Further studies should be struction videos, J. Comput. Civ. Eng. 26 (6) (2011) 769–781.
performed in two ways for preparing a large amount of image data: 1) [24] S. Chi, C.H. Caldas, Image-based safety assessment: automated spatial safety risk
publicly opened image labeling/sharing platform for the construction identification of earthmoving and surface mining activities, J. Constr. Eng. Manag.
138 (3) (2012) 341–351.
industry and 2) an image data generation platform for a specific con- [25] M.W. Park, I. Brilakis, Construction worker detection in video frames for initializing
struction object to train AI, which involves supervised learning. With a vision trackers, Autom. Constr. 28 (2012) 15–25.
large amount of data, the vision-based monitoring system can be signif- [26] E. Rezazadeh Azar, S. Dickinson, B. McCabe, Server-customer interaction tracker:
computer vision–based system to estimate dirt-loading cycles, J. Constr. Eng.
icantly improved with respect to the recognition capability to various Manag. 139 (7) (2012) 785–794.
construction objects. [27] E. Rezazadeh Azar, B. McCabe, Part based model and spatial–temporal reasoning to
recognize hydraulic excavators in construction images and videos, Autom. Constr.
24 (2012) 194–202.
Acknowledgement [28] M. Golparvar-Fard, A. Heydarian, J.C. Niebles, Vision-based action recognition of
earthmoving equipment using spatio-temporal features and support vector ma-
chine classifiers, Adv. Eng. Inform. 27 (4) (2013) 652–663.
We would like to thank the anonymous reviewers for their valuable [29] M. Memarzadeh, M. Golparvar-Fard, J.C. Niebles, Automated 2D detection of con-
comments that helped improve the paper. This work was supported by struction equipment and workers from site video streams using histograms of ori-
ented gradients and colors, Autom. Constr. 32 (2013) 24–37.
the National Research Foundation of Korea (NRF) grant funded by the
[30] K. Ranaweera, J. Ruwanpura, S. Fernando, Automated real-time monitoring system
Korea government (MSIP; Ministry of Science, ICT and Future Planning) to measure shift production of tunnel construction projects, J. Comput. Civ. Eng.
(NRF-2014R1A2A1A11052499 and No. 2011-0030040). The authors 27 (1) (2013) 68–77.
282 H. Kim et al. / Automation in Construction 71 (2016) 271–282

[31] C. Kim, B. Kim, H. Kim, 4D CAD model updating using image processing-based con- [55] C. Koch, G.M. Jog, I. Brilakis, Automated pothole distress assessment using asphalt
struction progress monitoring, Autom. Constr. 35 (2013) 44–52. pavement video data, J. Comput. Civ. Eng. 27 (4) (2013) 370–378.
[32] H. Son, C. Kim, N. Hwang, C. Kim, Y. Kang, Classification of major construction mate- [56] C. Koch, I. Brilakis, Pothole detection in asphalt pavement images, Adv. Eng. Inform.
rials in construction environments using ensemble classifiers, Adv. Eng. Inform. 28 25 (3) (2011) 507–515.
(1) (2014) 1–10. [57] H. Son, N. Hwang, C. Kim, C. Kim, Rapid and automated determination of rusted sur-
[33] S. Kumar, Neural Networks: A Classroom Approach, 2nd ed.McGraw-Hill Education face areas of a steel bridge for robotic maintenance systems, Autom. Constr. 42
(India) Private Limited, P-24, Green Park Extension, New Delhi 2013, pp. 1–735 (2014) 13–24.
(110 016). [58] T. Nishikawa, J. Yoshida, T. Sugiyama, Y. Fujino, Concrete crack detection by multiple
[34] H. Kim, K. Kim, H. Kim, Data-driven Scene Parsing Method for Construction Site sequential image filtering, Comput. Aided Civ. Infrastruct. Eng. 27 (1) (2012) 29–47.
Monitoring, 33rd International Symposium on Automation and Robotics in Con- [59] I.K. Brilakis, L. Soibelman, Y. Shinagawa, Construction site image retrieval based on
struction, Auburn, AL, 2016 (in press). material cluster recognition, Adv. Eng. Inform. 20 (4) (2006) 443–452.
[35] C. Liu, J. Yuen, A. Torralba, Nonparametric Scene Parsing via Label Transfer, IEEE [60] J. Yang, T. Cheng, J. Teizer, P.A. Vela, Z. Shi, A performance evaluation of vision and
Transactions on Pattern Analysis and Machine Intelligence 33 (12) (2011) radio frequency tracking methods for interacting workforce, Adv. Eng. Inform. 25
2368–2382, http://dx.doi.org/10.1109/TPAMI.2011.131. (4) (2011) 736–747.
[36] A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of [61] J. Yang, O. Arif, P.A. Vela, J. Teizer, Z. Shi, Tracking multiple workers on construction
the spatial envelope, Int. J. Comput. Vis. 42 (3) (2001) 145–175. sites using video cameras, Adv. Eng. Inform. 24 (4) (2010) 428–434.
[37] C. Liu, J. Yuen, A. Torralba, SIFT flow: dense correspondence across scenes and its ap- [62] M.W. Park, A. Makhmalbaf, I. Brilakis, Comparative study of vision tracking methods
plications, IEEE Trans. Pattern Anal. Mach. Intell. 33 (5) (2011) 978–994. for tracking of construction site resources, Autom. Constr. 20 (7) (2011) 905–915.
[38] M. Golparvar-Fard, F. Peña-Mora, S. Savarese, D4AR–a 4-dimensional augmented re- [63] M.-W. Park, C. Koch, I. Brilakis, Three-dimensional tracking of construction resources
ality model for automating construction progress monitoring data collection, pro- using an on-site camera system, J. Comput. Civ. Eng. 26 (4) (2012) 541–549.
cessing and communication, J. Inf. Technol. Constr. 14 (2009) 129–153. [64] A. Rashidi, M. Sigari, M. Maghiar, D. Citrin, An analogy between various machine-
[39] M. Ahmed, C. Haas, R. Haas, Using digital photogrammetry for pipe-works progress learning techniques for detecting construction materials in digital images, KSCE J.
tracking, Can. J. Civ. Eng. 39 (9) (2012) 1062–1071. Civ. Eng. 20 (4) (2015) 1178–1188.
[40] H. Son, C. Kim, 3D structural component recognition and modeling method using [65] V. Balali, M. Golparvar-Fard, Segmentation and recognition of roadway assets from
color and 3D data for construction progress monitoring, Autom. Constr. 19 (7) car-mounted camera video streams using a scalable non-parametric image parsing
(2010) 844–854. method, Autom. Constr. 49 (2015) 27–39.
[41] L. Hui, M.-W. Park, I. Brilakis, Automated brick counting for façade construction [66] J. Tighe, S. Lazebnik, Superparsing, Int. J. Comput. Vis. 101 (2) (2013) 329–349.
progress estimation, J. Comput. Civ. Eng. 29 (6) (2015) 04014091-1–04014091-12. [67] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput.
[42] A. Dimitrov, M. Golparvar-Fard, Vision-based material recognition for automated Vis. 60 (2) (2004) 91–110.
monitoring of construction progress and generating building information modeling [68] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M.
from unordered site image collections, Adv. Eng. Inform. 28 (1) (2014) 37–49. Tappen, C. Rother, A comparative study of energy minimization methods for Mar-
[43] S. Siebert, J. Teizer, Mobile 3D mapping for surveying earthwork projects using an kov random fields with smoothness-based priors, IEEE Trans. Pattern Anal. Mach.
Unmanned Aerial Vehicle (UAV) system, Autom. Constr. 41 (2014) 1–14. Intell. 30 (6) (2008) 1068–1080.
[44] M.-W. Park, N. Elsafty, Z. Zhu, Hardhat-wearing detection for enhancing on-site [69] J. Shotton, J. Winn, C. Rother, A. Criminisi, Textonboost for image understanding:
safety of construction workers, J. Constr. Eng. Manag. 141 (9) (2015) 04015024- multi-class object recognition and segmentation by jointly modeling texture, layout,
1–04015024-16. and context, Int. J. Comput. Vis. 81 (1) (2009) 2–23.
[45] H. Kim, K. Kim, H. Kim, Vision-based object-centric safety assessment using fuzzy in- [70] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using
ference: monitoring struck-by accidents with moving objects, J. Comput. Civ. Eng. iterated graph cuts, ACM Trans. Graph. 23 (3) (2004) 309–314.
(2015) 04015075-1–04015075-13 (published online). [71] B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, LabelMe: a database and web-
[46] J. Seo, R. Starbuck, S. Han, S. Lee, T.J. Armstrong, Motion data-driven biomechanical based tool for image annotation, Int. J. Comput. Vis. 77 (2008) 157–173.
analysis during construction tasks on sites, J. Comput. Civ. Eng. 29 (4) (2015) [72] Y.Y. Boykov, M.P. Jolly, Interactive Graph Cuts for Optimal Boundary & Region Seg-
B4014005-1-B4014005-13. mentation of Objects in N-D Images, 8th International Conference on Computer Vi-
[47] S. Han, S. Lee, F. Peña-Mora, Comparative study of motion features for similarity- sion, vol. 1, 2001 105–112, http://dx.doi.org/10.1109/iccv.2001.937505 (Vancouver,
based modeling and classification of unsafe actions in construction, J. Comput. Civ. BC).
Eng. 28 (5) (2013) A4014005-1-A4014005-11. [73] L. Gorelick, O. Veksler, Y. Boykov, C. Nieuwenhuis, Convexity Shape Prior for
[48] S. Han, S. Lee, A vision-based motion capture and recognition framework for behav- Segmentation, 13th European Conference on Computer Vision, vol. 8693, LNCS,
ior-based safety management, Autom. Constr. 35 (2013) 131–141. Springer Verlag, Zurich, 2014 675–690, http://dx.doi.org/10.1007/978-3-319-
[49] S. Han, S. Lee, F. Peña-Mora, Vision-based detection of unsafe actions of a construc- 10602-1_44.
tion worker: case study of ladder climbing, J. Comput. Civ. Eng. 27 (6) (2012) [74] M. Tang, L. Gorelick, O. Veksler, Y. Boykov, Grabcut in One Cut, 14th IEEE Interna-
635–644. tional Conference on Computer Vision, Institute of Electrical and Electronics Engi-
[50] S.K. Sinha, P.W. Fieguth, Automated detection of cracks in buried concrete pipe im- neers Inc., Sydney, NSW, 2013 1769–1776, http://dx.doi.org/10.1109/ICCV.2013.
ages, Autom. Constr. 15 (1) (2006) 58–72. 222.
[51] S.K. Sinha, P.W. Fieguth, Neuro-fuzzy network for the classification of buried pipe [75] M.M. Soltani, Z. Zhu, A. Hammad, Automated annotation for visual recognition of
defects, Autom. Constr. 15 (1) (2006) 73–83. construction resources using synthetic images, Autom. Constr. 62 (2016) 14–23.
[52] S.K. Sinha, P.W. Fieguth, Segmentation of buried concrete pipe images, Autom. [76] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, C. Schmid, Transformation pursuit
Constr. 15 (1) (2006) 47–57. for image classification, 27th IEEE Conference on Computer Vision and Pattern Rec-
[53] Z. Zhu, S. German, I. Brilakis, Detection of large-scale concrete columns for automat- ognition, IEEE Computer Society 2014, pp. 3646–3653, http://dx.doi.org/10.1109/
ed bridge inspection, Autom. Constr. 19 (8) (2010) 1047–1055. CVPR.2014.466.
[54] Z. Zhu, S. German, I. Brilakis, Visual retrieval of concrete crack properties for auto- [77] C. Fellbaum, WordNet: An Electronic Lexical Database, Bradford Book, Cambridge,
mated post-earthquake structural safety evaluation, Autom. Constr. 20 (7) (2011) 1998.
874–883.

DAR AO No. 1-2002 (Full Text)
100% (1)
DAR AO No. 1-2002 (Full Text)
46 pages
Sensors 21 06046 v2
No ratings yet
Sensors 21 06046 v2
17 pages
Dimitrov - Vision Based Recognition and Unordered Site Image
No ratings yet
Dimitrov - Vision Based Recognition and Unordered Site Image
13 pages
SODA A Large-Scale Open Site Object Detection Dataset For Deep Learning in
No ratings yet
SODA A Large-Scale Open Site Object Detection Dataset For Deep Learning in
17 pages
Applied Computational Intelligence and Soft Computing - 2020 - Kamsing - Deep Neural Learning Adaptive Sequential Monte
No ratings yet
Applied Computational Intelligence and Soft Computing - 2020 - Kamsing - Deep Neural Learning Adaptive Sequential Monte
9 pages
Automatic Image Captioning Combining Natural Language Processing and
No ratings yet
Automatic Image Captioning Combining Natural Language Processing and
14 pages
Rahimzad RS 2021
No ratings yet
Rahimzad RS 2021
18 pages
Jiang 2021
No ratings yet
Jiang 2021
11 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Yevdokimov Thesis Stud Conf RTF 2023 Eng 30.04
No ratings yet
Yevdokimov Thesis Stud Conf RTF 2023 Eng 30.04
3 pages
Yevdokimov_Thesis_Stud_Conf_RTF_2023_Eng_30.04_rv2
No ratings yet
Yevdokimov_Thesis_Stud_Conf_RTF_2023_Eng_30.04_rv2
3 pages
Drones 05 00066 v3
No ratings yet
Drones 05 00066 v3
24 pages
Enhancing Image Annotation With Object Tracking and Image Retrieval a Systematic Review
No ratings yet
Enhancing Image Annotation With Object Tracking and Image Retrieval a Systematic Review
17 pages
Image Retrieval
No ratings yet
Image Retrieval
7 pages
Comparative Analysis of Deep Learning Image Detection Algorithms
No ratings yet
Comparative Analysis of Deep Learning Image Detection Algorithms
27 pages
Optimizing Content Based Image Retrieval in P2P Systems
No ratings yet
Optimizing Content Based Image Retrieval in P2P Systems
68 pages
Ijlbps 6620dd20c5747
No ratings yet
Ijlbps 6620dd20c5747
8 pages
AI DRIVEN WATERSYSTEM2023
No ratings yet
AI DRIVEN WATERSYSTEM2023
13 pages
Duplicate Image Detection and Comparison Using Single Core, Multiprocessing, and Multithreading
No ratings yet
Duplicate Image Detection and Comparison Using Single Core, Multiprocessing, and Multithreading
8 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
Content Based Remote-Sensing Image Retrieval With Bag of Visual Words Representation
No ratings yet
Content Based Remote-Sensing Image Retrieval With Bag of Visual Words Representation
6 pages
Overview_of_object_detection_based_on_deep_learnin
No ratings yet
Overview_of_object_detection_based_on_deep_learnin
7 pages
Efficient Online Structured Output Learning for Keypoint-Based ObjectTracking
No ratings yet
Efficient Online Structured Output Learning for Keypoint-Based ObjectTracking
8 pages
Automatic Localization of Casting Defects With Convolutional Neural Networks
No ratings yet
Automatic Localization of Casting Defects With Convolutional Neural Networks
11 pages
Real-Time Object Detection Using SSD MobileNet Mod
No ratings yet
Real-Time Object Detection Using SSD MobileNet Mod
6 pages
Hamledari - Comp Vision Detection Indoor Partitions
No ratings yet
Hamledari - Comp Vision Detection Indoor Partitions
17 pages
ObjectDetectionwithConvolutionalNeuralNetworks
No ratings yet
ObjectDetectionwithConvolutionalNeuralNetworks
12 pages
附件5
No ratings yet
附件5
29 pages
Understanding_house_numbers_for_delivery_robots-2024
No ratings yet
Understanding_house_numbers_for_delivery_robots-2024
8 pages
Remote Sensing Image Classification A Comprehensiv PDF
No ratings yet
Remote Sensing Image Classification A Comprehensiv PDF
24 pages
Building_Footprint_Generation_Using_Improved_Generative_Adversarial_Networks
No ratings yet
Building_Footprint_Generation_Using_Improved_Generative_Adversarial_Networks
5 pages
Subjective Clustering Approach by Edge Detection f
No ratings yet
Subjective Clustering Approach by Edge Detection f
11 pages
A Spatiotemporal Deep Learning Approach For Unsupervised Anomaly Detection in Cloud Systems
No ratings yet
A Spatiotemporal Deep Learning Approach For Unsupervised Anomaly Detection in Cloud Systems
15 pages
High Fidelity FEM Based on Deep Learning for Arbitrary Composite Material Structure
No ratings yet
High Fidelity FEM Based on Deep Learning for Arbitrary Composite Material Structure
17 pages
Improved Vision-Based Vehicle Detection and Classification by Optimized YOLOv4
No ratings yet
Improved Vision-Based Vehicle Detection and Classification by Optimized YOLOv4
14 pages
Towards More Efficient Security Inspection Via Deep Learning A Task-Driven X-Ray Image Cropping Scheme - Hong Duc Ngoyen
No ratings yet
Towards More Efficient Security Inspection Via Deep Learning A Task-Driven X-Ray Image Cropping Scheme - Hong Duc Ngoyen
16 pages
1 s2.0 S1047320320301115 Main
No ratings yet
1 s2.0 S1047320320301115 Main
10 pages
BSSNet_A_Real-Time_Semantic_Segmentation_Network_for_Road_Scenes_Inspired_From_AutoEncoder
No ratings yet
BSSNet_A_Real-Time_Semantic_Segmentation_Network_for_Road_Scenes_Inspired_From_AutoEncoder
15 pages
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
No ratings yet
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
14 pages
Progressive 3D Reconstruction of Infrastructure With Videogrammetry
No ratings yet
Progressive 3D Reconstruction of Infrastructure With Videogrammetry
12 pages
Remotesensing 13 05111
No ratings yet
Remotesensing 13 05111
21 pages
1. Koppadi Ramesh
No ratings yet
1. Koppadi Ramesh
109 pages
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
No ratings yet
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
14 pages
Time-Distributed Framework for 3D Reconstruction Integrating Fringe Projection with Deep Learning
No ratings yet
Time-Distributed Framework for 3D Reconstruction Integrating Fringe Projection with Deep Learning
23 pages
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
No ratings yet
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
13 pages
Review Paper On Image Processing in Distributed Environment: Sandeep Kaur, Manisha Bhardwaj
No ratings yet
Review Paper On Image Processing in Distributed Environment: Sandeep Kaur, Manisha Bhardwaj
2 pages
Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds
No ratings yet
Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds
13 pages
19-Iacit - 183
No ratings yet
19-Iacit - 183
5 pages
realtime object detection
No ratings yet
realtime object detection
22 pages
Information Sciences: Changqin Huang, Haijiao Xu, Liang Xie, Jia Zhu, Chunyan Xu, Yong Tang
No ratings yet
Information Sciences: Changqin Huang, Haijiao Xu, Liang Xie, Jia Zhu, Chunyan Xu, Yong Tang
18 pages
Regression Machine Learning Models for the Short-Time Prediction of Genetic Algorithm Results in a Vehicle Routing Problem
No ratings yet
Regression Machine Learning Models for the Short-Time Prediction of Genetic Algorithm Results in a Vehicle Routing Problem
15 pages
A brief review and challenges of object 2020
No ratings yet
A brief review and challenges of object 2020
17 pages
2311.12345v1
No ratings yet
2311.12345v1
7 pages
A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cl
No ratings yet
A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cl
15 pages
physics-based-synthetic-data-model-for-automated-segmentation-in-catalysis-microscopy
No ratings yet
physics-based-synthetic-data-model-for-automated-segmentation-in-catalysis-microscopy
17 pages
Heuristic Based Image Stitching Algorithm With Autom - 2024 - Expert Systems Wit
No ratings yet
Heuristic Based Image Stitching Algorithm With Autom - 2024 - Expert Systems Wit
12 pages
Uncertainty Oriented-Incremental Erasable Pattern Mining Over Data Streams
No ratings yet
Uncertainty Oriented-Incremental Erasable Pattern Mining Over Data Streams
15 pages
A Cognitive Based Approach For Building Detection From High Resolution Satellite Images
No ratings yet
A Cognitive Based Approach For Building Detection From High Resolution Satellite Images
5 pages
Object Detection and Trackinfg in Videos: N. Rasathi
No ratings yet
Object Detection and Trackinfg in Videos: N. Rasathi
8 pages
UNIT_3 _DL
No ratings yet
UNIT_3 _DL
15 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
ARTS Quarter 1 Module 1
No ratings yet
ARTS Quarter 1 Module 1
3 pages
Prostho Short Notes
100% (1)
Prostho Short Notes
84 pages
Relay & HV Lab Manual - 2020 Latest-1
No ratings yet
Relay & HV Lab Manual - 2020 Latest-1
43 pages
100 People Who Changed History and The World PDF
100% (1)
100 People Who Changed History and The World PDF
402 pages
Picnic Spots
No ratings yet
Picnic Spots
43 pages
Belimo SF230A Date Tehnice 10.07.10
No ratings yet
Belimo SF230A Date Tehnice 10.07.10
4 pages
Egp1111 Presentation
No ratings yet
Egp1111 Presentation
11 pages
Rachmat's Resume
No ratings yet
Rachmat's Resume
8 pages
Crafting Your Message Tips and Tricks for Educators to Deliver Perfect Presentations a Clear Process for Planning and Delivering Highly Effective Presentations 1st Edition Tammy Heflebower All Chapters Instant Download
100% (4)
Crafting Your Message Tips and Tricks for Educators to Deliver Perfect Presentations a Clear Process for Planning and Delivering Highly Effective Presentations 1st Edition Tammy Heflebower All Chapters Instant Download
40 pages
Introduction To Electrical Safety
No ratings yet
Introduction To Electrical Safety
4 pages
SCOPE OF WORK - Cement Lining
No ratings yet
SCOPE OF WORK - Cement Lining
2 pages
Design of Bamboo Scaffolds
No ratings yet
Design of Bamboo Scaffolds
105 pages
Traffic Management Guide Construction Work
No ratings yet
Traffic Management Guide Construction Work
4 pages
Galvanic Corrosion Wikipedia
No ratings yet
Galvanic Corrosion Wikipedia
7 pages
Gynecomastia
No ratings yet
Gynecomastia
22 pages
Ayushi Verma Practice School
No ratings yet
Ayushi Verma Practice School
20 pages
Police Log December 20, 2015
No ratings yet
Police Log December 20, 2015
13 pages
Sequencing Sentences
No ratings yet
Sequencing Sentences
16 pages
JSA-RA-Updated 26-02-2024
No ratings yet
JSA-RA-Updated 26-02-2024
9 pages
Anesthetic Management For Woman With Single Ventricle Heart After BCPS Who Undergoes Curretage Procedure
No ratings yet
Anesthetic Management For Woman With Single Ventricle Heart After BCPS Who Undergoes Curretage Procedure
3 pages
Companies HR Email IDs Page12
100% (2)
Companies HR Email IDs Page12
5 pages
Pub BP 1
No ratings yet
Pub BP 1
1 page
Murli 2021 03 31
No ratings yet
Murli 2021 03 31
4 pages
Functional Components of A Computer
No ratings yet
Functional Components of A Computer
23 pages
Overviewof Supercomputer S: Presented by
No ratings yet
Overviewof Supercomputer S: Presented by
21 pages
Assignment 13 2023
No ratings yet
Assignment 13 2023
3 pages
ADRL Company Deck Final
No ratings yet
ADRL Company Deck Final
16 pages
Romperalls
No ratings yet
Romperalls
16 pages
HR12-150WL HR12-150WL: Specification
No ratings yet
HR12-150WL HR12-150WL: Specification
2 pages

Kim - Data Driven Scene Parsing

Uploaded by

Kim - Data Driven Scene Parsing

Uploaded by

Automation in Construction 71 (2016) 271–282

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/autcon

Data-driven scene parsing method for recognizing construction site

1. Introduction important basis of the performance of a model because the representa-

Fig. 3. Web-based image labeling tool.

Fig. 4. Taxonomy of construction site objects.

Fig. 5. Samples of labeled images.

Fig. 6. Samples of construction images in the database.

Water_tank 13 0.00% 60%

Fig. 9. Box plots of pixel-wise and per-class recognition rates.

You might also like