0% found this document useful (0 votes)
9 views

Kim - Data Driven Scene Parsing

This document summarizes a data-driven scene parsing method for recognizing construction site objects in whole images using limited training data. The method uses three modules: 1) a scene retrieval module that finds similar images in a database using GIST matching, 2) a dense scene alignment module that selects the most related images using SIFT flow matching, and 3) a scene parsing module that transfers labels from the most related images to label pixels in the query image. The method demonstrated reasonable performance, achieving an 81.48% average pixel-wise recognition rate on test construction site images using a small number (less than three) similar labeled images for each scene. This nonparametric scene parsing approach provides scalable and global object recognition capabilities using limited

Uploaded by

samir.aman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Kim - Data Driven Scene Parsing

This document summarizes a data-driven scene parsing method for recognizing construction site objects in whole images using limited training data. The method uses three modules: 1) a scene retrieval module that finds similar images in a database using GIST matching, 2) a dense scene alignment module that selects the most related images using SIFT flow matching, and 3) a scene parsing module that transfers labels from the most related images to label pixels in the query image. The method demonstrated reasonable performance, achieving an 81.48% average pixel-wise recognition rate on test construction site images using a small number (less than three) similar labeled images for each scene. This nonparametric scene parsing approach provides scalable and global object recognition capabilities using limited

Uploaded by

samir.aman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Automation in Construction 71 (2016) 271–282

Contents lists available at ScienceDirect

Automation in Construction

journal homepage: www.elsevier.com/locate/autcon

Data-driven scene parsing method for recognizing construction site


objects in the whole image
Hongjo Kim, Kinam Kim, Hyoungkwan Kim ⁎
Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history: Although efforts have been made for automated monitoring of construction sites, comprehensive understanding
Received 12 February 2016 of a whole image remains to be a difficult task. Conventional vision-based monitoring methods have shortcom-
Received in revised form 2 June 2016 ings in obtaining semantic information regarding an entire image because these methods are not scalable to the
Accepted 15 August 2016
number of recognizable objects and training data. Most methods use a parametric model to recognize objects, in-
Available online 25 August 2016
volving cumbersome parameter tuning. This study presents the data-driven scene parsing method to recognize
Keywords:
various objects in a construction site image. For identifying object information of a query image, the monitoring
Construction site system retrieves the most relevant images to a query image using nearest neighbors and scale invariant feature
Computer vision transform flow matching and transfers relevant image labels to a query image. This study demonstrated the rea-
Scene parsing sonable system performance in construction site images, recording 81.48% of average pixel-wise recognition rate
Object recognition with a small amount of similar images. The scene parsing method would enrich the raw information of a con-
Label transfer struction site image, thereby facilitating information use for various management applications.
© 2016 Elsevier B.V. All rights reserved.

1. Introduction important basis of the performance of a model because the representa-


tive patterns of objects cannot be learned from only a few examples
Information retrieved from images is widely used for construction [33].
management applications such as progress monitoring, productivity The construction industry lacks publicly available image data, which
analysis, safety management, and facility condition assessment [1–7]. is a major obstacle in implementing an intelligent monitoring system
Advanced monitoring methods essentially involve the object recogni- [4]. Therefore, developers have to prepare training data for their recog-
tion for extracting contextual information. Previous studies have suc- nition targets. This time-consuming and labor-intensive process of
cessfully identified construction workers, vehicles, structures, and collecting training data is a daunting task for a few individuals or orga-
materials for monitoring construction sites with computer vision tech- nizations. As a result, previous monitoring systems were only capable of
nology [8–32]. However, an environment in a construction site has not identifying a limited number of object classes. While a state-of-the-art
received enough attention of researchers, despite its usefulness of un- system increases the number of recognizable object classes up to 23
derstanding the construction status in a holistic way. Furthermore, the [16], a large amount of training data is still required to recognize various
number of recognizable objects in previous studies is small and fixed; construction objects and generate a robust generalization capable
thus, changing the number of target object classes entails a tedious pro- model (generalization capability is the performance of a model on un-
cess that involves gathering data, training a system, or tuning parame- seen data). Diverse photographing methods, including closed circuit
ters. These nonscalable systems capture limited information and may television, mobile phone, camcorder, unmanned aerial vehicle, and
lose important data because the number of object categories and the wearable recording devices, have augmented the types of image data
construction site environment vary over time. available; hence, monitoring systems require more training data to
To recognize various object classes, monitoring systems mainly rely deal with a large degree of appearance variance of objects.
on artificial intelligence (AI) to which supervised learning is applied. In To address these issues, a global recognition system for construction
supervised learning, developers provide training data to their model, site imagery was proposed. The initial idea of this paper was presented
and a model learns the particular patterns in the given data in order to in the 33rd ISARC (International Symposium on Automation in Con-
identify target objects. A sufficient amount of training data is the most struction) [34]. The system uses a nonparametric scene parsing method
[35], which labels all pixels of a query image to their category by trans-
⁎ Corresponding author.
ferring labels of similar images in a database. The method comprises
E-mail addresses: [email protected] (H. Kim), [email protected] (K. Kim), three modules: (1) a scene retrieval module, which determines a set
[email protected] (H. Kim). of similar images to the query image using a database and GIST

http://dx.doi.org/10.1016/j.autcon.2016.08.018
0926-5805/© 2016 Elsevier B.V. All rights reserved.
272 H. Kim et al. / Automation in Construction 71 (2016) 271–282

matching [36], (2) a dense scene alignment module that selects the reasoning processes, this method is not practical for recognizing various
most related images from the retrieved images by scale invariant fea- materials within an environment.
ture transform (SIFT) flow [37], and (3) a scene parsing module that
conveys the labels of the most related images to the query image. This
method demonstrates a reasonable performance of global recognition 2.2. Scalable nonparametric scene parsing method
on a particular construction site image when used with a small number
of pre-prepared images (less than three completely labeled images to a Conventional approaches to object recognition necessitate defining
monitoring target scene) of similar data. Furthermore, recognizable ob- an object model using template matching, bags of features, or shape
ject categories are scalable to images, which means that the system can models [35]. These methods require training for each class; thus, repet-
identify the various number of object categories in an image during con- itive, time-consuming processes are inevitable, varying by the size of
struction operation. the object category or training data available. To overcome these limita-
Our experiment was designed to demonstrate the performance of tions, Liu et al. [35] proposed a data-driven scene parsing method called
our system in global recognition and scalability to target object classes label transfer, which is a nonparametric and scalable technique. Scene
when used with a limited data set. The experiments were conducted parsing provides a semantic understanding of whole image areas by
using a database that contained 211 completely annotated construction segmenting and recognizing independent regions of an image.
site images. The images were obtained by five different individuals, with Implementing this method is a promising means of monitoring con-
three coming from outside the construction industry. A recognition rate struction sites in terms of global recognition capability, scalability, and
of 81.48% was reported for the 42 test images. Experimental settings practicality. This method transfers labels (information of object identi-
and details are described in Section 4. ty) to a query image from similar images, thereby enabling scaling to
The main contributions of the scene parsing system comprises three the number of object categories as each image has its own number of
main components: (1) the global recognition capability that enriches object classes. If a few similar images to a particular job site can be pre-
the information quality obtained from construction sites, thereby en- pared, this method can successfully identify the environment of the
abling a more comprehensive understanding of the context of job sites construction site in a query image. Because the method is a nonpara-
and advanced management applications; (2) the scalability, which metric model, which makes no assumptions on the distributions of an
maintains the performance of a monitoring system during construction object's features, only a few parameters remain to be trained. The details
operation; and (3) a practical solution for monitoring a construction site of the algorithm can be found in Section 3.
with limited data, requiring only a minimal effort in preparing a small A similar method has been used for roadway asset monitoring in
amount of training data to obtain a good recognition performance. An [65]. The study used the superparsing algorithm [66] for segmenting
additional advantage of our system is the small number of parameters, and recognizing roadway assets in video frames. The superparsing
only four, which require limited tuning, making this a highly practical method is very similar to the label transfer method used in [35]. The
system to use. main difference between the methods is the basic element of the
image. The superparsing algorithm uses superpixels (small clusters of
2. Related work pixels) as an image primitive, whereas the label transfer method uses
an original pixel grid. Using superpixels, superparsing algorithms have
2.1. Previous vision-based monitoring system recognizing objects in con- significant computational efficiency. The generation of superpixels in-
struction sites volves the image segmentation method (e.g., graph cut). Because nu-
merous materials used in construction sites appear in various sizes in
The vision-based monitoring systems have been used in previous each image and may have a similar appearance depending on the mon-
studies to monitor target objects for various purposes such as progress itoring distance and viewpoint, designing image segmentation tech-
monitoring [12,14,16,31,32,38–42], productivity analysis [8,18,20,26, niques and setting parameters to generate optimal superpixels can be
43], safety management [24,44–49], facility condition assessment [13, cumbersome. Thus, this study adopts the image parsing method used
50–58], and monitoring technology used on a construction site [10,19, in [35].
21,22,27–29,59–64]. Most of the studies have one or two recognizable
object classes. In general, this is sufficient because the number of recog-
nition targets required for their research objectives is typically less than 3. Scene parsing system for construction site monitoring
three. Nevertheless, their monitoring performance is limited to a specif-
ic application because diverse applications are not possible without 3.1. Overview of the global recognition system
comprehensive recognition on entire image areas.
To date, only a few studies have more than two recognizable objects The scene parsing system aims to recognize whole image areas, and
[16,22,32,42]. Chi and Caldas [22] have tested the performance of two the label transfer method described in [35] is used in this study. The task
classifiers, that of the naïve Bayes classifier and the neural networks, involves segmentation and recognition. Segmentation denotes separat-
as means to identify a worker, loader, and backhoe. Son et al. [32] ing independent regions of an image, whereas recognition means iden-
have identified major construction materials, including concrete, steel, tifying the information required in determining an object's identity. The
and wood, using an ensemble technique on multiple classifiers. Han system performs the two processes in a recognition-by-matching
and Golparvar-Fard [16] and Dimitrov and Golparvar-Fard [42] classi- scheme in which labels of similar images are transferred to a query
fied construction materials of over 20 classes forming a joint probability image [35]. The database has the completely labeled images that are ob-
distribution of material appearances for feature generation and using a tained from a web-based image labeling tool. The completely labeled
multiple binary support vector machine for object classification. Despite images are used as training data from which the system can exploit
these successes, the abovementioned methods have insufficient recog- the prior information of the per-pixel frequency of each object category
nizable objects for diverse applications and are not scalable to the num- to be used in Eq. (5) in Section 3.1.3. By searching the most similar im-
ber of recognizable objects; thus, the methods involve the cumbersome ages to the query image, some of the training images are used as the
re-adjustment of the system or gathering additional training data to bases for transferring their labels to a query image. Fig. 1 shows the sys-
change the number of identifiable object categories. The naïve Bayes tem architecture, and the outline of the scene parsing process are shown
classifier used in [22] is scalable; however, strong assumptions are re- in Fig. 2. As shown in the Fig. 1, the main component of the system com-
quired to be made on representative features of an object. Because the prises three modules, i.e., scene retrieval, dense scene alignment, and
establishment and validation of each assumption requires complicate scene parsing. Details are present in the Subsections 3.1.1–3.1.3.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 273

Fig. 1. Scene parsing system architecture that was adapted from [35].

Fig. 2. Outline of the scene parsing process. (a), a query image; (b), an image of the final candidate images from the scene retrieval and dense scene alignment step; (c), an integer image
representing the labels on (b); (d), transferred labels from the final candidates to (a) using a probabilistic Markov random field model; (e), an integer image representing the labels on (d);
and (f), the ground-truth labels of (a).
274 H. Kim et al. / Automation in Construction 71 (2016) 271–282

3.1.1. Scene retrieval rule can be applied for determining the most relevant label of a pixel.
When a query image comes into the system, similar images to the The posterior probability that contains the likelihood, prior, and
query image are retrieved using the nearest neighbor (NN) classifiers. smoothness terms is formulated as follows:
Among the variants of NN, a combination of the K-NN and ε-NN is
used in [35], which determines the K closest neighbors to the query −X
logP ðcjI;
 s; fsi ;ci ;0 wi gÞ ¼X X
(K-NN) within (1 + ε) times the minimum distance from the query p
ψ cðpÞ; s; si þ α p λðcðpÞÞ þ β fp;qg∈ε ϕðcðpÞ; cðqÞ; I Þ þ logZ;
(ε-NN). The (K,ε)-NN is defined as ð5Þ

N ðxÞ ¼ fyi jdist ðx; yi Þ ≤ ð1 þ ε Þdist ðx; yi Þ; yi ¼ arg mindist ðx; yi Þ; ibKg; ð1Þ where Z is the normalization constant, {si, ci, wi}i = 1:M is a set of the final
candidates, si is the SIFT image, ci is an integer image where ci(p)∈{1,
where x is a query image, yi is one of the closest neighbors, and dist (·,·)
…,L} is the index of object category of pixel p, wi = the SIFT flow field
calculates the distance between two images [35]. The distance function
from s to si [35].
dist (·,·) measures image similarities, and this study adopts an Euclide-
The likelihood term ψ is defined as
an distance of GIST [36] as a scene descriptor. The GIST descriptor is a
representation of the global scene structure of an image, computing a 8
< min jjsðpÞ−si ðp þ wðpÞÞjj; Ωp;l ≠0;
set of perceptual dimensions, such as naturalness, openness, roughness, i∈Ωp;l
ψðcðpÞ ¼ lÞ ¼ ð6Þ
expansion, and ruggedness, using a Fourier transform and principal :
τ; Ωp;l ¼ 0;
component analysis.
where Ωp,l = {i; ci(p + w(p)) = l}, l = 1, …, L, is the index set of the final
3.1.2. Dense scene alignment candidate images whose the transferred label is l to pixel p [35]. τ is the
To select the most relevant images to a query image from those re- value of the maximum difference of the SIFT feature of a query image to
trieved, visual features are extracted using the scale-invariant keypoint one of the final candidate images (τ = maxs1,s2,p ||s1(p) − s2(p)||) [35].
descriptor [67]. Different from the original procedure of extracting fea- The prior term λ is defined as
tures called SIFT [67], only the keypoint descriptor stage is applied to
every pixel of an image to characterize local image structure. The SIFT λðcðpÞ ¼ lÞ ¼ − loghist l ðpÞ; ð7Þ
keypoint is a salient feature of construction site images because the fea-
ture is an invariant to scale, rotation, affine distortion, and illumination. where λ(c(p) = l) is the prior probability that the object category l ex-
To align pixel-to-pixel correspondence between two images (one of the ists at pixel p, counting the occurrence of each object classes at each
retrieved images to a query image), the SIFT flow method [37] is used. pixel in the training set [35]. histl (p) is the spatial histogram of object
The procedure is formulated as the energy function with three key as- category l [35].
sumptions: 1) brightness constancy, 2) small motion, and 3) spatial co- The smoothness term ϕ is defined as
herence between corresponding pixels in two images: !
2
X    ξ þ exp−γjjIðpÞ−IðqÞjj
EðwÞ ¼ min s1 ðpÞ−s2 ðp þ wðpÞÞ1; t þ ð2Þ ϕðcðpÞ; cðqÞÞ ¼ δ½cðpÞ≠cðqÞ ; ð8Þ
p ξþ1

∑ ηðjuðpÞj þ jvðpÞjÞþ ð3Þ where image contrast variable γ = (2 b || I(p) − I(q)||2 N)− 1, b · N
p
denoting an average over the image and [·] is the zero-one indicator
function [35,68,69]. This term restrains the label at pixel p to be similar
∑ minðλjuðpÞ−uðqÞj; dÞ þ minðλjvðpÞ−vðqÞj; dÞ; ð4Þ
ðp;qÞ∈ε to neighbor pixels when brightness values are similar or to be different
if the values have a large difference. In the case of no label information
where Eq. (2) is the data term, Eq. (3) is the small displacement term, on a pixel, a label of a neighbor is assigned by this equation.
Eq. (4) is the smoothness term, p = (x,y) is the spatial coordinate of a By minimizing Eq. (5) using the BP-S algorithm [68], all pixels on a
pixel, w(p) = [u(p),v(p)] is the flow vector at p, s1 and s2 are the per- query image obtain their own labels. The posterior probability function
pixel SIFT descriptor for two images, and ε is a set of the four spatial has four parameters, K, and M controls the mode of the model, α con-
neighborhood [35]. The data term in Eq. (2) maintains brightness con- trols the influence of spatial prior, and β controls the impact of smooth-
stancy by matching the SIFT descriptors along with the flow vector ness. Samples of scene parsing results are shown in Fig. 2(e).
w(p). The small displacement term in Eq. (3) limits the magnitude of
the flow vectors to be small. The smoothness term in Eq. (4) constrains 3.2. Web-based dataset generation platform
adjacent pixels of the flow vector to be similar with a spatial regulariza-
tion constant λ. For minimizing the energy function, the sequential be- The data-driven scene parsing methods require completely labeled
lief propagation (BP-S) algorithm [68] is used, with t and d as the image data for identifying objects on a query image. Labeled image
thresholds of matching outliers and flow discontinuities, respectively. data has separated objects within its boundary and object information.
From this step, M number of the final candidates (M ≤ K) are selected This data is used for 1) the prior information of the per-pixel frequency
for transferring their labels to a query image by ranking the retrieved of each object class and 2) transferring labels of final candidate images
images in descending order of energy. Fig. 2(b) is an example of the to a query image by matching similar pixels across images. To generate
final candidates to a query image in Fig. 2(a). labeled image data, a web-based image labeling platform for monitoring
construction sites was developed on a private webserver, employing
3.1.3. Scene parsing functions of the LabelMe online annotation tool [71]. In the website,
Scene parsing is the process of segmenting a query image into inde- users can designate a region of an object by clicking vertices of a polygon
pendent regions and recognizing them by transferring the labels of final and can annotate its name. Fig. 3 shows an interface of the website with
candidate images. To reconcile multiple labels of candidates to a pixel an example of a completely labeled image.
and impose spatial smoothness, a probabilistic Markov random field
(MRF) model is built [35], which is a graphical model in which pixels 3.3. Taxonomy of construction site objects
are only influenced by their adjacent pixels that have a Markov proper-
ty. A graph structure represents an image as a network of pixels with Because humans label an image, there is no guarantee of having sim-
edge and node components in place of a pixel grid. In MRF, the Bayes' ilar labeled results by different people. This is because perceptions on an
H. Kim et al. / Automation in Construction 71 (2016) 271–282 275

Fig. 3. Web-based image labeling tool.

object boundary vary with respect to individuals. For example, some in- construction industry, and thus, the taxonomy of construction site
dividuals label a building as a whole appearance, whereas others sepa- objects was presented as a reference of object names, as shown in
rate windows of the building to represent independent regions. Fig. 4. Five labelers annotated object tags on whole image areas, tak-
Another issue is that labelers can tag different names on the same ing an average of 30 min/image. The samples of the labeled images
object (e.g., reinforcement bar and rebar). Such issues are because are shown in Fig. 5. The original image size was 1920 × 1080, and
of the different object perceptions of labelers. Another problem is the size was reduced to half the size (960 × 540) for computational
that if a labeler has insufficient knowledge regarding construction efficiency.
site objects, incorrect labels might be assigned. Inconsistent training
data occurred by these issues would yield poor performance on the 4.2. Construction site image dataset
data-driven scene parsing system. In other words, different labels
on a same object results in producing redundant object classes; this Construction site images used in this experiment were obtained
effect is undesirable because each of these classes has to split the from two construction projects at Yonsei University from which images
limited object samples. Section 4.3.3 Performance of the scene parsing and videos were recorded for years: 1) the Baekyang-ro renovation pro-
system shows the importance of a sufficient amount of object sam- ject and 2) the engineering building extension project. A camcorder,
ples as it supports a high recognition rate. camera, and mobile phone were used to acquire construction site im-
To minimize these problems, it is worth directing labelers by pre- ages at a range of locations, e.g., from the rooftop of a building to the
senting construction site object taxonomy. Because monitoring applica- ground. In addition, an unmanned aerial vehicle with a rotary wing cap-
tions of this study are not limited to a specific purpose, fundamental tured aerial images of the construction sites. A number of construction
categories are established for the general purpose of monitoring. Con- operations were captured, such as earthwork, building construction,
struction site objects are classified on the basis of the following four temporary facility construction in various illumination, and weather
groups: 1) moving objects (e.g., worker, excavator, or truck), 2) material conditions, as shown in Fig. 6. In general, construction site images in
types (e.g., h-beam or pipe), 3) structure (e.g., temporary building), and the database were congested with several objects.
4) environment (e.g., ground or sky), as shown in Fig. 4. The proposed
taxonomy is set for the specific job sites used in the experiment and 4.3. Results
do not include comprehensive object categories for the construction in-
dustry. Thus, depending on monitoring applications, the taxonomy and 4.3.1. Performance evaluation criterion
the level of detail should be changed. For example, if monitoring appli- To evaluate the performance of the scene parsing system, average
cations include safety management employing posture analysis, la- pixel-wise recognition rate r was computed by the following equation
belers have to tag each part of the human body for which the [35]:
taxonomy contain the details of human body parts to capture their mo-
tion. A list of commonly used temporary construction resources in [5] 1
r ¼ ∑ ∑ 1ðoðpÞ ¼ aðpÞ; aðpÞN0Þ; ð9Þ
can be referred. ∑i mi i p∈Λi

4. Experiments
where p is a pixel in image i, a(p) is the ground-truth label [for unla-
beled pixels, a(p) = 0], o(p) is the output, Λi is a pixel grid of test
4.1. Experimental setting
image i, and mi = ∑ 1ðaðpÞN0Þ is the number of labeled pixels for
p∈Λi
The scene parsing was conducted on a desktop computer with Intel image I [35]. To calculate the recognition rates for each object category,
i7-4770 CPU, 32 GB RAM in the Windows 10 64 bit operating system. the per-class average recognition rates were computed by the following
The study used the open source code of label transfer [35] for scene equation [35]:
parsing. The code was executed in an MATLAB environment. Functions
of the LabelMe [71], built in the Ubuntu 14.04 operating system, were
used to build a web-based image labeling tool to generate training ∑ ∑ 1ðoðpÞ ¼ aðpÞ; aðpÞ ¼ lÞ
i p∈Λi
data. The 169 training images and 42 test images were generated in rl ¼ ; l ¼ 1; …; L: ð10Þ
∑ ∑ 1ðaðpÞ ¼ lÞ
the website by five labelers. Three were irrelevant to the i p∈Λi
276 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 4. Taxonomy of construction site objects.


H. Kim et al. / Automation in Construction 71 (2016) 271–282 277

Fig. 5. Samples of labeled images.

4.3.2. Parameter searching test images and the per-class recognition rates of the construction ob-
Four parameters, K number of neighbors, M number of the final can- jects in the whole test set. Fig. 10 illustrates expanded samples of the
didates, prior weight α, and spatial weight β, control the performance of scene parsing results.
the scene parsing system. The spatial regularization parameter λ of the
SIFT flow field in Eq. (4) was selected as 0.7, which is the optimal num- 5. Discussion
ber reported in [35]. The parameters K = 5, 10, 15, and 20; M = 1, 2, 3, 5,
and 10; α = 0.02, 0.04, 0.06, 0.08, 0.1, and 0.12; and β = 5, 10, 20, 30, 5.1. Analysis of the experimental result
and 40 were tested to determine the optimal solution. The various num-
bers of M were first tested on the basis of K = 20, α = 0.06, and β = 20, From the experiment, the scene parsing system shows the global
and M = 2 exhibited the best performance. Likewise, the K, α, and β recognition capability on the construction site images, recording
were tested, and the best parameter set (K = 20, M = 2, α = 0.06, 81.48% of the average pixel-wise recognition rate. Even an unlabeled
and β = 20) were empirically obtained; the testing results are shown pixel is labeled using the neighborhood information of the final candi-
in Fig. 7. For selecting K, numbers under 20 were tested because of the date images by the constraint of a MRF model. However, the perfor-
small size of the training dataset. mance varies by whether similar images are retrieved or not. In Fig. 2,
two query images of the upper two rows have similar candidate images;
thus, the performances are rather good. In contrast, the performances of
4.3.3. Performance of the scene parsing system the lower two query images are low because the system failed to re-
The performance of the scene parsing system has been demonstrat- trieve similar candidate images. The result implies that the good perfor-
ed in two indices: 1) the average pixel-wise recognition rate and 2) the mance of a query image can be guaranteed when having similar training
per-class average recognition rates. A total of 42 images were used for a images in the database and correctly retrieving them.
test set, comprising 20% of the total images in the database. The average The system demonstrates scalability to the number of recognizable
pixel-wise recognition rate was 81.48% from the 42 test images, with a object classes. By selecting the final candidate images, objects in a
median value of 84.47% and standard deviation of 13.12%. The average query image are labeled without retraining the whole system to adjust
processing time was 7.93 s/query image. The per-class average recogni- for the number of recognizable object categories. For example, each row
tion rates are selectively shown in Tables 1, 2, and 3 from the total 119 in Fig. 2 has a query image with the different number of object
object classes. The average per-class recognition rate from the test set categories.
was 59.82% with a median value of 67.37%. Fig. 8 shows the distribution Several similar images to a particular job site yields a good recogni-
of the per-class recognition rates with respect to the object tag counts. tion performance because two final candidate images have transferred
Fig. 9 shows the box plots of the pixel-wise recognition rates of the their labels into a query image in the experiment. This means that the
278 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 6. Samples of construction images in the database.

little effort for preparing a few images to monitor target areas can assure rates. This claim is supported by the increasing tendency of per-class
promising recognition performance. Nonetheless, an abundant amount recognition rates along with the number of tag counts, as shown in
of data can be helpful to recognize various objects with high recognition Fig. 8.

Fig. 7. Average pixel-wise recognition rates to each parameter, M candidates, K neighbors, spatial weight α, and prior weights β.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 279

Table 1 Table 3
Top 20 objects in a descending order of per-class recognition rates. Per-class recognition rate of major construction objects.

Object name Tag counts Recognition rate Object name Tag counts Recognition rate
Building_under_construction 91 97.54% Building_under_construction 91 97.54%
Loader 1 96.81% Loader 1 96.81%
Dust 6 95.63% Fork_lift 2 92.86%
Materials_sack 39 94.32% Crane 69 92.77%
Fork_lift 2 92.86% H_beam 1578 90.53%
Crane 69 92.77% Ground 739 86.75%
tTmporary_building 47 92.32% Building 344 86.52%
Materials_rope 13 91.48% Boring_machine 46 84.00%
Safety_fence 476 91.36% Worker 550 76.90%
Waste_re-bar 5 91.32% Concrete 46 75.44%
Banner 142 91.12% Excavator 165 69.53%
H_beam 1578 90.53% Truck 95 62.39%
Window 1044 90.32% Truck_concrete_mixer 21 61.76%
Materials_drum 40 90.30% Pillar 212 54.87%
Panel 131 87.51% Concrete_form 65 51.51%
Tube 140 87.01% Concrete_column 44 42.82%
Ground 739 86.75% Scaffold 305 39.17%
Steel_pipe 822 86.67% Re-bar 216 38.20%
Building 344 86.52% Wood 52 11.74%
Garbage 1 86.41% Soil 13 0.00%

5.2. Limitations and suggestions up to two. However, the small number of candidate images may have
insufficient object classes. Therefore, it may miss some object classes,
The major limitations of the scene parsing method can be summa- such as a concrete mixer or dump truck, which enter a construction
rized into four main components: site as required. To cope with the problem of scalability, the number
The system has missed some small objects that account for a tiny of final candidate images M should be increased for a sufficient number
area in the images. Several small objects shown in Table 3 such as scaf- of transferable object categories. The label transfer system in [35] re-
fold, rebar, and wood recorded low recognition rates of 39.17%, 38.20%, ports the experimental result that the number of final candidate images
and 11.74%, respectively, even though these objects had enough tag M = 3, 5, 7, and 9 yields high performance with the LabelMe outdoor
counts of 305, 216, and 52, respectively. This error is caused by inaccu- database. However, the increased M number demonstrates a decreasing
rate labeling of a small object. When labeling a tiny object, the labeled tendency of the average pixel-wise recognition rate in this study, as
region tends to be larger than what is necessary to cover the whole shown in the upper left graph in Fig. 7. Probable reasons for this phe-
area of the object; this results in an extra region that is not required to nomenon include 1) the small number of images in the database are in-
represent the entire area of the object. This extra region represents inac- sufficient to support similar images to a query image and 2) a
curate features of an object, thereby leading to a poor recognition rate. construction site is congested with many items in a unique environ-
To minimize this problem, segmentation methods, such as graph cut ment. Therefore, it is difficult to have similar scenes even if they are in
[72,73] or GrabCut [70,74] can assist users in extracting an exact region the same type of construction project. Furthermore, some object classes
of interest using user inputs as prior information for segmentation. might not be in a few final candidate images. The implementation of
Above all, for some applications, a target object of interest in an image complementary modules for detecting entities that occasionally appear
should account for a detectable and labelable size. in images should be introduced in the future.
In the experiment, the label transfer method could increase its rec- The system is not suitable for real-time applications because the av-
ognition performance by reducing the number of final candidate images erage processing time exceeds a second. This is because the system de-
termines the labels of every pixel. For a real-time application 1) parallel

Table 2 100%
Bottom 20 objects in an increasing order of per-class recognition rates.
90%
Object name Tag counts Recognition rate
80%
Hole 19 0.00%
Waste 17 0.00% 70%
Soil 13 0.00%
Recognition rate

Water_tank 13 0.00% 60%


Motorcycle 10 0.00%
Gas_tank 9 0.00% 50%
Bus 6 0.00%
Waste_rock 68 4.10% 40%
Gnerator_car 38 6.10%
Wood 52 11.74% 30%
Box 3 20.42%
Sewer 14 22.59% 20%
Concrete_pipe 127 26.06%
10%
Wall 109 26.71%
Door 81 27.07%
0%
Materials_pocket 11 27.46%
0 200 400 600 800 1000 1200 1400 1600
Materials_box 5 31.83%
Rubber_cone 37 32.61% Tag counts
Excavator_bucket 45 32.88%
Bicycle 4 37.25% Fig. 8. Per-class recognition rates in regard to object tag counts. A tag is one piece of an
object segment.
280 H. Kim et al. / Automation in Construction 71 (2016) 271–282

Fig. 9. Box plots of pixel-wise and per-class recognition rates.

processing techniques can be used or 2) variants of the data-driven shown in Fig. 9. The discrepancy between the average pixel-wise and
scene parsing methods can be adopted (e.g., superparsing techniques per-class recognition rates was caused by some successfully identified
[66], successfully used in roadway asset monitoring [65]). objects taking a large portion in an image (e.g. building, ground, win-
In this study, a small number of construction site images were used dow and safety fence, as shown in Table 1). Lack of training images
for training the system. Despite the system performance on a whole caused poor performance for some objects shown in Table 2. A small
image was reasonable as it recorded the average pixel-wise recognition amount of training data taken in a fixed period time can be used for
rate of 81.48%, the average per-class recognition rate was relatively low monitoring only limited types of construction sites. The monitoring sys-
(59.82%) showing high variances among the construction objects as tem might fail to produce a similar accuracy for a new image coming

Fig. 10. Samples of the scene parsing result. (a) query image, (b) labeled result, and (c) ground-truth label.
H. Kim et al. / Automation in Construction 71 (2016) 271–282 281

from a future construction operation with an unique scenery, which would like to thank Yonsei University for granting access to the con-
would not be contained in the existing database. Liu et al. [35] indicates struction site of the Baekyang-ro renovation project.
that the performance of the scene parsing system can be improved with
increasing number of training images. A construction image database
should have more labeled construction site images for robust perfor- References
mance. Moreover, because all construction sites are unique and
[1] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, P. Fieguth, A review on computer vi-
congested with various objects, it takes a significant amount of time sion based defect detection and condition assessment of concrete and asphalt civil
for labeling (approximately 30 min/image in this study). A large amount infrastructure, Adv. Eng. Inform. 29 (2) (2015) 196–210.
of labeled image data can be obtained using a crowdsourcing platform [2] J. Yang, M.-W. Park, P.A. Vela, M. Golparvar-Fard, Construction performance moni-
toring via still images, time-lapse photos, and video streams: now, tomorrow, and
(e.g., Amazon Mechanical Turk), generating synthetic images from 3D the future, Adv. Eng. Inform. 29 (2) (2015) 211–224.
CAD models [75], or enlarging training data using image transformation [3] V. Pătrăucean, I. Armeni, M. Nahangi, J. Yeung, I. Brilakis, C. Haas, State of research in
techniques [76]. automatic as-built modelling, Adv. Eng. Inform. 29 (2) (2015) 162–171.
[4] J. Seo, S. Han, S. Lee, H. Kim, Computer vision techniques for construction safety and
For labeling construction site images, the expertise of labelers re- health monitoring, Adv. Eng. Inform. 29 (2) (2015) 239–251.
garding construction site objects is necessary to name exact object clas- [5] J. Teizer, Status quo and open challenges in vision-based sensing and tracking of
ses. Because the perception of labelers varies, labeling rules or temporary resources on infrastructure construction sites, Adv. Eng. Inform. 29 (2)
(2015) 225–238.
taxonomy should be suggested for acquiring consistent results. It is
[6] H. Son, F. Bosché, C. Kim, As-built data acquisition and its use in production monitor-
also possible to allow labelers to freely annotate an object name by ing and automated layout of civil infrastructure: a survey, Adv. Eng. Inform. 29 (2)
their perceptions. However, system developers should pay for the (2015) 172–183.
price of this autonomy by having to develop extra post-processing [7] H. Fathi, F. Dai, M. Lourakis, Automated as-built 3D reconstruction of civil infrastruc-
ture using computer vision: achievements, opportunities, and challenges, Adv. Eng.
steps to match between synonyms or to link between superordinate Inform. 29 (2) (2015) 149–161.
and subordinate terms in a sematic hierarchy. Russell et al. [71] sug- [8] J. Gong, C.H. Caldas, An object recognition, tracking, and contextual reasoning-based
gested a method for establishing semantic relationships between object video interpretation method for rapid productivity analysis of construction opera-
tions, Autom. Constr. 20 (8) (2011) 1211–1226.
labels based on WordNet [77], an electronic dictionary with semantic [9] S. Chi, C.H. Caldas, D.Y. Kim, A methodology for object identification and tracking in
hierarchies of words. One can refer to the method for the post- construction based on spatial modeling and image matching techniques, Comput.
processing. Aided Civ. Infrastruct. Eng. 24 (3) (2009) 199–211.
[10] I. Brilakis, L. Soibelman, Y. Shinagawa, Material-based construction site image re-
trieval, J. Comput. Civ. Eng. 19 (4) (2005) 341–355.
[11] M. Golparvar-Fard, F. Peña-Mora, C.A. Arboleda, S. Lee, Visualization of construction
6. Conclusion progress monitoring with 4D simulation model overlaid on time-lapsed photo-
graphs, J. Comput. Civ. Eng. 23 (6) (2009) 391–404.
This study presented a recognition system for a construction site, [12] Y.H. Wu, H. Kim, C. Kim, S.H. Han, Object recognition in construction-site images
using 3D CAD-based filtering, J. Comput. Civ. Eng. 24 (1) (2010) 56–64.
which identified whole image areas using the scene parsing method
[13] Z. Zhu, I. Brilakis, Parameter optimization for automated concrete detection in image
proposed in [35]. The system could recognize the varying size of object data, Autom. Constr. 19 (7) (2010) 944–953.
categories by transferring labels of the final candidate images to a query [14] M. Golparvar-Fard, F. Peña-Mora, S. Savarese, Automated progress monitoring using
unordered daily construction photographs and IFC-based building information
image. As the system had few parameters, it was easy to optimize the
models, J. Comput. Civ. Eng. 29 (1) (2012) 147–165.
objective function, which was a probabilistic MRF for scene parsing. [15] H. Son, C. Kim, C. Kim, Automated color model-based concrete detection in con-
Implementing the data-driven scene parsing method in a construction struction-site images by using machine learning algorithms, J. Comput. Civ. Eng.
site was novel, thereby demonstrating the global recognition capability 26 (3) (2012) 421–433.
[16] K.K. Han, M. Golparvar-Fard, Appearance-based material classification for monitor-
and scalability of this monitoring system. The performance of the scene ing of operation-level construction progress using 4D BIM and site photologs,
parsing system recorded 81.48% of the average pixel-wise recognition Autom. Constr. 53 (2015) 44–57.
rate from the 42 test images with 169 training images. The experiment [17] J. Teizer, C.H. Caldas, C.T. Haas, Real-time three-dimensional occupancy grid model-
ing for the detection and tracking of construction resources, J. Constr. Eng. Manag.
demonstrated that a high recognition rate could be attained with only 133 (11) (2007) 880–888.
the two final candidate images to a query image. Likewise, one can mon- [18] J. Zou, H. Kim, Using hue, saturation, and value color space for hydraulic excavator
itor a construction site using the scene parsing system by preparing a idle time analysis, J. Comput. Civ. Eng. 21 (4) (2007) 238–246.
[19] J. Teizer, P.A. Vela, Personnel tracking on construction sites using video cameras,
small number of completely labeled images for a particular job site Adv. Eng. Inform. 23 (4) (2009) 452–462.
scene. By semantic understanding on entire areas in images, abundant [20] J. Gong, C.H. Caldas, Computer vision-based video interpretation model for automat-
knowledge can be generated for various construction management ed productivity analysis of construction operations, J. Comput. Civ. Eng. 24 (3)
(2010) 252–263.
applications.
[21] I. Brilakis, M.W. Park, G. Jog, Automated vision tracking of project related entities,
Currently, the scene parsing method has limitations in recognizing Adv. Eng. Inform. 25 (4) (2011) 713–724.
various construction objects. Therefore, a large amount of labeled [22] S. Chi, C.H. Caldas, Automated object identification using optical video cameras on
construction sites, Comput. Aided Civ. Infrastruct. Eng. 26 (5) (2011) 368–380.
image data is still required for implementing a complementary module
[23] E. Rezazadeh Azar, B. McCabe, Automated visual recognition of dump trucks in con-
such as a detector for particular object classes. Further studies should be struction videos, J. Comput. Civ. Eng. 26 (6) (2011) 769–781.
performed in two ways for preparing a large amount of image data: 1) [24] S. Chi, C.H. Caldas, Image-based safety assessment: automated spatial safety risk
publicly opened image labeling/sharing platform for the construction identification of earthmoving and surface mining activities, J. Constr. Eng. Manag.
138 (3) (2012) 341–351.
industry and 2) an image data generation platform for a specific con- [25] M.W. Park, I. Brilakis, Construction worker detection in video frames for initializing
struction object to train AI, which involves supervised learning. With a vision trackers, Autom. Constr. 28 (2012) 15–25.
large amount of data, the vision-based monitoring system can be signif- [26] E. Rezazadeh Azar, S. Dickinson, B. McCabe, Server-customer interaction tracker:
computer vision–based system to estimate dirt-loading cycles, J. Constr. Eng.
icantly improved with respect to the recognition capability to various Manag. 139 (7) (2012) 785–794.
construction objects. [27] E. Rezazadeh Azar, B. McCabe, Part based model and spatial–temporal reasoning to
recognize hydraulic excavators in construction images and videos, Autom. Constr.
24 (2012) 194–202.
Acknowledgement [28] M. Golparvar-Fard, A. Heydarian, J.C. Niebles, Vision-based action recognition of
earthmoving equipment using spatio-temporal features and support vector ma-
chine classifiers, Adv. Eng. Inform. 27 (4) (2013) 652–663.
We would like to thank the anonymous reviewers for their valuable [29] M. Memarzadeh, M. Golparvar-Fard, J.C. Niebles, Automated 2D detection of con-
comments that helped improve the paper. This work was supported by struction equipment and workers from site video streams using histograms of ori-
ented gradients and colors, Autom. Constr. 32 (2013) 24–37.
the National Research Foundation of Korea (NRF) grant funded by the
[30] K. Ranaweera, J. Ruwanpura, S. Fernando, Automated real-time monitoring system
Korea government (MSIP; Ministry of Science, ICT and Future Planning) to measure shift production of tunnel construction projects, J. Comput. Civ. Eng.
(NRF-2014R1A2A1A11052499 and No. 2011-0030040). The authors 27 (1) (2013) 68–77.
282 H. Kim et al. / Automation in Construction 71 (2016) 271–282

[31] C. Kim, B. Kim, H. Kim, 4D CAD model updating using image processing-based con- [55] C. Koch, G.M. Jog, I. Brilakis, Automated pothole distress assessment using asphalt
struction progress monitoring, Autom. Constr. 35 (2013) 44–52. pavement video data, J. Comput. Civ. Eng. 27 (4) (2013) 370–378.
[32] H. Son, C. Kim, N. Hwang, C. Kim, Y. Kang, Classification of major construction mate- [56] C. Koch, I. Brilakis, Pothole detection in asphalt pavement images, Adv. Eng. Inform.
rials in construction environments using ensemble classifiers, Adv. Eng. Inform. 28 25 (3) (2011) 507–515.
(1) (2014) 1–10. [57] H. Son, N. Hwang, C. Kim, C. Kim, Rapid and automated determination of rusted sur-
[33] S. Kumar, Neural Networks: A Classroom Approach, 2nd ed.McGraw-Hill Education face areas of a steel bridge for robotic maintenance systems, Autom. Constr. 42
(India) Private Limited, P-24, Green Park Extension, New Delhi 2013, pp. 1–735 (2014) 13–24.
(110 016). [58] T. Nishikawa, J. Yoshida, T. Sugiyama, Y. Fujino, Concrete crack detection by multiple
[34] H. Kim, K. Kim, H. Kim, Data-driven Scene Parsing Method for Construction Site sequential image filtering, Comput. Aided Civ. Infrastruct. Eng. 27 (1) (2012) 29–47.
Monitoring, 33rd International Symposium on Automation and Robotics in Con- [59] I.K. Brilakis, L. Soibelman, Y. Shinagawa, Construction site image retrieval based on
struction, Auburn, AL, 2016 (in press). material cluster recognition, Adv. Eng. Inform. 20 (4) (2006) 443–452.
[35] C. Liu, J. Yuen, A. Torralba, Nonparametric Scene Parsing via Label Transfer, IEEE [60] J. Yang, T. Cheng, J. Teizer, P.A. Vela, Z. Shi, A performance evaluation of vision and
Transactions on Pattern Analysis and Machine Intelligence 33 (12) (2011) radio frequency tracking methods for interacting workforce, Adv. Eng. Inform. 25
2368–2382, http://dx.doi.org/10.1109/TPAMI.2011.131. (4) (2011) 736–747.
[36] A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of [61] J. Yang, O. Arif, P.A. Vela, J. Teizer, Z. Shi, Tracking multiple workers on construction
the spatial envelope, Int. J. Comput. Vis. 42 (3) (2001) 145–175. sites using video cameras, Adv. Eng. Inform. 24 (4) (2010) 428–434.
[37] C. Liu, J. Yuen, A. Torralba, SIFT flow: dense correspondence across scenes and its ap- [62] M.W. Park, A. Makhmalbaf, I. Brilakis, Comparative study of vision tracking methods
plications, IEEE Trans. Pattern Anal. Mach. Intell. 33 (5) (2011) 978–994. for tracking of construction site resources, Autom. Constr. 20 (7) (2011) 905–915.
[38] M. Golparvar-Fard, F. Peña-Mora, S. Savarese, D4AR–a 4-dimensional augmented re- [63] M.-W. Park, C. Koch, I. Brilakis, Three-dimensional tracking of construction resources
ality model for automating construction progress monitoring data collection, pro- using an on-site camera system, J. Comput. Civ. Eng. 26 (4) (2012) 541–549.
cessing and communication, J. Inf. Technol. Constr. 14 (2009) 129–153. [64] A. Rashidi, M. Sigari, M. Maghiar, D. Citrin, An analogy between various machine-
[39] M. Ahmed, C. Haas, R. Haas, Using digital photogrammetry for pipe-works progress learning techniques for detecting construction materials in digital images, KSCE J.
tracking, Can. J. Civ. Eng. 39 (9) (2012) 1062–1071. Civ. Eng. 20 (4) (2015) 1178–1188.
[40] H. Son, C. Kim, 3D structural component recognition and modeling method using [65] V. Balali, M. Golparvar-Fard, Segmentation and recognition of roadway assets from
color and 3D data for construction progress monitoring, Autom. Constr. 19 (7) car-mounted camera video streams using a scalable non-parametric image parsing
(2010) 844–854. method, Autom. Constr. 49 (2015) 27–39.
[41] L. Hui, M.-W. Park, I. Brilakis, Automated brick counting for façade construction [66] J. Tighe, S. Lazebnik, Superparsing, Int. J. Comput. Vis. 101 (2) (2013) 329–349.
progress estimation, J. Comput. Civ. Eng. 29 (6) (2015) 04014091-1–04014091-12. [67] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput.
[42] A. Dimitrov, M. Golparvar-Fard, Vision-based material recognition for automated Vis. 60 (2) (2004) 91–110.
monitoring of construction progress and generating building information modeling [68] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M.
from unordered site image collections, Adv. Eng. Inform. 28 (1) (2014) 37–49. Tappen, C. Rother, A comparative study of energy minimization methods for Mar-
[43] S. Siebert, J. Teizer, Mobile 3D mapping for surveying earthwork projects using an kov random fields with smoothness-based priors, IEEE Trans. Pattern Anal. Mach.
Unmanned Aerial Vehicle (UAV) system, Autom. Constr. 41 (2014) 1–14. Intell. 30 (6) (2008) 1068–1080.
[44] M.-W. Park, N. Elsafty, Z. Zhu, Hardhat-wearing detection for enhancing on-site [69] J. Shotton, J. Winn, C. Rother, A. Criminisi, Textonboost for image understanding:
safety of construction workers, J. Constr. Eng. Manag. 141 (9) (2015) 04015024- multi-class object recognition and segmentation by jointly modeling texture, layout,
1–04015024-16. and context, Int. J. Comput. Vis. 81 (1) (2009) 2–23.
[45] H. Kim, K. Kim, H. Kim, Vision-based object-centric safety assessment using fuzzy in- [70] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using
ference: monitoring struck-by accidents with moving objects, J. Comput. Civ. Eng. iterated graph cuts, ACM Trans. Graph. 23 (3) (2004) 309–314.
(2015) 04015075-1–04015075-13 (published online). [71] B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, LabelMe: a database and web-
[46] J. Seo, R. Starbuck, S. Han, S. Lee, T.J. Armstrong, Motion data-driven biomechanical based tool for image annotation, Int. J. Comput. Vis. 77 (2008) 157–173.
analysis during construction tasks on sites, J. Comput. Civ. Eng. 29 (4) (2015) [72] Y.Y. Boykov, M.P. Jolly, Interactive Graph Cuts for Optimal Boundary & Region Seg-
B4014005-1-B4014005-13. mentation of Objects in N-D Images, 8th International Conference on Computer Vi-
[47] S. Han, S. Lee, F. Peña-Mora, Comparative study of motion features for similarity- sion, vol. 1, 2001 105–112, http://dx.doi.org/10.1109/iccv.2001.937505 (Vancouver,
based modeling and classification of unsafe actions in construction, J. Comput. Civ. BC).
Eng. 28 (5) (2013) A4014005-1-A4014005-11. [73] L. Gorelick, O. Veksler, Y. Boykov, C. Nieuwenhuis, Convexity Shape Prior for
[48] S. Han, S. Lee, A vision-based motion capture and recognition framework for behav- Segmentation, 13th European Conference on Computer Vision, vol. 8693, LNCS,
ior-based safety management, Autom. Constr. 35 (2013) 131–141. Springer Verlag, Zurich, 2014 675–690, http://dx.doi.org/10.1007/978-3-319-
[49] S. Han, S. Lee, F. Peña-Mora, Vision-based detection of unsafe actions of a construc- 10602-1_44.
tion worker: case study of ladder climbing, J. Comput. Civ. Eng. 27 (6) (2012) [74] M. Tang, L. Gorelick, O. Veksler, Y. Boykov, Grabcut in One Cut, 14th IEEE Interna-
635–644. tional Conference on Computer Vision, Institute of Electrical and Electronics Engi-
[50] S.K. Sinha, P.W. Fieguth, Automated detection of cracks in buried concrete pipe im- neers Inc., Sydney, NSW, 2013 1769–1776, http://dx.doi.org/10.1109/ICCV.2013.
ages, Autom. Constr. 15 (1) (2006) 58–72. 222.
[51] S.K. Sinha, P.W. Fieguth, Neuro-fuzzy network for the classification of buried pipe [75] M.M. Soltani, Z. Zhu, A. Hammad, Automated annotation for visual recognition of
defects, Autom. Constr. 15 (1) (2006) 73–83. construction resources using synthetic images, Autom. Constr. 62 (2016) 14–23.
[52] S.K. Sinha, P.W. Fieguth, Segmentation of buried concrete pipe images, Autom. [76] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, C. Schmid, Transformation pursuit
Constr. 15 (1) (2006) 47–57. for image classification, 27th IEEE Conference on Computer Vision and Pattern Rec-
[53] Z. Zhu, S. German, I. Brilakis, Detection of large-scale concrete columns for automat- ognition, IEEE Computer Society 2014, pp. 3646–3653, http://dx.doi.org/10.1109/
ed bridge inspection, Autom. Constr. 19 (8) (2010) 1047–1055. CVPR.2014.466.
[54] Z. Zhu, S. German, I. Brilakis, Visual retrieval of concrete crack properties for auto- [77] C. Fellbaum, WordNet: An Electronic Lexical Database, Bradford Book, Cambridge,
mated post-earthquake structural safety evaluation, Autom. Constr. 20 (7) (2011) 1998.
874–883.

You might also like