0% found this document useful (0 votes)

75 views12 pages

05 1-S2.0-S1474034620301506-Main

Example of a residential construction project timeline

Uploaded by

skaterboy1988

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views12 pages

05 1-S2.0-S1474034620301506-Main

Example of a residential construction project timeline

Uploaded by

skaterboy1988

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Advanced Engineering Informatics 46 (2020) 101179

Contents lists available at ScienceDirect

Advanced Engineering Informatics

journal homepage: www.elsevier.com/locate/aei

Quantitative similarity assessment of construction projects using

WBS-based metrics
Navid Torkanfar a, *, Ehsan Rezazadeh Azar b
a
Department of Civil Engineering, Lakehead University, Thunder Bay, Canada
b
Department of Civil Engineering, Lakehead University, Thunder Bay P7B5E1, Canada

A R T I C L E I N F O A B S T R A C T

Keywords: Lessons learned from completed projects are valuable resources for planning of new projects. A quantitative
Project similarity similarity measurement between construction projects can improve knowledge reuse practices. The information
Construction and documents of a similar past project can be retrieved to resolve the challenges in a new project. This paper
Natural language processing
introduces a novel method for measuring the similarity of construction projects based on semantic comparison of
Work breakdown structure
Knowledge retrieval
their work breakdown structure (WBS). WBS of a project should theoretically encompass a hierarchical
Project planning decomposition of the total scope of project’s works, thus it could be used as an appropriate representative of the
projects. The proposed method measures the semantic similarity between WBS of projects by means of natural
language processing techniques. This method was implemented based on three metrics: node, structural, and
total similarity. Each of these metrics calculate a quantitative similarity score between 0 and 1. The method was
assessed using fifteen test samples with promising results in compliance with similarity properties. In addition,
precision and recall of the method were evaluated in retrieving similar past projects. The results illustrate that
the structural similarity slightly outperforms the other metrics.

1. Introduction a problem by reusing past information. A CBR system recalls a similar

past situation to solve a new problem [21]. In construction, CBR has
Project managers typically consider knowledge gained from previous been used in various areas, such as cost estimation [3,35], safety [12],
projects in their decision makings [43,11]. Effective reuse of the gained structural design [26], and planning [38,29]. The first step in CBR
knowledge not only reduces the time and cost of solving problems, but methods is to measure the similarity of a new case with previously stored
also improves the quality of solutions [45]. In construction, various cases to retrieve the most similar case(s) [6]. The retrieving process
studies have investigated different methods to use past information and requires some predefined nominal and/or numerical attributes, such as
experiences in new projects. These studies have applied different tech type, size, structural system, and location of the project. In addition, a
niques from other areas such as Knowledge Management (KM) and user-defined weight is considered for each attribute which will be used
Artificial Intelligence (AI). to calculate the similarity of cases and retrieve the most similar project
Knowledge management systems are information technology-based (s). One of the existing challenges in this step is to find the most relevant
systems aiming at improving organizational knowledge processes, attributes and their appropriate weights.
including creation, storage/retrieval, transfer, and application of In addition to CBR, some AI methods have been used in construction
knowledge [2]. One important step in knowledge management systems to model different problems based on the past data and information to
is to effectively search databases and find the relevant knowledge. The predict important project information, such as cost, schedules, and
conventional retrieval of relevant knowledge can often be difficult and safety plans. Neural networks and linear regression models have been
sometimes results in several irrelevant documents [14]. For instance, in implemented to estimate project costs [20] and facilitate project plan
construction, a method was proposed in which the intended information ning [47].
was retrieved through a simple Google™-like search or an advanced A quantitative similarity measurement of construction projects can
search function [44]. help project managers find similar past projects and extract related in
Case-Based Reasoning (CBR) is one of the popular techniques to solve formation and documents. This can happen at various stages of a project,

* Corresponding author.
E-mail addresses: [email protected] (N. Torkanfar), [email protected] (E. Rezazadeh Azar).

https://doi.org/10.1016/j.aei.2020.101179
Received 27 May 2020; Received in revised form 2 September 2020; Accepted 17 September 2020
Available online 30 September 2020
1474-0346/© 2020 Elsevier Ltd. All rights reserved.
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

such as planning and execution phases. Quantitative similarity assess information retrieval, text classification, document clustering, topic
ment between projects can potentially improve current CBR and other detection, topic tracking, question generation, question answering,
AI methods by providing more comprehensive attributes that consider essay scoring, short answer scoring, machine translation, and text
the entire project rather than focusing on certain attributes. summarization [13]. One of the first implementations for text similarity
The research studies on quantitative measurement of similarity be measurement aimed at ranking documents in the order of their simi
tween construction projects, however, are still limited. For example, a larity to the input query [39].
recent attempt in the area of project bundling proposed a method to Two words can be similar either semantically or lexically. Lexically
quantify construction projects similarity by vectorizing the projects’ pay similar words contain strings with similar characters in their structures,
items and measuring the distance between vectors [34]. and this similarity is evaluated through a couple of string-based
Scope management of a construction project requires comprehensive methods, which are discussed in the following subsection. Semanti
assessment of the project and a main outcome of this assessment, i.e. cally similar words, however, are related by means of different relations,
Work Breakdown Structure (WBS), is used by other project management such as being synonyms, antonyms, or their utilization in the same
areas, namely project time and cost management [32]. But there is not context [13]. In other words, semantic similarity determines the relation
any research attempt to use WBS, as a hierarchical breakdown of the of words or concepts based on predetermined databases, which include
scope of a project, for similarity assessment of the projects. The outcome the relations of the words.
of this assessment can identify similar projects for better development of There are several studies on semantic analysis of texts and docu
WBS and project planning of a new project. ments in construction management domain focused on information
The aim of this research study is to develop a method to assess the analysis and retrieval, and tested for some key applications, such as text
similarity of construction projects using their WBSs. It has been classification and automated regularity compliance checking using NLP
hypothesised that the tasks and services required during the construc methods [50,49]. A method was proposed to extract semantic knowl
tion phase can be used to develop metrics to measure the similarity of edge from contract documents and to categorize and retrieve informa
construction projects. Since the WBS of a project contains hierarchical tion in electronic document management systems using NLP [1].
information about its scope, WBS was considered as a potential repre Another method was proposed to partition multi-topic documents into
sentative of construction projects. Natural language processing (NLP) several passages [24]. The partitioning approach generates passages
techniques were employed in the proposed method to extract semantic based on domain ontology. Costa et al. [8] explored a method to enrich
attributes of the work-packages. This method calculates a score between the semantic vectors by means of ontology concepts and relations. The
0 and 1 to determine the semantic similarity of two WBSs. semantic vectors were used to represent knowledge sources.

2. Background
2.3. String-based similarity measurement
2.1. Work breakdown structure (WBS)
In order to measure the string similarity between two words, Lev
enshtein [23] proposed edit distance method which identifies the dif
Project Management Institute (PMI) defines WBS as “a hierarchical
ference between two strings by the minimum number of changes
decomposition of the total scope of work to be carried out by the project team
(insertion, deletion, or substitution) needed to transform one string to
to accomplish the project objectives and create the required deliverables. The
another. For example, the distance between the strings “cat” and “hat” is
WBS organizes and defines the total scope of the project and represents the
one character (substitution of character “c” with “h”). The edit distance
work specified in the current approved project scope statement” [32]. In
method does not consider the number of strings. In another proposed
another word, the main goal of WBS is to present a complete and proper
method for syntactic similarity [25], the number of characters is also
scope of the entire project work [17].
considered as shown in Eq. (1).
The highest level of the WBS hierarchy represents the entire project
( )
and is decomposed into smaller subjects, each representing tasks that min(|c1 |, |c2 | ) − ed(c1 , c2 )
simsyn (c1 , c2 ) = max 0, (1)
should be performed for the higher-level subject to be completed. The min(|c1 |, |c2 | )
process of subdividing continues until the tasks could not be decom
These two approaches calculate similarity without considering se
posed any further (or it is not reasonable to do that). The lowest level
mantics of inputs. Therefore, lexical similarity methods do not reliably
entries in this structure represent work packages. The responsibility of
provide an accurate similarity measurement. For instance, similarity
the performance of each work package is assigned to an individual, unit,
(simsyn ) between the concepts “reinforcement” and “rebar” would not
or organization [16]. The project management body of knowledge [32]
return a high score of similarity, even though these two concepts are
provides generic guideline for creating an appropriate WBS; however,
semantically related to a great degree. It was shown that semantic
the complex and fragmented nature of construction projects, such as
similarity algorithms outperform simple lexical methods with a 13%
coordination of multiple players (e.g. subcontractors, contract admin
error rate reduction [28].
istrators, and suppliers), brings about specific challenges in creating
WBSs.
There are research studies on various aspects and application of WBS 2.4. Semantic similarity measurement
in construction management, such as proposed methods for automated
WBS development [40], WBS-based project documentation [31], Semantic similarity measurement methods have been developed
combining off-site and on-site WBSs [42], and WBS-based integration of using corpus-based and knowledge-based algorithms. A corpus is a large
project cost and time [18]. Nonetheless, none of these studies have structured set including written or spoken texts for the purpose of lan
focused on WBS-based similarity assessment of the construction guage processing. The corpus-based semantic similarity determines the
projects. similarity of various words by utilizing a large corpus. Latent semantic
analysis (LSA) [22] is one of the most popular methods for obtaining
2.2. Text similarity measurements corpus-based similarity. LSA hypothesizes that reoccurring of the same
words in similar pieces of texts is an indication for their proximate
Natural language processing (NLP) is a research area that focuses on meaning [22].
enabling computers to understand natural language text and speech [7]. The knowledge-based similarity is another type of semantic simi
Measuring similarity between words, sentences, paragraphs, and docu larity that measures similarity by using embedded information in se
ments has been used for a long time in several NLP related fields, such as mantic networks. A semantic network is a knowledge base which

2
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

represents semantic relations of concepts using networks [41]. required to build a project during its construction phase. A method was
WordNet is a popular software tool in the field of knowledge-based developed in Python programming environment to compare docu
semantic similarity measurement, which was produced as a result of a mented construction projects with a targeted project, based on their
comprehensive research program at Princeton University [30] and is WBSs. Following subsections describe the elements of this method.
utilized as a lexical reference of English language. In WordNet, English
nouns, verbs, and adjectives are organized in synonym sets and these 3.1. WBS encoding
sets are related together by means of semantic relations [27]. A variety
of semantic relations have been developed in WordNet including (but The WBS information are exported from Microsoft Project file of the
not limited to) synonymy, autonomy, hyponymy, and membership [30]. sample projects to a spreadsheet format file (such as Microsoft Excel).
By exploiting these relations, semantic hierarchy structures are devel Fig. 1 shows the tasks and WBS codes in spreadsheet format for small
oped and these hierarchies could be useful in semantic computations. parts of two simplified projects (drastically shortened for the purpose of
There are several different methods for semantic similarity mea representation) that belong to a “House project” and a “Bridge project”.
surements based on WordNet, such as path-based, information content- Each node in the WBS hierarchy contains two main information: the
based [36], feature-based [46], and hybrid measurements. This research node’s task, and the node’s code which locates each element in the hi
utilizes a method that calculates the similarity between two concepts erarchy. WBS hierarchies of the projects were written in eXtensible
based on their depth in the taxonomy [48]. It computes similarity based Markup Language (XML) to encode this information into a machine-
on the position of concepts c1 and c2 , as well as the lowest common readable format [4]. Fig. 2 depicts a part of a WBS of a building proj
subsumer lso(c1 , c2 ). In Eq. (2), the function len(c1 , c2 ) measures the ect which is encoded in the XML format. As shown in Fig. 2, each
length of the shortest path from the concept c1 to concept c2 , and the element contains a text and a numerical attribute, where the text rep
depth measures the length of the path from each concept to the root resents the task and is followed with the attribute of the level of the task
element [48]. in the hierarchy. For instance, the XML element in line 8 (in Fig. 2)
2*depth(lso(c1 , c2 ) ) contains a task which is called “earthworks” and its level is 1.2 (i.e. the
simWP (c1 , c2 ) = (2) second task in the second level of WBS).
len(c1 , c2 ) + 2*depth(lso(c1 , c2 ) )

3. Methodology 3.2. Comparison of nodes

This paper proposes a method to quantify similarity of construction The first step in measuring the similarity of two WBSs is to compare
projects based on semantic and structural metrics derived from their the tasks within the WBS nodes. There are two important issues in
WBSs. The WBS of a project includes some nodes, which are labelled measuring the similarity of the tasks. First, their naming is subjective to
with tasks required to complete that project. The focus of this research is project managers. For instance, “Rebar placement” and “Reinforcement
to quantify the similarity of construction projects based on the tasks installation” are not the same strings, but both of them represent the

Fig. 1. WBS codes and hierarchy of tasks; (a) “House project” (b) “Bridge project”.

3
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

Fig. 2. Segment of the written XML for WBS of a steel structure building.

same task. Thus, two tasks should not contain the exact same texts to be one of the word-to-word similarity measures (previously explained) to
considered similar. This problem can be addressed by including se find the most semantically similar word from segment T2
mantic similarity measurements of tasks instead of simple string (maxSim(w, T2 )). The same procedure will determine the most similar
measurements. word in T1 starting with the words in T2 . These similarities are then
On the other hand, the semantic equivalence of tasks does not weighted with corresponding word specificity. The specificity of words
necessarily result in similarity of their nodes. For example, there are two idf(w) gives higher scores to the specific words compared to the generic
nodes with “concrete pouring” as label, but they might represent concepts such as “get” or “become” [28]. This method measures the
different tasks, where one can represent concrete pouring for a column semantic similarity of two segments as presented in Eq. (3).
(s) and the other one is for a beam(s). (∑
To address the above-mentioned issues, the proposed method de 1 w∈{T1 } (maxSim(w, T2 )*idf (w) )
sim(T1 , T2 ) = ∑
termines the similarity of two WBSs through the following three metrics. 2 w∈{T1 } idf (w)
∑ )
w∈{T2 } (maxSim(w, T1 )*idf (w) )
(1) Semantic similarity, in which the semantic similarity of the tasks + ∑ (3)
w∈{T2 } idf (w)
within the compared nodes is measured;
(2) Parent similarity, which measures the semantic similarity of the This method can be adjusted to a more appropriate one by elimi
parents of the compared nodes; nating the word specificity weight. The reason behind this decision is
(3) Siblings similarity, which measures the semantic similarity of that in this case, the tasks are phrases with a very few component words
siblings (nodes from a common parent) of the compared nodes. which are mostly specific to the construction domain rather than being
generic concepts.
Using Wu and Palmer method [48] as a word-to-word semantic
3.3. Semantic similarity similarity measurement and considering the above-mentioned assump
tions, the semantic similarity between taski and taskj is calculated using
WordNet [30] was utilized to measure the semantic similarity of the Eq. (4). In this approach for each word w in the taski , the most seman
node’s tasks in the proposed method. Tasks are usually expressed as a ( )
tically similar word from taskj (maxSim w, taskj wup ) is found by means
phrase that contains a few words. There are different methods for
measuring the semantic similarity of two sentences or phrases by aver of the Wu and Palmer [48] method. The same procedure will determine
aging semantic similarity of their words, such as a method proposed by the most similar word in taski starting with the words in taskj .
Mihalcea et al. [28]. To measure the semantic similarity of two text
segments T1 and T2 , for each word w in the segment T1 , this method uses

4
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

(∑ ( ) ∑ )
( ) 1 w∈{taski } (maxSim w, taskj ) w∈{taskj } (maxSim(w, taski )wup )
(4)
wup
simsemantic taski , taskj = ∑ + ∑
2 w∈{taski } 1 w∈{taskj } 1

For example, Eq. (4) results in a similarity score of 0.9 for tasks Also, each parent is generated from an upper-level element which cre
labeled as “reinforcement installation” and “reinforcement placement”. ates a sequence of parents for each node. This metric determines the
This similarity is less than one, because word-to-word similarity of the similarity of the sequence of parents and is calculated by averaging their
“installation” and “placement” is lower than one, and therefore it de semantic similarity. Therefore, this method reduces the similarity of the
creases the total semantic similarity to 0.9. tasks which belong to different parts of compared projects, such as in
two matched “concrete pouring” tasks belonging to foundation and
3.4. Word-to-word semantic similarity measurements shear wall construction accordingly.
Since considering all ancestors of a node requires a large amount of
Since in WordNet the relations between concepts are based on syn calculations, the least similar parent (LSP) is defined. LSPs are the first
sets, an algorithm was required to find the similarity between words pair of parents in the sequence of two given nodes’ parents that are not
rather than synsets [19]. WordNet defines synsets as sets of synonyms semantically similar (less than the defined threshold). This method only
composed of nouns, verbs, adjectives, or adverbs that each expresses a considers the parents up to LSP. Given nodes n and m from WBSLN11 ,
unique concept [33]. Thus, to compute the semantic similarity of two WBSLN22 respectively, the parent similarity between them (simparents (n, m))
words by utilizing WordNet, one synset from each word should be is calculated using Eq. (7). LLSP − Ln is the difference between levels of
selected. The comparison of chosen synsets results in the semantic ( )
node n and its LSP, and simsemantic ith parents is the semantic similarity
similarity of two words. In construction domain, however, some of the
applied words do not have a special meaning in regular vocabulary re between ith parents of nodes n and m.
sources, such as WordNet. To address this issue, technical words were ∑LLSP − Ln
(LLSP − Ln − (i − 1) ) × simsemantic (ithparents)
replaced by meaningful terms defined in the WordNet. For example, the simparents (n, m) = i=1 ∑LLSP − Ln
word ‘HVAC’ was replaced with ‘Heating Ventilation and Air Condi i=1 (i)
tioning’ or the word “rebar” was replaced with “reinforcement”. In (7)
addition, the words that are not defined in WordNet were compared For instance, Fig. 3 shows the first two parents of nodes n1 and m1
lexically by the string-based method which was introduced in Eq. (1). with a semantic similarity of 0.8, which is more than an arbitrarily
A simplified method has been used in this study to measure the se defined threshold of 0.5. In this example, the next two parents have a
mantic similarity of two words. In this approach, the system approxi similarity of 0.2 (i.e. less than threshold of 0.5), and therefore they are
mates the similarity of two words by using a pair of their synsets that defined as LSP. In this case, parent similarity is calculated by the
result in maximum similarity, as shown in Eq. (5). following function.
word − to − wordsimilairty (w1 , w2 ) = max(similarity(C1 , C2 ) ) (5) 2*0.8 + 1*0.2
simparents (n1 , m1 ) = = 0.6
2+1
C1 ∈ synsets(w1 ), C2 ∈ synsets(w2 )
The results of parent similarity between nodes from WBSLN11 , WBSLN22
Using the word-to-word similarity measure and the proposed method
are presented using the matrix shown in Eq. (8).
for measuring semantic similarity of two phrases, semantic similarity of
⎡ ( )⎤
two tasks was calculated. Assuming WBSLN11 and WBSLN22 are two WBSs, in ( ) simparents (n1 , m1 ) ⋯ simparents n1 , mj
L1 L2 ⎣ ⎦
which L1 and L2 represent the total number of levels that each WBS hi simparents WBSN1 , WBSN2 = ⋮ ⋱ ⋮( )
erarchy contains (e.g. the WBS illustrated in Fig. 5 has three levels and simparents (ni , m1 ) ⋯ simparents ni , mj
its L is three). Moreover, N1 and N2 represent the finite sets of WBS’s (8)
nodes (N1 : (n1 , n2 , ⋯, nN ) and N2 : (m1 , m2 , ⋯, mM )). The results of these
pairwise comparisons between nodes ni and mj from WBSLN11 , WBSLN22 will 3.6. Comparison of the nodes’ siblings
form a matrix shown in Eq. (6). This matrix represents the semantic
similarity of tasks between nodes ni and mj . In a WBS, nodes generated from the same parent are called siblings.
⎡ ( )⎤ Similarity of the nodes’ siblings in two WBSs can also enhance the
( ) simsemantic (n1 , m1 ) ⋯ simsemantic n1 , mj possibility that the nodes’ tasks are rather similar. To calculate the
simnodes WBSN1 , WBSN2 = ⎣
L1 L2
⋮ ⋱ ⋮( )
⎦
simsemantic (ni , m1 ) ⋯ simsemantic ni , mj
(6)
The proposed method considers two nodes semantically similar if
they have a semantic similarity more than a user-defined threshold be
tween 0 and 1. In addition, to reduce computation effort, the system only
computes the other node similarity metrics (parent similarity and sib
lings similarity) for the nodes that are semantically similar (more than
the threshold). The effects of different thresholds on the accuracy of the
system are explored in section 4 “Experimental results”.

3.5. Comparison of the nodes’ parents

In a WBS, except the root element (i.e. the highest level), each node is
subdivided from an upper-level element, which is the parent of the node. Fig. 3. Parent similarity between nodes n1 and m1.

5
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

sibling similarity between nodes ni and mi , their siblings are compared to the highest level. Since the lowest level of WBS usually contains the
one by one and any two siblings which are semantically similar (i.e. task with the highest level of details, level of details of each node is
simsemantic > threshold) are considered matched together. Thus, (siblingni , assessed based on the distance between its level and the lowest level in
siblingmj ) can be defined as a tuple that includes the pairs of matched the WBS hierarchy. For this purpose, the system defines a weight be
siblings from nodes ni and mj (Eq. (9)). tween 0 and 1, which determines distance between the level of details of
two nodes.
( ) ( )
matched siblings ni , mj = siblingni , siblingmj (9) Bottomlevel is the level of nodes which is numbered starting from the
lowest level in the hierarchy. For example, bottomlevel and regular level of
As a result, the sibling similarity score between nodesni and mi is nodes in WBS (“House project”) are indicated in Fig. 5.
calculated using Eq. (10), which is obtained by dividing the total num This weight is calculated by the absolute difference between
ber of matched siblings by the total number of siblings. bottomlevel of two nodes, divided by the maximum number of levels that
⃒ ( )⃒ two WBS have (Eq. (14)).
( ) ⃒matchedsiblings ni , mj ⃒
simsiblings ni , mj = ⃒ ⃒ ⃒ ⃒ (10) (
⃒( ( ))
) ⃒ bottomlevel (ni ) − bottomlevel mj
⃒
⃒
⃒siblingsn ⃒ + ⃒⃒siblingsm ⃒⃒ levelscores ni , mj = ⃒⃒ − 1 ⃒⃒ (14)
i j
max(L1 , L2 )
For example, the sibling similarity between nodes ni and mj with only
For example, levelscores for nodes “Columns” and “Road” from the
one pair of matched siblings in Fig. 4 is calculated by the function “House project” and “Bridge project” in Fig. 1 is calculated as,
bellow. ⃒ ⃒
⃒(1 − 2) ⃒
( ) 2*1 levelscores ("Columns", "Road") = ⃒⃒ − 1 ⃒⃒ = 0.66
simsiblings ni , mj = = 0.5 3
2+2
A sibling similarity matrix, which contains a pairwise comparison of and for “Columns”, “Girders” is calculated as,
nodes ni and mj from WBSLN11 and WBSLN22 , can be expressed as shown in ⃒
⃒(1 − 1)
⃒
⃒
Eq. (11). levelscores ("Columns", "Girders") = ⃒⃒ − 1 ⃒⃒ = 1
3
⎡ ( )⎤
( ) simsiblings (n1 , m1 ) ⋯ simsiblings n1 , mj This score increases the chance of node "Columns" to be mapped to
simsiblings WBSN1 , WBSN2 = ⎣
L1 L2
⋮ ⋱ ⋮( )
⎦
"Girders" instead of the node "Road" with a lower levelscores . The following
simsiblings (ni , m1 ) ⋯ simsiblings ni , mj matrix is used to contain node to node levelscores for the nodes of two
(11) WBSs (see Eq. (15)).
⎡ ( )⎤
3.7. Average similarity of compared nodes ( ) levelscores (n1 , m1 ) ⋯ levelscores n1 , mj
L1 L2
levelscores WBSN1 , WBSN2 = ⎣ ⋮ ⋱ ⋮( ⎦
)
levelscores (ni , m1 ) ⋯ levelscores ni , mj
The average similarity matrix represents the average node to node
(15)
similarity between nodes of WBSLN11 and WBSLN22 , which is calculated by
means of Eq. (12) and presented as Eq. (13). By multiplying matrixes levelscores and simaverage , a matrix is formed
which contains the required scores that can be used to find the mapped
simnodes + simparents + simsiblings
simaverage = (12) nodes (see Eq. (16)).
3
( )
⎡ ( )⎤ mappingscores WBSLN11 ,WBSLN22
( ) simaverage (n1 , m1 ) ⋯ simaverage n1 , mj ⎡ ( )⎤
simaverage WBSLN11 , WBSLN22 =⎣ ⋮ ⋱ ⋮( ⎦ mappingscore (n1 ,m1 ) ⋯ mappingscore n1 ,mj
) =⎣ ⋮ ⋱ ⋮ ( )
⎦ (16)
simaverage (ni , m1 ) ⋯ simaverage ni , mj
mappingscore (ni ,m1 ) ⋯ mappingscore ni ,mj
(13)
The system searches through the mappingscores matrix to find the
3.8. Mapping of nodes highest mapping score and when the highest score is found, the system
will use that for mapping corresponding nodes and removes them for
Each node from the first WBS will be mapped to a node from the finding the other matched paired in the next runs. The system continues
second WBS with the highest average similarity. The highest average this procedure until all the possible nodes are mapped.
( )
similarity must be more than the determined threshold. This threshold is Mappednodes is a list of tuples ni , mj , simaverage , in which ni and mj
considered to prevent mapping of irrelevant nodes which have a se are mapped together with the average similarity of simaverage (Eq. (17)).
mantic similarity score below the threshold. {( )}
mappednodes = ni , mj , simaverage (17)
In some cases, there could be more than one node with the same
highest simaverage . In these cases, the system prefers the nodes with a
ni ∈ N1
closer level of details. Level of details of the nodes depends on their level
in the WBS hierarchy. Details in the hierarchy decreases from the lowest

Fig. 4. Sibling similarity between nodes ni and mj .

6
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

Fig. 5. WBS of the “House project”; (a): Regular level, (b): Bottom level.

m j ∈ N2 different graph-edit operations which can be used here. Node deletion or

insertion, and node substitution were considered in this study.
The structural similarity measurements [9] start with the mapped
3.9. Node similarity score nodes that were found in the previous stage. The node deletion or
insertion cost (or effort) can be defined as the required operations to
Overall Node similarity score between WBSLN11 and WBSLN22 is the delete unmapped nodes. This cost was defined as Deletion Effort (DE),
average of simaverage of all the mapped nodes. This score is calculated by and can be computed by the total number of Unmapped Nodes (|UN|)
means of Eq. (18). divided by the total number of nodes in WBSLN11 and WBSLN22 (see Eq. (19)).
∑ ( ( ))
( ) 2 × (ni ,mj )∈mappednodes simavg ni , mj ( ) |UN|
L1
Nodesimilarity WBSN1 , WBSN2 =L2 DE WBSLN11 , WBSLN22 = (19)
|N1 | + |N2 | |N1 | + |N2 |
(18) The Substitution Effort (SE) can be explained as the required effort to
map the nodes. In other words, the required effort to map two similar
3.10. Structural similarity score nodes is lower than the required effort to map two less similar nodes.
Therefore, for each pair of mapped nodes in mappednodes list, the SE is
The second similarity measurement is the structural similarity, calculated by one minus their similarity (see Eq. (20)).
which examines the hierarchy structure of two WBSs. This metric is ( ) ( ) ( )
for ni , mj ∈ mappednodes, SE ni , mj = 1 − simaverage ni , mj (20)
defined based on graph-edit-distance method [15,9] for the structure of
two WBSs. The graph-edit-distance measures the minimum required And, the total SE effort [9] between WBSLN11 and WBSLN22 over all the
operations to change the structure of one WBS to another. There are

( ) ( )
( ) DE WBSLN11 , WBSLN22 + SE WBSLN11 , WBSLN22
Structuralsimilarity WBSLN11 , WBSLN22 = 1 − (22)
2

7
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

mapped nodes can be calculated by Eq. (21). Table 1

∑ ( ( )) The developed samples by the experts.
( ) 2* (ni ,mj )∈mappednodes 1 − simaverage ni , mj
L1 L2
SE WBSN1 , WBSN2 = (21) Experts Developed samples Represented by
|N1 | + |N2 | − |UN|
Expert 1 Bridgeconstruction1 B1
The structural similarity between WBSLN11 and WBSLN22 is defined by 1 concretestructurebuilding1 C1
minus average of two over mentioned efforts (DE and SE), as shown in steelstructurebuilding1 S1
Eq. (22). Smaller amount of required effort to transfer structure of the Roadmaintenance1 M1
first WBS to second one results in higher structural similarity and vice hotelbuilding1 H1
versa. Expert 2 Bridgeconstruction2 B2
concretestructurebuilding2 C2
steelstructurebuilding2 S2
Roadmaintenance2 M2
3.11. Total similarity score hotelbuilding2 H2

Expert 3 Bridgeconstruction3 B3
The final score determines the Total similarity between WBSLN11 and
concretestructurebuilding3 C3
WBSLN22 ,which is calculated by the average (see Eq. (23)) of Node simi
steelstructurebuilding3 S3
larity (Eq. (18)) and Structural similarity (Eq. (22)) scores. This final Roadmaintenance3 M3
measurement produces a score between 0 and 1, in which 0 is hypo hotelbuilding3 H3
thetically resulted from the comparison of two completely different
projects, and 1.0 is resulted for two exact similar projects.

( ) ( )
( ) Node similarity WBSLN11 , WBSLN22 + structural similarity WBSLN11 , WBSLN22
Total similarity WBSLN11 , WBSLN22 = (23)
2

projects were given to three experts in construction management in

dustry to develop their WBS samples. To assure that the experts follow
the same basic standards for creating their WBS samples, they were
4. Experimental results provided with some guidelines. The guideline includes the 100% rule,
level of details, and the coding scheme. The 100% rule specifies that the
This section presents a set of experiments to evaluate the perfor total amount of work covered by the child elements have to be exactly
mance of the defined WBS similarity metrics in distinguishing con same as the work content of the parent element. The experts were also
struction projects and retrieving relevant samples. The experiments asked to decompose each task to a level that it is not reasonable to
were carried out on fifteen different construction projects test samples. further decompose the work package. In addition, the coding criteria,
These WBSs were developed for the construction phase of the pro discussed in “3.1 WBS encoding” subsection, was defined for experts to
jects, and other phases, such as feasibility study and design phases, were follow.
excluded. Three-dimensional models of five different construction Sample projects consisted of a bridge construction (steel girder with

Fig. 6. The 3D model of the steel structure building project (roof was sectioned to provide internal details).

8
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

composite concrete slab), a steel-framed office building, a reinforced

concrete-framed residential building, a road widening project, and a
steel-framed hotel building. For instance, Fig. 6 shows the 3D model for
the steel-framed building. Table 1 shows developed samples which are
represented by B, C, S, M and H, respectively.

4.1. Results

This section discusses the overall similarity scores measured by three

metrics of node, structural, and total similarity. These results are
explored in two parts: compliance of the results with similarity measure
properties and performance of the method in the retrieval process. In the
later part, the WBS samples were sorted based on their calculated sim
ilarity scores to find the most similar projects to the queried sample. This
evaluation was performed based on precision and recall measures. In Fig. 8. The average of the reflexivity errors.
addition, the effect of the variations of threshold in measuring node to
node similarity metrics was examined. The results were obtained
through experiments in which the threshold varied in the range of 0.50 Symmetry error (A, B) =
|sim(A, B) − sim(B, A) |
(26)
to 0.80 with 0.05 intervals. average[sim(A, B), sim(B, A) ]
Before discussing the results, it is important to explain the precision
The symmetry errors of all the possible pairwise comparisons from
and recall measures. These two terms can be defined based on the binary
the test samples were measured and averaged for different overall
relevance judgment in which every retrievable sample is recognizably
similarity metrics. Fig. 7 presents the averages of the symmetry errors.
“relevant” or “not relevant” [5]. Hence, in a search result, each sample is
As it can be observed in Fig. 7, the node similarity had larger errors, and
placed in only one of the pair of groups, in which the samples are
the structural similarity measurement performed better than the other
“relevant” or “not relevant” and “retrieved” or not “retrieved” [5].
two metrics in the symmetry analysis.
For any given retrieved set of items, recall is defined as the number of
retrieved relevant items as a proportion of all relevant items. In other
4.2.2. Reflexivity
words, recall is a measure of performance in including relevant items in
The reflexivity error (e.g. for the sample B1 ) is calculated using Eq.
the retrieved set. Precision is defined as the number of retrieved relevant
(27) [37]:
items as a proportion of retrieved items. Therefore, precision is a mea
sure of excluding the nonrelevant items from the retrieved set [5]. Reflexivity error = 1 − sim(B1 , B1 ) (27)
The reflexivity errors were obtained by comparing the test samples
4.2. Properties of similarity measures with themselves. The average values of the reflexivity errors are pre
sented in Fig. 8. The results confirm that the system generated promising
The similarity measurements must fulfil the properties of symmetry results and the reflexivity errors were negligible across the various
and reflexivity [37,10]. A similarity function S: S × S → [0,1] on a set S thresholds.
measuring the degree of similarity between two elements, is called
similarity measure if, ∀X, Y ∈ S (see Eqs. (24) and (25)).
4.3. Retrieval precision and recall
Sim(X, Y) = Sim(Y, X) [symmetry] (24)
This section discusses the performance of the similarity metrics in
Sim(X, X) = 1 [reflexivity] (25) retrieving the similar stored cases to the query sample (given sample).
For this purpose, the samples B1 , C1 , S1 , M1 , and H1 were chosen as the
4.2.1. Symmetry query samples and the rest of samples were considered as the stored
To determine the symmetry fulfilment, the symmetry error for two
WBSs, such as A and B, is computed using Eq. (26). In this equation, the Table 2
sim can be one of the three overall similarity measurements (total sim Comparing B1 with the stored samples using the threshold of 0.65.
ilarity, node similarity or structural similarity).
Query Documented Total Node Structural
sample sample similarity similarity similarity score
score score

B1 B2 0.72 0.64 0.80

B1 S2 0.69 0.56 0.81
B1 B3 0.63 0.48 0.79
B1 C1 0.60 0.47 0.73
B1 S1 0.59 0.40 0.78
B1 H3 0.56 0.39 0.73
B1 S3 0.56 0.35 0.76
B1 H2 0.56 0.42 0.69
B1 H1 0.54 0.34 0.74
B1 C2 0.53 0.39 0.67
B1 C3 0.51 0.34 0.68
B1 M2 0.44 0.13 0.75
B1 M3 0.43 0.13 0.74
B1 M1 0.40 0.05 0.74
Fig. 7. The average of the symmetry errors.

9
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

samples. The reason for choosing different types of samples as query

samples was to study the effect of various types of test samples in manual
task labelling and structuring of WBSs.
The performance of the three overall similarity metrics was evalu
ated by the precision score in retrieving the relative stored samples. For
this purpose, each query sample (B1 , C1 , S1 , M1 , and H1 ) was compared
with all the stored samples and the results were ranked from the highest
to the lowest similarity score. For instance, Table 2 shows the results
with the thresholds equals to 0.65. In this table, the B1 was the query
sample and the results are ordered by the Total similarity score.
The highest similarity score belongs to a relevant sample (B2 );
however, the second score in this list belongs to a non-relevant sample
(S2 ). The limited number of relevant samples in our test case causes the
retrieving process highly competitive. One of the two relevant samples
received the highest (B2 ) and the other one had the third (B3 ) similarity
score. Fig. 10. Average precision scores with a higher weight for semantic similarity.
The retrieving precision is calculated using Eq. (28) which measures
the number of retrieved relevant samples as a proportion of retrieved
tasks within the WBSs. As a result, it provides a more comprehensive
samples. In this part, the relevance argument is not challenging, as the
metric. The average precision in this range was about 0.7, which means
sample tests were identified as relevant if they were developed for the
that the two relevant samples were mostly among the top three retrieved
same project by different experts. Therefore, for each query sample, two
cases. This approach provides the opportunity to the users to assess the
relevant samples exist among the stored samples. For example, the
top-ranked retrieved items and use the most relevant one(s).
samples B2 and B3 are the relevant samples to the query sample B1 .
|{B2 } ∩ {B2 } | 1
|{Relevant samples} ∩ {Retrived samples} | Recall = 0.5, Retrieving precision = = = 1.0 (29)
Retrieving precision = (28) |{B2 } | 1
|{Retrieved samples} |
The number of retrieved samples in each query is determined by the Recall = 1.0, Retrieving precision =
|{B2 , B3 } ∩ {B2 , S2 , B3 } | 2
= = 0.66
recall score, where the retrieval process continues until the number of |{B2 , S2 , B3 } | 3
retrieved relevant samples fulfil the recall score. Since there was a small (30)
set of stored WBS samples in this study, only two recall thresholds (i.e.
The weight of semantic similarity measurement (i.e. simnodes ) was
0.5 and 1) were considered. In the recall score of 0.5, retrieval of stored
increased from one to two in Eq. (12) to investigate the effect of the
samples continues until one of the two relevant samples is retrieved. The
weights in this equation on the average retrieving precision. Fig. 10
other tested recall score was 1.0, in which both relevant samples should
shows the results of average retrieving precision with the increased
be retrieved.
weight of semantic similarity in Eq. (12). As illustrated in Fig. 10, the
For example, the precision scores of retrieving B1 were obtained as
precision scores were affected (reduced in lower thresholds), and the
follow: For the recall score of 1.0, the retrieval process continues until
structural similarity with thresholds around 0.75 had the highest pre
both relevant samples (B2 and B3 ) to B1 were retrieved. As we can see in
cision scores.
Table 2, this results in two retrieved relative samples out of three
These WBS samples were developed according to the Project Man
retrieved samples (B2 , S2 , and B3 ) and the precision is 0.66 (see Eq.
agement Institute’s guidelines to encompass the entire scope of the
(30)). For the recall score of 0.5, only one of the relative samples to the
project [32]; however, this might not be followed in all actual con
B1 must be retrieved, which was achieved by retrieving only the first
struction projects which can hinder the performance of the proposed
sample from Table 2 and it results in a precision score of one (see Eq.
method. In addition, performance of the developed system heavily de
(29)). The average values of the precision scores are presented in Fig. 9,
pends on the employed semantic network, which was WordNet in this
in which it is evident that the Structural similarity measurement pro
research. Since this vocabulary source is a generic database for English
vides higher precision scores than the Total and Node similarity metrics.
language, it might occasionally fail to provide desirable results in
In particular, it provides the highest precision in the thresholds between
technical domains such as construction. Thereby, it would be valuable to
0.7 and 0.75. The reason is that the structural similarity considers the
develop and use a customized semantic network for construction, which
way that the WBSs are structured in addition to the semantics of the
would be able to match the related concepts in this domain.

4.4. Sample application

The proposed method can be used to develop a comprehensive WBS

for construction projects during the planning stage. For example, a
preliminary WBS can be developed by junior schedulers for a con
struction project, then the proposed method runs a query to find WBS of
similar past project(s) to complete the preliminary WBS. To evaluate the
performance of the proposed method in this context, the sample S3 was
altered in which 16 tasks were omitted (out of 53) and six of the
remining were reworded. The omitted tasks were mostly included op
erations that might not be visible in the project outcome, such as
surveying and concrete curing, and can be missed by the novice
schedulers. This experiment was performed to represent a scenario
where an incomplete WBS is provided to find the similar projects.
Table 3 shows the results of comparing altered S3 (S3 _a) with the stored
Fig. 9. Average precision scores. samples. The results for four different thresholds are sorted from the

10
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

Table 3
Comparing S3_a with the stored samples using different thresholds.
Quired sample Thresholds

0.60 0.65 0.70 0.75

Retrieved sample Total similarity Retrieved sample Total similarity Retrieved sample Total similarity Retrieved sample Total similarity

S3_a S1 0.69 S1 0.66 S1 0.64 H3 0.62

S2 0.64 S2 0.63 H3 0.63 S1 0.56
B2 0.63 H3 0.63 S2 0.62 S2 0.52
H3 0.63 B2 0.60 B1 0.51 B1 0.48
B1 0.60 B1 0.55 B2 0.51 H1 0.48
B3 0.58 H1 0.55 H1 0.50 B3 0.47
H1 0.56 B3 0.53 B3 0.50 B2 0.46
M3 0.53 C1 0.49 C1 0.48 C3 0.45
M2 0.51 C3 0.48 C3 0.48 H2 0.42
C1 0.50 H2 0.47 H2 0.46 C1 0.41
C3 0.48 M3 0.44 C2 0.43 C2 0.38
H2 0.48 C2 0.44 M2 0.39 M2 0.37
C2 0.45 M2 0.42 M1 0.36 M1 0.34
M1 0.42 M1 0.39 M3 0.35 M3 0.34

largest to the lowest similarity based on the total similarity metric. The 7. Data availability
results show that S1 and S2 were among the top three retrieved cases in
all thresholds, which indicates that the method was able to retrieve the The source code of the developed program (in Python programming
similar projects to an incomplete WBS with a precision score between language) is publicly available and can be found at: https://osf.
0.66 and 1.00, depending on the applied threshold. io/b8qvy/ Data generated or analysed during the experiments are
available from the corresponding author upon reasonable request.
5. Conclusion
Declaration of Competing Interest
Reuse of the knowledge and experiences gained from completed
construction projects can improve planning of the new projects. In order The authors declare that they have no known competing financial
to reuse knowledge, finding similar past projects is critical. This research interests or personal relationships that could have appeared to influence
was undertaken to develop quantitative similarity metrics, to measure the work reported in this paper.
the similarity of construction projects using the WBS as their represen
tative. These metrics were implemented using NLP techniques written in Acknowledgement
Python programming language. The similarity metrics were evaluated
based on two sets of experiments: First the metrics were tested for the This research project was funded by Discovery grant RGPIN-2015-
similarity properties fulfilment, including symmetry and reflexivity; 03812 from Natural Sciences and Engineering Research Council of
second, the metrics were tested to search among test samples and to find Canada.
the relevant cases to the given samples.
The results show promising outcomes in compliance with similarity References
properties (i.e. symmetry and reflexivity) with small errors. The results
on the second part of the experiments, which were the main focus of this [1] M. Al Qady, A. Kandil, Concept relation extraction from construction documents
research, revealed that the structural similarity metric had the best using natural language processing, J. Construct. Eng. Manage. 136 (3) (2010)
294–302.
performance in retrieval of similar projects with thresholds in the range
[2] M. Alavi, D.E. Leidner, Knowledge management and knowledge management
of 0.7 to 0.75. systems: conceptual foundations and research issues, MIS Quarterly (2001)
107–136.
[3] S.H. An, G.H. Kim, K.I. Kang, A case-based reasoning cost estimating model using
6. Future works
experience by analytic hierarchy process, Build. Environ. 42 (7) (2007)
2573–2579.
The proposed method could identify similar projects using their [4] T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, 2000. Extensible Markup
WBSs. But the future research can investigate inclusion of major quan Language (XML) 1.0. W3C Recommendation 6 October 2000. Available via the
World Wide Web at http://www.w3.org/TR/1998/REC-xml-19980210.
titative attributes, such as work quantity of the tasks and their duration, [5] M. Buckland, F. Gey, The relationship between recall and precision, J. Am. Soc.
to enhance the similarity assessment of the construction projects. The Inform. Sci. 45 (1) (1994) 12–19.
vocabulary source in this research (i.e. WordNet) is a general and [6] S.H. Chen, A.J. Jakeman, J.P. Norton, Artificial intelligence techniques: an
introduction to their use for modelling environmental systems, Math. Comput.
comprehensive source, and might not be able to provide flawless simi Simul 78 (2–3) (2008) 379–400.
larity assessments for some technical terms. Developing a specialized [7] G.G. Chowdhury, Natural language processing, Ann. Rev. Inform. Sci. Technol. 37
resource for construction technical words is a valuable opportunity for (2003) 51–89.
[8] R. Costa, C. Lima, J. Sarraipa, R. Jardim-Gonçalves, Facilitating knowledge sharing
future research. Lastly, this study focused on development of a method and reuse in building and construction domain: an ontology-based approach,
to quantify the similarity of construction projects using their WBS and J. Intell. Manuf. 27 (1) (2016) 263–282.
did not explore retrieval of the information and documents of projects. [9] R. Dijkman, M. Dumas, B. Van Dongen, R. Käärik, J. Mendling, Similarity of
business process models: metrics and evaluation, Inform. Syst. 36 (2) (2011)
The future work can investigate integration of the proposed method
498–516.
with the existing knowledge retrieval systems, namely case-based [10] M. Ehrig, A. Koschmider, A. Oberweis, Measuring similarity between semantic
reasoning. business process models, in: Proceedings of the fourth Asia-Pacific Conference on
Comceptual modelling, vol. 67, 2007, pp. 71–80.
[11] S. Gasik, A model of project knowledge management, Project Manage. J. 42 (3)
(2011) 23–44.

11
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179

[12] Y.M. Goh, D.K.H. Chua, Case-based reasoning approach to construction safety [31] J. Park, H. Cai, WBS-based dynamic multi-dimensional BIM database for total
hazard identification: adaptation and utilization, J. Construct. Eng. Manage. 136 construction as-built documentation, Autom. Constr. 77 (2017) 15–23.
(2) (2010) 170–178. [32] PMBOK® Guide, 2017. Sixth edition, Project Management Institute.
[13] W.H. Gomaa, A.A. Fahmy, A survey of text similarity approaches, Int. J. Comput. [33] Princeton University “About WordNet.” WordNet. Princeton University, 2010.
Appl. 68 (13) (2013) 13–18. [34] Y. Qiao, J.D. Fricker, S. Labi, Quantifying the similarity between different project
[14] J. Hahn, M. Subramani, A framework of knowledge management systems: issues types based on their pay item compositions: application to bundling, J. Construct.
and challenges for theory and practice, ICIS 2000 Proc. 28 (2000). Eng. Manage. 145 (9) (2019) 04019053.
[15] P.E. Hart, N.J. Nilsson, B. Raphael, A formal basis for the heuristic determination of [35] B. Raphael, B. Domer, S. Saitta, I.F. Smith, Incremental development of CBR
minimum cost paths, IEEE Trans. Syst. Sci. Cybernetics 4 (2) (1968) 100–107. strategies for computing project cost probabilities, Adv. Eng. Inf. 21 (3) (2007)
[16] G.T. Haugan, Effective Work Breakdown Structures, Berrett-Koehler Publishers, 311–321.
2001. [36] P. Resnik, Using information content to evaluate semantic similarity in a
[17] Y.M. Ibrahim, A.P. Kaka, E. Trucco, M. Kagioglou, A. Ghassan, Semi-automatic taxonomy. arXiv preprint cmp-lg/9511007, 1995.
development of the work breakdown structure (WBS) for construction projects. In: [37] M.M. Richter, Classification and learning of similarity measures, in: Information
Proceedings of the 4th International SCRI Research Symposium, Salford, UK, 2007. and Classification, Springer, Berlin, Heidelberg, 1993, pp. 323–334.
[18] Y. Jung, S. Woo, Flexible work breakdown structure for integrated cost and [38] H.G. Ryu, H.S. Lee, M. Park, Construction planning method using case-based
schedule control, J. Construct. Eng. Manage. 130 (5) (2004) 616–625. reasoning (CONPLA-CBR), J. Comput. Civil Eng. 21 (6) (2007) 410–422.
[19] D. Jurafsky, J.H. Martin, 2014. Speech and Language Processing, vol. 3. [39] G. Salton, M.E. Lesk, Computer evaluation of indexing and text processing, J. ACM
[20] G.H. Kim, S.H. An, K.I. Kang, Comparison of construction cost estimating models (JACM) 15 (1) (1968) 8–36.
based on regression analysis, neural networks, and case-based reasoning, Build. [40] E. Siami-Irdemoosa, S.R. Dindarloo, M. Sharifzadeh, Work breakdown structure
Environ. 39 (10) (2004) 1235–1242. (WBS) development for underground construction, Autom. Constr. 58 (2015)
[21] J.L. Kolodner, An introduction to case-based reasoning, Artif. Intell. Rev. 6 (1) 85–94.
(1992) 3–34. [41] J.F. Sowa, Semantic networks. John_Florian_Sowa isi [2012-04-20 16: 51]> Author
[22] T.K. Landauer, S.T. Dumais, A solution to Plato’s problem: the latent semantic [2012-04-20 16: 51], 2012.
analysis theory of acquisition, induction, and representation of knowledge, [42] M. Sutrisna, C.D. Ramanayaka, J.S. Goulding, Developing work breakdown
Psychol. Rev. 104 (2) (1997) 211. structure matrix for managing offsite construction projects, Arch. Eng. Des.
[23] V.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and Manage. 14 (5) (2018) 381–397.
reversals. In: Soviet Physics Doklady, vol. 10, no. 8, 1966, pp. 707–710. [43] J.H.M. Tah, V. Carr, R. Howes, Information modelling for case-based construction
[24] H.T. Lin, N.W. Chi, S.H. Hsieh, A concept-based information retrieval approach for planning of highway bridge projects, Adv. Eng. Softw. 30 (7) (1999) 495–509.
engineering domain-specific technical documents, Adv. Eng. Inf. 26 (2) (2012) [44] H.C. Tan, P.M. Carrillo, C.J. Anumba, N. Bouchlaghem, J.M. Kamara, C.E. Udeaja,
349–360. Development of a methodology for live capture and reuse of project knowledge in
[25] A. Maedche, S. Staab, Measuring similarity between ontologies. In: International construction, J. Manage. Eng. 23 (1) (2007) 18–26.
conference on knowledge engineering and knowledge management, Springer, [45] H.P. Tserng, Y.C. Lin, Developing an activity-based knowledge management system
Berlin, Heidelberg, 2002, pp. 251–263. for contractors, Autom. Constr. 13 (6) (2004) 781–802.
[26] M.L. Maher, A.G. de Silva Garza, Developing case-based reasoning for structural [46] A. Tversky, Features of similarity, Psychol. Rev. 84 (4) (1977) 327.
design, IEEE Expert 11 (3) (1996) 42–52. [47] Y.R. Wang, G.E. Gibson Jr, A study of preproject planning and project success using
[27] L. Meng, R. Huang, J. Gu, A review of semantic similarity measures in wordnet, Int. ANNs and regression models, Autom. Constr. 19 (3) (2010) 341–346.
J. Hybrid Inform. Technol. 6 (1) (2013) 1–12. [48] Z. Wu, M. Palmer, 1994. Verb semantics and lexical selection. arXiv preprint cmp-
[28] R. Mihalcea, C. Corley, C. Strapparava, July). Corpus-based and knowledge-based lg/9406033.
measures of text semantic similarity, Aaai 6 (2006) (2006) 775–780. [49] J. Zhang, N.M. El-Gohary, 2013. Information transformation and automated
[29] E. Mikulakova, M. König, E. Tauscher, K. Beucke, Knowledge-based schedule reasoning for automated compliance checking in construction. In: Computing in
generation and evaluation, Adv. Eng. Inf. 24 (4) (2010) 389–403. civil engineering, 2013, pp. 701–708.
[30] G.A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (11) (1995) [50] J. Zhang, N.M. El-Gohary, Semantic NLP-based information extraction from
39–41. construction regulatory documents for automated compliance checking,
J. Comput. Civil Eng. 30 (2) (2016) 04015014.