05 1-S2.0-S1474034620301506-Main
05 1-S2.0-S1474034620301506-Main
A R T I C L E I N F O A B S T R A C T
Keywords: Lessons learned from completed projects are valuable resources for planning of new projects. A quantitative
Project similarity similarity measurement between construction projects can improve knowledge reuse practices. The information
Construction and documents of a similar past project can be retrieved to resolve the challenges in a new project. This paper
Natural language processing
introduces a novel method for measuring the similarity of construction projects based on semantic comparison of
Work breakdown structure
Knowledge retrieval
their work breakdown structure (WBS). WBS of a project should theoretically encompass a hierarchical
Project planning decomposition of the total scope of project’s works, thus it could be used as an appropriate representative of the
projects. The proposed method measures the semantic similarity between WBS of projects by means of natural
language processing techniques. This method was implemented based on three metrics: node, structural, and
total similarity. Each of these metrics calculate a quantitative similarity score between 0 and 1. The method was
assessed using fifteen test samples with promising results in compliance with similarity properties. In addition,
precision and recall of the method were evaluated in retrieving similar past projects. The results illustrate that
the structural similarity slightly outperforms the other metrics.
* Corresponding author.
E-mail addresses: [email protected] (N. Torkanfar), [email protected] (E. Rezazadeh Azar).
https://doi.org/10.1016/j.aei.2020.101179
Received 27 May 2020; Received in revised form 2 September 2020; Accepted 17 September 2020
Available online 30 September 2020
1474-0346/© 2020 Elsevier Ltd. All rights reserved.
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
such as planning and execution phases. Quantitative similarity assess information retrieval, text classification, document clustering, topic
ment between projects can potentially improve current CBR and other detection, topic tracking, question generation, question answering,
AI methods by providing more comprehensive attributes that consider essay scoring, short answer scoring, machine translation, and text
the entire project rather than focusing on certain attributes. summarization [13]. One of the first implementations for text similarity
The research studies on quantitative measurement of similarity be measurement aimed at ranking documents in the order of their simi
tween construction projects, however, are still limited. For example, a larity to the input query [39].
recent attempt in the area of project bundling proposed a method to Two words can be similar either semantically or lexically. Lexically
quantify construction projects similarity by vectorizing the projects’ pay similar words contain strings with similar characters in their structures,
items and measuring the distance between vectors [34]. and this similarity is evaluated through a couple of string-based
Scope management of a construction project requires comprehensive methods, which are discussed in the following subsection. Semanti
assessment of the project and a main outcome of this assessment, i.e. cally similar words, however, are related by means of different relations,
Work Breakdown Structure (WBS), is used by other project management such as being synonyms, antonyms, or their utilization in the same
areas, namely project time and cost management [32]. But there is not context [13]. In other words, semantic similarity determines the relation
any research attempt to use WBS, as a hierarchical breakdown of the of words or concepts based on predetermined databases, which include
scope of a project, for similarity assessment of the projects. The outcome the relations of the words.
of this assessment can identify similar projects for better development of There are several studies on semantic analysis of texts and docu
WBS and project planning of a new project. ments in construction management domain focused on information
The aim of this research study is to develop a method to assess the analysis and retrieval, and tested for some key applications, such as text
similarity of construction projects using their WBSs. It has been classification and automated regularity compliance checking using NLP
hypothesised that the tasks and services required during the construc methods [50,49]. A method was proposed to extract semantic knowl
tion phase can be used to develop metrics to measure the similarity of edge from contract documents and to categorize and retrieve informa
construction projects. Since the WBS of a project contains hierarchical tion in electronic document management systems using NLP [1].
information about its scope, WBS was considered as a potential repre Another method was proposed to partition multi-topic documents into
sentative of construction projects. Natural language processing (NLP) several passages [24]. The partitioning approach generates passages
techniques were employed in the proposed method to extract semantic based on domain ontology. Costa et al. [8] explored a method to enrich
attributes of the work-packages. This method calculates a score between the semantic vectors by means of ontology concepts and relations. The
0 and 1 to determine the semantic similarity of two WBSs. semantic vectors were used to represent knowledge sources.
2. Background
2.3. String-based similarity measurement
2.1. Work breakdown structure (WBS)
In order to measure the string similarity between two words, Lev
enshtein [23] proposed edit distance method which identifies the dif
Project Management Institute (PMI) defines WBS as “a hierarchical
ference between two strings by the minimum number of changes
decomposition of the total scope of work to be carried out by the project team
(insertion, deletion, or substitution) needed to transform one string to
to accomplish the project objectives and create the required deliverables. The
another. For example, the distance between the strings “cat” and “hat” is
WBS organizes and defines the total scope of the project and represents the
one character (substitution of character “c” with “h”). The edit distance
work specified in the current approved project scope statement” [32]. In
method does not consider the number of strings. In another proposed
another word, the main goal of WBS is to present a complete and proper
method for syntactic similarity [25], the number of characters is also
scope of the entire project work [17].
considered as shown in Eq. (1).
The highest level of the WBS hierarchy represents the entire project
( )
and is decomposed into smaller subjects, each representing tasks that min(|c1 |, |c2 | ) − ed(c1 , c2 )
simsyn (c1 , c2 ) = max 0, (1)
should be performed for the higher-level subject to be completed. The min(|c1 |, |c2 | )
process of subdividing continues until the tasks could not be decom
These two approaches calculate similarity without considering se
posed any further (or it is not reasonable to do that). The lowest level
mantics of inputs. Therefore, lexical similarity methods do not reliably
entries in this structure represent work packages. The responsibility of
provide an accurate similarity measurement. For instance, similarity
the performance of each work package is assigned to an individual, unit,
(simsyn ) between the concepts “reinforcement” and “rebar” would not
or organization [16]. The project management body of knowledge [32]
return a high score of similarity, even though these two concepts are
provides generic guideline for creating an appropriate WBS; however,
semantically related to a great degree. It was shown that semantic
the complex and fragmented nature of construction projects, such as
similarity algorithms outperform simple lexical methods with a 13%
coordination of multiple players (e.g. subcontractors, contract admin
error rate reduction [28].
istrators, and suppliers), brings about specific challenges in creating
WBSs.
There are research studies on various aspects and application of WBS 2.4. Semantic similarity measurement
in construction management, such as proposed methods for automated
WBS development [40], WBS-based project documentation [31], Semantic similarity measurement methods have been developed
combining off-site and on-site WBSs [42], and WBS-based integration of using corpus-based and knowledge-based algorithms. A corpus is a large
project cost and time [18]. Nonetheless, none of these studies have structured set including written or spoken texts for the purpose of lan
focused on WBS-based similarity assessment of the construction guage processing. The corpus-based semantic similarity determines the
projects. similarity of various words by utilizing a large corpus. Latent semantic
analysis (LSA) [22] is one of the most popular methods for obtaining
2.2. Text similarity measurements corpus-based similarity. LSA hypothesizes that reoccurring of the same
words in similar pieces of texts is an indication for their proximate
Natural language processing (NLP) is a research area that focuses on meaning [22].
enabling computers to understand natural language text and speech [7]. The knowledge-based similarity is another type of semantic simi
Measuring similarity between words, sentences, paragraphs, and docu larity that measures similarity by using embedded information in se
ments has been used for a long time in several NLP related fields, such as mantic networks. A semantic network is a knowledge base which
2
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
represents semantic relations of concepts using networks [41]. required to build a project during its construction phase. A method was
WordNet is a popular software tool in the field of knowledge-based developed in Python programming environment to compare docu
semantic similarity measurement, which was produced as a result of a mented construction projects with a targeted project, based on their
comprehensive research program at Princeton University [30] and is WBSs. Following subsections describe the elements of this method.
utilized as a lexical reference of English language. In WordNet, English
nouns, verbs, and adjectives are organized in synonym sets and these 3.1. WBS encoding
sets are related together by means of semantic relations [27]. A variety
of semantic relations have been developed in WordNet including (but The WBS information are exported from Microsoft Project file of the
not limited to) synonymy, autonomy, hyponymy, and membership [30]. sample projects to a spreadsheet format file (such as Microsoft Excel).
By exploiting these relations, semantic hierarchy structures are devel Fig. 1 shows the tasks and WBS codes in spreadsheet format for small
oped and these hierarchies could be useful in semantic computations. parts of two simplified projects (drastically shortened for the purpose of
There are several different methods for semantic similarity mea representation) that belong to a “House project” and a “Bridge project”.
surements based on WordNet, such as path-based, information content- Each node in the WBS hierarchy contains two main information: the
based [36], feature-based [46], and hybrid measurements. This research node’s task, and the node’s code which locates each element in the hi
utilizes a method that calculates the similarity between two concepts erarchy. WBS hierarchies of the projects were written in eXtensible
based on their depth in the taxonomy [48]. It computes similarity based Markup Language (XML) to encode this information into a machine-
on the position of concepts c1 and c2 , as well as the lowest common readable format [4]. Fig. 2 depicts a part of a WBS of a building proj
subsumer lso(c1 , c2 ). In Eq. (2), the function len(c1 , c2 ) measures the ect which is encoded in the XML format. As shown in Fig. 2, each
length of the shortest path from the concept c1 to concept c2 , and the element contains a text and a numerical attribute, where the text rep
depth measures the length of the path from each concept to the root resents the task and is followed with the attribute of the level of the task
element [48]. in the hierarchy. For instance, the XML element in line 8 (in Fig. 2)
2*depth(lso(c1 , c2 ) ) contains a task which is called “earthworks” and its level is 1.2 (i.e. the
simWP (c1 , c2 ) = (2) second task in the second level of WBS).
len(c1 , c2 ) + 2*depth(lso(c1 , c2 ) )
This paper proposes a method to quantify similarity of construction The first step in measuring the similarity of two WBSs is to compare
projects based on semantic and structural metrics derived from their the tasks within the WBS nodes. There are two important issues in
WBSs. The WBS of a project includes some nodes, which are labelled measuring the similarity of the tasks. First, their naming is subjective to
with tasks required to complete that project. The focus of this research is project managers. For instance, “Rebar placement” and “Reinforcement
to quantify the similarity of construction projects based on the tasks installation” are not the same strings, but both of them represent the
Fig. 1. WBS codes and hierarchy of tasks; (a) “House project” (b) “Bridge project”.
3
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
Fig. 2. Segment of the written XML for WBS of a steel structure building.
same task. Thus, two tasks should not contain the exact same texts to be one of the word-to-word similarity measures (previously explained) to
considered similar. This problem can be addressed by including se find the most semantically similar word from segment T2
mantic similarity measurements of tasks instead of simple string (maxSim(w, T2 )). The same procedure will determine the most similar
measurements. word in T1 starting with the words in T2 . These similarities are then
On the other hand, the semantic equivalence of tasks does not weighted with corresponding word specificity. The specificity of words
necessarily result in similarity of their nodes. For example, there are two idf(w) gives higher scores to the specific words compared to the generic
nodes with “concrete pouring” as label, but they might represent concepts such as “get” or “become” [28]. This method measures the
different tasks, where one can represent concrete pouring for a column semantic similarity of two segments as presented in Eq. (3).
(s) and the other one is for a beam(s). (∑
To address the above-mentioned issues, the proposed method de 1 w∈{T1 } (maxSim(w, T2 )*idf (w) )
sim(T1 , T2 ) = ∑
termines the similarity of two WBSs through the following three metrics. 2 w∈{T1 } idf (w)
∑ )
w∈{T2 } (maxSim(w, T1 )*idf (w) )
(1) Semantic similarity, in which the semantic similarity of the tasks + ∑ (3)
w∈{T2 } idf (w)
within the compared nodes is measured;
(2) Parent similarity, which measures the semantic similarity of the This method can be adjusted to a more appropriate one by elimi
parents of the compared nodes; nating the word specificity weight. The reason behind this decision is
(3) Siblings similarity, which measures the semantic similarity of that in this case, the tasks are phrases with a very few component words
siblings (nodes from a common parent) of the compared nodes. which are mostly specific to the construction domain rather than being
generic concepts.
Using Wu and Palmer method [48] as a word-to-word semantic
3.3. Semantic similarity similarity measurement and considering the above-mentioned assump
tions, the semantic similarity between taski and taskj is calculated using
WordNet [30] was utilized to measure the semantic similarity of the Eq. (4). In this approach for each word w in the taski , the most seman
node’s tasks in the proposed method. Tasks are usually expressed as a ( )
tically similar word from taskj (maxSim w, taskj wup ) is found by means
phrase that contains a few words. There are different methods for
measuring the semantic similarity of two sentences or phrases by aver of the Wu and Palmer [48] method. The same procedure will determine
aging semantic similarity of their words, such as a method proposed by the most similar word in taski starting with the words in taskj .
Mihalcea et al. [28]. To measure the semantic similarity of two text
segments T1 and T2 , for each word w in the segment T1 , this method uses
4
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
(∑ ( ) ∑ )
( ) 1 w∈{taski } (maxSim w, taskj ) w∈{taskj } (maxSim(w, taski )wup )
(4)
wup
simsemantic taski , taskj = ∑ + ∑
2 w∈{taski } 1 w∈{taskj } 1
For example, Eq. (4) results in a similarity score of 0.9 for tasks Also, each parent is generated from an upper-level element which cre
labeled as “reinforcement installation” and “reinforcement placement”. ates a sequence of parents for each node. This metric determines the
This similarity is less than one, because word-to-word similarity of the similarity of the sequence of parents and is calculated by averaging their
“installation” and “placement” is lower than one, and therefore it de semantic similarity. Therefore, this method reduces the similarity of the
creases the total semantic similarity to 0.9. tasks which belong to different parts of compared projects, such as in
two matched “concrete pouring” tasks belonging to foundation and
3.4. Word-to-word semantic similarity measurements shear wall construction accordingly.
Since considering all ancestors of a node requires a large amount of
Since in WordNet the relations between concepts are based on syn calculations, the least similar parent (LSP) is defined. LSPs are the first
sets, an algorithm was required to find the similarity between words pair of parents in the sequence of two given nodes’ parents that are not
rather than synsets [19]. WordNet defines synsets as sets of synonyms semantically similar (less than the defined threshold). This method only
composed of nouns, verbs, adjectives, or adverbs that each expresses a considers the parents up to LSP. Given nodes n and m from WBSLN11 ,
unique concept [33]. Thus, to compute the semantic similarity of two WBSLN22 respectively, the parent similarity between them (simparents (n, m))
words by utilizing WordNet, one synset from each word should be is calculated using Eq. (7). LLSP − Ln is the difference between levels of
selected. The comparison of chosen synsets results in the semantic ( )
node n and its LSP, and simsemantic ith parents is the semantic similarity
similarity of two words. In construction domain, however, some of the
applied words do not have a special meaning in regular vocabulary re between ith parents of nodes n and m.
sources, such as WordNet. To address this issue, technical words were ∑LLSP − Ln
(LLSP − Ln − (i − 1) ) × simsemantic (ithparents)
replaced by meaningful terms defined in the WordNet. For example, the simparents (n, m) = i=1 ∑LLSP − Ln
word ‘HVAC’ was replaced with ‘Heating Ventilation and Air Condi i=1 (i)
tioning’ or the word “rebar” was replaced with “reinforcement”. In (7)
addition, the words that are not defined in WordNet were compared For instance, Fig. 3 shows the first two parents of nodes n1 and m1
lexically by the string-based method which was introduced in Eq. (1). with a semantic similarity of 0.8, which is more than an arbitrarily
A simplified method has been used in this study to measure the se defined threshold of 0.5. In this example, the next two parents have a
mantic similarity of two words. In this approach, the system approxi similarity of 0.2 (i.e. less than threshold of 0.5), and therefore they are
mates the similarity of two words by using a pair of their synsets that defined as LSP. In this case, parent similarity is calculated by the
result in maximum similarity, as shown in Eq. (5). following function.
word − to − wordsimilairty (w1 , w2 ) = max(similarity(C1 , C2 ) ) (5) 2*0.8 + 1*0.2
simparents (n1 , m1 ) = = 0.6
2+1
C1 ∈ synsets(w1 ), C2 ∈ synsets(w2 )
The results of parent similarity between nodes from WBSLN11 , WBSLN22
Using the word-to-word similarity measure and the proposed method
are presented using the matrix shown in Eq. (8).
for measuring semantic similarity of two phrases, semantic similarity of
⎡ ( )⎤
two tasks was calculated. Assuming WBSLN11 and WBSLN22 are two WBSs, in ( ) simparents (n1 , m1 ) ⋯ simparents n1 , mj
L1 L2 ⎣ ⎦
which L1 and L2 represent the total number of levels that each WBS hi simparents WBSN1 , WBSN2 = ⋮ ⋱ ⋮( )
erarchy contains (e.g. the WBS illustrated in Fig. 5 has three levels and simparents (ni , m1 ) ⋯ simparents ni , mj
its L is three). Moreover, N1 and N2 represent the finite sets of WBS’s (8)
nodes (N1 : (n1 , n2 , ⋯, nN ) and N2 : (m1 , m2 , ⋯, mM )). The results of these
pairwise comparisons between nodes ni and mj from WBSLN11 , WBSLN22 will 3.6. Comparison of the nodes’ siblings
form a matrix shown in Eq. (6). This matrix represents the semantic
similarity of tasks between nodes ni and mj . In a WBS, nodes generated from the same parent are called siblings.
⎡ ( )⎤ Similarity of the nodes’ siblings in two WBSs can also enhance the
( ) simsemantic (n1 , m1 ) ⋯ simsemantic n1 , mj possibility that the nodes’ tasks are rather similar. To calculate the
simnodes WBSN1 , WBSN2 = ⎣
L1 L2
⋮ ⋱ ⋮( )
⎦
simsemantic (ni , m1 ) ⋯ simsemantic ni , mj
(6)
The proposed method considers two nodes semantically similar if
they have a semantic similarity more than a user-defined threshold be
tween 0 and 1. In addition, to reduce computation effort, the system only
computes the other node similarity metrics (parent similarity and sib
lings similarity) for the nodes that are semantically similar (more than
the threshold). The effects of different thresholds on the accuracy of the
system are explored in section 4 “Experimental results”.
In a WBS, except the root element (i.e. the highest level), each node is
subdivided from an upper-level element, which is the parent of the node. Fig. 3. Parent similarity between nodes n1 and m1.
5
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
sibling similarity between nodes ni and mi , their siblings are compared to the highest level. Since the lowest level of WBS usually contains the
one by one and any two siblings which are semantically similar (i.e. task with the highest level of details, level of details of each node is
simsemantic > threshold) are considered matched together. Thus, (siblingni , assessed based on the distance between its level and the lowest level in
siblingmj ) can be defined as a tuple that includes the pairs of matched the WBS hierarchy. For this purpose, the system defines a weight be
siblings from nodes ni and mj (Eq. (9)). tween 0 and 1, which determines distance between the level of details of
two nodes.
( ) ( )
matched siblings ni , mj = siblingni , siblingmj (9) Bottomlevel is the level of nodes which is numbered starting from the
lowest level in the hierarchy. For example, bottomlevel and regular level of
As a result, the sibling similarity score between nodesni and mi is nodes in WBS (“House project”) are indicated in Fig. 5.
calculated using Eq. (10), which is obtained by dividing the total num This weight is calculated by the absolute difference between
ber of matched siblings by the total number of siblings. bottomlevel of two nodes, divided by the maximum number of levels that
⃒ ( )⃒ two WBS have (Eq. (14)).
( ) ⃒matchedsiblings ni , mj ⃒
simsiblings ni , mj = ⃒ ⃒ ⃒ ⃒ (10) (
⃒( ( ))
) ⃒ bottomlevel (ni ) − bottomlevel mj
⃒
⃒
⃒siblingsn ⃒ + ⃒⃒siblingsm ⃒⃒ levelscores ni , mj = ⃒⃒ − 1 ⃒⃒ (14)
i j
max(L1 , L2 )
For example, the sibling similarity between nodes ni and mj with only
For example, levelscores for nodes “Columns” and “Road” from the
one pair of matched siblings in Fig. 4 is calculated by the function “House project” and “Bridge project” in Fig. 1 is calculated as,
bellow. ⃒ ⃒
⃒(1 − 2) ⃒
( ) 2*1 levelscores ("Columns", "Road") = ⃒⃒ − 1 ⃒⃒ = 0.66
simsiblings ni , mj = = 0.5 3
2+2
A sibling similarity matrix, which contains a pairwise comparison of and for “Columns”, “Girders” is calculated as,
nodes ni and mj from WBSLN11 and WBSLN22 , can be expressed as shown in ⃒
⃒(1 − 1)
⃒
⃒
Eq. (11). levelscores ("Columns", "Girders") = ⃒⃒ − 1 ⃒⃒ = 1
3
⎡ ( )⎤
( ) simsiblings (n1 , m1 ) ⋯ simsiblings n1 , mj This score increases the chance of node "Columns" to be mapped to
simsiblings WBSN1 , WBSN2 = ⎣
L1 L2
⋮ ⋱ ⋮( )
⎦
"Girders" instead of the node "Road" with a lower levelscores . The following
simsiblings (ni , m1 ) ⋯ simsiblings ni , mj matrix is used to contain node to node levelscores for the nodes of two
(11) WBSs (see Eq. (15)).
⎡ ( )⎤
3.7. Average similarity of compared nodes ( ) levelscores (n1 , m1 ) ⋯ levelscores n1 , mj
L1 L2
levelscores WBSN1 , WBSN2 = ⎣ ⋮ ⋱ ⋮( ⎦
)
levelscores (ni , m1 ) ⋯ levelscores ni , mj
The average similarity matrix represents the average node to node
(15)
similarity between nodes of WBSLN11 and WBSLN22 , which is calculated by
means of Eq. (12) and presented as Eq. (13). By multiplying matrixes levelscores and simaverage , a matrix is formed
which contains the required scores that can be used to find the mapped
simnodes + simparents + simsiblings
simaverage = (12) nodes (see Eq. (16)).
3
( )
⎡ ( )⎤ mappingscores WBSLN11 ,WBSLN22
( ) simaverage (n1 , m1 ) ⋯ simaverage n1 , mj ⎡ ( )⎤
simaverage WBSLN11 , WBSLN22 =⎣ ⋮ ⋱ ⋮( ⎦ mappingscore (n1 ,m1 ) ⋯ mappingscore n1 ,mj
) =⎣ ⋮ ⋱ ⋮ ( )
⎦ (16)
simaverage (ni , m1 ) ⋯ simaverage ni , mj
mappingscore (ni ,m1 ) ⋯ mappingscore ni ,mj
(13)
The system searches through the mappingscores matrix to find the
3.8. Mapping of nodes highest mapping score and when the highest score is found, the system
will use that for mapping corresponding nodes and removes them for
Each node from the first WBS will be mapped to a node from the finding the other matched paired in the next runs. The system continues
second WBS with the highest average similarity. The highest average this procedure until all the possible nodes are mapped.
( )
similarity must be more than the determined threshold. This threshold is Mappednodes is a list of tuples ni , mj , simaverage , in which ni and mj
considered to prevent mapping of irrelevant nodes which have a se are mapped together with the average similarity of simaverage (Eq. (17)).
mantic similarity score below the threshold. {( )}
mappednodes = ni , mj , simaverage (17)
In some cases, there could be more than one node with the same
highest simaverage . In these cases, the system prefers the nodes with a
ni ∈ N1
closer level of details. Level of details of the nodes depends on their level
in the WBS hierarchy. Details in the hierarchy decreases from the lowest
6
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
Fig. 5. WBS of the “House project”; (a): Regular level, (b): Bottom level.
( ) ( )
( ) DE WBSLN11 , WBSLN22 + SE WBSLN11 , WBSLN22
Structuralsimilarity WBSLN11 , WBSLN22 = 1 − (22)
2
7
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
Expert 3 Bridgeconstruction3 B3
The final score determines the Total similarity between WBSLN11 and
concretestructurebuilding3 C3
WBSLN22 ,which is calculated by the average (see Eq. (23)) of Node simi
steelstructurebuilding3 S3
larity (Eq. (18)) and Structural similarity (Eq. (22)) scores. This final Roadmaintenance3 M3
measurement produces a score between 0 and 1, in which 0 is hypo hotelbuilding3 H3
thetically resulted from the comparison of two completely different
projects, and 1.0 is resulted for two exact similar projects.
( ) ( )
( ) Node similarity WBSLN11 , WBSLN22 + structural similarity WBSLN11 , WBSLN22
Total similarity WBSLN11 , WBSLN22 = (23)
2
Fig. 6. The 3D model of the steel structure building project (roof was sectioned to provide internal details).
8
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
4.1. Results
9
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
10
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
Table 3
Comparing S3_a with the stored samples using different thresholds.
Quired sample Thresholds
Retrieved sample Total similarity Retrieved sample Total similarity Retrieved sample Total similarity Retrieved sample Total similarity
largest to the lowest similarity based on the total similarity metric. The 7. Data availability
results show that S1 and S2 were among the top three retrieved cases in
all thresholds, which indicates that the method was able to retrieve the The source code of the developed program (in Python programming
similar projects to an incomplete WBS with a precision score between language) is publicly available and can be found at: https://osf.
0.66 and 1.00, depending on the applied threshold. io/b8qvy/ Data generated or analysed during the experiments are
available from the corresponding author upon reasonable request.
5. Conclusion
Declaration of Competing Interest
Reuse of the knowledge and experiences gained from completed
construction projects can improve planning of the new projects. In order The authors declare that they have no known competing financial
to reuse knowledge, finding similar past projects is critical. This research interests or personal relationships that could have appeared to influence
was undertaken to develop quantitative similarity metrics, to measure the work reported in this paper.
the similarity of construction projects using the WBS as their represen
tative. These metrics were implemented using NLP techniques written in Acknowledgement
Python programming language. The similarity metrics were evaluated
based on two sets of experiments: First the metrics were tested for the This research project was funded by Discovery grant RGPIN-2015-
similarity properties fulfilment, including symmetry and reflexivity; 03812 from Natural Sciences and Engineering Research Council of
second, the metrics were tested to search among test samples and to find Canada.
the relevant cases to the given samples.
The results show promising outcomes in compliance with similarity References
properties (i.e. symmetry and reflexivity) with small errors. The results
on the second part of the experiments, which were the main focus of this [1] M. Al Qady, A. Kandil, Concept relation extraction from construction documents
research, revealed that the structural similarity metric had the best using natural language processing, J. Construct. Eng. Manage. 136 (3) (2010)
294–302.
performance in retrieval of similar projects with thresholds in the range
[2] M. Alavi, D.E. Leidner, Knowledge management and knowledge management
of 0.7 to 0.75. systems: conceptual foundations and research issues, MIS Quarterly (2001)
107–136.
[3] S.H. An, G.H. Kim, K.I. Kang, A case-based reasoning cost estimating model using
6. Future works
experience by analytic hierarchy process, Build. Environ. 42 (7) (2007)
2573–2579.
The proposed method could identify similar projects using their [4] T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, 2000. Extensible Markup
WBSs. But the future research can investigate inclusion of major quan Language (XML) 1.0. W3C Recommendation 6 October 2000. Available via the
World Wide Web at http://www.w3.org/TR/1998/REC-xml-19980210.
titative attributes, such as work quantity of the tasks and their duration, [5] M. Buckland, F. Gey, The relationship between recall and precision, J. Am. Soc.
to enhance the similarity assessment of the construction projects. The Inform. Sci. 45 (1) (1994) 12–19.
vocabulary source in this research (i.e. WordNet) is a general and [6] S.H. Chen, A.J. Jakeman, J.P. Norton, Artificial intelligence techniques: an
introduction to their use for modelling environmental systems, Math. Comput.
comprehensive source, and might not be able to provide flawless simi Simul 78 (2–3) (2008) 379–400.
larity assessments for some technical terms. Developing a specialized [7] G.G. Chowdhury, Natural language processing, Ann. Rev. Inform. Sci. Technol. 37
resource for construction technical words is a valuable opportunity for (2003) 51–89.
[8] R. Costa, C. Lima, J. Sarraipa, R. Jardim-Gonçalves, Facilitating knowledge sharing
future research. Lastly, this study focused on development of a method and reuse in building and construction domain: an ontology-based approach,
to quantify the similarity of construction projects using their WBS and J. Intell. Manuf. 27 (1) (2016) 263–282.
did not explore retrieval of the information and documents of projects. [9] R. Dijkman, M. Dumas, B. Van Dongen, R. Käärik, J. Mendling, Similarity of
business process models: metrics and evaluation, Inform. Syst. 36 (2) (2011)
The future work can investigate integration of the proposed method
498–516.
with the existing knowledge retrieval systems, namely case-based [10] M. Ehrig, A. Koschmider, A. Oberweis, Measuring similarity between semantic
reasoning. business process models, in: Proceedings of the fourth Asia-Pacific Conference on
Comceptual modelling, vol. 67, 2007, pp. 71–80.
[11] S. Gasik, A model of project knowledge management, Project Manage. J. 42 (3)
(2011) 23–44.
11
N. Torkanfar and E. Rezazadeh Azar Advanced Engineering Informatics 46 (2020) 101179
[12] Y.M. Goh, D.K.H. Chua, Case-based reasoning approach to construction safety [31] J. Park, H. Cai, WBS-based dynamic multi-dimensional BIM database for total
hazard identification: adaptation and utilization, J. Construct. Eng. Manage. 136 construction as-built documentation, Autom. Constr. 77 (2017) 15–23.
(2) (2010) 170–178. [32] PMBOK® Guide, 2017. Sixth edition, Project Management Institute.
[13] W.H. Gomaa, A.A. Fahmy, A survey of text similarity approaches, Int. J. Comput. [33] Princeton University “About WordNet.” WordNet. Princeton University, 2010.
Appl. 68 (13) (2013) 13–18. [34] Y. Qiao, J.D. Fricker, S. Labi, Quantifying the similarity between different project
[14] J. Hahn, M. Subramani, A framework of knowledge management systems: issues types based on their pay item compositions: application to bundling, J. Construct.
and challenges for theory and practice, ICIS 2000 Proc. 28 (2000). Eng. Manage. 145 (9) (2019) 04019053.
[15] P.E. Hart, N.J. Nilsson, B. Raphael, A formal basis for the heuristic determination of [35] B. Raphael, B. Domer, S. Saitta, I.F. Smith, Incremental development of CBR
minimum cost paths, IEEE Trans. Syst. Sci. Cybernetics 4 (2) (1968) 100–107. strategies for computing project cost probabilities, Adv. Eng. Inf. 21 (3) (2007)
[16] G.T. Haugan, Effective Work Breakdown Structures, Berrett-Koehler Publishers, 311–321.
2001. [36] P. Resnik, Using information content to evaluate semantic similarity in a
[17] Y.M. Ibrahim, A.P. Kaka, E. Trucco, M. Kagioglou, A. Ghassan, Semi-automatic taxonomy. arXiv preprint cmp-lg/9511007, 1995.
development of the work breakdown structure (WBS) for construction projects. In: [37] M.M. Richter, Classification and learning of similarity measures, in: Information
Proceedings of the 4th International SCRI Research Symposium, Salford, UK, 2007. and Classification, Springer, Berlin, Heidelberg, 1993, pp. 323–334.
[18] Y. Jung, S. Woo, Flexible work breakdown structure for integrated cost and [38] H.G. Ryu, H.S. Lee, M. Park, Construction planning method using case-based
schedule control, J. Construct. Eng. Manage. 130 (5) (2004) 616–625. reasoning (CONPLA-CBR), J. Comput. Civil Eng. 21 (6) (2007) 410–422.
[19] D. Jurafsky, J.H. Martin, 2014. Speech and Language Processing, vol. 3. [39] G. Salton, M.E. Lesk, Computer evaluation of indexing and text processing, J. ACM
[20] G.H. Kim, S.H. An, K.I. Kang, Comparison of construction cost estimating models (JACM) 15 (1) (1968) 8–36.
based on regression analysis, neural networks, and case-based reasoning, Build. [40] E. Siami-Irdemoosa, S.R. Dindarloo, M. Sharifzadeh, Work breakdown structure
Environ. 39 (10) (2004) 1235–1242. (WBS) development for underground construction, Autom. Constr. 58 (2015)
[21] J.L. Kolodner, An introduction to case-based reasoning, Artif. Intell. Rev. 6 (1) 85–94.
(1992) 3–34. [41] J.F. Sowa, Semantic networks. John_Florian_Sowa isi [2012-04-20 16: 51]> Author
[22] T.K. Landauer, S.T. Dumais, A solution to Plato’s problem: the latent semantic [2012-04-20 16: 51], 2012.
analysis theory of acquisition, induction, and representation of knowledge, [42] M. Sutrisna, C.D. Ramanayaka, J.S. Goulding, Developing work breakdown
Psychol. Rev. 104 (2) (1997) 211. structure matrix for managing offsite construction projects, Arch. Eng. Des.
[23] V.I. Levenshtein, Binary codes capable of correcting deletions, insertions, and Manage. 14 (5) (2018) 381–397.
reversals. In: Soviet Physics Doklady, vol. 10, no. 8, 1966, pp. 707–710. [43] J.H.M. Tah, V. Carr, R. Howes, Information modelling for case-based construction
[24] H.T. Lin, N.W. Chi, S.H. Hsieh, A concept-based information retrieval approach for planning of highway bridge projects, Adv. Eng. Softw. 30 (7) (1999) 495–509.
engineering domain-specific technical documents, Adv. Eng. Inf. 26 (2) (2012) [44] H.C. Tan, P.M. Carrillo, C.J. Anumba, N. Bouchlaghem, J.M. Kamara, C.E. Udeaja,
349–360. Development of a methodology for live capture and reuse of project knowledge in
[25] A. Maedche, S. Staab, Measuring similarity between ontologies. In: International construction, J. Manage. Eng. 23 (1) (2007) 18–26.
conference on knowledge engineering and knowledge management, Springer, [45] H.P. Tserng, Y.C. Lin, Developing an activity-based knowledge management system
Berlin, Heidelberg, 2002, pp. 251–263. for contractors, Autom. Constr. 13 (6) (2004) 781–802.
[26] M.L. Maher, A.G. de Silva Garza, Developing case-based reasoning for structural [46] A. Tversky, Features of similarity, Psychol. Rev. 84 (4) (1977) 327.
design, IEEE Expert 11 (3) (1996) 42–52. [47] Y.R. Wang, G.E. Gibson Jr, A study of preproject planning and project success using
[27] L. Meng, R. Huang, J. Gu, A review of semantic similarity measures in wordnet, Int. ANNs and regression models, Autom. Constr. 19 (3) (2010) 341–346.
J. Hybrid Inform. Technol. 6 (1) (2013) 1–12. [48] Z. Wu, M. Palmer, 1994. Verb semantics and lexical selection. arXiv preprint cmp-
[28] R. Mihalcea, C. Corley, C. Strapparava, July). Corpus-based and knowledge-based lg/9406033.
measures of text semantic similarity, Aaai 6 (2006) (2006) 775–780. [49] J. Zhang, N.M. El-Gohary, 2013. Information transformation and automated
[29] E. Mikulakova, M. König, E. Tauscher, K. Beucke, Knowledge-based schedule reasoning for automated compliance checking in construction. In: Computing in
generation and evaluation, Adv. Eng. Inf. 24 (4) (2010) 389–403. civil engineering, 2013, pp. 701–708.
[30] G.A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (11) (1995) [50] J. Zhang, N.M. El-Gohary, Semantic NLP-based information extraction from
39–41. construction regulatory documents for automated compliance checking,
J. Comput. Civil Eng. 30 (2) (2016) 04015014.
12