(Asce) Co 1943-7862 0001846
(Asce) Co 1943-7862 0001846
Abstract: New forms of data science, including machine learning and data analytics, are enabled by machine-readable information but are
not widely deployed in construction. A qualitative study of information flow in three projects using building information modeling (BIM) in
the late design and construction phase is used to identify the challenges of codification that limit the application of data science. Despite
substantial efforts to codify information with common data environment (CDE) platforms to structure and transfer digital information within
and between teams, participants work across multiple media in both structured and unstructured ways. Challenges of codification identified in
Downloaded from ascelibrary.org by Rice University on 04/27/20. Copyright ASCE. For personal use only; all rights reserved.
this paper relate to software usage (interoperability, information loss during conversion, multiple modelling techniques), information sharing
(unstructured information sharing, drawing and file based sharing, document control bottlenecks, lack of process change), and construction
process information (loss of constraints and low level of detail). This paper contributes to the current understanding of data science in
construction by articulating the codification challenges and their implications for data quality dimensions, such as accuracy, completeness,
accessibility, consistency, timeliness, and provenance. It concludes with practical implications for developing and using machine-readable
information and directions for research to extract insight from data and support future automation. DOI: 10.1061/(ASCE)CO.1943-
7862.0001846. © 2020 American Society of Civil Engineers.
Author keywords: Building information modeling (BIM); Codification; Artificial intelligence (AI); Automation; Data science; Machine
readability; Construction.
across sectors, poor data quality is costing $3.1 trillion in the preted easily by machines. The first one is to index the data to make
United States (Quintero et al. 2015). In addition, the poor quality it digitally accessible by storing it on online servers so that it can be
of data is increasing operational costs, decreasing revenue, and re- easily accessed by computers as well as people. This relates to the
sulting in missed commercial opportunities (Loshin 2010). Within accessibility dimension of data quality. Indexing the data and stor-
construction, Sacks et al. (2017) have described how the quality of ing it online increases the accessibility because it is easy to find.
input information influences semantic enrichment of BIM when The second rule is to structure the data with relevant schemas for
using machine learning. Moreover, Farias et al. (2018) have shown easy interpretation by machines. This step makes the data struc-
the effects of poor quality data resulting in wrong inferences when tured such that semantic relations are embedded within it, resulting
they tried to extract building views using a rule-based method. in better inferencing and thus improving the syntactic accuracy of
Whyte et al. (2016) articulated how managing change in large the data. The third rule is to make the schemas public and machine
datasets becomes a focus in an era of big data in which project readable by using open-source schemas to describe the data model
information is increasingly characterized by volume, velocity, and so that interpretations can be made by computers without any pro-
variety. Recent work has further characterizes such data as also in- prietary data interfaces. Proprietary data formats limit data infer-
cluding characteristics of veracity and value (e.g., Younas 2019). encing as the schema by which data is modeled is only accessible
These issues of data quality occur because construction data is to few applications. Therefore, using open-source schema would
heterogeneous, and its veracity is not always known. Data cleaning increase the measure of accessibility dimension because more ap-
related to the variety (heterogeneity) and veracity characteristics of plications can use the open schema to derive the context of data
big data is especially difficult when compared to data cleaning re- for inferencing. The last rule is to link the data with other datasets
lated to other characteristics such as volume and velocity (Fan 2015; so that better inferences can be made by deriving the context in-
Janssen et al. 2017). Therefore, there is a need to keep the data of the formation. This improves the consistency dimensions of data
highest quality and in a machine-readable format (maintaining the quality because the same data is linked to multiple datasets. Link-
data relationships) as far as possible to have the best inferencing. ing would ensure that there are no conflicts in the data about a
concept stored in multiple databases. Based on these rules, struc-
tured information can be classified into five types in the increasing
Quality and Machine Readability order of machine readability, as shown in Table 1. Increasing the
What constitutes a good quality dataset? According to Wang and machine readability of the data, in turn, increases the data quality
Strong (1996), a good quality dataset is the one that has enough in- because the dimensions relating to accessibility, accuracy, and
formation embedded in it for a particular use by the user. Research- consistency are improved in the process.
ers have set out multiple dimensions to assess the data quality
concerning big data analytics (Batini and Scannapieco 2016; Cai Limits of Existing Research
and Zhu 2015; Delone and McLean 2003; Naumann and Rolker
2000; Wang and Strong 1996). For this paper, the focus is on the What is limiting the generation of good quality machine-readable
following data quality dimensions based on Batini and Scannapieco information within the sector? It does not appear to be technical
(2016) because they best reflect the implications of codification chal- development because novel technical solutions are being developed
lenges. Accuracy is the closeness of the measured/represented data by construction informatics researchers with a focus on the integra-
and reality. There are two kinds of accuracies: semantic and syntac- tion of data in the sector; for example, through the use of data stan-
tic. Semantic accuracy relates to the closeness of the data value to dards (Krijnen and Beetz 2017; Pazlar and Turk 2008), cloud-based
reality, whereas syntactic accuracy refers to the closeness of the data BIM (Beetz et al. 2010; Singh et al. 2011; Zhang et al. 2017a), and
representation to the expected data type/model. Completeness is the linked-data technologies (Kim et al. 2018; Pauwels et al. 2015;
measure of information content present in the data compared to the
extent of information content required to be present in the data to Table 1. Level of machine readability of the data based on linked data
perform a particular task. Temporal dimensions refer to currency, principles set out by Berners-Lee (2006)
volatility, and timeliness. Currency relates to the promptness of data
updates. Volatility refers to the frequency at which the data variance Quality of data Principles for publishing a machine-readable data set
occurs. Timeliness refers to the suitability of the current data to per- 1-star Data is available on the web
form a task. Consistency refers to uniformity and constancy of data 2-star 1-star data structured in a proprietary format
with respect to the semantic rules defined over multiple data items. 3-star 1-star data structured in a nonproprietary format
Accessibility refers to the ability of data to be accessed by a user 4-star 3-star data that is published using open standards
5-star 4-star data with links to other 4-star datasets
(human user or computer program) and generate information from
construction documents and models, and studied the software also present, but there were also issues of unstructured communi-
tools used to understand the information embedded in the BIMs, cation channels, document control, and lack of skills to adopt dig-
construction schedules, and other reports such as design calcu- ital technologies. In the water infrastructure project, additional
lation, method statements, and so on. These documents were issues were identified concerning problems with CDE, lack of
centrally hosted on a CDE, and the first author had access to it process codification, and long processing times. In the water infra-
while being there at the office. structure project, the design workflow, work package plan, con-
• Metro rail infrastructure project in India (Case 2): To examine struction program, BIM models, and drawings in the CDE were
this case, documents such as the BIM execution plan, presenta- studied to understand the level of detail and machine readability
tion documents for training, and press releases were studied to of the documents. Coding was done on the field notes and interview
understand digital information management practices. The pro- transcripts to identify different issues related to codification and
ject manager, chief site engineer, casting yard engineer, and information sharing. The software was used to track the patterns
BIM consultants, who form a cohort of the key decision makers emerging from these data. These codes were organized to find
during the construction stage, were interviewed informally to themes. The identified themes were then analyzed based on the data
get an insight into the extent of codification in the information quality dimensions to understand their implications about data
flow during their daily work practices. Field notes were taken quality.
during the interview. In addition to these interviews, the casting
yard, viaduct construction site, and a station site were visited to
understand the on-ground practice of various activities. Further Codification Challenges in Construction
insight into this case was obtained through a workshop, co-
Table 2 summarizes the codification challenges observed from
organized by the authors, with 40 participants, including client
studying the projects. Low machine readability of data is a signifi-
representatives of six major Indian metro-rail projects along
cant challenge for codification, which was observed across the proj-
with technology providers and delivery teams. The workshop
ects. Product information is well codified through BIM, CAD
provided a perspective on the digitization of this project in drawings, analysis models, and so on in all the projects. However,
the broader landscape of Indian metro rail construction. the codified information is distributed amongst different formats
• Water infrastructure project in the United Kingdom (Case 3): To and databases, limiting the application of analytics. In addition,
understand the codification challenges in this case, eight semi- multiple modes of communication, multiple CDEs, and lack of pro-
structured interviews were conducted. All eight interviewees cess change also limit the codification of information in the projects
had more than 10 years of experience in the construction sector studied. Different codification challenges observed in the cases
and had worked with different major projects in the United have been mapped in Table 2. These topics are discussed in detail
Kingdom and abroad. The interviewees’ areas of expertise cov- in this section.
ered design, planning, project engineering, digital engineering,
prefabricated construction, and information management. All
interviewees had teams working with them on their areas of spe-
Table 2. Codification challenges across the cases
cialization and also interacted with the other stakeholders in the
project. These characteristics make them ideal for case-based The The The
research. Following the semistructured approach, ensured par- Codification student metro water
ticipants would talk broadly on their experiences with informa- challenges Observations apartment project project
tion flow using digital collaboration tools. Seven out of the eight Software Interoperability X X X
interviews were recorded, and transcripts were made from the usage Information loss during X — X
recordings. In addition to the interviews, the first author con- conversions
ducted multiple visits over 2 weeks to the project office, obser- Modeling technique — — X
ving meetings and the work practice. The first author also had Information Unstructured information X X X
access to CDE and documents such as a construction program, sharing sharing
look-ahead schedule, and method statements. Drawing and file-based sharing X X X
Data analysis took place in three steps. First, each of the cases Document control bottleneck — X X
was separately analyzed. Second, the cases were compared and Lack of process change X X X
contrasted. The initial analyses were conducted during data collec- Process Loss of constraints X X X
tion, so early analyses focused and directed later data collection. information Low level of detail X X X
The within case analyses and the cross-case comparisons also led
the document control there sends it to whoever the lead is, and move back to the older ways of information sharing to expedite the
the lead then sends it on to whoever’s doing the work–and that task. When users find it difficult to utilize the CDE for structured in-
may take a week or two. And that’s completely wasted time, formation flow, they bypass the workflows to get the work expedited.
and no one in the middle of that process has done any work This leads to the loss of metadata, document trails, and information
[ : : : ] And actually, by the time it gets to the people who are dependencies because these unstructured communication chan-
reviewing the actual technical data, it may be three or four nels offer limited or no codification. In addition, the document
weeks later [ : : : ] In fact, I did it yesterday, I sent a load control workflow itself makes the process slow. Document control
of RFIs through to [designer] informally, five minutes after bottlenecks have multiple implications for data quality. Firstly, the
I’d sent it through my document control. (Principal engineer, value for the timeliness dimension is lowered as the data that is
C3I8) published might not be the latest. Hence, the inferences are based
on old data, which leads to false interpretations. Secondly, this
To bypass this obstacle, workers send the information through lowers the semantic accuracy of the data because its attributes
an unstructured channel in addition to the structured workflows. might no longer be true. This also introduces the problem of con-
This is because of poor understanding of document control work- sistency. Depending on which database employees look at, they
flows amongst the project participants regarding the requirement of see different values. For instance, one CDE to which the informa-
these structured workflows and the CDE. The slow processes and tion was packaged would have the latest value, while the one that
the need for completing the task before deadlines force employees must go through another document controller would have a differ-
not to follow document control workflows. ent value. This tampers with the idea of a “single source of truth.”
In addition, when users circumnavigate the workflow, there are
There is generally a poor understanding of document control
more data quality issues related to unstructured information shar-
requirements, certification requirements [ : : : ] we’re finding
ing such as accessibility and provenance.
that general good practice that people should have brought
with them from other projects is being conveniently put to
one side for the purposes of expediting the work that people Lack of Process Change
are being asked to do. (Information manager, C3I4) Even though structured information flow is digitized through the
introduction of a CDE, the process enabling information flow re-
In addition, the workflows in the CDE are complex and not in- mains unchanged. For example, for a piece of information to be
tuitive, making it difficult for users to follow the protocols for docu- approved, it must be printed, associated with a cover sheet, and
ment control. signed.
Just because of the way they need to store it in certain places We’re actually going to export that out of the CDE, we’re
and stuff like that, and it can’t be done : : : The way that going to print it out, we’re going to staple it together, we’re
[CDE1] is set up here, I believe it is not easy to use (Project going to put our own cover sheet on the front of it, with the
engineer, C3I1) exact same details on the back and we’re going to go off and
go and get three signatures, scan it back in, put it back into
It seems very complex, you open them up, there’s lots of [CDE1] and submit it. (Project engineer, C3I1)
things going on [ : : : ] I just want to know where I can get Printing and scanning the document results in loss of metadata.
my latest drawing (Digital engineer, C3I3) A scanned document in a PDF format has little machine-readable
This has resulted in employees bypassing the workflow, which information in it. Inferring the contents from a scanned document is
leads to system conflicts and further delays in the processes and, at also resource intensive when compared to its original form. In the
times, in the information being stored on the CDE. “I think some- process of printing and scanning, the content becomes digitally
one within the doc management system had obviously circumnav- accessible but not machine readable, thereby limiting the applica-
igated it somehow, to get the drawings out. And then when we tion of data science.
were trying to get the said revisions for our set out, the system What I’m finding now is it’s not actually speeding everything
wouldn’t allow it because directory hadn’t been properly created. up, it’s sort of making everything a lot slower; which I find
(Technical manager, C3I5).” Not following the document control very frustrating (Principal engineer, C3I8)
workflows leads to information loss in the CDE. This is a major
setback to codification because the data is stored in an unstruc- The lack of change in the processes reduces the value in the
tured way that is difficult to access. This problem was found in adoption of a CDE as it increases the time to do these tasks rather
Our client prefers [CDE1], and we have the designer who Loss of Constraint Information
stores things in [CDE2], so we have both of those tools, In construction, the constraints for any activity execution are dis-
and we have to balance between those two. That can be very cussed during team meetings because the constraints span between
confusing when we have two platforms (Technical man- different teams. For example, logistics constraints span amongst pre-
ager, C3I5). fabrication, logistics, and site teams. These discussions lead to the
removal of constraints by rearranging the start and end times for the
Downloaded from ascelibrary.org by Rice University on 04/27/20. Copyright ASCE. For personal use only; all rights reserved.
For the information submissions to the clients, CDE1 was used; activities. These are then translated to Gantt charts as an output.
for the information from the design consultants to the contractor,
CDE2 was used; and for internal file handling and sharing with the So, from an engineering perspective, we have to interpret en-
contractor, CDE3 was used. CDE1 and CDE3 came from the same gineering information, whether it be drawing or written con-
vendor. However, CDE2 was from a different vendor. straints, written narratives and interpret those into a Gantt
chart. So, we physically need that information to know what
Everything had to be taken out of one data environment and we’re building and what the constraints in building it are.
pushed into another. One of the issues with that is the consis- (Project planner, C3I2).
tency or the compliance or knowing the latest versions of in-
formation (Digital engineer, C3I6) The above statement from the project planner provides evidence
on the processes used to convert the constraints into a Gantt chart
The existence of multiple CDEs within a project introduces the for communication. However, during this process, many of the con-
problem of data inconsistencies. Documents must be taken out of straints themselves, and thus context information for rearranging
one CDE and placed in another. When the volume of information is the activities, are lost. This is because Gantt charts can hold only
huge, with each file having multiple versions, it is difficult to main- precedence constraints. Other constraints, such as disjunctive
tain consistency of documents across multiple CDEs. This means (where activities cannot overlap) and logical constraints, are not
that information in a CDE might not be accurate and up to date, embedded into the Gantt model. Instead, they are retained only
leading to incorrect interpretations when data analytics is per- as tacit information by project participants involved in team meet-
formed on it. ings. This is a case of incomplete information within the dataset
because this information is only accessible to the meeting partic-
As the contractor, then we have to deliver it to a completely
ipants. For example, one of the meetings in the water project had an
separate, disconnected CDE [ : : : ] we’re double-handling
issue with piling, where the pile-driving equipment did not have
(Digital engineer, C3I3)
access to the site for a specific date because there was another ac-
When the CDEs are disconnected, the document trail is lost tivity going on which limited the width of the site access road. At
when a document is moved from one CDE to another, leading the meeting, this was raised:
to the loss of traceability.
Access chamber works will conflict with access road for pile
Lack of process change has multiple implications on data qual-
work, piling work package has to be moved back 2 weeks.
ity as well. Printing and scanning remove metadata and data re-
(Progress review meeting, C3M3)
lationships from the files. A scanned version of the file would also
have very little machine-readable information embedded in it and Here, there is a dependency between access chamber works, and
would require resource-intensive methods to extract insights from the piling work package as the access chamber work would reduce
it. This reduces the data quality dimensions such as accessibility the road width. Therefore, the piling activity was delayed to a later
(i.e., the metadata and data relationships are removed), complete- date. The constraint was removed. However, the knowledge that
ness (information is not complete), and provenance (document there was a constraint is not recorded, and thus the presence of that
trail is lost in the process). In addition, having multiple CDEs in- constraint is not codified. This means such constraints are not ma-
troduces the data quality issue associated with provenance be- chine readable because the access to this information was limited to
cause files residing in multiple CDEs are disconnected, and the the participants of a particular meeting. If an automatic scheduler is
relationships of that particular file with another file are lost in used to reschedule these activities, this rescheduling activity would
the transition process. There are further issues with synchroniza- not have access to this information, resulting in an unrealistic
tion of the information when it is distributed in multiple CDEs. schedule.
This introduces the data quality issues associated with consis- Similar issues were observed in all cases. This issue introduces
tency, which limits the quality of findings made by inference data quality problems associated with accessibility (information is
algorithms. limited to people who attended the meeting) and data completeness
(the model does not include any constraint information; hence, it is
Construction Process Information incomplete).
This section presents the codification challenges related to con- Low Level of Detail
struction process information. The data analysis shows that product The precedence information codified into Gantt charts is linked
information is relatively well structured as BIM models, analysis with BIM to create 4D BIM simulations. However, the lack of
models, and CAD drawings. However, construction process detail in work packaging and associated information (such as
Fig. 1. Visual representation showing the effect of low level of detail in process information. Deck is completed before the completion of the
supporting pier.
constraints and resources) results in misinterpretation from the 4D understanding of codification challenges. Furthermore, building on
models. The metro project follows a 5D BIM workflow (digital and extending Batini and Scannapieco (2016), it shows how these
project management PPT—C2D2), where the schedule is linked codification challenges are then mapped to their data quality di-
to a BIM model, bill of quantities (BOQ), and an enterprise re- mensions, such as accuracy, completeness, timeliness, consistency,
source planning (ERP) system to compare the cost based on the accessibility, and data provenance.
quantities versus the cost stated in the work orders from the sub-
contractors. The progress information is also linked to this model to
ensure that the work is done before sanctioning the bills for the Software Usage
work orders. The findings on software usage show that, despite significant digi-
tization of work processes, data remains fragmented into different
Work package for three spans were linked to a work order.
domains and formats because of the multiple software tools in use
Model showed the deck for a span was completed before the
across the organizations involved in construction. Work in the con-
pier supporting it was completed because the work package
struction information technology community is pioneering new
for the first span was reported as completed. (Field notes, BIM
data management solutions to improve interoperability (Hu et al.
Consultant 2, C2I5)
2016; Pauwels et al. 2010; Pazlar and Turk 2008; Redmond et al.
The deck of the metro can be completed only when the pier 2012), and it is disappointing to find that construction projects still
supporting it is completed, as shown in Fig. 1. However, the system suffer from poor quality data as a result of problems of interoper-
recorded the deck assembly to be completed when the pier was not ability caused by the existence of multiple domain-specific tools
completed. This was because work packages for the deck and pier and modeling practices. In their work, Dossick and Neff (2010)
were different because they were done by different subcontractors, have previously shown how the organizational and cultural divi-
and the level of detail of the work package is low. A subcontractor sions between the designers and builders, contractors, and subcon-
who dealt with deck assembly had a part of the work package com- tractors stifle collaborative work. This paper shows these issues are
pleted, but the lack of detail in work packaging triggered the com- not resolved. In the projects studied, organizational and cultural
puter to record the whole work package as completed, resulting in divisions between the firms involved in the late design and construc-
the error. This is a clear case of lack of detail in the model leading to tion stages of projects cause software usage problems (interoperability,
wrong inferencing. information loss during format conversion, and multiple modeling
Similar issues were observed in the student apartment (Case 1) techniques). While there may be shared norms and tools within a
by examining the schedule data in the construction program firm for modeling information, these norms differ across the firms
(C1D1) and the water project (Case 3) by examining the construc- that modeled project information. Multiple modeling techniques
tion program update (C3D4) and observing the review meeting and data created using different software result in datasets that are
(C3M3). These issues are caused by low data quality due to incom- not interoperable and require format conversions, resulting in loss
plete information related to the completeness dimension. of information and low machine readability. The water project
(Case 3) had multiple firms working on the data over different
phases of the project, with interoperability problems more prevalent
Discussion in this case in comparison with the student apartment (Case 1), in
which a single leading firm was involved with the creation and use
This section discusses the software usage, information sharing, and of the model. Although the metro project (Case 2) had different
construction process information codification challenges that limit firms over the different phases of the project, digital data creation
the uptake of data science in construction, drawing on the evidence was handled through a single owner support organization, limiting
from the empirical study. The discussion relates the findings to the the impact of this problem. Software usage problems are found to
literature on BIM use in practice (e.g., Dossick and Neff 2010; lead to challenges of codification for data science; hence, this work
Harty and Whyte 2010) and other strands of research on data qual- extends prior insights by Dossick and Neff (2010) to show how
ity, machine readability, and BIM adoption and implementation organizational and cultural divisions between designers and build-
to articulate how these new analyses contribute by extending ers not only stifle collaborative work and joint problem-solving but
between. To achieve better-quality data in projects, practitioners et al. (2012) have further reported the lack of correlation between
must focus beyond the individual scope of their multiple firms to- the resources employed hourly and work progress. This led to the
wards the common goals of the project. decomposition of tasks into subtasks to determine causal relation-
ships between the involved variables so the whole progress could
Information Sharing be determined. This study suggests that to overcome such reported
issues, the methods for codifying construction process information
The analyses suggest that the construction sector has not yet made must be more detailed. The institutionalized practice of planning
the transition from document-based to model-based ways of organ- being limited to master planning and phase planning, without the
izing digital data. The use of drawings and file-based sharing, focus on granular planning such as look-ahead planning and
unstructured information sharing, printing and scanning of docu- weekly planning, is causing this codification challenge, with the
ments, multiple CDEs, and so forth in information sharing has a lack of semantic relationships embedded in the model limiting the
significant impact on the machine readability of data. Paper-based application of automatic schedulers. These issues suggest both a
practices are institutionalized in the sector, and while they are being change in the modeling of process information in construction, with
replaced by digital ways of working, this change is slow, with users the need to develop tools that support modeling of complex con-
of construction information still conditioned to work with drawings straint information, and also a change in the practice to codify the
and PDFs and unstructured information sharing. Even in projects process information in greater detail so that data science could be
that are championing newer BIM-based workflows using CDEs, employed to augment decision making in construction.
this work finds it is difficult to replace these practices, as evidenced
by the problems associated with information sharing (i.e., the sec-
tion “Information Sharing”). The complexity and long processing Machine Readability of Construction Datasets
times involved in these workflows force users to shift back to This section discusses the machine readability of the construction
existing practices and workarounds to expedite their work. The datasets. Common construction datasets are classified based on the
findings from this paper also support the previous characterization set of rules for creating structured data described by Berners-Lee
of users in construction combining new structured methods of (2006) in Table 4.
information sharing along with the prior practice of unstructured Most of the construction information observed from the cases
information sharing when they were hindered by bottlenecks in satisfies the requirement for a 1-star category. The observed proj-
processes, such as document control. Thus, we can characterize the ects use a CDE for storing and managing project data, resulting in
project participants use a range of new and existing practices to- indexing the data and storing it on online servers, resulting in 1-star
gether as “hybrid practices” (Harty and Whyte 2010), and their data. CDE makes the data easier to retrieve for the computers to
circumnavigation of workflows results in unstructured information make inferences on them. However, the complexity of the new
sharing [as shown in a previous discussion in Whyte et al. (2016)]. structured methods for information sharing using CDE, and the
However, this paper goes beyond such studies to characterize poor understanding of workflows across the teams, leads to the
the implications for data quality and to highlight, building on use of a combination of structured and unstructured channels for
Hartmann (2008), the potential to develop newer workflows and information sharing, as discussed in the section “Information Shar-
digitally enabled processes which address the challenges faced by ing.” This aspect reduces the machine readability of the information
practitioners. distributed over unstructured channels because the information is
extract 2D drawings from 3D BIM models, those drawings are not as correct and as detailed as they used to be” (Technical
manager, C3I5)
Lack of process change: “We’re going to print it out, we’re going to staple it together [ : : : ] get three signatures, scan it back in,
put it back into [CDE1] and submit it.”(Project engineer, C3I1) “it’s not actually speeding everything up, it’s sort of making
everything a lot slower; which I find very frustrating” (Principal engineer, C3I8) “can be very confusing when we have two
platforms” (Technical manager, C3I5) “Everything had to be taken out of one data environment and pushed into another. One
of the issues with that is the consistency or the compliance or knowing the latest versions of information” (Digital engineer,
C3I6) “As the contractor, then we have to deliver it to a completely separate, disconnected CDE [ : : : ] we’re double-handling”
(Digital engineer, C3I3)
Loss of constraint information: “we physically need that information to know what we’re building and what the constraints in
building it are.” (Project planner, C3I2). “Access chamber works will conflict with access road for pile work, piling work
package has to be moved back 2 weeks.” (Progress review meeting, C3M3); Low level of detail: “Work package for three
spans were linked to a work order. Model showed the deck for a span was completed before the pier supporting it was
completed because the work package for the first span was reported as completed.” (Field notes- BIM Consultant 2, C2I5)
Timeliness Unstructured information sharing: “when I have finished everything-by the way, we have this spread sheet” (Technical
manager, C3I5); Document control bottlenecks: “it’s no longer the most current version anymore by the time I’m reviewing it”
(Project engineer, C3I1) “he keeps on updating but he hasn’t he hasn’t put it on the [CDE1].” (Technical manager, C3I5) “I just
want to know where I can get my latest drawing” (Digital engineer, C3I3) “I think someone within the doc management
system had obviously circumnavigated it somehow, to get the drawings out. And then when we were trying to get the said
revisions for our set out, the system wouldn’t allow it because directory hadn’t been properly created.” (Technical manager,
C3I5)
Lack of process change: “We’re going to print it out, we’re going to staple it together [ : : : ] get three signatures, scan it back in,
put it back into [CDE1] and submit it.”(Project engineer, C3I1) “it’s not actually speeding everything up, it’s sort of making
everything a lot slower; which I find very frustrating” (Principal engineer, C3I8) “can be very confusing when we have two
platforms” (Technical manager, C3I5) “Everything had to be taken out of one data environment and pushed into another. One
of the issues with that is the consistency or the compliance or knowing the latest versions of information” (Digital engineer,
C3I6) “As the contractor, then we have to deliver it to a completely separate, disconnected CDE [ : : : ] we’re double-handling”
(Digital engineer, C3I3)
Consistency Document control bottlenecks: “it’s no longer the most current version anymore by the time I’m reviewing it” (Project
engineer, C3I1) “he keeps on updating but he hasn’t he hasn’t put it on the [CDE1].” (Technical manager, C3I5) “I just want to
know where I can get my latest drawing” (Digital engineer, C3I3) “I think someone within the doc management system had
obviously circumnavigated it somehow, to get the drawings out. And then when we were trying to get the said revisions for our
set out, the system wouldn’t allow it because directory hadn’t been properly created.” (Technical manager, C3I5)
Accessibility Interoperability: “transferring things [ : : : ], you lose data” (Technical manager, C3I5); Unstructured information sharing:
“when I have finished everything-by the way, we have this spread sheet” (Technical manager, C3I5);
Drawings and file-based sharing: “I use all the navigator tools that we’ve got here. But I prefer to use AutoCAD because I find
it a lot easier” (Project engineer, C3I1) “It’s not the best way because we haven’t got the technology. I haven’t got a big screen”
(Project engineer, C3I1)
Lack of process change: “We’re going to print it out, we’re going to staple it together [ : : : ] get three signatures, scan it back in,
put it back into [CDE1] and submit it.” (Project engineer, C3I1) “it’s not actually speeding everything up, it’s sort of making
everything a lot slower; which I find very frustrating” (Principal engineer, C3I8) “can be very confusing when we have two
platforms” (Technical manager, C3I5) “Everything had to be taken out of one data environment and pushed into another. One
of the issues with that is the consistency or the compliance or knowing the latest versions of information” (Digital engineer,
C3I6) “As the contractor, then we have to deliver it to a completely separate, disconnected CDE [ : : : ] we’re double-handling”
(Digital engineer, C3I3)
Loss of constraint information: “we physically need that information to know what we’re building and what the constraints in
building it are.” (Project planner, C3I2). “Access chamber works will conflict with access road for pile work, piling work
package has to be moved back 2 weeks.” (Progress review meeting, C3M3)
Low level of detail: “Work package for three spans were linked to a work order. Model showed the deck for a span was
completed before the pier supporting it was completed because the work package for the first span was reported as completed.”
(Field notes- BIM Consultant 2, C2I5)
not indexed nor available on a common server. The same issue oc- to another. Storing the information in proprietary formats also re-
curs when the users circumnavigate the workflows to get the work duces the accessibility dimension for data quality (Table 5).
expedited. Similarly, codification challenges associated with con-
struction process information also lower the machine readability
because the information is not recorded (lack of detail and loss of Implications for Data Quality
construction information) and, hence, not indexed or stored in on- The codification challenges discussed earlier have many significant
line servers. These issues make the information inaccessible for implications for data quality (refer to Table 5). To unpack these in
inferencing. this section, they are mapped onto the different quality dimensions.
With regard to the structure of the construction information, Accuracy: The organizational and cultural divisions between
construction datasets in the form of BIM, project management in- different teams results in problems associated with multiple mod-
formation in project management software (Primavera P6, Asta eling techniques, leading to data quality issues concerning the syn-
powerproject, etc.), outputs from Microsoft tools such as Excel, tactic accuracy of the data. This was evident from the dataset when
and so forth, are structured, satisfying the requirements for 2-star different people had different perceptions of the model, as in the
data. However, construction data are also unstructured in the forms case of the preceding slab example. Similarly, the hybrid practices
of PDF documents, drawings, and other file-based formats, as de- associated with information sharing lead to lowering the semantic
scribed in the sections “Drawings and File-Based Sharing” and accuracy of the data because the data with which inferences are
“Lack of Process Change.” The lack of structure in the datasets made are not accurate due to inefficiencies in information sharing.
makes inferencing from them difficult, leading to the need for com- When there are syntactic errors in the data, this leads to incorrect
plex algorithms. Where the construction data is structured, the data
insights (for example, if a slab is modeled as a geometric object
structure is in proprietary formats, which require the APIs to access
with attributes attached to it and when a software tool is used to
the semantic relationships within the data. The observed projects
compute quantity take-off from the model for all the slabs). The
do not use open formats or open standards for publishing the data.
output would be zero as quantity required because the program fails
Proprietary tools for the authorship of construction data are far
to identify the geometric object as a slab. If this software is inte-
more advanced and easier to use than the open-source tools. Hence,
grated with a costing tool used for cash flow analytics, this error
construction projects resort to using the tested and robust propri-
etary tools, resulting in issues associated with interoperability and gets propagated into that tool. These errors can be removed to an
loss of information presented in the section “Software Usage.” extent using semantic enrichment programs. However, even the ac-
Thus, the construction information rarely achieves 3-star classifi- curacy of inferences of semantic enrichment programs is dependent
cation, as mentioned by Berners-Lee (2006). In conclusion, the on the quality of input datasets (Sacks et al. 2017).
maximum level of machine readability of the construction datasets Completeness: Concerning the completeness of the data, organi-
in the observed projects is 2-star, with most of the information with zational and cultural divisions between the teams resulting in prob-
a 1-star rating. lems such as interoperability, format conversions, multiple modeling
Low machine readability of the data has implications on data techniques, and the implication of hybrid practices such as printing
quality. When the construction information is not stored on servers and scanning the documents play a role in reducing the data quality.
due to information sharing issues or lack of detail in the models, the For example, the software usage issues caused by the organiza-
accessibility of that data is affected, thus reducing the accessibility tional divisions leads to format conversion resulting data loss lead-
dimension of the data quality. When information is stored in CDE ing incomplete dataset. Lack of process change also leads to similar
(satisfying conditions for 1-star data) as PDF documents, the struc- problems such as loss of metadata when documents are printed and
ture of the data is not maintained, resulting in data quality issues scanned. The institutionalized practices of process modeling with
associated with syntactic accuracy and consistency. Lack of a data low levels of detail and the practice of not codifying constraints in
structure removes the context from the information, thus resulting the model result in incomplete data. Inferring insights from incom-
in the need for complex algorithms to infer the contexts and infer plete datasets reduces the quality of the output. For example, if the
from data. This issue also introduces problems associated with con- constraints are not codified in a schedule, an automatic scheduler
sistency as the data value for a field might be different in different would create an unrealistic schedule. This leads to further problems
files, and the lack of context limits the computer programs to detect down the line. In the case of a product model, an incomplete dataset
it. This problem is further worsened because the datasets are not used for a structural capacity prediction would give incorrect
linked since the links are lost when a file is moved from one CDE results.
Sources Source ID Details Time in the field; documentation/research records; and/or dates
Case 1: Multistory residential student apartment block
Document C1D1 Construction program 3 years of planning data
C1D2 BIM model 1.5 gigabytes
C1D3 BIM training document 5 pages
C1D4 Access to common data 2 weeks of access
environment
Interviews C1I1 Digital engineer (August 15, 2017) 2 h, Field note (3 pages)
C1I2 Planner (August 15, 2017) 1 h, Field note (2 pages)
Office visit C1S1 Contractor’s office Multiple visits over 2 weeks, Field notes (7 pages) August 14,
2017–August 18, 2017
Case 2: Metro rail project
Documents C2D1 BIM execution plan 17 Pages
C2D2 Digital project management (PPT) 26 slides
C2D3 Employee information 32 pages
requirements
Informal interviews C2I1 Project manager October 1, 2018, 30 min, Field notes (4 pages)
C2I2 Chief site engineer October 1, 2018, 30 min, Field notes (3 pages)
C2I3 Casting yard engineer October 1, 2018, 20 min, Field notes (3 pages)
C2I4 BIM consultant 1 November 1, 2018, 20 min, Field notes (2 pages)
C2I5 BIM consultant 2 November 1, 2018, 20 min, Field notes (2 pages)
C2I6 BIM consultant 3 November 1, 2018, 20 min, Field notes (2 pages)
Semistructured interviews C3I1 Project engineer July 9, 2018 1 h, taped and transcribed (T&T) (10 pages)
C3I2 Project planner lead July 10, 2018, 1 h, T&T (10 pages)
C3I3 Digital engineer lead July 11, 2018, 1 h, T&T (11 pages)
C3I4 Information manager July 11, 2018, 1 h, T&T (15 pages)
C3I5 Technical manager/DfMA design July 12, 2018, 1 h, T&T (15 pages)
lead
C3I6 Senior digital engineer July 12, 2018, 1 h, T&T (8 pages)
C3I7 Design manager July 13, 2018, 1 h, Fieldnotes (4 pages)
C3I8 Principal engineer July 17, 2018, 1 h, T&T (12 pages)
Office visit C3S1 Contractor’s office July 5, 2018–July 19, 2018, multiple visits over 2 weeks, Fieldnotes
(5 pages)
Meetings C3M2 Temporary works design meeting July 13, 2018, 2 h, Fieldnotes (3 pages)
C3M3 Progress review meeting July 16, 2018, 4 h, Fieldnotes (4 pages)
Data Availability Statement Berners-Lee, T. 2006. “Linked data—Design issues.” Accessed March 19,
2019. https://www.w3.org/DesignIssues/LinkedData.html.
Bilal, M., L. O. Oyedele, J. Qadir, K. Munir, S. O. Ajayi, O. O. Akinade, H. A.
Data generated or analyzed during the study are available from the
Owolabi, H. A. Alaka, and M. Pasha. 2016. “Big data in the construction
corresponding author by request. Information about the Journal’s
industry: A review of present status, opportunities, and future trends.” Adv.
data-sharing policy can be found here: http://ascelibrary.org/doi/10 Eng. Inf. 30 (3): 500–521. https://doi.org/10.1016/j.aei.2016.07.001.
.1061/(ASCE)CO.1943-7862.0001263. Bolton, A., et al. 2018. The Gemini principles: Guiding values for
the national digital twin and information management framework.
Cambridge, UK: Centre for Digital Built Britain and Digital Frame-
Acknowledgments work Task Group.
BSI (British Standards Institution). 2018. “ISO19650—1 & 2: Organization
The authors are grateful to the research participants in the three and digitization of information about buildings and civil engineering
case studies. The Ph.D. research of the first author is cofunded by works, including building information modelling (BIM). Information man-
Bentley Systems UK and through a Skempton Scholarship from the agement using building information modelling.” Accessed March 19,
Department of Civil and Environmental Engineering, Imperial Col- 2019. https://bsol.bsigroup.com/Bibliographic/BibliographicInfoData
lege, London. During the development of this paper, this author /000000000030333754.
was supported by the Ph.D. enrichment scholarship from the Alan Cai, L., and Y. Zhu. 2015. “The challenges of data quality and data quality
assessment in the big data era.” Data Sci. J. 14 (2): 1–10. https://doi.org
Turing Institute, the United Kingdom’s National Institute for Data
/10.5334/dsj-2015-002.
Science and AI. The second author acknowledges the support of
Cao, D., H. Li, and G. Wang. 2014. “Impacts of isomorphic pressures on
Laing O’Rourke and the Royal Academy of Engineering for co- BIM adoption in construction projects.” J. Constr. Eng. Manage.
sponsoring her Professorship; and Lloyds Register Foundation/ 140 (12): 04014056. https://doi.org/10.1061/(ASCE)CO.1943-7862
ATI Data Centric Engineering Programme. .0000903.
Carrillo, P., J. Harding, and A. Choudhary. 2011. “Knowledge discovery
from post-project reviews.” Constr. Manage. Econ. 29 (7): 713–723.
References https://doi.org/10.1080/01446193.2011.588953.
Chang, C.-Y., W. Pan, and R. Howard. 2017. “Impact of building informa-
Akintola, A., S. Venkatachalam, and D. Root. 2017. “New BIM roles’ le- tion modeling implementation on the acceptance of integrated delivery
gitimacy and changing power dynamics on BIM-enabled projects.” systems: Structural equation modeling analysis.” J. Constr. Eng. Man-
J. Constr. Eng. Manage. 143 (9): 04017066. https://doi.org/10.1061 age. 143 (8): 04017044. https://doi.org/10.1061/(ASCE)CO.1943-7862
/(ASCE)CO.1943-7862.0001366. .0001335.
Batini, C., and M. Scannapieco. 2016. Data and information quality. Chegu Badrinath, A., and S.-H. Hsieh. 2019. “Empirical approach to identify
Cham, Switzerland: Springer. operational critical success factors for BIM projects.” J. Constr. Eng.
Beetz, J., V. L. L. Berlo, D. R. Laat, and V. D. P. Helm. 2010. Manage. 145 (3): 04018140. https://doi.org/10.1061/(ASCE)CO.1943
“BIMSERVER.ORG—An open source IFC model server.” In Proc., -7862.0001607.
27th Int. Conf. CIB W78 2010. Cairo, Egypt: International council Choi, B., H.-S. Lee, M. Park, Y. K. Cho, and H. Kim. 2014. “Framework
of Research and Innovation in Building and Construction. for work-space planning using four-dimensional BIM in construction
Fan, W. 2015. “Data quality: From theory to practice.” ACM SIGMOD Rec. data exchange integrity of building information models.” Autom.
44 (3): 7–18. https://doi.org/10.1145/2854006.2854008. Constr. 85 (Jan): 249–262. https://doi.org/10.1016/j.autcon.2017
Farias, T. M. D., A. Roxin, and C. Nicolle. 2018. “A rule-based methodology .08.010.
to extract building model views.” Autom. Constr. 92 (Aug): 214–229. Loshin, D. 2010. The practitioner’s guide to data quality improvement.
https://doi.org/10.1016/j.autcon.2018.03.035. Burlington, MA: Morgan Kaufmann.
Giretti, A., A. Carbonari, G. Novembri, and F. Robuffo. 2012. “Estimation Mirarchi, C., and A. Pavan. 2019. “Building information models are dirty.”
of job-site work progress through on-site monitoring.” In Proc., 9th In Proc., 2019 European Conf. of Computing in Construction (2019
Int. Symp. of Automation and Robotics in Construction, ISARC EC3 ). Crete, Greece: European Council for Computing in Construction.
2012. Eindhoven, Netherlands: International Association for Automa- Naumann, F., and C. Rolker. 2000. “Assessment methods for information
tion and Robotics in Construction. quality criteria.” In Proc., Fifth Conf. on Information Quality (IQ 2000).
Goedert, J. D., and P. Meadati. 2008. “Integrating construction process Cambridge, MA: MIT Sloan School of Management.
documentation into building information modeling.” J. Constr. Eng. Oti, A. H., J. H. M. Tah, and F. H. Abanda. 2018. “Integration of lessons
Manage. 134 (7): 509–516. https://doi.org/10.1061/(ASCE)0733-9364 learned knowledge in building information modeling.” J. Constr. Eng.
(2008)134:7(509). Manage. 144 (9): 04018081. https://doi.org/10.1061/(ASCE)CO.1943
Gu, N., and K. London. 2010. “Understanding and facilitating BIM adop- -7862.0001537.
tion in the AEC industry.” Autom. Constr. 19 (8): 988–999. https://doi Pauwels, P., R. De Meyer, J. Van Campenhout, R. D. Meyer, and J. V.
.org/10.1016/j.autcon.2010.09.002. Campenhout. 2010. “Interoperability for the design and construction
Han, K. K., and M. Golparvar-Fard. 2017. “Potential of big visual data and industry through semantic web technology.” Lect. Notes Comput.
building information modeling for construction performance analytics: Sci. 6725 (1): 143–158.
An exploratory study.” Autom. Constr. 73 (Jan): 184–198. https://doi Pauwels, P., S. Törmä, J. Beetz, M. Weise, and T. Liebich. 2015. “Linked
.org/10.1016/j.autcon.2016.11.004. data in architecture and construction.” Autom. Constr. 57 (Sep): 175–177.
Hartmann, T. 2008. “A grassroots model of decision support system impli- https://doi.org/10.1016/j.autcon.2015.06.007.
cations by construction project teams.” Ph.D. dissertation, Dept. of Civil Pazlar, T., and Ž. Turk. 2008. “Interoperability in practice: Geometric
and Environmental Engineering, Stanford Univ. data exchange using the IFC standard.” J. Inf. Technol. Constr. 13 (1):
Harty, C., and J. Whyte. 2010. “Emerging hybrid practices in construction 362–380.
design work: Role of mixed media.” J. Constr. Eng. Manage. 136 (4): Pedro, A., D. Y. Lee, R. Hussain, and C. S. Park. 2017. “Linked data
468–476. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000146. system for sharing construction safety information.” In Proc., 34th
Hendler, J., and T. A. Pardo. 2012. “A primer on machine readability for Int. Symp. on Automation and Robotics in Construction (ISARC
online documents and data.” Accessed October 10, 2019. https://www 2017). Taipei, Taiwan: The International Association for Automation
.data.gov/developers/blog/primer-machine-readability-online-documents and Robotics in Construction.
-and-data. Preidel, C., A. Borrmann, C. Oberender, and M. Tretheway. 2016. “Seam-
Hu, Z.-Z., X.-Y. Zhang, H.-W. Wang, and M. Kassem. 2016. “Improving less integration of common data environment access into BIM author-
interoperability between architectural and structural design models: ing applications: The BIM integration framework." In Proc., 11th
An industry foundation classes-based approach with web-based tools.” European Conf. on Product and Process Modelling (ECPPM 2016),
Autom. Constr. 66 (Jun): 29–42. https://doi.org/10.1016/j.autcon.2016 119. Limassol, Cyprus: European Association of Product and Process
.02.001. Modeling.
Hwang, B.-G., X. Zhao, and K. W. Yang. 2019. “Effect of BIM on rework Quintero, D., W. Genovese, K. Kim, M. Li, F. Martins, A. Nainwal, D.
in construction projects in Singapore: Status Quo, magnitude, impact, Smolej, M. Tabinowski, and A. Tiwary. 2015. IBM software defined
and strategies.” J. Constr. Eng. Manage. 145 (2): 04018125. https://doi environment. IBM Redbooks.
.org/10.1061/(ASCE)CO.1943-7862.0001600. Redmond, A., A. Hore, M. Alshawi, and R. West. 2012. “Exploring
ISO. 2019. “ISO/DIS 21597 Information container for data drop— how information exchanges can be enhanced through cloud BIM.”
Exchange specification.” Accessed March 19, 2019. https://www.iso Autom. Constr. 24 (Jul): 175–183. https://doi.org/10.1016/j.autcon
.org/standard/74389.html. .2012.02.003.
Janssen, M., H. van der Voort, and A. Wahyudi. 2017. “Factors influencing Sacks, R., C. Eastman, G. Lee, and P. Teicholz. 2018. BIM handbook:
big data decision-making quality.” J. Bus. Res. 70 (1): 338–345. https:// A guide to building information modeling for owners, designers, engi-
doi.org/10.1016/j.jbusres.2016.08.007. neers, contractors, and facility managers. Hoboken, NJ: Wiley.
Jaradat, S., J. Whyte, and R. Luck. 2013. “Professionalism in digitally Sacks, R., L. Ma, R. Yosef, A. Borrmann, S. Daum, and U. Kattel. 2017.
mediated project work.” Build. Res. Inf. 41 (1): 51–59. https://doi “Semantic enrichment for building information modeling: Procedure
.org/10.1080/09613218.2013.743398. for compiling inference rules and operators for complex geometry.”
Jayawardene, V., S. Sadiq, and M. Indulska. 2015. An analysis of data qual- J. Comput. Civ. Eng. 31 (6): 04017062. https://doi.org/10.1061/(ASCE)
ity dimensions. ITEE Technical Report. St Lucia, Australia: Univ. of CP.1943-5487.0000705.
Queensland. Sebastian, R. 2011. “Changing roles of the clients, architects and contrac-
Jordani, D. A. 2010. “BIM and FM: The portal to lifecycle facility man- tors through BIM.” Eng. Constr. Archit. Manage. 18 (2): 176–187.
agement.” J. Build. Inf. Model. 13: 16. https://doi.org/10.1108/09699981111111148.