Estimating Project S-Curves Using Polynomial Function and Neural Networks Chao, Chien
Estimating Project S-Curves Using Polynomial Function and Neural Networks Chao, Chien
Neural Networks
Li-Chung Chao1 and Ching-Fa Chien2
Abstract: The S-curve is a graphical representation of a construction project’s cumulative progress from start to finish. While S-curves
Downloaded from ascelibrary.org by UNIVERSITY OF VIRGINIA on 11/04/13. Copyright ASCE. For personal use only; all rights reserved.
for project control during construction should be estimated analytically based on a schedule of activity times, empirical estimation
methods using various mathematical S-curve formulas have been developed for initial planning at predesign stages, with the mean for past
similar projects often used as the basis of prediction. In an attempt to make an improvement, a succinct cubic polynomial function for
generalizing S-curves is proposed and a comparison with existing formulas shows its advantages of accuracy and simplicity. Based on an
analysis of the attributes and actual progress of 101 projects, four factors, i.e., contract amount, duration, type of work, and location, are
then used as the inputs of a model developed for estimating S-curves as represented by the polynomial parameters. For model develop-
ment, it is proposed to use neural networks for their ability to perform complex nonlinear mapping. The neural network model is
compared with statistical models with respect to modeling and testing accuracy. The results show that the presented methodology can
achieve error reduction consistently, thereby being potentially useful for owners and contractors in early financial planning and checking
schedule-based estimates.
DOI: 10.1061/共ASCE兲0733-9364共2009兲135:3共169兲
CE Database subject headings: Construction management; Neural networks; Curve fitting; Polynomials; Estimates.
Introduction cumulative progress are not smooth and often are highly uneven;
Fig. 1 shows an example in which time is standardized 共divided兲
The S-curve is a graphical representation of the cumulative by project duration to the range of 0–1.
progress of a construction project from start to finish, with the Estimated S-curves are widely used by owners and contractors
horizontal scale showing time and the vertical scale showing cu- for project planning and control. During the preconstruction
mulative project progress in dollars or in percent complete. For a phases, an estimated S-curve is used as the basis for forecasting
project of n activities, the cumulative progress at time point t, Pt, cash flows in making financial arrangements. During the con-
is defined as struction stage, an S-curve agreed in the contract is used as the
n target against which the actual progress of the project at any point
Pt = 兺
i=1
wi · pti 共1兲
can be evaluated to establish whether it is overall behind schedule
and to assess the amount of delay. Since construction contracts
often contain clauses stipulating that a delay should not exceed a
where wi = percent weight of activity i in the project; pti certain percentage, the S-curve estimate will affect the determi-
= percent complete of activity i at t. nation of violation of a contract and the resulting penalty. In re-
The shape of the S-curve, normally with a smaller slope at the cent years, some schools of thought such as lean construction
beginning and near the end and a larger slope in the middle, have questioned the practice of using the S-curve to establish the
indicates that progress is slow in mobilization and demobilization progress target for construction control, citing the possibility that
periods but faster when the bulk of the work takes place. Al- contractors under the threat of penalty will speed up nonurgent
though the shape generally applies to any project consisting of activities so as to offset delays of earned value in critical activi-
activities whose times overlap to some extent, each individual ties, to undesirable effects 共Kim and Ballard 2000兲. However, the
project being a unique undertaking will have an S-curve with S-curve, being able to tell overall project progress in single num-
differing geometric properties, such as the relative length and bers, has the inherent advantage of simplicity and remains handy
slope of each section. In addition, curves constructed from actual for many project cases in construction. On balance, use of the
S-curve for financial planning is unquestionable, but, admittedly,
1
Associate Professor, Dept. of Construction Engineering, National unqualified use of it as the chief control during construction may
Kaohsiung First Univ. of Science and Technology, Kaohsiung 824, Tai- give cause for concern due to oversimplified information, espe-
wan, ROC. E-mail: [email protected] cially for large projects where additional means such as mile-
2
Ph.D. Student, Institute of Engineering Science and Technology, stones should be used.
National Kaohsiung First Univ. of Science and Technology, Kaohsiung Different methods are used at different stages of a project to
824, Taiwan, ROC. E-mail: [email protected]
obtain reasonable estimates of project progress. When design and
Note. Discussion open until August 1, 2009. Separate discussions
must be submitted for individual papers. The manuscript for this paper detailed project information is available, the normal and accepted
was submitted for review and possible publication on September 18, approach to estimating an S-curve is analytical, i.e., based on a
2007; approved on September 11, 2008. This paper is part of the Journal schedule of planned activity times and progress calculation using
of Construction Engineering and Management, Vol. 135, No. 3, March Eq. 共1兲. However, S-curve estimation at predesign stages with
1, 2009. ©ASCE, ISSN 0733-9364/2009/3-169–177/$25.00. only sketchy project definition has to use historical-data-based
Fig. 1. Example of actual cumulative progress versus fitted curve Fig. 2. Example of envelope curves built as 90% confidence interval
of mean progress
empirical methods, the development of which has attracted much
research interest. Over the years, many alternative ways of deter-
mination of S-curves have been studied, along with various math- 90 percentiles of progress for the sample. An example built from
ematical formulas for generalizing cumulative project progress as real projects is shown in Fig. 2, in which there are limits ⬍0 or
a function of time 共Skitmore 1992; Navon 1996兲. Since such ⬎1 as a result of the broad confidence interval. Methods of this
methods give estimates of progress that are not produced from a kind do not use a generalized mathematical form to represent
schedule according to project-specific information, their results S-curves, nor do they consider the factors that may influence
are intended mainly for initial preparation of financing, not for project progress.
control purposes during construction. However, because a con-
struction project is subject to the influence of many factors and a Mathematical Formulas
schedule is often not very certain, it is a prudent practice to com-
pare a schedule-based S-curve estimate with one obtained accord- Peer 共1982兲 proposed five S-curve formulas for building construc-
ing to historical realities and an empirical method will also serve tion projects in which percent progress is made a function of
for the checking purpose. percent time with all parameters predetermined. However, the
In light of the backdrop mentioned above, the objective of the more common form adopted in other researches is a progress-
research presented herein is to develop an improved approach versus-time relation in which a few parameters are left to be
based on a proposed polynomial function and neural networks, determined for an individual project by using some curve-fitting
whose application is in line with an empirical method’s. In the methods. Navon 共1996兲 gave a summary of the formulas pro-
following, existing methods for the early estimation of project posed by a number of researchers, some of which fail to meet the
progress as well as existing S-curve formulas are reviewed first. boundary conditions of 0% progress at 0% time and 100%
The proposed function and the solution to its parameters are pre- progress at 100% time. Skitmore 共1992兲 fitted each of four two-
sented next, along with a comparison of closeness of fit with parameter formulas to 27 case projects and evaluated their close-
existing functions. Based on collected cases of completed con- ness of fit. Since the proposed formula in the present research will
struction projects, a neural network model for estimating S-curves be compared with these four formulas later, they are briefly re-
as represented by the polynomial function is then proposed, viewed next in Eqs. 共2兲–共7兲, where y and x denotes standardized
whose accuracy is compared with that of statistical models. Rel- progress and standardized time, respectively, and a , b are the pa-
evance to industry practitioners is discussed before conclusions rameters to be determined.
are drawn at the end. 1. The Department of Health and Social Security 共DHSS兲 for-
mula, which was developed for hospital projects, has the
form of Eq. 共2兲
Review of Existing Methods and Formulas
y = x + ax2 − ax − 共6x3 − 9x2 + 3x兲/b 共2兲
squared error method. The error for time xt, et, is defined as the a basis for model performance evaluation later: mean square error
Downloaded from ascelibrary.org by UNIVERSITY OF VIRGINIA on 11/04/13. Copyright ASCE. For personal use only; all rights reserved.
difference between the actual progress y t and the calculated 共MSE兲 and root-mean-square error 共RMSE兲, as defined next
progress from Eq. 共8兲 as
兺dj=1共calculated j − actual j兲2
et = y t − ax3t − bx2t − 共1 − a − b兲xt 共9兲 MSE = 共20兲
d
The sum of squared errors for all xt in the set then is
collected. Of this data, nine projects were discarded because of a par, considering the errors of the two solving methods for each
their unusual delays, resulting in a usable set of 101 projects. The formula together. Therefore, although mathematical proof is not
projects are spread all over Taiwan covering a variety of work, possible and more tests are required to judge conclusively, for the
such as roads, bridges, and service areas, and vary greatly in two sets of cases in Skitmore 共1992兲 and herein, the proposed
contract amount and in duration. As is the common practice of S-curve formula 关Eq. 共8兲兴 with the advantages of a simpler form
valuation of work done, progress measurement for all the projects and the accompanying convenience in use, is shown to be at least
took place monthly. For each project, the progress versus time as good as Eq. 共3兲.
data was first standardized by its contract amount and project
duration, with the number of pairs of 共x , y兲 obtained equaling the
number of measurements or months. a and b of Eq. 共8兲 were then Description of Model
solved using Eqs. 共13兲–共19兲 and a fitted S-curve was obtained.
See Table 2 for statistics of contract amount, project duration, and The idea of the model presented herein is to represent the S-curve
values of a and b for the 101 projects 共note: NT$ 1 ⬃ US$ 0.03兲. by parameters a, b of polynomial Eq. 共8兲 and use neural networks
An analysis of one-to-one correlation among the quantifiable to acquire the ability to predict a, b from actual progress data,
project attributes, i.e., contract amount and duration, and param- with the aim of producing a better early S-curve estimate for a
eters a, b of Eq. 共8兲 for the 101 projects was performed. The few given project conditions. The attainment of the goal will be
coefficients of correlation 共COE兲 obtained are shown in Table 3. evaluated by comparing the accuracy of the model with that of the
The strong and positive correlation between contract amount and multiple regression and average curve methods.
project duration appears reasonable. The weak but noticeable cor-
relations between contract amount and a 共or b兲 and between du-
Input Factors
ration and a 共or b兲 may be attributed to effects of these attributes
on project progress reflected by the geometric properties of the The project data was filtered to set up input factors for a model
S-curve. Interestingly enough, a and b are closely and negatively for estimating S-curves. The two quantifiable attributes, contract
correlated 共COE= −0.9643兲. The above would have implications amount and duration, together measure a project’s relative inten-
for development of a model for estimating S-curves with the pro- sity that affects arrangement of activities, i.e., more work is done
posed formula. concurrently or sequentially, so they were selected. Categorical
The accuracy of Eq. 共8兲 in fitting to the 101 projects was attributes comprising type of work, address of project, and iden-
compared with that of Eq. 共3兲, which was solved using two meth- tity of contractor, were considered next. Type of work affects
ods: regression with truncated data in Kenley and Wilson 共1986兲 number of trades, lead time, and site character, so it has a bearing
and manual parameter adjustment after regression suggested by on distribution of work and project progress. Project location in
Evans and Kaka 共1998兲. For solving Eq. 共8兲, in addition to using Taiwan, an island with significant regional differences in rainy
seasons and terrain, also has an effect. Although contractor per-
formance certainly influences progress, the limited information
Table 3. Coefficients of Correlation among Contract Amount, Duration, available does not allow a separate indicator for it to be set up.
Parameter a, and Parameter b for Collected 10 Projects Therefore, only four factors were used as inputs in model devel-
Contract Project Parameter Parameter opment: contract amount, duration, type of work, and location.
amount duration a b While contract amount and duration being real numbers can be
used as they are, type of work and location being categorical
Contract amount 1
variables require a classification scheme in a model. With respect
Project duration 0.6317 1
to type of work, a project is classified into one of three groups:
Parameter a 0.0566 0.1860 1
bridges/elevated roads, embankment roads, and service areas/toll
Parameter b −0.0805 −0.229 −0.9643 1
Table 4. Comparison between Eqs. 共3兲 and 共8兲 in Accuracy of Fitting to Collected 101 Projects
S-curve Parameters-solving Mean Maximum Mean Maximum
formula method MSE MSE RMSE RMSE
Eq. 共3兲 Regression/truncated data 0.000698 0.005399 0.0242 0.0735
Eq. 共3兲 Ditto+ manual adjustment 0.000586 0.003446 0.0224 0.0587
Eq. 共8兲 Optimization by Eqs. 共13兲–共19兲 0.000654 0.002869 0.0236 0.0536
Eq. 共8兲 Ditto+ rectification 0.000625 0.002869 0.0230 0.0536
the networks, but they perform better than before; their modeling The three models’ mean and maximum RMSE for each of the
and testing RMSE are shown in Table 7. Two issues are noted. modeling and testing samples are shown in Table 7. For all the
First, the testing error is sensitive to the weights at the start of samples, the neural network model consistently outperforms the
training and a network was retrained a few times with different regression and average curve models in terms of mean RMSE,
initial weights to get the lowest error, which is somewhat biased. although for a few samples it has a slightly higher maximum
However, after all, as model performance is judged by the aver- RMSE. The neural network model’s overall averages of 5.52 and
age error of the six networks for the six random samples, the 12.79%, for mean and maximum RMSE, respectively, are both
results of such retraining can be interpreted as the best achievable. lower than those of the other two models, and, hence, can repre-
Second, for some projects, a slight change in the values of a, b sent an improvement in modeling and prediction accuracy. The
will lead to a large total error that is disproportionate to the neural networks’ edge over their regression and average curve
change. For the total error to be acceptable, the mapping error counterparts can be attributed to their being adaptive to the data in
should be sufficiently low, but it is not necessarily the least-mean dealing with the demanding mapping of function parameters.
squared error for a and b, since in any case a network trained to
the point of lowest total error should be adopted.
Effects of Changes in Inputs on Estimated S-Curves
To examine further how the trained neural networks work, a sen-
Discussions sitivity analysis is performed using a hypothetical but representa-
tive case: a middle-size project with a contract amount of
Comparison with Multiple Regression and Average
Curve Models
Instead of the existing standard S-curve model based on some
methods for grouping projects, multiple regression is the bench-
mark for the neural network model, as it represents a more gen-
eral averaging technique, while the average curve for all projects
is used for another comparison. Corresponding to each neural
network above, two multiple regression equations, one with a as
the independent variable and the other with b, were built from the
same data of 90 projects as for training, except that binary repre-
sentation was adopted for the categorical variables. The resulting
R2 ranges between 0.20 and 0.26. As before, the regression equa-
tions were then used to estimate a and b for 11 testing projects
and the total errors in RMSE for all modeling and testing projects
were obtained. Likewise, an average S-curve based on the mean a
and b for the same 90 projects was formed as the third model and
used for obtaining the RMSE for each project. As an illustration,
for a small project in the first testing sample, the fitted S-curve
versus the estimated S-curves from the neural network, multiple Fig. 5. Fitted S-curve versus estimated S-curves from neural net-
regression, and average curve models are shown in Fig. 5, along work, multiple regression, and average curve models for a project of
with their total errors. NT$60 million, 20 months, type of work 3, and location 2