Parametric Cost Estimation Model
Parametric Cost Estimation Model
The parametric cost estimation model is a linear regression function. It is a mathematical function that
employs a defined set of variables related to the features of a proposed highway project. The variables
are divided into two categories: objective and subjective. Objective variables included in the model are
length, number of lanes, earthwork volume, number of intersections, number of grade-separated
interchanges, and construction cost index. The subjective variables used in the model include functional
classification, location, work type, and incorporation of technology. For someone to make quality
estimates based on these variables, three conditions must be met to create a reliable cost estimation
function. The first condition focuses on the selection of the variables. Each project will need to be
reviewed and adjusted to properly calculate pre-design costs. The second condition relies on the
collection and retention of historical data, along with its accessibility. For this method to work properly,
there must be past data to construct the pre-design cost function. This function seems, however, to be
a problem in Ethiopia because project data are still mainly recorded on paper and are available
principally within the district where the project took place. The final condition is to test and to calibrate
the regression equation using data from relevant projects to ensure proper results. As with any equation
and testing, precautions should be taken.
Two techniques that are commonly used in parametric estimating are (1) analogous (similar) projects
and (2) historical cost
1
Where,
• 𝑿𝒊 𝑡o 𝑿𝒏Represent independent variables.
• 𝐘 Dependent variable.
• 𝛽1 the regression coefficient of variable 𝒙𝟏
• 𝛽2 the regression coefficient of the variable 𝒙𝟐
• 𝜷𝟎 the intercept point of the regression line and the y-axis.
The goal of parametric estimating by using the regression technique is to create a statistically valid cost
estimating relationship (CER) using historical data. A CER defines cost as a function of one or more
technical parameters, such as physical characteristics or operating characteristics. A CER is the
foundation of the art and science of estimating resource needs in projects using parametric methods.
The parametric method comprises a collection of historical cost data and reducing it to mathematical
forms that can be used to estimate similar activities in future projects. The mathematical forms are
called CERs.
The Necessary steps for developing the Cost estimating the relationship between variables using
multiple regression are:
Step: -1 Identify Key road project parameters /Variables/
Many qualitative and quantitative factors affect the total road construction project cost as shown in
previous studies. It is striking, however, that most past construction models used only a few of them
because they lack information available in the early stages of a project, and information about
qualitative factors surrounding each project are difficult to obtain. Key parameters, which are also
known as the cost drivers/factors in construction cost and project duration prediction of different road
projects have been identified and some of them are list as follows but not limited.
Project scope. Pavement type (Asphalt or Concrete).
Year of Construction. Pavement Width.
Duration of the project. Shoulder Width.
Project location. Pavement thickness.
Length of the project. Earthwork Volume.
Price of labors. Average Daily Traffic volume.
Price of materials. Topography.
Price of equipment. Weather condition.
Changes in standards or specifications. Construction budget.
2
Asphalt hauls distance. Terrain Type (Flat, Rolling, Mountainous
Base coarse haul distance. /escarpment).
Soil suitability Condition. Geometric Design Standard.
Dependent Variable
▪ Total Project Budget
Step 2: -Identify the Data and Collect data.
The data were collected from contracts awarded by ERA’s, the clients for road construction projects,
awarded in the federal road projects. The data collected comprised of 11 projects from ERA, from these
ten upgrading projects, and new projects. As a result, the data on the cost and cost drivers for similar
kinds of projects in the past. Through the parametric method, the significant parameters, which are also
known as the cost drivers are identified, and the relationships between the parameters and cost are
established based on the historical project data.
Step 3: -Normalization of Project Cost
Cost indexes are particularly important with regards to cost analysis techniques that rely on historical
or past information. A road construction project is a unique undertaking. Due to its unique nature, it is
extremely difficult to collect and compile large quantities of cost data for roads relating to the same
reference point that is under the same time, location, and condition, to analyze its trends and patterns
for any cost research effort. The objective of the cost index development is to measure changes in the
cost of an item or group of items from one point to another. As such, the index can also be used to
adjust the costs from one point to another, or a common reference point. By bringing cost information,
obtained at several different points, to a common base by the use of indexes, a much larger sample of
data can be compiled and analyzed. This process measure of the relationship between the Y and X
variables is also referred to as the normalization of the project costs.
3
So, Adjust the project cost in terms of Size, Location, and Time.
The conceptual cost estimate for a proposed project is prepared from cost records of a project completed
at a different time and a different location with a different size. The estimator must adjust the previous
cost information for the combination of time, location, and size. From the Central Statistics Agency of
Ethiopia (CSA), Ethiopia’s current inflation rate is 18.7%. so, by using the following formula I try to
compute the new adjusted cost of each project using which enables and incorporates the time value of
money concept.
➢ Proposed cost= Previous cost × Time adjustment × Location adjustment × Size adjustment
As an example
The Federal Democratic Republic of Ethiopia as part of its Road Sector Development Program has
allocated a sufficient budget for the construction of 76.31km (Original contract length) from Adi
Arkay – Telemt Road to DC4 standard.
The road project is located in the Amhara National Regional State, Northern Gondar Zone. The
route connects two Woredas (provinces) of the zone; namely Adi Arkay Woreda and Telemt
Woreda. So
Project Location: Amhara
• Project location adjustment factor:1.32
Tender price levels vary according to the region of the country where the work is carried out. the use
of cost information from a previous project to forecast the cost of a proposed project will not be reliable
unless an adjustment is made proportional that represents the difference in cost between the locations
of the two projects. The adjustment should represent the relative difference in the costs of material,
equipment, and labor of the two locations. The indices that show below are the relative difference in
construction costs concerning geographical location is that taken from the ERA document and
Research.
Year of Construction:2018
• Time adjustment Factor = (1+18.7/100) ^ (the construction years of the proposed
project – the construction years of the completed road)
= (1.187) ^2020-2018 = 1.41.
Length of the project: 76.31 Km
• Size Adjustment Factor =Length of proposed project /length of a completed project
=60km/76.31km = 0.79
Therefore,
4
Proposed project cost= Previous/completed project cost × Location adjustment factor × Time
adjustment Factor × Size adjustment factor.
Finally, all the projects have been normalizing or adjust to the project cost in the following table.
5
Road Length Year of Previous Project Size Adjustment Location Time Adjustment
ID No. Project Name (in Km) X1 Project Location Construction cost factors Adjustment factors factors ATPC Y
1 Adi Arkay-Tselemt Road Project 76.31 Amhara 2017 1,283,338,584.50 0.79 1.32 1.67 2,227,599,628.23
2 Tulu Bolo - Kela Road Project 79.9 Oromia & SNNP 2013 695,545,730.60 0.75 1.72 3.32 2,982,727,922.47
Gondar Debark Road
Amhara
3 Upgrading Pr. (AC) 99.2 2017 483,546,135.51 0.60 1.32 1.67 645,659,555.91
4 Woreta to Meklle 53 Amhara 2018 1,981,378,049.64 1.13 1.32 1.41 4,171,748,296.95
5 Morka- Gircha- Chencha Road Project 72.6 Oromia & SNNP 2017 1,309,798,352.48 0.83 1.72 1.67 3,113,864,234.58
Shire - Shiraro - Humera
Amhara & Tigray
6 Lugdi lot.1(AC) 156 2008 1,246,200,728.65 0.38 1.32 7.82 4,949,908,962.15
7 Fik- Hamero- Imi Road Project 81 Somalia 2018 1,502,371,329.85 0.74 1.11 1.41 1,740,475,584.87
8 Robe- Gassra- Ginir Road Project 60.84 Oromia 2011 504,800,000.00 0.99 1.71 4.68 3,982,299,985.70
Mintamir - Metehabila - Metehara Road
Amhara
9 Projcet 81.59 2011 2,971,456.25 0.74 1.32 4.68 13,493,172.09
Nazareth Assela Dodola
& Shashemene Goba
Cont. 2 Assela Dodola Oromia
Junc. ipc dodola Hard
10 copy (AC) 79 2010 421,350,000.00 0.76 1.71 5.55 3,038,578,476.92
11 Metema - Abrajira Road Projects 117.3 Amhara 2013 819,129,220.55 0.51 1.32 3.32 1,836,261,889.61
6
Step 4: - Run the Regression By using SPSS statistics 23 and Interpreting Summary Function
Output
Input data for regression are shown in the above table I assumed that the town section of each
project has been 10 % of the project length and the road width is 7m for each project. Once the
variables to be included in the estimate equation have been identified, a series of models were
developed using multiple regression analysis techniques. Regression models are intended to
find the linear combination of independent variables which best correlates with dependent
variables. At the project level we postulate that total project cost (T.P.Cj ) is a function of a bill
of quantity such as earthwork volume, sub base and base coarse, pavement quantities, and as a
function of project size (i.e. road length, road width) and the regression equation is expressed
A summary of the key points when estimating the cost of a project or program, the estimator needs to
know more than a use of a single method, and/or cost models or techniques and a variable for that
7
❖ The output from regression analysis shown as follows
❖ Multiple regression models among total project cost and, project scope, road width, Road length, and Town Section, Terrain
classification and No.of bridge structure
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.967657569
R Square 0.93636117
Adjusted R Square 0.777777778
Standard Error 2181423294
Observations 11
ANOVA
df SS MS F Significance F
Regression 8 4.0021E+11 4.45E+10 3.012312645 0.011560404
Residual 3 3.71161E+11 1.24E+11
Total 12 4.27409E+11
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -1.12E+11 2.52099E-11 -4.61785 0.001258193 -1.73E+10 -5.94E+11 -1.73E+10 -5.94E+11
Road Length (in Km) X1 6579.166443 7272.126521 0.431521 0.00520348 21115.12473 21115.12473 21115.12473 21115.12473
Road width (in m) X2 4036.11133 21762.02453 0.612215 0.002155668 -21858.01145 42130.52512 -21858.01145 42130.52512
Town section (in km) X3 2.83548E+11 6.90285E+11 3.853499 0.001848883 1.86921E+11 4.70437E+11 1.86921E+11 4.70437E+11
Project Scope (X4) 136571979.3 528352736.3 0.258486 0.820201187 -2136746364 2409890323 -2136746364 2409890323
Flat (X5) 3522893000 1500407932 2.347957 0.143383836 -2932841284 9978627285 -2932841284 9978627285
Rolling (X6) -2020420219 2864792914 -0.70526 0.553722185 -14346629271 10305788833 -14346629271 10305788833
Mountanious & Escarpment (X7) 750144257.9 1055543778 0.710671 0.550985809 -3791494060 5291782576 -3791494060 5291782576
No.Bridge Section(X8) 107077668.3 154736552.4 0.692 0.56047897 -558699981.1 772855317.7 -558699981.1 772855317.7
8
❖ Multiple regression among total project cost and activities’ quantity
SUMMARY OUTPUT
Regression
Statistics
Multiple R 0.931366122
R Square 0.867442853
Adjusted R Square 0.697012235
Standard Error 1316671221
Observations 11
ANOVA
Significance
df SS MS F F
Regression 7 1.89016E+19 2.70022E+18 5.089713 0.021650904
Residual 3 2.88842E+18 9.62806E+17
Total 10 2.179E+19
Lower Upper
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% 95.0% 95.0%
Intercept 3971301556 1092413270 2.375869828 0.001805 23530393.94 9919072717 23530393.94 9919072717
- - - -
Earth works(m3) X1 5869.755776 7361.447529 0.661521495 0.009203 22276.81313 12537.30158 22276.81313 12537.30158
- -
Basecoarse (m3) X2 6036.11133 11881.01398 0.592214717 0.003156 21058.02246 35130.24512 21058.02246 35130.24512
- - - -
Pavement (m2) X3 2195.128985 3545.339208 0.337098629 0.012053 9578.524057 7188.266086 9578.524057 7188.266086
The parametric project cost value of the above out is:
❖ Y= 3971301556 -5869.755776X4+6036.111335X5-2195.128985X6
❖ As a result, the cost of the project where Assume, X4=355944.44, and X5=36345.17, and X6=420000
❖ Is Y= ETB 1179424176.36 or 19657069.61ETB/Km
❖ Interpreting the Basic Outputs of Multiple Linear Regression.
9
I. Multiple R-squared
The R-squared (R2) ranges from 0 to 1 and represents the proportion of variation in the outcome
/dependent variable that can be explained by the model predictor /independent variables. For a simple
linear regression, R2is the square of the Pearson correlation coefficient between the outcome and the
predictor variables. In multiple linear regression, the R2represents the correlation coefficient between
the observed outcome values and the predicted values. The R2measures, how well the model fits the
data. The higher the R2, the better the model. However, a problem with the R2, is that it will always
increase when more variables are added to the model, even if those variables are only weakly associated
with the outcome. A solution is to adjust the R2by taking into account the number of predictor variables.
In general, the most common interpretation of r-squared is how well the regression model fits the observed
data. For these projects, an r-squared of 0.867 reveals that 86.7 % of the data fit the regression model. In most
cases, a higher r-squared indicates a better fit for the model but not always.
II. Adjusted R-squared
Adjusted R-squared, a modified version of R-squared, adds precision and reliability by considering the
impact of additional independent variables that tend to skew the results of R-squared measurement.
The adjustment in the Adjusted R Square value in the summary output is a correction for the number
of x variables included in the predictive model. So, you should mainly consider the adjusted R-squared,
which is a penalized R2for a higher number of predictors. An (adjusted) R2that is close to 1 indicates
that a large proportion of the variability in the outcome has been explained by the regression model. A
number near 0 indicates that the regression model did not explain much of the variability in the
outcome.so in our case Adjusted R-squared is around 0.697reveals that 69.7% of the variability in the
outcome has been explained by the regression model.
The standard error is used to refer to the standard deviation of various sample statistics, such as the
mean or median. the standard error of the mean refers to the standard deviation of the distribution of
sample means taken from a population. The smaller the standard error, the more representative the
sample will be of the overall population. The more data points involved in the calculations of the mean,
the smaller the standard error tends to be. When the standard error is small, the data is said to be more
10
representative of the true mean. In cases where the standard error is large, the data may have some
notable irregularities.
IV. Observations:
The number of data samples considered in this case is 11 and I considered 6 key parameters to analyze
parametric cost estimation.
V. F-Statistic
The F-test evaluates the null hypothesis that all regression coefficients are equal to zero versus the
alternative that at least one is not. Thus, the F-test determines whether the proposed relationship
between the response variable and the set of predictors is statistically reliable. This is showing the
relationship between predictor and response, higher the value will give more reasons to reject the null
hypothesis, its significant of the overall model, not any specific parameter. R-squared tells you how
well your model fits the data, and the F-test is related to it. A large F-statistic will correspond to a
statistically significant p-value (p <0.05). In our example, the F-statistic equal 5.09 producing a p-value
of 0.022, which is highly significant. The object of the F-test is to find out whether the two independent
estimates of population variance differ significantly or whether the two samples may be regarded as
drawn from the normal population. That’s why also known as the Variance Ratio Test. The value of F
is always more than 1.
VI. Coefficients
Coefficients are simply numbers that express in the equation of the fit line i.e. representing the Y-
intercept and the slopes for each independent variable. In general, the coefficients describe the
mathematical relationship between each Independent variable and the dependent variable.
VII. P-value
The p-values for the coefficients indicate whether these relationships are statistically significant. The
p-value is the probability that this statistic will take on a value at least as extreme as the observed value,
assuming that the null hypothesis is true (i.e., the regression estimate is equal to zero). If the p-value is
less than alpha, say 0.05.as a result in this project p-value all P-values are less than 1, so using those
parameters as an independent variable is meaningful. Then, it is better if the P-value is less than the
11
significance F to some extent. Values closer to one can be removed and redo the analysis until a
satisfactory result is obtained.
VIII. Degree of Freedom
The degrees of freedom is the number of dimensions associated with this term. Note that each
observation can be interpreted as a dimension in n-dimensional space. The degrees of freedom for the
intercept, model, error, and adjusted total are 1, p, np-1, and n-1, respectively.
12