Prediction Accuracy - A Measure of Simulation Reality
Prediction Accuracy - A Measure of Simulation Reality
Abstract: Nowadays, simulations are increasingly being used in many contexts, such as
training and education in business and economics. The validity of the simulation outcomes is a key
issue in simulations. Procedures and protocols for simulation model verification and validation are
an ongoing field of academic study, research and development in simulations technology and
practice. The present paper discusses the simulation models accuracy, how to measure and
improve it in order to achieve better simulations results and provide more reliable insights and
predictions about the real-life processes or systems. It presents results from international research
done in Europe, Australia, and most recently in the United States.
Keywords: accuracy; simulations; predictions; artificial neural networks, statistical learning
networks, group method of data handling, multi-layered networks of active neurons
1. INTRODUCTION
Nowadays, simulations are increasingly being used in many contexts, such as
training and education in business and economics. One of the most common and simpliest
definition of a simulation is “an approximate imitation of the operation of a process or
system” (Banks et al., 2010, p. 3). In this regard, if a simulation does not accurately
represent (imitate) the real process or system, then the knowledge that the users will gain
about this real-life process is questionable.
Business simulation, according to Greenlaw et al. (1962), is described as a
sequential decision-making exercise structure, which is build around a model of a business
operation, in which participants assume the role of managing the simulated operation.
Simulations are often used to gain insights into business systems functioning and
structure. Also, simulation models help to predict what the business system can expect
in the future, make it possible to run simulation experiments with the system, and to apply
“what-if” analysis. Therefore, if the simulation model is not accurate and the predictions
made by its users are not close enough to the real-life case, then learning will be minimal
(Motzev, 2018, p. 295).
The validity of the simulation outcomes is a key issue in simulation. Procedures and
protocols for simulation model verification and validation are an ongoing field of academic
study, research and development in simulations technology and practice. Simulation
model validation is usually defined to mean “substantiation that a computerized model
within its domain of applicability possesses a satisfactory range of accuracy consistent
with the intended application of the model” (Schlesinger et al., 1979).
1
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
For years, precision was measured with respect to detail and accuracy was
measured with respect to reality (Acken, 1997, pp. 281-306). A shift in the meaning of
these terms appeared with the publication of the International Organization for
Standardization (ISO) 5725 series of standards in 1994, which was last reviewed and
confirmed in 2018. The purpose of ISO 5725-1 “is to outline the general principles to be
understood when assessing accuracy (trueness and precision) of measurement methods
and results, and in applications, and to establish practical estimations of the various
measures by experiment” (ISO 5725-1, 1994, p.1).
The new ISO defines accuracy as describing a combination of both types of
observational error (random and systematic), therefore, high accuracy requires both high
precision and high trueness (reality). In the current paper, we will use and understand the
term “accuracy” according to the ISO 5725-1.
The present paper discusses the simulation models accuracy, how to measure and
improve it in order to achieve better simulations results and provide more reliable insights
and predictions about the real-life processes or systems. It presents results from
international research done in Europe, Australia, and most recently in the United States.
2
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
Fig. 1 Distinction between Accuracy and Precision (source: Lucey, 1991, p. 17)
One important point here is that accuracy should not be confused with precision.
Information may be inaccurate but precise or vice versa (Fig. 1). To explain the difference
between accuracy and precision, we can use the analogy with the target comparison. In
this analogy, repeated measurements are compared to arrows that are shot at a target.
Accuracy describes the closeness of arrows to the bull’s-eye at the target center. Arrows
that strike closer to the bullseye are considered more accurate. The closer a system's
measurements to the accepted value, the more accurate the system is considered to be.
To continue the analogy, if a large number of arrows are shot, precision would be
the size of the arrow cluster. When only one arrow is shot, precision is the size of the
cluster one would expect if this were repeated many times under the same conditions.
When all arrows are grouped tightly together, the cluster is considered precise since they
all struck close to the same spot, even if not necessarily near the bullseye. The results are
precise, though not necessarily accurate.
However, it is not possible to reliably achieve accuracy without precision. If the
arrows are not grouped close to one another, they cannot all be close to the bull’s-eye.
Their average position might be an accurate estimation of the bull’s-eye, but the individual
arrows are inaccurate.
In the fields of engineering and statistics (Dodge, 2003) the accuracy of a
measurement system is the degree of closeness of measurements of a quantity to its real
(true) value. The precision of a measurement system, also called reproducibility or
repeatability, is the degree to which repeated measurements under unchanged conditions
show the same results. The field of statistics prefers to use the terms bias and variability
instead of accuracy and precision. The bias is the amount of inaccuracy and the variability
is the amount of imprecision.
In the past, most commonly accuracy was used as a description of systematic
errors, a measure of statistical bias, i.e. accuracy was considered as the proximity of
results to the true value and precision as the degree to which repeated experiments,
under unchanged conditions, show the same results.
3
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
Fig. 2 A) Low accuracy due to poor precision. B) Low accuracy due to poor trueness
(Source:https://commons.wikimedia.org/wiki/File:Accuracy_and_precision-highaccuracylowprecision.png)
As mentioned above, a shift in the meaning of these terms appeared with the
publication of ISO 5725-1 which purpose was to outline the general principles to be
understood when assessing accuracy. According to the new international standard, the
accuracy should be considered as describing a combination of random and systematic
error, and the high accuracy should consist of high precision and high reality.
In this regard, the chart in Fig.1 should be refined as shown in Fig.2 where accuracy
requires both, trueness and precision.
4
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
of known data on which training is run (training dataset), and a dataset of unknown data
(or first seen data) against which the model is tested (testing or validating dataset).
There are two possible algorithms in cross-validation. In the first one, we select a
testing set of m observations (about 20%-30% of the total sample), based on how big the
sample is and how far ahead we want to forecast. It should be at least as large as the
maximum forecast horizon required, but no more than 30% of the total number of
observations. The testing dataset could be selected in a different way depending on the
model purpose, for instance, this can be the last m observations, by random selection and
other schemes, as presented in (Madala, H. and Ivakhnenko A. G., 1994; Mueller J-A. and
Lemke F., 2003) and others.
There is another, more sophisticated algorithm of training/testing sets selection in
cross-validation. For cross-sectional data it works as follows:
1. Select (it could be random) observation i for the testing set, and use the remaining
observations in the training set. Compute the error on the test observation.
2. Repeat the above step for i = 1, 2, … N-1, where N is the total number of
observations.
3. Compute the forecast accuracy measures based on all errors obtained.
This is a much more efficient use of the available data, as we only omit one
observation at each step. However, it can be very time consuming to implement.
In time series analysis, the cross-validation is similar to the procedure with
cross-sectional data described above but for time series data the training set consists only
of observations that occurred prior to the observation that forms the testing set. Thus, no
future observations can be used in constructing the forecast. However, since it is not
possible to get a reliable forecast based on a very small training set, the earliest
observations are not considered as testing sets.
5
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
6
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
In real life, the two most important factors used to evaluate a simulation are its
Accuracy and Cost. Other important factors include the time to gather and analyze the
data, computational power and software, availability of historical data, and time horizon of
the simulation. How cost and accuracy increase with sophistication is presented in Fig. 6,
which shows and charts this against the corresponding cost of prediction errors, given
some general assumptions. The most sophisticated technique that can be economically
justified is one that falls in the region where the sum of the two costs is minimal.
There are also a few more points that should be considered when identifying the
simulation accuracy, such as “A model which fits the data well does not necessarily
forecast well”, “A perfect fit with zero prediction error can always be obtained by using a
model with large enough number of parameters (as shown in Fig. 4)”, “Overfitting a model
to data is as bad as failing to identify the systematic pattern in these data” (quoted from
Madala, H. and Ivakhnenko A. G., 1994 and Mueller J-A. and Lemke F., 2003).
Fig. 6 Cost of Forecasting Versus Cost of Inaccuracy for a Medium-Range Forecast, Given
Data Availability (Source: Chambers, J., Mullick, S. and Smith, D., 1971. How to Choose the Right
Forecasting Technique, Harvard Business Review, July)
7
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
et = yt – Ft (1)
8
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
1 N
MPE (%) =
N t =1
(et / yt ) 100 (2)
MPE shows the direction of errors which occurred and the opposite signed errors
affect each other and cancel out. It is desirable that for a good simulation with a minimum
bias the calculated MPE should be as close to zero as possible.
• Mean Absolute Percentage Error (MAPE) – see equation (3). This
measure puts errors in perspective. It is useful when the size of the simulated variable is
important in evaluating. It provides an indication of how large the simulation errors are in
comparison to the actual values. It is also useful to compare the accuracy of different
simulation models on same or different data.
1 N
MAPE (%) = ( et / yt ) 100 (3)
N t =1
MAPE represents the percentage of average absolute error and is independent of
the scale of measurement but it is affected by data transformations. It does not show the
direction of error and does not penalize extreme deviations. For a good simulation, the
calculated MAPE should be as small as possible.
• Mean squared error (MSE) – see equation (4). It is the average of the
squared errors, i.e. the differences between the actual and the simulated values at period
t:
Technically, the MSE is the second moment about the origin of the error, and thus
incorporates both the variance of the estimator and its bias. For an unbiased estimator, the
MSE is the variance of the estimator. Like the variance, MSE has the same units of
measurement as the square of the quantity being estimated. It is also sensitive to the
change of scale and data transformations. Because of all these properties, researchers
mostly use the MSE square root.
• Root Mean Squared Error (RMSE) – see equation (5). This is the square
root of calculated MSE. In an analogy to the standard deviation, taking the square root of
MSE yields the root-mean-squared error (RMSE), which has the same units as the
quantity being estimated:
9
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
Most importantly, RMSE (like MSE) penalizes extreme errors (since it squares
each) occurring in simulation. It emphasizes the fact that the total error is in fact much
affected by large individual errors, i.e. the large errors are much more expensive than the
small errors.
The RMSE serves to aggregate the magnitudes of the errors in predictions for
various times into a single measure of predictive power. Thus, it is a good measure of
accuracy, but only to compare simulation errors of different models for a particular variable
and not between variables, as it is scale-dependent.
• Coefficient of variation of the RMSE CV(RMSE) – see equation (6). It is
defined as the RMSE normalized to the mean of the real values:
It is the same concept as the coefficient of variation (CV) in statistics except that
RMSE replaces the standard deviation. The CV is useful because the standard deviation
of data must always be understood in the context of the mean of these data. In contrast,
the actual value of the CV is independent of the unit in which the measurement has been
taken, so it is a dimensionless number. For comparison between datasets with different
units or widely different means, we should use the CV instead of the standard deviation.
The smaller the CV(RMSE) value, the better the simulation.
et
qt = n
(8)
y - y /(n − 1)
i=2
i i -1
10
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
where the numerator et is the prediction error for a given period t and the
denominator is the average forecast error of the one-step naïve forecast method, which
uses the actual value from the prior period as a forecast, i.e. Ft = yt−1.
According to this study, the scale-free error metric "can be used to compare
forecast methods on a single series and also to compare forecast accuracy between
series. This metric is well suited to intermittent-demand series, because it never gives
infinite or undefined values except in the irrelevant case where all historical data are equal”
(Hyndman, 2006, p. 46).
The example above shows that there are many attempts to improve the prediction-
error metrics, including adjustments, modifications, and even new formulas and criteria.
However, none of them is perfect enough to replace the existing set of measures of
accuracy. More details on this topic we have discussed in (Motzev, 2016, pp. 67-94).
In summary, we have to conclude that there is a variety of measures of the
simulation accuracy which have different properties and could be used for different
purposes. Though each case is particular and has its specific goals, we can summarize
some general rules of using the measures of accuracy:
• To measure simulation usefulness or its reliability, most frequently, researchers use
RMSE and MPE.
• To compare the accuracy of two different techniques, the most common measures
are MAPE (also MASE could be used) and the RMSE normalized value CV(RMSE).
• There are some important points that need clarification as well, in terms of the
specific properties of the simulation technique used, like the validity of the
simulation technique assumptions and the significance of the simulation model
parameter estimations.
• Lastly, it is also important if the simulation technique is simple to use and easy to
understand for decision makers.
11
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
simulation models but at the same time DNNs are difficult to develop and hard to
understand.
Statistical Learning Networks (SLNs) can address the common problems of DNNs
such as: difficulties in interpretation of the results (DNNs are implicit models with no
explanation component by default), the problem of overfitting, the designing DNN topology
which is in general a trial-and-error process and lack of rules how to use the theoretical a
priori knowledge in DNN design and so on (Müller and Lemke, 2003).
4.2. Statistical Learning Networks and the Group Method of Data Handling
In (Motzev, 2017, 2018) we presented a highly automated procedure for developing
SLNs for business simulations in the form of Multi-Layered Networks of Active Neurons
(MLNAN) using the Group Method of Data Handling (GMDH) and in (Motzev and
Pamukchieva, 2019) we discussed the problem of accuracy in business simulations
providing some insights into how to address it in a cost-effective way using the SLNs in the
form of MLNAN. In this paper, we will mention briefly this technique and will concentrate
on its possible applications and the advantages and the benefits which it provides in
business simulations.
Statistical learning theory deals with the problem of finding a predictive function
based on data. Typically, it is a framework for machine learning drawing from the fields of
statistics and functional analysis. Proven to be one of the most successful methods in
SLNs is the Group Method of Data Handling (GMDH). It was introduced in 1968 by the
Russian researcher Alexy G. Ivakhnenko as an inductive approach to model building
based on self-organization principles (see GMDH.net).
In GMDH algorithms, models are generated adaptively from input data in the form of
an ANN of active neurons in a repetitive generation of populations of competing partial
models of growing complexity. A limited number is selected from generation to generation
by cross-validation, until an optimal complex model is finalized. This modeling approach
grows a tree-like network out of data of input and output variables in a pair-wise
combination and competitive selection from a single neuron to a final output – a model
without predefined characteristics (Fig. 7). Here, neither the number of neurons and the
number of layers in the network, nor the actual behavior of each created neuron is
predefined. The modeling is self-organizing because the number of neurons, the number
of layers, and the actual behavior of each created neuron are identified during the learning
process from layer to layer.
12
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
SLNs, such as the MLNAN narrated above, help to address most of the problems in
ANNs and the Deep Learning. For example, ANNs cannot explain results and this is their
biggest drawback in a business decision support context. In situations where explaining
rules may be critical, such as denying loan applications, general ANNs are not a good
choice. Müller and Lemke (1995) provided comparisons and pointed out that in distinction
to ANNs, the results of GMDH algorithms are explicit mathematical models, that are
generated in a relatively short time on the basis of even small samples. Another problem is
that deep learning and neural network algorithms can be prone to overfitting. Following
Beer (1959), only the external criteria, calculated on new, independent information can
produce the minimum of the simulation model error. GMDH algorithms for MLNAN address
this problem with the cross-validation technique (Fig. 8).
In summary, algorithms like MLNAN combine in a powerful and a cost-effective way
the best features of ANNs and statistical techniques.
13
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
5. APPLICATIONS
5.1. Simulation Models in Business and Economy
MLNAN have been successfully applied in many areas, including business simula-
tions. When developing the working prototype of the MLNAN (Motzev and Marchev, 1988),
we used it to build a series of increasingly complex simulation models as linear systems of
simultaneous equations (SE) of the Bulgarian national economy and the results show a
very high level of accuracy (Table 1). Other successful applications in predictive modeling
with similar SLNs were made in Germany (Müller and Lemke 2003), the United States
(Klein et al. 1980) and other countries. The comparisons of the results from different
simulations and models, which we did in (Motzev, 2016) show almost insignificant
differences in their predictions and a very high level of accuracy.
Just recently, the proposed in (Motzev, 2017) framework for developing MLNANs
was used in Business Forecasting and Predictive Analytics class at the Walla Walla
University School of Business. For three years in a row (2017-2019), we developed many
predictive models using more or less complex techniques. In summary, the predictions
done with the MLNAN always have the smallest errors (i.e. the highest accuracy) as
presented in Tables 2 and 3.
All results so far prove the advantages of utilizing SLNs in business simulations.
SLNs such as MLNAN provide opportunities in both shortening the design time and
reducing the cost and efforts in simulations model building, as well as reliably developing
even complex models with high level of accuracy.
Tab.1 Macroeconomic simulation models in Bulgaria.
Model name and
Main purpose and accuracy achieved Model characteristics
Year of design
First step in developing complex models in A one-product macro-economic model
the form of Simultaneous Equations (SE). developed as a system of five SE.
SIMUR 0 - 1977
Indirect Least Squares used for estimation. Contains five endogenous, one
CV(RMSE)=14%. exogenous, and five lag variables.
Analysis of possibilities for automated A one-product macro-economic model
SIMUR I - 1981 model building. GMDH algorithm for developed as a system of five SE.
MLNAN used. CV(RMSE)=2.7%. Contains the same set of variables.
Design and verification of a program system
Aggregated macroeconomic model in the
for simulation experiments with SE. Analysis
form of 12 interdepending SE. Contains
SIMUR II - 1985 of validation criteria for accuracy evaluation
12 endogenous, 5 exogenous and 26 lag
of SE. GMDH algorithm for MLNAN used.
variables with time lag of up to 3 years.
CV(RMSE)=2.0%.
Improving the MLNAN for synthesis of SE Macroeconomic simulation model.
with large number of equations. Simulation Contains 39 SE and 39 endogenous, 7
SIMUR III - 1987
of main macroeconomic variables. exogenous and 82 lag variables with a
Average CV(RMSE)<1%. time lag of up to 5 years.
Source: Own data
14
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
15
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
The same data and set of variables were used to build the model SIMUR 1 using
the MLNAN. The new model produced much better accuracy (about five times smaller
CV(RMSE) as shown in Table 4) providing a more reliable base for simulations and what-if
analysis. Similar results were obtained when applying SLNs and MLNAN in many other
simulation games (Motzev, 2013).
Recently, the MLNAN was used in the process of developing the “NEW PRODUCT”
business game series, which is an integrated, role-playing, model-based simulation game
designed for the purposes of business training and education (Motzev, 2012). The latest
versions of the game cover all major stages in the process of new product planning and
development, production and operations management, sales and marketing. The main
purpose of the game is to be used as an educational tool for teaching business, but it may
also be carried out for training in general management, inventory and stock control,
production and small business management.
6. CONCLUSIONS
It is important to note that despite the fact that simulations are based on models, the
use of good and accurate models does not guarantee a good decision. Nonqualified users
cannot comprehend the rules for using the simulation model, or may incorrectly apply it
and misinterpret the results. Also, no single technique/model works in every situation and
selecting the most appropriate (i.e. cost-effective) one out of many possible models is a
never-ending task in business simulations.
It is true that the future cannot always be predicted based on history. Data fitting
limits somewhat the value of the results and decision makers have to decide the overall
importance of the simulation outputs. However, techniques such as SLNs (and the MLNAN
in particular) provide processed data that are needed in the business context and the
extracted information is useful to business because creates value or predicts market
behavior in a way which leads to competitive advantages.
16
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
7. REFERENCES
Acken, J., 1997. "none". Encyclopedia of Computer Science and Technology. 36,
pp. 281–306.
Banks, J., Carson, J., Nelson, B. and Nicol, D., 2010. Discrete-Event System
Simulation. Fifth Edition, Upper Saddle River, Pearson Education, Inc.
Beer, S., 1959. Cybernetics and Management, English University Press, London, p.
280.
Dodge, Y., 2003. The Oxford Dictionary of Statistical Terms, OUP.
Greenlaw, P., Herron, L. and Rawdon, R., 1962. Business Simulation in Industrial
and University Education. Prentice-Hall.
Hyndman, R., 2006. Another Look at Forecast-Accuracy Metrics for Intermittent
Demand. Foresight, Issue 4, June, pp. 43-46.
ISO 5725-1, 1994. Accuracy (trueness and precision) of measurement methods and
results - Part 1: General principles and definitions p. 1. Retrieved from
https://www.iso.org/obp/ui/#iso: std:iso:5725:-1:ed-1:v1:en
Lucey, T., 1991. Management Information Systems. DP Publications Lim.
Klein, L., Müller, J-A. and Ivakhnenko, A.G., 1980. Modeling of the Economics of
the USA by Self-organization of the System of Equations, Soviet Automatic Control 13 (1),
pp. 1-8.
Madala, H. and Ivakhnenko, A. G., 1994. Inductive Learning Algorithms for Complex
Systems Modelling. CRC Press Inc., Boca Raton.
Motzev, M. and Marchev, A., 1988. Multi-Stage Selection Algorithms in Simulation.
In: XII IMACS World Congress Proceedings, Paris, vol. 4, pp. 533-535.
Motzev, M., 2010. Intelligent Techniques In Business Games And Simulations – A
Hybrid Approach. In: Martin Beran (ed.) Changing the world through meaningful play.
Proceedings of the Annual ISAGA Conference, Spokane, WA, pp. 81-86.
Motzev, M., 2012. New Product – An Integrated Simulation Game in Business
Education. In: Bonds & Bridges. Proceedings of the Annual ISAGA Conference, pp. 63-75.
Motzev, M., 2013. Model-Based Simulation Games in Business Training. In:
Electronic Proceedings of the Annual ISAGA Conference, Stockholm, Sweden.
17
VANGUARD SCIENTIFIC INSTRUMENTS IN MANAGEMENT, vol. 15, no. 1, 2019, ISSN 1314-0582
18