Big Data Analytics Using Multiple Criteria Decision-Making Models (2017)
Big Data Analytics Using Multiple Criteria Decision-Making Models (2017)
Business Analytics
Tools
Business Intelligence
Big Data Analytics
Using Multiple
Criteria Decision-
Making Models
The Operations Research Series
Series Editor: A. Ravi Ravindran
Professor, Department of Industrial and Manufacturing Engineering
The Pennsylvania State University, University Park, PA
Published Titles
Multiple Criteria Decision Analysis for Industrial Engineering:
Methodology and Applications
Gerald William Evans
Multiple Criteria Decision Making in Supply Chain Management
A. Ravi Ravindran
Operations Planning: Mixed Integer Optimization Models
Joseph Geunes
Introduction to Linear Optimization and Extensions with MATLAB ®
Roy H. Kwon
Supply Chain Engineering: Models and Applications
A. Ravi Ravindran & Donald Paul Warsing
Analysis of Queues: Methods and Applications
Natarajan Gautam
Integer Programming: Theory and Practice
John K. Karlof
Operations Research and Management Science Handbook
A. Ravi Ravindran
Operations Research Applications
A. Ravi Ravindran
Operations Research: A Practical Introduction
Michael W. Carter & Camille C. Price
Operations Research Calculations Handbook, Second Edition
Dennis Blumenfeld
Operations Research Methodologies
A. Ravi Ravindran
Probability Models in Operations Research
C. Richard Cassady & Joel A. Nachlas
Big Data Analytics
Using Multiple
Criteria Decision-
Making Models
Edited by
Ramakrishnan Ramanathan,
Muthu Mathirajan, and A. Ravi Ravindran
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all materials reproduced in this
publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged, please write and let us know so we
may rectify in any future reprint.
Except as permitted under the U.S. Copyright Law, no part of this book may be reprinted, repro-
duced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now
known or hereafter invented, including photocopying, microfilming, and recording, or in any infor-
mation storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC),
222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Names: Ramanathan, R., 1966- editor. | Mathirajan, M., editor. | Ravindran, A.,
1944- editor.
Title: Big data analytics using multiple criteria decision-making models /
edited by Ramakrishnan Ramanathan, Muthu Mathirajan, A. Ravi Ravindran.
Description: Boca Raton : Taylor & Francis, CRC Press, 2017. | Series: The
operations research series | Includes bibliographical references and index.
Identifiers: LCCN 2016056409| ISBN 9781498753555 (hardback : alk. paper) |
ISBN 9781498753753 (ebook)
Subjects: LCSH: Big data. | Multiple criteria decision making. | Business
logistics--Decision making
Classification: LCC QA76.9.B45 B5535 2017 | DDC 005.7--dc23
LC record available at https://lccn.loc.gov/2016056409
Preface..................................................................................................................... vii
Editors.......................................................................................................................ix
Acknowledgments............................................................................................... xiii
Contributors............................................................................................................xv
v
vi Contents
vii
viii Preface
modeling-based tools for BA, with exclusive focus on the subfield of MCDM
within the domain of operational research.
We believe that the two themes of the book, MCDM and big data, address
a very valuable research gap. While there are several textbooks and research
materials in the field of MCDM, there is no book that discusses MCDM in the
context of emerging big data. Thus, the present volume addresses the knowl-
edge gap on the paucity of MCDM models in the context of big data and BA.
There was an instant response from Professor Ravindran’s students and
colleagues for the call for contributions of the book. A total of 15 chapters
were considered in the first round of review. Though all of them were of
good quality, after careful review and evaluation for the fit for the theme
of the book, it was decided to include 13 chapters in this volume. At least
five of these chapters have been authored by students and close associates
of Professor Ravindran. There are contributions from authors based in the
United States (5 chapters), from the United Kingdom (2), and from India (6).
This volume starts with a fitting Festschrift in Honor of Professor
Ravindran by Professor Adedeji B. Badiru. The rest of the volume is broadly
divided into three sections. The first section, consisting of Chapters 2 and 3,
is intended to provide the basics of MCDM and big data analytics. The next
section, comprising Chapters 4 through 10, discusses applications of tradi-
tional MCDM methods. The last section, comprising the final three chapters,
discusses the application of more sophisticated MCDM methods, namely,
data envelopment analysis (DEA) and the analytic hierarchy process.
Due to the topical nature of the theme of big data, it has been a challenge
to ensure that the contributions of this volume, from traditional MCDM
researchers, had adequate treatment of big data. We believe that the chapters
of this book illustrate how MCDM methods can be fruitfully employed in
exploiting big data, and will kindle further research avenues in this exciting
new field. We also believe that the book will serve as a reference for MCDM
methods, big data, and linked applications.
Ramakrishnan Ramanathan
Muthu Mathirajan
A. Ravi Ravindran
Editors
Profile: http://www.beds.ac.uk/research-ref/bmri/centres/bisc/people/
ram-ramanathan
http://www.beds.ac.uk/howtoapply/departments/business/staff/
prof-ramakrishnan-ramanathan
LinkedIn: http://www.linkedin.com/pub/ramakrishnan-ramanathan/
12/50a/204
ResearcherID: http://www.researcherid.com/rid/H-5206-2012
Google Scholar: http://scholar.google.co.uk/citations?user=1CBQZA8
AAAAJ
ix
x Editors
Facebook: http://www.facebook.com/ProfRamanathan
Twitter: @ProfRamanathan
First and foremost, the editors acknowledge the extraordinary effort, dedica-
tion, and leadership of Dr. P. Balasubramanian, Founder and CEO of Theme
Work Analytics, who was instrumental in organizing the international sym-
posium in 2015 in Bangalore to honor Professor Ravi Ravindran. Participants
of the symposium, particularly Dr. M. Mathirajan, Chief Research Scientist
at IISc, Bangalore (one of the editors of this book); Professor S. Sadogopan,
the Director of IIT-Bangalore; and Mr. Harsha Kestur, the Vice President of
National Education Foundation, Bangalore, were instrumental for the gen-
esis of this book.
Next, we would like to thank the authors, who have worked diligently in
producing the book chapters that are comprehensive, concise, and easy to
read, bridging the theory and practice. The development and evolution of
this book have also benefitted substantially from the advice and counsel of
our colleagues and friends in academia and industry, who are too numerous
to acknowledge individually.
Finally, we would like to thank Cindy Carelli, senior acquisition editor and
Ashley Weinstein, project coordinator at CRC Press for their help from the
book’s inception until its publication.
xiii
Contributors
xv
xvi Contributors
A. B. Badiru
CONTENTS
1.1 Background...................................................................................................... 1
1.2 The Tinker Projects.........................................................................................2
1.3 Reprint of a Tinker Project Article...............................................................3
1.4 Conclusion..................................................................................................... 18
References................................................................................................................ 18
1.1 Background
I completed my PhD in industrial engineering at the University of Central
Florida (UCF) in Orlando, Florida in December 1984, which was an off-season
PhD completion cycle for the purpose of securing academic positions. The
graduation ceremony at UCF was on Friday, December 14, 1984. Having been
offered a job, bigheartedly, by Ravi, my wife and I packed up and set out
for Norman, Oklahoma with our two young children the same afternoon
after the graduation ceremony. After 3 days of driving a combination of a
U-Haul Truck and our old family car, we arrived in Norman on Monday,
December 17 in the morning. I called Ravi that I had arrived in town and
he immediately informed me that the last department meeting of the year
was taking place that very afternoon and he would like me to attend. Casual
as that invitation might seem, it said a lot about the caliber of the go-getter
that Ravi was and still is. Ravi never missed an early opportunity to put
us to work constructively as a way to start building our faculty profiles on
our match toward earning tenure at the University of Oklahoma. Although
I declined to attend that meeting, partly because I was tired from the long
road trip and partly because I feared getting to work right away, that engage-
ment opened my eyes to the need to be ready to take advantage of Ravi’s
mentoring ways. He would cajole, persuade, entice, and or coax the young
assistant professors under his charge into pursuing the latest line of research
1
2 Big Data Analytics Using Multiple Criteria Decision-Making Models
FIGURE 1.1
Edelman Laureate Ribbon recognizing Ravi Ravindran’s team accomplishment with the
Tinker Air Force Base Project in 1988.
in November 2015. So, even after 30 years, the Tinker Projects continue to
bear intellectual fruits.
* Reprinted verbatim with Permission from: Ravindran, A.; Foote, B. L.; Badiru, Adedeji B.;
Leemis, L. M.; and Williams, Larry (1989), “An Application of Simulation and Network
Analysis to Capacity Planning and Material Handling Systems at Tinker Air Force Base,”
TIMS Interfaces, Vol. 19, No. 1, Jan.–Feb., 1989, pp. 102–115.
4 Big Data Analytics Using Multiple Criteria Decision-Making Models
accessories, and it manages selected Air Force assets worldwide. This engine
overhaul facility is responsible for logistical support for a series of Air Force
engines. Engines are returned from service activities for periodic overhaul
or to complete a modification or upgrade. The engine is disassembled, and
each part is inspected for wear and possible repair. Individual parts are
repaired or modified to a like-new condition or are condemned and replaced
with a new part. The majority of the parts are overhauled and returned to
service for a fraction of the cost of a new part. A major overhaul may cost less
than five percent of the cost of a new engine in terms of labor, material, and
replaced parts, Between November 11 and 14, 1984, a fire devastated Building
3001, which contained the Propulsion (Engine) Division in the Directorate of
Maintenance. The division consists of over 2800 employees and produces
over 10 million earned hours to support Department of Defense overhaul
requirements each year. In February 1985, the Air Force published a state-
ment of work requesting assistance from industry to model and develop a
simulation of the engine overhaul process to assist in the redesign and lay-
out of approximately 900,000 square feet of production floor space. Three
commercial firms attended an onsite prebid conference to learn the scope
of the project, the nature of the data the Air Force could provide, and the
time frame in which a finished product had to be delivered. The model was
expected to predict the number and type of machines, the personnel, the
queuing space required, the material-handling distribution, and the volume
between and within organizations. The facility engineers needed various
management reports to help them to lay out the plant. The project was to be
completed within 120 days. Of the three firms, one elected not to respond,
the second bid $225,000 with the first report in nine months, and the third
quoted $165,000 to study the problem with an expected projected cost of
over $300,000. Each was a highly reputable organization with considerable
expertise and success in the field. Since TAFB had budgeted only $80,000
and time was running out, TAFB contacted the University of Oklahoma. It
had not been considered earlier because of conflicts with class schedules.
As the month of May approached, the university became a potential vendor.
A contract was let on May 1, 1985, and the first product was delivered by June
15, 1985.
Project Scope
The scope of the project was to take advantage of the disaster and forge a
state-of-the-art facility for overhauling engines with the most efficient and
cost-effective organizational structure and physical layout. The relocation
team was charged with developing and implementing a total change in the
philosophy of engine overhaul that would maximize flexibility while mini-
mizing facility and plant equipment costs. Of equal importance was the task
of developing a means to predict and forecast resource requirements as work
load mixes changed.
Multicriteria Leadership and Decisions 5
Prior to the fire, the division was organized along functional operational
lines with each department responsible for a specific process, such as machin-
ing, welding, cleaning, or inspection. This organization structure was devel-
oped in 1074 when engine overhaul functions were consolidated into one
organization. At that time, such functional shop layouts maximized equip-
ment utilization and skill concentrations since a typical long-flow part would
require 30 to 50 production operations and change organizations only seven
to 10 times. Today, the same part requires over 120 production operations
and changes organizations as many as 30 to 50 times. This increase has been
caused by incremental introduction of technology and by improved repair
procedures that offset wear of critical engine parts and reduce replacement
6 Big Data Analytics Using Multiple Criteria Decision-Making Models
costs. The additional repairs increased routing that overburdened the mecha-
nized conveyor system. Since 1974, the only major change was an experiment
three years prior to the first to consolidate one part-type family, combustion
cans, into a partially self-contained work center.
The reconstruction period after the fire gave TAFB a unique opportunity
to design a modern production system to replace the one destroyed. TAFB
manufacturing system analysts changed the repair process from a process
specialization type of operation to a family (group) type of operation. Staff
from the University of Oklahoma helped to solve the problems associated
with ling flow types, lack of clear responsibility for quality problems, and
excessive material handling. The plan for reconstruction was based on the
concept of a modular repair canter (MRC), a concept similar to the group
technology cell (GTC) concept except that it is more interrelated with other
centers than a GTC.
We created and defined the modular repair center concept as a single orga-
nization to inspect and repair a collection of parts with similar geometries
and industrial processes so as to provide the most economical assignment
of equipment and personnel to facilitate single point organizational respon-
sibility and control. An example of such a center is the blade MRC, which
repairs all turbine blades from all engine types. With the exception of initial
chemical cleaning, disassembly, plating, paint, and high temperature heat
treatment, all industrial equipment and processes were available for assign-
ment to an MRC.
Since TAFB lost an entire overhead conveyor system in the fire, imple-
menting the MRC concept required a new conveyor design in terms of rout-
ing, size, and location of up and down elevators. The new system needed
a conveyor to move parts to their respective MRCs from the disassembly
area and to special areas such as hear treatment, painting, or plating and
back to engine reassembly. When an engine arrives for repair, its turbine
blades are removed and routed via the overhead conveyor to the blade
MRC, out to heat treatment, painting, and plating, back to the MRC, and
finally returned to be assembled back into an engine. A stacker (mecha-
nized inventory storage system) in each MRC handles excess in-process
queues that are too large for the finite buffer storage at each machine. One
of the functions of the simulation model was to compute the capacity of the
buffers and stacker.
Data Analysis
Standard sources at TAFB provided the information for analysis. The first
source, the work control documents (WCD), gives the operation sequences for
all the parts. It tells which MRC a part goes to and the sequence of machines
the part will visit within the MRC. There are 2,600 different WCDs, with as
few as 11 assigned to combustion cans and as many as 700 assigned to the
general shop.
Multicriteria Leadership and Decisions 7
TABLE 1.1
Material Handling Codes
Alphanumeric Code Weight of Item (lbs.)
A 0–1
B 1–5
C 5–10
D 10–25
E 25–50
F >50
W 0–6 6–12 12–24 24–48 >48
L
0–6 1 2 4 7 11
6–12 2 3 5 8 12
12–24 4 5 6 9 13
14–48 7 8 9 10 14
>48 11 12 13 14 15
The second data source, the engine repair plan, showed how many engines
of each type were expected to be repaired each year. We used the fiscal ’85
requirements and a projected annual work load of 2000 engine equivalents
to determine how many units of each family type would enter the system.
Table 1.1 presents the material handling codes. The top part of the Table
gives the code for the six different weight categories while the lower part (the
matrix) expresses the code for 15 different categories of length and width of
the base of the part, which rests on the pallet. Each number represents a com-
bination of length (L) and width (W), measured in inches. D9, for instance,
means a part that weighs 10–25 pounds and has a base whose length is
between 12 and 24 inches and whose width is between 24 and 48 inches.
The third source of data, the TAFB standard material handling (MH) cod-
ing of each part, was based on the size and weight of each part. Parts move
on pallets at TAFB. We used the MH coding to estimate the number of parts
per pallet (see Table 1.1). TAFB engineers had decided on the shop configura-
tion and location of the MRCs but had not determining their physical dimen-
sions prior to our analysis. The configuration was based on groupings of jet
engine parts with similar geometries, metal types, and repair processes (for
example, major cases, rotating components). The MRCs are N-nozzle, S-seal,
B-bearing housing, GX-gear box, TR-turbine compressor rotor, K-combustion
can, BR-blade, AB-after burner, C-case, CR-compressor rotor, ZH-general
handwork, ZM-general machining, ZW-general welding. In addition, gen-
eral purpose shops handle painting, plating, heat treatment, blasting and
cleaning. Since several hundred units of each WCD are processed, the facil-
ity handles over one-half million units annually. Each WCD is assigned to
one of the MRCs and goes through several processes, comprising 25 to 100
8 Big Data Analytics Using Multiple Criteria Decision-Making Models
operations each. Each MRC handles from 11 to 700 WCDs and has between
19 and 83 processes assigned to it.
TABLE 1.2
Parts per Pallet Material Handling Codes
Weight Code
Size Code A B C D–E F
1 50 30 20 10 5
2 8 8 8 4 4
3 8 8 8 4 4
4 8 6 5 3 2
5 8 6 5 3 2
6 8 6 5 3 2
7 8 6 5 3 2
8 4 4 4 2 2
9 4 4 4 2 2
10 4 4 4 2 2
11 4 4 4 2 2
12 2 2 2 1 1
13 2 2 2 1 1
14 2 2 2 1 1
15 2 2 2 1 1
Multicriteria Leadership and Decisions 9
Building 3001
prefire
Heat
treat
Disassembly
Inspection
Cleaning
Heavy
Weld
machining
General
machining
Plating Paint
Stacker
Rotor
Admin
Blast
Gearbox
NDI Blades
Building 3001
postfire
Seals
Heat treat
NDI
Disassembly
Cleaning
Comb
Gen can
Plating Lab Paint
MRC Case
NOZ
Gearbox
Stacker
Material
housing
Turbine
Bearing
Blast
Comp
Admin
rotor
control N.C.
rotor
Blades After
Assembly burner
Conveyor with up
elevator
FIGURE 1.2
Conveyor system pre- and postfire.
the WCDs flowing through one particular MRC. Features of the TIPS model
include three shifts, transfer to other MRC operations (that is, painting, plat-
ing, and heat treatment), and stackers to model WCD storage when machine
queue lengths are exceeded. The simulation model is capable of storing 70,000
entities (concurrent WCDs) in an MRC. Despite this, three of the MRCs were
so large that they had to be broken into smaller family groups.
Figure 1.3 illustrates the system concept of the MRC and how material flows
inside and to external shops. This allows the stacker to be sized by the simu-
lation; the maximum load will determine the size of the stacker installed.
The part shown in Figure 1.3 has a 1-4-3-painting-5 machine sequence.
Multicriteria Leadership and Decisions 11
Process Legend
number
1 2 Machine available on
first shift only
3
Machine available
on first and second shifts
Q Q Q
Route out of MRC
(4-hour material handling delay)
In Painting
Q Stacker Q
Infinite Out
4 capacity 5
FIGURE 1.3
Sample five-machine MRC configuration.
The in-process queue area at the machine is limited. When this is full, the
overflow goes to the infinite capacity stacker. The simulation computed the
maximum stacker storage requirement needed. A route out after machine
number three to painting and return after an 8-hour material handling delay
is shown. The cross hatching on the machines in the diagram indicated the
shifts when they are available. For example, there are six machines of type
number one available during the day shift, and only four available during
the second shift. If a WCD is on a machine when the shift change occurs,
it is assumed that the machine completes processing the WCD prior to the
changeover.
Tinker Air Force Base supplied the data used to determine the rate of flow
of WCDs through each MRC. The data for each MRC came in two sets, the
1985 fiscal year data and the data for 2000 engine equivalents (when the facil-
ity would run at full capacity). Both data sets contained a list of the WCDs for
the MRC, the operations sequence for each WCD, the corresponding machine
process time for each WCD, the corresponding standard labor time for each
WCD, the UPA (units per assembly) number for each WCD the data included,
and a vector containing the relative frequencies of each WCD. In addition,
the projected size of each MRC (for example, number of machines of each
type) and information needed to calculate a From-To matrix (for inter- and
intra-MRC transfers). We transformed all the data to a format that allowed
SLAM to execute the discrete event model.
Two features of the TIPS simulation model make it unique. First, the model
was so large that it used the SLAM language at its maximum configura-
tion to run a single MRC. We had to consult with Pritsker and Associates to
determine how to extend SLAM’s storage limits in the source code. Second,
the model integrated both physical (machines) and skill (labor) resources
12 Big Data Analytics Using Multiple Criteria Decision-Making Models
Verification
To determine whether the simulation model was working as intended, we
took the following verification steps:
expectations?), consistency (does the output remain about the same for
similar inputs?), reasonable run time (does the program run longer
than expected for the given MRC?), and output (a 10 percent increase
in load should show more than a 10 percent increase in waiting time).
Validation
To validate the model, we made a diagnostic check of how closely the simula-
tion model matched the actual system, taking the following steps:
Output
The output from TIPS consisted of two documents: the standard SLAM sum-
mary report and a custom printout generated by a FORTRAN subroutine.
The custom output presented the SLAM output in a format and at a level of
detail suitable for prompt managerial decision making. The statistics in the
output included:
Management
process
Engineering Design
process process
Process Capacity
capability Tips planning
analysis
Process Production
monitoring planning
Parts flow
analysis
FIGURE 1.4
Central role of TIPS in shop management.
monitoring to determine if production goals are met, and to meet the needs
of engineering in designing new overhaul procedures.
Design Constraints
Because the fire was so destructive and because slowing maintenance opera-
tions for a long time would seriously affect the national defense, TAFB set
a time limit of three months for designing the analytic computer models.
We developed quick approximations of such items as pallet flow so that we
could test model validity quickly. The closeness of predicted need to actual
need showed that these approximations were acceptable based on the crite-
rion specified by TAFB (plus or minus five percent). The predicted require-
ments have ranged from 100 to 115 percent of actual need.
The program checked for missing processes to see if other processes were
required to follow or precede it based on technological constraints and then
determining if the process plan met this constraint.
Errors found by use of the above rules and others were reported to
TAFB personnel for analysis and correction, if necessary. TAFB printed
out violations of the rules and corrected the errors. Some errors were
transpositions and easily corrected, others necessitated quick time studies
or verification of process sequence. Some data rejected by the tests were
actually correct. This phase took two months, but it overlapped the design
of the simulation and material-handling models. The simulation had to
be developed in some detail to calculate the space needed for sequen-
tial queues as parts moved from machine to machine and to capture the
material-handling sequence and routes as we played what-if games with
resources and work-load assignments. We also needed this detail to docu-
ment and solve bottleneck problems within the flow of a single part or for
a combination of parts.
Project Summary
The Air Force started the project in January 1985, approved the organizational
concept in February 1985, developed the industrial process code concept and
started data collection for the data base in late January, and created rough-
cut capacity plans and organizations in March–April 1985, and it required
data to meet material and scheduling lead times by June 15, 1985. We had to
complete all simulations by September 1985 to finalize the resource alloca-
tions and to allow for design lead times. The simulations were used to allo-
cate personnel, machines, and floor space to the various organizations.
Our analysis to aid the transition to the MRC layout included format-
ting the TIPS simulation program, designing the overhead conveyor sys-
tem, laying out the plant, and making a routing analysis of the inter-MRC
transfers. The TIPS program proved valuable in aiding the transition to
the new layout by estimating performance measures (for example, flow-
time and queue statistics) that helped determine the number of machines
of each type to place in an MRC. In one particular instance, the nozzle
MRC, the Production and Engineering Department called for 24 work sta-
tions of a particular type. The TIPS analysis indicated that between 11 and
13 work stations were needed. Based on the TIPS results, only 12 work
stations were installed and this has proved to be sufficient. At Tinker Air
Force Base, simulation proved to be a valuable tool in assessing the effec-
tiveness of the new plans and in determining the right parameters for
each new MRC.
Multicriteria Leadership and Decisions 17
The fire at TAFB was a disaster that turned out to be a great opportunity.
It would have been very hard to justify dislocating the entire facility for over
a year and expending the sum of money required to redesign the facility. In
the long run, the benefits of the new system may pay back the cost of the fire
with interest.
The University of Oklahoma design team responded to the emergency
with accomplishments we are very proud of. We developed and verified a
general simulation model in three months, when national consulting firms
estimated four times that long. The new system met design expectations,
which is rare. Part flow times have been reduced by 35 to 50 percent depend-
ing on size. Labor savings have been $1.8 million in 1987, $2.1 million in 1988,
and continue to rise. $4.3 million was saved in equipment purchases.
Space requirements were reduced by 30,000 square feet. The percent defec-
tive had dropped three percent in 1987 and five percent further in 1988. The
conveyor system has had no jam-ups due to overloading. Finally, other Air
Logistics Centers have adopted the TIPS concept to plan redesign of their
facilities and the report (Ravindran et al., 1986) has been distributed for
review by over 60 organizations at their request. We have proved that mod-
ern management science techniques can be applied quickly and with great
impact.
Appendix
The overall pallet factor is computed by
{∑ } M
i =1
W1P1
{∑ }
OPF = M
W1
i =1
where
P1 = estimate of pallets/parts given its MH code. For example, if P1 has code
1A, P1 = 1/50 = /02 (see Table 1.2), i = 1,2, …, M,
W1 = (N1)(UPA1), i = 1,2, …, M,
N1 = number of WCD’s of type i per year, i = 1,2, …, M,
18 Big Data Analytics Using Multiple Criteria Decision-Making Models
1.4 Conclusion
As can be seen in the case example of the Tinker Project presented in this
festschrift, Ravi Ravindran has made several multicriteria leadership con-
tributions to several people and several organizations. Under his unflinch-
ing leadership, several PhD and MSc students graduated on the basis of the
Tinker Project. Those individuals continue to make professional and intel-
lectual contributions internationally. The seminal publications that resulted
from Ravi’s leadership of the project continue to guide management science
and operations research practitioners around the world. My personal expo-
sure to Ravi’s academic leadership actually started before I went to work
for him at the University of Oklahoma. In 1982, I was offered admission
to Purdue University for my PhD studies. Ravi Ravindran was at Purdue
University at that time and he was assigned as my initial academic adviser,
as was the practice for all new incoming students at that time. Although I
opted to attend the UCF instead of Purdue University, for financial assistant-
ship package reasons, the written communications with Ravi played a key
role in mind as I continue to dedicate myself to the challenges of doctoral
education. The Purdue admission letter, dated May 17, 1982 and signed by
Professor James W. Barany, associate head, introduced me to Professor Ravi
Ravindran, assistant head. Ravi engaged with me positively through a series
of written communications to encourage me to attend Purdue. It was fortu-
itous for me that Ravi later left Purdue University to become the department
head at the University of Oklahoma, where I ended up working for and with
him for several years until he left to go to the Pennsylvania State University
around 1997. Ravi’s mentoring and nurturing ways continue to influence my
professional activities even today. I am, thus, delighted to be able to contrib-
ute this festschrift in his honor.
References
Badiru, A. B.; B. L. Foote; L. Leemis; A. Ravindran; and L. Williams 1993, Recovering
from a crisis at Tinker Air Force Base, PM Network, Vol. 7, No. 2, Feb. pp. 10–23.
Dreyfus, S. E. 1969, An appraisal of some shortest-path algorithms, Operations
Research, Vol. 17, No. 3, pp. 395–412.
Multicriteria Leadership and Decisions 19
CONTENTS
2.1 Introduction...................................................................................................22
2.2 MCDM Terminologies.................................................................................. 23
2.3 Classification of Different MCDM Approaches....................................... 24
2.4 Multi-Objective Decision Making: Some Basic Concepts....................... 28
2.4.1 Efficient, Non-Dominated, or Pareto Optimal Solution.............. 29
2.4.2 Determining an Efficient Solution (Geoffrion, 1968)................... 31
2.4.3 Test for Efficiency.............................................................................. 32
2.5 Multi-Attribute Decision-Making Methods: Some Common
Characteristics............................................................................................... 32
2.6 Overview of Some MCDM Methods......................................................... 33
2.6.1 MADM Methods...............................................................................34
2.6.1.1 Multi-Attribute Utility Theory.........................................34
2.6.1.2 Multi-Attribute Value Theory.......................................... 35
2.6.1.3 Simple Multi-Attribute Rating Technique...................... 36
2.6.1.4 Analytic Hierarchy Process.............................................. 37
2.6.1.5 ELECTRE Methods............................................................ 39
2.6.1.6 PROMETHEE Methods..................................................... 41
2.6.1.7 Fuzzy Set Theory...............................................................43
2.6.2 MODM Methods...............................................................................44
2.6.2.1 Multi-Objective Linear Programming............................ 45
2.6.2.2 Goal Programming (Ravindran et al., 2006).................. 45
2.6.2.3 Method of Global Criterion and Compromise
Programming..................................................................... 49
2.6.2.4 Compromise Programming............................................. 50
2.6.2.5 Interactive Methods........................................................... 50
2.6.2.6 Data Envelopment Analysis............................................. 52
2.6.3 Other MCDM Methods.................................................................... 53
2.7 A General Comparative Discussion of MCDM Methodologies.............54
2.8 MCDM in the Era of Big Data..................................................................... 57
2.9 Summary........................................................................................................ 58
References................................................................................................................ 59
21
22 Big Data Analytics Using Multiple Criteria Decision-Making Models
2.1 Introduction
Multi-criteria decision making (MCDM) is a subfield of operations research.
It is a special case of the so-called decision-making problems. A decision-
making problem is characterized by the need to choose one or a few from
among a number of alternatives. The person who is to choose the alter-
natives is normally called the decision maker (DM). His preferences will
have to be considered in choosing the right alternative(s). In MCDM, the
DM chooses his most preferred alternative(s) on the basis of two or more
criteria or attributes (Dyer et al., 1992). The terms criteria, attributes, and
objectives are closely related and will be discussed in more detail later in
this chapter.
The field of MCDM has been succinctly defined as making decisions in
the face of multiple conflicting objectives (Zionts, 1992, 2000). According to
Korhonen (1992), the ultimate purpose of MCDM is “to help a decision maker
to find the ‘most preferred’ solution for his/her decision problem.” Several
perspectives are available in the literature to characterize a good decision-
making process. Stewart (1992) suggests that “the aim of any MCDM tech-
nique is to provide help and guidance to decision maker in discovering his
or her most desired solution to the problem” (in the sense of the course of
action which best achieves the DM’s long-term goals). According to French
(1984), a good decision aid should help the DM explore not just the problem
but also himself. Keeney (1992), in his famous book on value-focused think-
ing, says that we should spend more of our decision-making time concen-
trating on what is important, and that we should evaluate more carefully
the desirability of the alternatives. He also mentions that we should artic-
ulate and understand our values and using these values we should select
meaningful decisions to ponder and to create better alternatives. Howard
(1992) describes decision analysis as a “quality conversation about a decision
designed to lead to clarity of action.” Finally, Henig and Buchanan (1996)
say that a good decision process will force the DM to understand his or her
preferences and allow the set of alternatives to be expanded. Thus, a good
decision-making process should not only improve the clarity of the problem
to the DM, but it should also shed new light into the problem by generating
newer alternatives.
We are normally concerned with a single DM. If more number of DMs are
involved, it is important to aggregate all their preferences, leading to a group
decision-making (GDM) situation.
Starr and Zeleny (1977) provide a brief historical sketch of the early devel-
opments in MCDM. Some special issues of journals have been devoted to the
field of MCDM, including Management Science (Vol. 30, No. 1, 1984), Interfaces
(Vol. 22, No. 6, 1992) (devoted to decision and risk analysis), and Computers and
Operations Research (Vol. 19, No. 7, 1994). The Journal of Multi-Criteria Decision
Analysis, starting from the year 1992, publishes articles entirely devoted
Multi-Criteria Decision Making 23
Alternatives are normally compared with each other in terms of the so-
called criteria. Identification of the criteria for a particular problem is subjec-
tive, that is, varies for each problem. Criteria are normally developed in a
hierarchical fashion, starting from the broadest sense (usually called the goal
of the problem) and refined into more and more precise sub- and sub-sub
goals. There is no unique definition for the term “criterion,” but a useful gen-
eral definition is from Bouyssou (1990), who has defined criterion as a tool
allowing comparison of alternatives according to a particular significance
axis or point of view. Edwards (1977) calls criteria as the relevant dimensions of
24 Big Data Analytics Using Multiple Criteria Decision-Making Models
value for evaluation of alternatives. Henig and Buchanan (1996) consider criteria
to be the raison d’être of the DM. The term “Criterion” is defined by Roy (1999)
as a tool constructed for evaluating and comparing potential actions accord-
ing to a well-defined point of view.
In general, some rules should be followed in identifying criteria for any
decision problem (Keeney and Raiffa, 1976; Saaty, 1980; von Winterfeldt and
Edwards, 1986). They have to be mutually exclusive or independent, collec-
tively exhaustive, and should have operational clarity of definition.
Criteria of a decision problem are usually very general, abstract, and often
ambiguous and it can be impossible to directly associate criteria with alter-
natives. Each criterion can be normally represented by a surrogate measure
of performance, represented by some measurable unit, called the attributes,
of the consequences arising from implementation of any particular decision
alternative. Thus while warmth is a criterion, temperature measured in a suit-
able (say Celsius or Fahrenheit) scale is an attribute. Attributes are objective
and measurable features of the alternatives. Thus, the choice of attributes
reflects both the objectively measurable components of the alternatives and
the DM’s subjective criteria. Attributes of alternatives can be measured inde-
pendently from DM’s desires and expressed as mathematical functions of
the decision variables.
Objectives, used in mathematical programming problems, represent
directions of improvement of the attributes. A maximizing objective refers
to the case where “more is better,” while a minimizing objective refers to the
case where “less is better.” For example, profit is an attribute, while maxi-
mizing profit is an objective. The term “criterion” is a general term compris-
ing the concepts of attributes and objectives. It can represent either attribute
or objective depending on the nature of the problem. Perhaps, that is why
MCDM is considered to encompass two distinct fields, namely, multi-
attribute decision making (MADM) and multi-objective decision making
(MODM) (e.g., Triantaphyllou, 2013). These fields are discussed in the next
section.
FIGURE 2.1
Classification of MCDM methods.
Big Data Analytics Using Multiple Criteria Decision-Making Models
Multi-Criteria Decision Making 27
max f1( x)
(2.1)
subject to: g j ( x) ≤ 0, for j = 1,… , m,
S is called the decision space and Y is called the criteria or objective space.
Methods for solving single-objective mathematical programming prob-
lems have been studied extensively for the past 40 years. However, almost
every important real-world problem involves more than one objective.
A general MODM problem has the following form:
where x is an n-vector of decision variables and fi(x), i = 1, … , k are the k criteria/
objective functions.
S is called the decision space and Y is called the criteria or objective space of
the MODM problem. Without loss of generality, we can assume all the objec-
tive functions to be maximizing. Thus, the MODM problem is similar to an
Multi-Criteria Decision Making 29
TABLE 2.1
Hypothetical Options
Emissions (kg of
Cost ($) Time (h) Aggregated Pollutants)
Option A 50 16 10
Option B 40 20 12
Option C 60 16 12
Minimize fi(x) = Cost
Minimize f2(x) = Time
Minimize f3(x) = Emissions
Subject to some constraints
Let us assume that we have three options (Table 2.1) that satisfy the con-
straints, that is, the feasible options, identified somehow.
Note that option C fares, on all objectives, worse than A and hence it
should not be considered anymore. Options A and B are incomparable, as
none of them is at least as good as the other in terms of all the objectives.
While option A results in lesser time and emissions compared with option
B, it is more expensive. Hence, options A and B are said to be non-dominated
options while option C is called a dominated option.
In any MODM exercise, we are first interested in identifying the
non-dominated options. Note that there may be more than one non-domi-
nated option for any MODM problem.
EXAMPLE 2.1
Consider the following bicriteria linear program (BCLP):
Max Z1 = 5x1 + x2
Max Z2 = x1 + 4 x2
Subject to: x1 ≤ 5
x2 ≤ 3
x1 + x2 ≤ 6
x1 , x2 ≥ 0
Solution
The decision space and the objective space are given in Figures 2.2 and 2.3,
respectively. Corner points C and D are efficient solutions, while cor-
ner points A, B, and E are dominated. The set of all efficient solutions is
given by the line segment CD in both figures.
Ideal solution is the vector of individual optima obtained by optimiz-
ing each objective function separately ignoring all other objectives. In
Example 2.1, the maximum value of Z1, ignoring Z2, is 26 and occurs at
point D. Similarly, maximum Z2 of 15 is obtained at point C. Thus, the
ideal solution is (26, 15) but is not feasible or achievable.
Note: One of the popular approaches to solving MODM problems is
to find an efficient solution that comes “as close as possible” to the ideal
solution. We will discuss this approach later in Section 2.6.2.4.
X2
B (0,3) C (3,3)
Optimal for Z2
Feasible
decision Optimal for Z1
space
D (5,1)
E (5,0)
A (0,0) X1
FIGURE 2.2
Decision space (Example 2.1).
Multi-Criteria Decision Making 31
Z2
C (18,15)
(26,15)
(3,12)
B
D (26,9)
Achievable
objective
values
E (25,5)
A (0,0) Z1
FIGURE 2.3
Objective space (Example 2.1).
Max Z = ∑ λ f ( x)
i =1
i i
Subject to : x ∈ S (2.3)
k
∑λ = 1 i and λ i ≥ 0.
i =1
Let λi > 0 for all i be specified. If xo is an optimal solution for the Pλ problem
(Equation 2.3), then xo is an efficient solution to the MODM problem.
In Example 2.1, if we set λ1 = λ2 = 0.5 and solve the Pλ problem, the optimal
solution will be at D, which is an efficient solution.
Warning: Theorem 2.1 is only a sufficient condition and is not necessary. For
example, there could be efficient solutions to MODM problem which could
32 Big Data Analytics Using Multiple Criteria Decision-Making Models
Max W = ∑d
i =1
i
()
Subject to : f i ( x) ≥ f i x + di for i = 1,2,… ,k
x ∈S
di ≥ 0.
Theorem 2.2
Note: If Max W > 0, then at least one of the di’s is positive. This implies that at
least one objective can be improved without sacrificing on the other objectives.
* Note that DEA can only identify the non-dominated alternatives, and not the best one captur-
ing DM’s preferences.
34 Big Data Analytics Using Multiple Criteria Decision-Making Models
U ( A) = ∑ w u (a )
i
i i i where 0 ≤ ui ( ai ) ≤ 1
∑ w = 1,
i wi ≥ 0 ,
i
Multi-Criteria Decision Making 35
where ui (ai) is the utility function describing preferences with respect to the
attribute i, ai represents the performance of the alternative A in terms of the
attribute i, wi are scaling factors which define acceptable trade-offs between
different attributes, and U(A) represents the overall utility function of the
alternative A when all the attributes are considered together. This form of
additive aggregation is valid if and only if (sometimes referred to as “iff” in
this book) the DM’s preferences satisfy the mutual preferential independence.
Suppose that there are a set of attributes, X. Let Y be a subset of X, and let Z
be its complement, that is, Z = X − Y. The subset Y is said to be preferentially
independent of Z, if preferences relating to the attributes contained in Y do
not depend on the level of attributes in Z.
The condition utility independence is a stronger assumption. More details
can be obtained from Keeney and Raiffa (1976). A brief discussion of MAUT
is available in Kirkwood (1992).
The utility functions may also be used as objective functions for solving
mathematical programming problems.
V ( A) = ∑ w v (a )
i
i i i where 0 ≤ vi ( ai ) ≤ 1
∑ w = 1,
i wi ≥ 0.
i
Once the above two measures are available, the overall performance of an
alternative i can be aggregated using the simple weighted average,
Ui = ∑w u ,
j
i ij
where Ui is the overall performance rating of alternative i, wj is the relative
importance of criterion j, and uij is the rating of the alternative i with respect
to the criterion j. The alternative that has the maximum Ui is the most pre-
ferred alternative to achieve the goal of the decision problem. The values of
Ui can be used to provide the overall rankings of the alternatives.
MAUT, or its simplified versions MAVT or SMART, has been used for sev-
eral practical applications. For example, MAUT or its variants have been plan-
ning a government research program (Edwards, 1977). Jones et al. (1990) have
applied MAVT for the study of UK energy policy. Keeney and McDaniels
(1999) have used this technique for identifying and structuring values for
integrated resource planning, while Keeney (1999) has used the technique
to create and organize a complete set of objectives for a large software orga-
nization. Duarte (2001) has used MAUT to identify appropriate technologi-
cal alternatives to implement to treat industrial solid residuals. Some more
MAUT applications are discussed by Bose et al. (1997).
Aw = λ max w.
38 Big Data Analytics Using Multiple Criteria Decision-Making Models
• Rank reversal (Belton and Gear, 1983; Dyer, 1990): The ranking of
alternatives determined by the original AHP may be altered by the
addition of another alternative for consideration. For example, when
AHP is used for a technology selection problem, it is possible that the
rankings of the technologies get reversed when a new technology is
added to the list of technologies. One way to overcome this problem
is to include all possible technologies and criteria at the beginning
of the AHP exercise, and not to add or remove technologies while
or after completing the exercise. However, MAHP, the multiplica-
tive variant of AHP, does not suffer from this type of rank reversal
(Lootsma, 1999).
• Number of comparisons: AHP uses redundant judgments for checking
consistency, and this can exponentially increase the number of judg-
ments to be elicited from DMs. For example, to compare eight alterna-
tives on the basis of one criterion, a total of 28 judgments are needed. If
there are n criteria, then the total number of judgments for comparing
alternatives on the basis of all these criteria will be 28n. This is often a
tiresome and exerting exercise for the DM. Some methods have been
developed to reduce the number of judgments needed (Millet and
Harker, 1990). Also, some modifications, such as MAHP, can compute
weights even when all the judgments are not available.
Multi-Criteria Decision Making 39
For each ordered pair (a,b), a concordance index c(a,b) is calculated as,
∑
1
c( a, b) = wj ,
W j: g j ( a )> g j ( b )
where
W= ∑w .
j =1
j
c( a, b) ≥ s,
∀ j such that g j ( a) < g j (b), the interval (g j ( a), g j (b)) is smaller than v j ( g j ( a)).
One can multiply all the weights by the same number, and if the new
weights are integers, the building of the outranking relations in ELECTRE I
can be interpreted as a voting procedure with a special majority rule (char-
acterized by the concordance level).
ELECTRE methods have received many practical applications. Roy
et al. (1986) have used ELECTRE for determining which Paris metro sta-
tions should be renovated. An application of the ELECTRE TRI method
Multi-Criteria Decision Making 41
∑ w F (a, b),
1
π( a, b) = j j
W j =1
where
W= ∑w
j =1
j
1st form
• Immediate strict preference.
1 Fj(a,b) • No parameter to be
determined.
gj(a) –
0
2nd form
• There exists an indifference
1 Fj(a,b) threshold which must be fixed.
gj(a) –
0 q
3rd form
• Preference increases up to a
1 Fj(a,b) preference threshold which
should be determined.
gj(a) –
0 p
4th form
• There exists an indifference
1 Fj(a,b) and a preference threshold
which must be fixed. Between
½
the two, preference is average.
gj(a) –
0 q p
5th form
• There exists an indifference
1 Fj(a,b) and a preference threshold
which must be fixed. Between
the two, preference increases.
gj(a) –
0 q p
6th form
• Preference increases following
1 Fj(a,b) a normal distribution, the
standard deviation of which
must be fixed.
gj(a) –
0 σj
FIGURE 2.4
Preference functions used in PROMETHEE methods. (From Vincke, P. 1999.)
Multi-Criteria Decision Making 43
Just like the ELECTRE methods, two complete preorders are built:
one consists of ranking the actions following the decreasing order of the
numbers (ϕ+(a)),
where x is an n-vector of decision variables and fi(x), i = 1, … , k are the k criteria/
objective functions.
S is called the decision space and Y is called the criteria or objective space of
the MODM problem.
In MODM problems, there are often an infinite number of efficient solu-
tions and they are not comparable without the input from the DM. Hence,
it is generally assumed that the DM has a real-valued preference function
defined on the values of the objectives, but it is not known explicitly. With
this assumption, the primary objective of the MODM solution methods is to
find the best compromise solution, which is an efficient solution that maximizes
the DM’s preference function.
In the last three decades, most MCDM researches have been concerned
with developing solution methods based on different assumptions and
approaches to measure or derive the DM’s preference function. Thus, the
MODM methods can be categorized by the basic assumptions made with
respect to the DM’s preference function as follows:
x1 + x2 = 3
46 Big Data Analytics Using Multiple Criteria Decision-Making Models
Note that, if d1− > 0, then x1 + x2 < 3, and if d1+ > 0, then x1 + x2 > 3.
By assigning suitable weights w1− and w1+ on d1− and d1+ in the objective
function, the model will try to achieve the sum x1 + x2 as close as possible
to 3. If the goal were to satisfy x1 + x2 ≥ 3, then only d1− is assigned a positive
weight in the objective, while the weight on d1+ is set to zero.
Minimize Z = ∑ (w d + +
i i + w i− di− ) (2.4)
i =1
Minimize Z = ∑ P ∑ (w
p
p
i
+
ip di+ + w ip− di− ), (2.8)
where Pp represents priority p with the assumption that Pp is much larger
than Pp+1 and w ip+ and w ip− are the weights assigned to the ith deviational vari-
ables at priority p. In this manner, lower priority goals are considered only
after attaining the higher priority goals. Thus, preemptive GP is essentially
a sequence of single-objective optimization problems, in which successive
optimizations are carried out on the alternate optimal solutions of the previ-
ously optimized goals at higher priority.
In both preemptive and non-preemptive GP models, the DM has to spec-
ify the targets or goals for each objective. In addition, in the preemptive GP
models, the DM specifies a preemptive priority ranking on the goal achieve-
ments. In the non-preemptive case, the DM has to specify relative weights for
goal achievements.
To illustrate, consider the following BCLP:
Subject to: 4 x1 + 3 x2 ≤ 12
x1 , x2 , ≥ 0
3 ≤ f1 ≤ 4
0 ≤ f 2 ≤ 3.
48 Big Data Analytics Using Multiple Criteria Decision-Making Models
Let the DM set the goals for f1 and f2 as 3.5 and 2, respectively. Then the
GP model becomes:
4 x1 + 3 x2 ≤ 12 (2.11)
Let the ideal values of the objectives f1, f2, … , fk be f1*, f2*, … , fk*. The method
of global criterion finds an efficient solution that is “closest” to the ideal solu-
tion in terms of the Lp distance metric. It also uses the ideal values to normal-
ize the objective functions. Thus the MODM problem reduces to:
k p
f i* − f i
Minimize Z = ∑i =1
f *
i
subject to : x ∈ S.
50 Big Data Analytics Using Multiple Criteria Decision-Making Models
The values of fi* are obtained by maximizing each objective fi subject to the
constraints x ∈S, but ignoring the other objectives. The value of p can be 1, 2,
3, …, etc. Note that p = 1 implies equal importance to all deviations from the
ideal. As p increases, larger deviations have more weight.
where λi’s are weights that have to be specified or assessed subjectively. Note
that λi could be set to 1/(fi*).
Theorem 2.3
Any point x* that minimizes Lp (Equation 2.13) for λi > 0 for all i, Σλi = 1 and
1 ≤ p < ∞ is called a compromise solution. Zeleny (1982) has proved that these
compromise solutions are non-dominated. As p → ∞, Equation 2.13 becomes
i
(
Min L∞ = Min Max λ i f i* − f i )
1.
Binary pairwise comparison: The DM must compare a pair of two
dimensional vectors at each interaction.
2.
Pairwise comparison: The DM must compare a pair of p-dimensional
vectors and specify a preference.
3.
Vector comparison: The DM must compare a set of p-dimensional vec-
tors and specify the best, the worst or the order of preference (note
that this can be done by a series of pairwise comparisons).
4.
Precise local trade-off ratio: The DM must specify precise values of
local trade-off ratios at a given point. It is the marginal rate of substitu-
tion between objectives fi and fj: in other words, trade-off ratio is how
much the DM is willing to give up in objective j for a unit increase in
objective i at a given efficient solution.
5.
Interval trade-off ratio: The DM must specify an interval for each local
trade-off ratio.
6.
Comparative trade-off ratio: The DM must specify his preference for a
given trade-off ratio.
7.
Index specification and value trade-off: The DM must list the indices of
objectives to be improved or sacrificed, and specify the amount.
8.
Aspiration levels (or reference point): The DM must specify or adjust
the values of the objectives which indicate his/her optimistic wish
concerning the outcomes of the objectives.
Shin and Ravindran (1991) also provide a detailed survey of MODM inter-
active methods. Their survey includes the following:
Lofti et al. (1992) have compared the performance of interactive methods with
other methods such as AHP (implemented on the computer using the software
Expert Choice), and according to them, interactive methods outperformed the
latter on numerous measures, but for no measure was the reverse true.
separate section on DEA. One of the eight parts and three of the 21 papers
included in the book are on DEA. Incorporating preference information in
a DEA analysis provides a natural extension of DEA toward MCDM. This
issue has been discussed for a long time in DEA literature (Halme et al., 1999;
Thanassoulis and Dyson, 1992).
Since DEA was proposed more than two decades ago, the methodology
has received numerous traditional as well as novel applications. Seiford
(1996) has presented one of the recent bibliographies on DEA. A very com-
prehensive DEA bibliography is maintained at the University of Warwick
by Emrouznejad (1995–2001). This bibliography is available on the Internet
at the site, http://www.deazone.com/. Some prominent applications of DEA
include the education sector (Ganley and Cubbin, 1992; Ramanathan, 2001b),
banks (Yeh, 1996), comparative risk assessment (Ramanathan, 2001c), health
sector (Bates et al., 1996), transport (Ramanathan, 2000), energy (Lv et al.,
2015), environment (Bi et al., 2015; Fare et al., 1996; Zhao et al., 2016), and
international comparisons on carbon emissions (Ramanathan, 2002).
2.9 Summary
In this chapter, the basic concepts and terminologies of MCDM have been
reviewed. A classification of MCDM methods is outlined. The broad area of
MCDM can be divided into two general categories, MADM and MODM. The
former involves cases in which the set of alternatives is defined explicitly by
a finite list from which one or a few alternatives should be chosen that reflect
DM’s preference structure, while the latter involves cases in which the set of
alternatives is defined implicitly by a mathematical programming structure
with objective functions. Some of the basic concepts of MODM and some
common characteristics of MADM methods have then been presented. Then,
an overview of some important MCDM methods has been presented.
The methods covered in the overview include MAUT, AHP, the ELECTRE
methods, PROMETHEE methods, fuzzy set theory (all MADM methods),
MOLP, GP, AIM, compromise programming, and DEA (MODM methods).
The overview has provided the basic idea behind each method along with
important introductory and review articles, strengths and criticisms, and
some important applications. Other methods, such as TOPSIS, reference
point approach, PRIME, and MACBETH, have been briefly mentioned in
the overview. Finally, the different MCDM methods have been compared
in terms of several vital parameters, including their ability to handle uncer-
tainty, incomplete information, number of responses required from the DM,
ability to be used in group decisions, user friendliness, software and Internet
support, and the nature of their outputs. The authors hope that this over-
view provides enough information about the field of MCDM to kindle fur-
ther interest in this exciting field, and encourage more practical applications.
Multi-Criteria Decision Making 59
References
Alley, W. M. 1983. Comment on Multiobjective river basin planning with qualitative
criteria by M. Gershon, L. Duckstein, and R. McAniff. Water Resources Research.
19(1): 293–294.
Ammar, S. and R. Wright. 2000. Applying fuzzy-set theory to performance evalua-
tion. Socio-Economic Planning Sciences. 34: 285–302.
Aouni, B. and O. Kettani. 2001. Goal programming: A glorious history and a promis-
ing future. European Journal of Operational Research. 133: 225–241.
Arthur, J. L. and A. Ravindran. 1978. An efficient goal programming algorithm using
constraint partitioning and variable elimination. Management Science. 24(8):
867–868.
Arthur, J. L. and A. Ravindran. 1980a. PAGP: An efficient algorithm for linear goal pro-
gramming problems. ACM Transactions on Mathematical Software. 6(3): 378–386.
Arthur, J. L. and A. Ravindran. 1980b. A branch and bound algorithm with con-
straint partitioning for integer goal programs. European Journal of Operational
Research. 4: 421–425.
Arthur, J. L. and A. Ravindran. 1981. A multiple objective nurse scheduling model.
Institute of Industrial Engineers Transactions. 13: 55–60.
Ballestero, E. and C. Romero. 1998. Multiple Criteria Decision Making and its Applications
to Economic Problems. Kluwer Academic Publishers, Boston.
Bana e Costa, C. A. and J. C. Vansnick. 1995. General overview of the MACBETH
approach, In: P. M. Pardalos, Y. Siskos and C. Zopounidis (Eds.), 93–100.
Advances in Multicriteria Analysis. Kluwer Academic Publishers, Dordrecht, The
Netherlands.
Bates, J. M., D. B. Baines, and D. K. Whynes. 1996. Measuring the efficiency of pre-
scribing by general practitioners. Journal of the Operational Research Society. 47:
1443–1451.
Bellman, R. E. and L. A. Zadeh. 1970. Decision making in a fuzzy environment.
Management Science. 17: B141–B164.
Belton, V. 1999. Multi-criteria problem structuring and analysis in a value theory
framework, In: T. Gal, T. J. Stewart, and T. Hanne (Eds.), Multicriteria Decision
Making: Advances in MCDM Models, Algorithms, Theory, and Applications, Kluwer
Academic Publishers, Boston, pp. 12.1–12.32.
Belton V. and T. Gear. 1983. On a shortcoming of Saaty’s method of analytic hierar-
chies. Omega. 11/3: 228–230.
Belton, V. and T. J. Stewart. 1999. DEA and MCDA: Competing or complementary
approaches? In: N. Meskens and M. Roubens (Eds.), 87–104. Advances in Decision
Analysis. Kluwer Academic Publishers, Dordrecht, Netherlands.
Bi, G., Y. Luo, J. Ding, and L. Liang. 2015. Environmental performance analysis of
Chinese industry from a slacks-based perspective. Annals of Operations Research.
228(1): 65–80.
Bisdorff, R. 2000. Logical foundation of fuzzy preferential systems with application to
the electre decision aid methods. Computers & Operations Research. 27: 673–687.
Bose, U., A. M. Davey, and L. O. David. 1997. Multi-attribute utility methods in group
decision making: Past applications and potential for inclusion in GDSS. Omega.
25(6): 691–706.
60 Big Data Analytics Using Multiple Criteria Decision-Making Models
Bouyssou, D. 1990. Building criteria: A prerequisite for MCDA, In: C. A. Bana e Costa
(Ed.), 58–80. Readings in Multiple Criteria Decision Aid. Springer, Berlin.
Bouyssou, D. and Ph. Vincke. 1997. Ranking alternatives on the basis of preference
relations: A progress report with special emphasis on outranking relations.
Journal of Multi-Criteria Decision Analysis. 6: 77–85.
Brans, J. P., B. Mareschal, and Ph. Vincke. 1984. Promethee: A new family of outrank-
ing methods in multicriteria analysis, In: J. P. Brans (Ed.), 408–421. Operational
Research ’84. Elsevier Science Publishers B.V, North-Holland.
Buede, D. M. 1992. Software review: Overview of the MCDA software market. Journal
of Multi-Criteria Decision Analysis. 1: 59–61.
Charnes, A. and W. W. Cooper. 1961. Management Models and Industrial Applications of
Linear Programming. Wiley, New York.
Charnes, A. W. and W. Cooper, and R. O. Ferguson. 1955. Optimal estimation of
executive compensation by linear programming. Management Science. 1(2):
138–151.
Charnes, A., W. W. Cooper, and E. Rhodes. 1978. Measuring the efficiency of decision
making units. European Journal of Operational Research. 2: 429–444.
Chen, H., R. H. Chiang, and V. C. Storey. 2012. Business intelligence and analytics:
From big data to big impact. MIS Quarterly. 36(4): 1165–1188.
Doyle, J. and R. Green. 1993. Data envelopment analysis and multiple criteria deci-
sion making. Omega. 21(6): 713–715.
Duarte, B. P. M. 2001. The expected utility theory applied to an industrial decision
problem—What technological alternative to implement to treat industrial solid
residuals. Computers & Operations Research. 28: 357–380.
Dyer, J. S. 1990. Remarks on the analytic hierarchy process. Management Science. 36/3:
249–258.
Dyer, J. S., P. C. Fishburn, R. E. Steuer, J. Wallenius, and S. Zionts. 1992. Multiple
criteria decision making, multiattribute utility theory: The next ten years.
Management Science. 38(5): 645–654.
Edwards, W. 1977. How to use multiattribute utility measurement for social decision
making. IEEE Transactions on Systems, Man, and Cybernetics. 7/5: 326–340.
Elkarni, F. and I. Mustafa. 1993. Increasing the utilization of solar energy
technologies (SET) in Jordan: Analytic hierarchy process. Energy Policy. 21:
978–984.
Emrouznejad, A. 1995–2001. Ali Emrouznejad’s DEA HomePage. Warwick Business
School, Coventry CV4 7AL, UK.
Evans, G. W. 1984. An overview of techniques for solving multiobjective mathemati-
cal programs. Management Science. 30(11): 1268–1282.
Fare, R., S. Grosskopf, and D. Tyteca. 1996. An activity analysis model of the environ-
mental performance of firms—application to fossil-fuel-fired electric utilities.
Ecological Economics. 18: 161–175.
French, S. 1984. Interactive multi-objective programming: Its aims, applications and
demands. Journal of the Operational Research Society. 35: 827–834.
French, S., R. Hartley, L. C. Thomas, and D. J. White. 1983. Multi-Objective Decision
Making. Academic Press, London.
Ganley, J. A. and J. S. Cubbin. 1992. Public Sector Efficiency Measurement: Applications
of Data Envelopment Analysis. North-Holland, Amsterdam.
Geoffrion, A. 1968. Proper efficiency and theory of vector maximum. Journal of
Mathematical Analysis and Applications. 22: 618–630.
Multi-Criteria Decision Making 61
Gershon, M. and L. Duckstein. 1983. Reply. Water Resources Research. 19(1): 295–296.
Golden, B. L., E. A. Wasil, and D. E. Levy. 1989. Applications of the analytic hierarchy
process: A categorized, annotated bibliography, In: B. L. Golden, E. A. Wasil,
and P. T. Harker (Eds.), 37–58. The Analytic Hierarchy Process: Applications and
Studies. Springer-Verlag, Berlin.
Govindan, K., S. Rajendran, J. Sarkis, and P. Murugesan. 2015. Multi criteria decision
making approaches for green supplier evaluation and selection: A literature
review. Journal of Cleaner Production. 98: 66–83.
Graves, S. B., J. L. Ringuest, and J. F. Bard. 1992. Recent developments in screening
methods for nondominated solutions in multiobjective optimization. Computers
& Operations Research. 19(7): 683–694.
Guitouni, A. and J.-M. Martel. 1998. Tentative guidelines to help choosing an appro-
priate MCDA method. European Journal of Operational Research. 109: 501–521.
Halme, M., T. Joro, P. Korhonen, S. Salo, and J. Wallenius. 1999. A value efficiency
approach to incorporating preference information in data envelopment analy-
sis. Management Science. 45: 103–115.
Hanne, T. 1999. Meta decision problems in multiple criteria decision making, In:
T. Gal, T. J. Stewart, and T. Hanne (Eds.), Multicriteria Decision Making: Advances
in MCDM Models, Algorithms, Theory, and Applications, Kluwer Academic
Publishers, Boston, pp. 6.1–6.25.
Henig, M. I. and J. T. Buchanan. 1996. Solving MCDM problems: Process concepts.
Journal of Multi-Criteria Decision Analysis. 5: 3–21.
Ho, W., X. Xu, and P. K. Dey. 2010. Multi-criteria decision making approaches for
supplier evaluation and selection: A literature review. European Journal of
Operational Research. 202(1): 16–24.
Hokkanen, J. and P. Salminen. 1997. ELECTRE III and IV decision aids in an environ-
mental problem. Journal of Multi-Criteria Decision Analysis. 6: 215–226.
Howard, R. 1992. Heathens, heretics and cults: The religious spectrum of decision
aiding. Interfaces. 22(6): 15–27.
Hwang, C. L. and A. Masud. 1979. Multiple Objective Decision Making—Methods and
Applications: A State of the Art Survey. Lecture notes in economics and math-
ematical systems. 164, Springer-Verlag, Berlin.
Jacquet-Lagrèze, E. and J. Siskos. 1982. Assessing a set of additive utility functions for
multicriteria decision making: The UTA method. European Journal of Operational
Research. 10: 151–164.
Jacquet-Lagrèze, E. and Y. Siskos. 2001. Preference disaggregation: 20 years of MCDA
experience. European Journal of Operational Research. 130: 233–245.
Jato-Espino, D., E. Castillo-Lopez, J. Rodriguez-Hernandez, and J. C. Canteras-
Jordana. 2014. A review of application of multi-criteria decision making meth-
ods in construction. Automation in Construction. 45: 151–162.
Jones, M., C. Hope, and R. Hughes. 1990. A multi-attribute value model for the
study of UK energy policy. Journal of the Operational Research Society. 41(10):
919–929.
Joro, T., P. Korhonen, and J. Wallenius. 1998. Structural comparison of data envelop-
ment analysis and multiple objective linear programming. Management Science.
40: 962–970.
Kabir, G., R. Sadiq, and S. Tesfamariam. 2014. A review of multi-criteria decision-
making methods for infrastructure management. Structure and Infrastructure
Engineering. 10(9): 1176–1210.
62 Big Data Analytics Using Multiple Criteria Decision-Making Models
Karwan, M. H., J. Spronk, and J. Wallenius. (Eds.). 1997. Essays in Decision Making: A
Volume in Honour of Stanley Zionts. Springer-Verlag, Berlin.
Keeney, R. L. 1992. Value Focused Thinking: A Path to Creative Decision-making. Harvard
University Press, Cambridge, MA.
Keeney, R. L. 1999. Developing a foundation for strategy at Seagate software.
Interfaces. 29/6: 4–15.
Keeney, R. L. and T. L. McDaniels. 1999. Identifying and structuring values to guide
integrated resource planning at bc gas. Operations Research. 47/5: 651–662.
Keeney, R. L. and H. Raiffa. 1976. Decisions with Multiple Objectives: Preferences and
Value Tradeoffs. Wiley, New York. Another edition available in 1993 from
Cambridge University Press.
Kirkwood, C. W. 1992. An overview of methods for applied decision analysis.
Interfaces. 22(6): 28–39.
Korhonen, P. 1992. Multiple criteria decision support: The state of research and future
directions. Computers & Operations Research. 19(7): 549–551.
Koundinya, S., D. Chattopadhyay, and R. Ramanathan. 1995. Combining qualitative
objectives in integrated resource planning: A combined AHP—Compromise
programming model. Energy Sources. 17(5): 565–581.
Kuriger, G. and A. Ravindran. 2005. Intelligent search methods for nonlinear goal
programs. Information Systems and Operational Research. 43: 79–92.
Larichev, O. I. 1999. Normative and descriptive aspects of decision making, In: T.
Gal, T. J. Stewart, and T. Hanne (Eds.), Multicriteria Decision Making: Advances
in MCDM Models, Algorithms, Theory, and Applications, Kluwer Academic
Publishers, Boston, pp. 5.1–5.24.
Larichev, O. I. 2000. Qualitative comparison of multicriteria alternatives, In: Yong,
S. and M. Zeleny (Eds.), New Frontiers of Decision Making for the Information
Technology Era, World Scientific Publishing Co. Pte. Ltd., Singapore, pp. 207–224.
Larichev, O. I. and H. Moshkovich. 1995. ZAPROS-LM—A method and system for
ordering multiattribute alternatives. European Journal of Operational Research.
82(3): 503–521.
Lee, H., Y. Shi, S. M. Nazem, S. Y. Kang, T. H. Park, and M. H. Sohn. 2001. Multicriteria
hub decision making for rural area telecommunication networks. European
Journal of Operational Research. 133: 483–495.
Lee, S. M. and D. L. Olson. 1999. Goal programming, In: T. Gal, T. J. Stewart, and
T. Hanne (Eds.), Multicriteria Decision Making: Advances in MCDM Models,
Algorithms, Theory, and Applications, Kluwer Academic Publishers, Boston,
pp. 8.1–8.33.
Leung, P. S., K. Heen, and H. Bardarson. 2001. Regional economic impacts of fish
resources utilization from the Barents Sea: Trade-offs between economic
rent, employment, and income. European Journal of Operational Research. 133:
432–446.
Lofti, V., T. J. Stewart, and S. Zionts. 1992. An aspiration-level interactive model
for multiple criteria decision making. Computers & Operations Research. 19(7):
671–681.
Lootsma, F. A. 1999. Multi-Criteria Decision Analysis Via Ratio and Difference Judgement.
Dordrecht: Kluwer Academic Publishers.
Lv, W., X. Hong, and K. Fang. 2015. Chinese regional energy efficiency change and its
determinants analysis: Malmquist index and Tobit model. Annals of Operations
Research. 228(1): 9–22.
Multi-Criteria Decision Making 63
Soltani, A., Hewage, K., Reza, B., and Sadiq, R. 2015. Multiple stakeholders in multi-
criteria decision-making in the context of municipal solid waste management:
A review. Waste Management. 35: 318–328.
Starr, M. K. and M. Zeleny. 1977. MCDM—State and future of the arts, In: M. K.
Starr and M. Zeleny (Eds.), 5–29. Multiple Criteria Decision Making: Studies
in the Management Sciences, Vol. 6. North-Holland Publishing Company,
Amsterdam.
Steuer, R. E. 1986. Multiple Criteria Optimization: Theory, Computation and Application.
John Wiley & Sons, New York.
Steuer, R. E., L. R. Gardiner, and J. Gray. 1996. A bibliographic survey of the activities
and international nature of multiple criteria decision making. Journal of Multi-
Criteria Decision Analysis. 5: 195–217.
Stewart, T. J. 1992. A critical survey on the status of multiple criteria decision mak-
ing theory and practice. Omega—The International Journal of Management Science.
20(5/6): 569–586.
Stewart, T. J. 1994. Data envelopment analysis and multiple criteria decision making:
A response. Omega. 22(2): 205–206.
Tamiz, M., D. F. Jones, and E. Eldarzi. 1995. A review of goal programming and its
applications. Annals of Operations Research. 58: 39–53.
Thanassoulis, E. and R. G. Dyson. 1992. Estimating preferred target input-output
levels using data envelopment analysis. European Journal of Operational Research.
56: 80–97.
Triantaphyllou, E. 2013. Multi-Criteria Decision Making Methods: A Comparative Study,
Vol. 44. Springer Science & Business Media, Dordrecht, Netherlands.
Urli, B. and R. Nadeau. 1999. Evolution of multi-criteria analysis: A scientometric
analysis. Journal of Multi-Criteria Decision Analysis. 8: 31–43.
Vansnick, J. C. 1986. On the problems of weights in MCDM: The noncompensatory
approach. European Journal of Operational Research. 24: 288–294.
Vargas, L. G. 1990. An overview of the analytic hierarchy process and its applica-
tions. European Journal of Operational Research. 48: 2–8.
Vincke, P. 1999. Outranking approach, In: T. Gal, T. J. Stewart, and T. Hanne (Eds.),
Multicriteria Decision Making: Advances in MCDM Models, Algorithms, Theory, and
Applications, Kluwer Academic Publishers, Boston, pp. 11.1–11.29.
von Winterfeldt, D. and Edwards, W. 1986. Decision Analysis and Behavioral Research.
Cambridge University Press, Cambridge.
Wang, J. J., Y. Y. Jing, C. F. Zhang, and J. H. Zhao. 2009. Review on multi-criteria
decision analysis aid in sustainable energy decision-making. Renewable and
Sustainable Energy Reviews. 13(9): 2263–2278.
White, D. J. 1990. A bibliography on the applications of mathematical programming
multiple-objective methods. Journal of the Operational Research Society. 41(8):
669–691.
Wierzbicki, A. P. 1998. Reference Point Methods in Vector Optimization and Decision
Support. Interim Report IR-98-017, International Institute for Applied Systems
Analysis, Laxenburg, Austria.
Wierzbicki, A. P. 1999. Reference point approaches, In: T. Gal, T. J. Stewart, and
T. Hanne (Eds.), Multicriteria Decision Making: Advances in MCDM Models,
Algorithms, Theory, and Applications, Kluwer Academic Publishers, Boston,
pp. 9.1–9.39.
66 Big Data Analytics Using Multiple Criteria Decision-Making Models
Yeh, Q. 1996. The application of data envelopment analysis in conjunction with finan-
cial ratios for bank performance evaluation. Journal of the Operational Research
Society. 47: 980–988.
Zadeh, L. 1965. Fuzzy sets. Information and Control. 8: 338–353.
Zanakis, S. H., A. Solomon, N. Wishart, and S. Dublish. 1998. Multi-attribute deci-
sion making: A simulation comparison of select methods. European Journal of
Operational Research. 107: 507–529.
Zeleny, M. 1980. The pros and cons of goal programming. Computers & Operations
Research. 8: 357–359.
Zeleny, M. 1982. Multiple Criteria Decision Making (McGraw-Hill Series In Quantitative
Methods For Management). McGraw-Hill Book Co., New York.
Zhao, L., Y. Zha, K. Wei, and L. Liang. 2016. A target-based method for energy saving
and carbon emissions reduction in China based on environmental data envel-
opment analysis. Annals of Operations Research. (in press).
Zimmermann, H. J. 1983. Using fuzzy sets in operational research. European Journal
of Operational Research. 13: 201–216.
Zionts, S. 1992. Some thoughts on research in multiple criteria decision making.
Computers & Operations Research. 19(7): 567–570.
Zionts, S. 2000. Some thoughts about multiple criteria decision making for ordinary
decisions, In: Yong, S. and M. Zeleny (Eds.), New Frontiers of Decision Making
for the Information Technology Era, World Scientific Publishing Co. Pte. Ltd.,
Singapore, pp. 17–28.
Zoupounidis, C. and A. I. Dimitras. 1998. Multicriteria Decision Aid Methods for the
Prediction of Business Failure. Kluwer Academic Publishers, Boston.
3
Basics of Analytics and Big Data
CONTENTS
3.1 Introduction................................................................................................... 67
3.2 Analytics........................................................................................................ 69
3.2.1 Descriptive Analytics....................................................................... 71
3.2.2 Predictive Analytics......................................................................... 72
3.2.3 Prescriptive Analytics...................................................................... 73
3.3 Big Data: Volume, Variety, Velocity, and Veracity.................................... 74
3.4 Limitations of Traditional Technologies for Big Data.............................. 75
3.5 Analytics Life Cycle...................................................................................... 76
3.5.1 Data Capture, Store, Prepare, Analyze, and Share...................... 76
3.5.2 NoSql Systems................................................................................... 78
3.5.3 Data Store: Distributed File Systems.............................................. 79
3.6 Prepare and Analyze in Big Data...............................................................80
3.6.1 MapReduce Paradigm......................................................................80
3.6.2 Hadoop Ecosystem...........................................................................80
3.7 Big Data Analytics, Multicriteria Decision Making, and BI................... 82
3.8 Conclusions....................................................................................................84
Bibliography...........................................................................................................84
3.1 Introduction
Business analytics (BA) and big data have become essential components
that every organization should possess to compete effectively in the market.
Hopkins et al. (2010) claimed that analytics sophistication is one of the primary
differentiators between high-performing and low-performing organizations.
BA is a set of statistical, mathematical and machine-learning management tools,
and processes used for analyzing the past data, to understand hidden trends,
which can assist in problem solving and/or drive fact-based decision making
in an organization. In the 1980s, many organizations did not collect data or the
data was not in an appropriate form to derive insights. Organizations at that
point in time found decision making and/or problem solving arduous due to
the nonavailability of data; with the advent of enterprise resource planning
67
68 Big Data Analytics Using Multiple Criteria Decision-Making Models
(ERP) systems, most of the organizations have ensured the availability of data,
which could be called upon whenever needed. However, for effective and effi-
cient problem solving and decision making thereby, the data stored within the
ERP systems needed to be a nalyzed, and this gave birth to the use of analytics.
Analytics can be grouped into three categories: descriptive analytics, pre-
dictive analytics, and prescriptive analytics. Descriptive analytics deals with
describing past data using descriptive statistics and data visualization; useful
insights may be derived using descriptive analytics. Predictive analytics aims
to predict future events such as demand for a product/service, customer churn,
and loan default. Prescriptive analytics on the other hand provides an optimal
solution to a given problem or offers the best alternative among several alter-
natives. In other words, descriptive analytics captures what happened, pre-
dictive analytics predicts what is likely to happen, and prescriptive analytics
provides the best alternative to solve a problem. Although all three compo-
nents of analytics are important, the value-add and the usage of different ana-
lytics components are shown in Figure 3.1. For all the hype around analytics,
vast majority of organizations use descriptive analytics in the form of business
intelligence (BI). Significantly, a smaller group of organizations use predictive
analytics, mainly for forecasting; the number of organizations using prescrip-
tive analytics is minimal at this point in time in comparison with descriptive
and predictive analytics. However, it is interesting to note that the value-add
to a company increases many fold if organizations were to use predictive and
prescriptive analytics conjointly as compared to descriptive analytics alone.
Today, with the ever-growing use of the Internet, social media platforms,
smartphones, and Internet of things (IOTs), the amount of data that gets
Prescriptive
analytics
Value-add to organization
FIGURE 3.1
Types of analytics solution and the value-add.
Basics of Analytics and Big Data 69
generated everyday has increased several thousand fold over the past few
years. An estimate by www.vcloudnews.com claims that 2.5 Exabytes of data
gets generated every day and it will increase exponentially in the future;
and all these data provide greater advantage to enterprises or organizations
around the world that look to leverage these data to get diversified insights.
Enterprises/organizations today can better understand their dynamic busi-
ness environments, their customer’s behavior and preferences, predict accu-
rate market trends, weather forecasts, and thereby optimize resources at
granular levels to increase efficiency to an extent they never believed was
possible earlier.
One such wonderful case study was Google’s flu trends, which predicts flu
trends in real time even before public organizations like Centre for Disease
Control (CDC) know it. Google built a system that extracted search terms,
their search frequencies by regions and time from several billions of histori-
cal search requests it received over few years, and correlated with the actual
incident reported by CDC; and after correlating millions of search terms with
reported incidents, it found about 45 search terms which are highly corre-
lated. Since 2009, Google has been using those very search term frequencies
in real time to predict flu trends.
Unlike in the past, enterprises today do not only use transactional data for
getting insights into business and customers but also use other sources of data
like weblogs or clickstreams, social feeds, emails to get deeper understanding
of customers. For example, e-commerce companies (which have mushroomed
in recent times) do not wait until a new customer makes some purchase to
understand his or her preferences, but know that from browsing patterns (i.e.,
by checking on the different links that the customer may have visited or spent
time on). Thus, by this example one can possibly gage the data size that would
be generated every day for these e-commerce sites. To illustrate this further and
to put numbers, let us consider Amazon, which has about 188 million visitors
(Anon, 2016) to its site every month, and it stores information about every single
link they click, their wish lists, and purchases. With this humongous amount of
data, it is a real challenge for e-commerce companies first to capture, store, and
finally analyze the data for them to gain insights into their customers, under-
stand their preferences, and thereby make purchase recommendations.
In this chapter, we will be discussing in detail various aspects of analytics
and big data with few examples of real-life applications. Finally, we end with
an example of how analytics is used in multicriteria decision making.
3.2 Analytics
The primary objective of analytics is enabling to take informed decisions as
well as solve business problems. Organizations would like to understand the
70 Big Data Analytics Using Multiple Criteria Decision-Making Models
association between the key performance indicators (KPIs) and factors that
have significant impact on the KPIs for effective management. Knowledge
of relationship between KPIs and factors would then provide the decision
maker with appropriate actionable items. Analytics thus is a knowledge
repository consisting of statistical and mathematical tools, machine-learning
algorithms, data management processes such as data extraction, transfor-
mation, and loading (ETL) and computing technologies such as Hadoop
that create value by developing actionable items from data. Devonport and
Harris (2009) reported that there was a high correlation between the use of
analytics and business performance. They reported that a majority of high
performers (measured in terms of profit, shareholder return, revenue, etc.)
strategically apply analytics in their daily operations as compared to low
performers.
The theory of bounded rationality proposed by Herbert Simon (1972) is
becoming very evident in the current context of managing organizations
and competing in the market. The increasing complexity of business prob-
lems, existence of several alternative solutions, and the paucity of time avail-
able for decision making demand a highly structured decision-making
process using past data for the effective management of organizations. There
are several reasons for the existence of bounded rationality such as uncer-
tainty, incomplete information about alternatives, and lack of knowledge
about cause and effect relationships between parameters of importance.
Although decisions are occasionally made using the “highest paid person’s
opinion” (HiPPO) algorithm especially in a group decision-making scenario
and Flipism (all decisions are made by flipping a coin), there is a significant
change in the form of “data-driven decision-making” among several compa-
nies. Many companies use analytics as competitive strategy and many more
are likely to use this in the near future, and here is why; a typical data-driven
decision-making process uses the steps as shown in Figure 3.2.
Stage 1 Stage 3
Stage 2
Identify problems/ Process the data for
Identify sources of data
improvement missing/incorrect data
required for problem
opportunities and prepare for analytics
identified in stage 1
model building
Stage 4
Stage 5 Stage 6
Build the best analytical
Communicate the Implement solution/
model and validate the
analytics output decision
model
FIGURE 3.2
Data-driven decision-making flow diagram.
Basics of Analytics and Big Data 71
1. Most shoppers turn toward their right when they enter a retail store.
2. Men who kiss their wife before going to work earn more and live
longer than those who do not.
72 Big Data Analytics Using Multiple Criteria Decision-Making Models
* Source: https://gramener.com/
† Source: http://articles.timesofindia.indiatimes.com/2014-01-30/mad-mad-world/46827501_
1_free-meals-single-ticket-issued-ticket
Basics of Analytics and Big Data 73
and its ability to store and process the data. There is a need for an alternate
solution or platform that has the margin to scale in order to accommodate
and process the exponentially increasing size of data.
Variety in big data refers to different types of data; all the data that are
captured nowadays cannot be arranged into rows and columns, or in other
words, not all data are structured. For example, data such as texts, images,
and machine-generated data in its original form cannot be arranged in the
traditional rows and columns. Such unstructured data can be in any form
such as XML, Json, free text, images, audios, or videos, and so on and there
is a need to analyze these data to get insights; for example, the sentiments
expressed by customers on a product or service provided by a company are
very important for improving the product and service overall.
Velocity is the rate of data growth. We have seen a little earlier that data
is growing exponentially, and thus it is imperative to analyze data faster,
almost in real time. As data starts to become backdated, their value dimin-
ishes hampering organizations to study and analyze data, which would have
enabled them to improve service delivery and decision-making. Veracity
refers to the quality and reliability of the data. Though we can leverage big
data to get new insights, the real challenge lies in how to capture, store, and
process these ever-increasing data size. Can the traditional platforms help or
we need to take a fresh approach to deal with big data?
• Data capture
• Data store
• Prepare
• Analyze
• Share
output is shared for visualization or fed to another system for usage. There
could be a big data problem in any one or more of the above stages. We will
discuss each stage in detail in later sections. The analytics life cycle is shown
in Figure 3.3.
Data from multiple sources come through various channels such as files,
web services, message queues, plain Transmission Control Protocol (TCP)
sockets, or Rich Site Summary (RSS) feeds. Data does also flow at different
rates, like a few dumps every hour or continuous streaming messages every
second like social feeds or transactional records or messages from devices
or online transactional processing (OLTP) systems. If the data inflow rate is
very high, then the systems which capture these data are in all likelihood to
run out of space and in the end may collapse; thus newer systems need to be
designed and ready to be used as a queue or buffer between source systems
and data capture systems. And finally, the data capture systems need to be
scaled out to support high volume writes. The entire architecture of data
capture in atypical big data environment is shown in Figure 3.4.
In summary, data capture poses three challenges:
FIGURE 3.3
Analytics life cycle.
Transactions
Logs
• Integrates all
sources
• Provides a buffer
or queue
• Transforms or
enriches data
Large-scale distributed systems
optimized for high volume of reads
and writes
FIGURE 3.4
Data capture in big data environment.
78 Big Data Analytics Using Multiple Criteria Decision-Making Models
There are open-source tools like Apache Flume which acts as an intermediate
system listening to multiple sources and aggregates, transforms or enriches,
and writes to systems capturing the data. Tools like Apache Kafka works as a
queuing system between the source and destination with different rates of
arrival and capture of data, and has the ability to scale out to a great extent to
support millions of messages flowing through the system per second. Most
of the Not only SQL (NoSql) Systems are designed to read and write at very
high volume (millions of reads and writes per second) by scaling out systems.
If a table is very large and cannot be stored in a single system, NoSql system
splits the table into multiple shards and distributes these shards across sev-
eral servers. Now each server dealing with a set of shards manages all reads
and writes for those shards. By splitting the tables appropriately into sev-
eral hundred shards, we can load balance high volume incoming reads and
writes to different servers and hence can support large-scale data ingestions.
There are four types of NoSQL databases:
• Key-value store: These databases are designed for storing data in a key-
value fashion like a map. Each record within consists of an indexed
key and a value. Examples: DyanmoDB, Reddis, Riak, BerkeleyDB.
Basics of Analytics and Big Data 79
Namenode Namenode
(active) (stand by)
1 2 3 1 3 2 2 3 1
Blocks
FIGURE 3.5
Hadoop-distributed system.
FIGURE 3.6
MapReduce approach to a simple word count task.
Sqoop
Flume/
Kafka
Import Import or export
FIGURE 3.7
Hadoop and its ecosystem.
all criteria. One such example is performance-based contracts (PBC) that are
becoming very popular among capital equipment users especially in the
defense and the aerospace industry. In PBC, the customers demand perfor-
mance as measured through multiple criteria such as reliability, availability,
total cost of ownership, and logistic foot print.
Predicting reliability and availability involves collecting historical failure
and maintenance data and finding the probability distribution of time to
failure and time to maintain distribution function. Estimating cost of owner-
ship will involve breaking down the total cost into different cost components
and predicting the future costs (such as operation and maintenance costs)
throughout the life of the systems. The original equipment manufacturer
will have to use prescriptive analytics techniques optimize the various KPIs
to provide equipment under PBC.
More examples are available in the remaining chapters of this book. The
role of MCDM in big data is illustrated in Figure 3.8. Big data is important
because the modern digital economy generates huge volumes of data every
day and this data can be used to understand customers and ultimately help
businesses. It has to be noted that data as such will not be sufficient for mak-
ing business decisions. There are a number of analytic tools and MCDM
models are one class of the whole range of analytics tools. However, big data
and MCDM models (and other analytics tools) will not be sufficient; one
Big data
MCDM tools
Business intelligence
FIGURE 3.8
Big data analytics, MCDM, and business intelligence.
84 Big Data Analytics Using Multiple Criteria Decision-Making Models
needs a shrewd business mind to make sense of the results from these ana-
lytics tools to generate BI insights (Ramanathan et al., 2012).
3.8 Conclusions
As enterprises try to understand their respective businesses more deeply to
constantly deliver value to their customers, data is going to be the core focus
in the time to come. As Angela Ahrendts, CEO of Burberry, says “whoever
unlocks the reams of data and uses it strategically will win” (Andrew Gill,
2013). Hence, enterprises will have to deal with challenges of managing and
analyzing big data. As technologies evolve over a period of time, enterprises
need to adopt and learn quickly to benefit from it and remain competitive.
Similarly, governments are also adopting digital platforms to administer and
deliver services and quality life to its citizens. One such example is u
niversal
identification service (UID) implementation by Government of India. It has
reached a billion users and promises to be the core platform to manage and
deliver services under various social service programs.
Bibliography
Anon 2012. 2.5 Quintillion bytes created each day. Storage Newsletter.com, http://
www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-
data-created-daily/, accessed on April 6, 2016.
Anon 2016. Most popular websites in the United States as of September 2015 ranked by
visitors. http://www.statista.com/statistics/271450/monthly-unique-visitors-
to-us-retail-websites/, accessed on April 6, 2016.
Brody, H., Rip, M.R., Johansen, P.V., Paneth, N., and Rachman, S. 2000. Map mak-
ing and myth making in broad street: The London Cholera epidemic 1954.
The Lancet, Vol. 356, pp. 64–69.
Bruhadeeswaran, R. 2012. Shubham Housing Development Finance raises $7.8M
from Elevar, Helion, Others. Vccircle. www.vccircle.com/news/2012/11/12/
shubham-housing-development-finance-raises-78m-elevar-helion-others.
Coles, P.A., Lakhani, K.R., and Mcafee, A.P. 2007. Prediction markets at Google.
Harvard Business School Case (No 9-607-088).
Devonport, T.H. 2006. Competing on analytics. Harvard Business Review, January,
pp. 1–10.
Devonport, T.H. and Harris, J.G. 2009. Competing on Analytics—The New Science of
Winning, Harvard Business School Press, Boston, MA.
Devonport, T.H., Iansiti, M., and Serels, A. 2013. Managing with analytics at Proctor
and Gamble. Harvard Business School Case (Case Number 9-613-045).
Basics of Analytics and Big Data 85
Devonport, T.H. and Patil, D.J. 2012. Data scientist: The sexiest job of 21st century.
Harvard Business Review, 2–8.
Duhigg, C. 2012a. How companies learn your secret. New York Times, February 16,
2012.
Duhigg, C. 2012b. The Power of Habit: Why We Do What We Do in Life and Business.
William Heinemann, London.
Grill, A. 2013. IBM CEO Ginni Rometty believes big data and social will change
everything—How about other CEOs. Leadership, Social Business and Social Media.
http://londoncalling.co/2013/03/ibm-ceo-ginni-rometty-believes-big-data-
and-social-will-change-everything-how-about-other-ceos/
Hayes, B. 2013. First links in the Markov chain. American Scientist, Vol. 101,
pp. 92–97.
Hays, C.L. 2004. What Wal-Mart knows about customers’ habits. New York Times,
November 14, 2004.
Hopkins, M.S., LaValle, S., Balboni, F., Kruschwitz, N., and Shockley, R. 2010.
10 insights: A first look at the new intelligence enterprise survey on winning
with data. MIT Sloan Management Review, Vol. 52, pp. 21–31.
Howard, R. 2002. Comments on the origin and applications of Markov decision
processes. Operations Research, Vol. 50, No. 1, pp. 100–102.
Kant, G., Jacks, M., and Aantjes, C. 2008. Coca-Cola enterprises optimizes vehicle
routes for efficient product delivery. Interfaces, Vol. 38, pp. 40–50.
Leachman, R.C., Kang, J., and Lin, V. 2002. SLIM: Short cycle time and low inventory
in manufacturing at Samsung Electronics. Interfaces, Vol. 32, pp. 61–77.
Lewis, M. 2003. Moneyball: The Art of Winning an Unfair Game. W W Norton &
Company, London, UK.
Mahadevan, B., Sivakumar, S., Dinesh Kumar, D., and Ganeshram, K. 2013.
Redesigning mid-day meal logistics for the Akshaya Patra foundation: OR at
work in feeding hungry school children. Interfaces, Vol. 43, No. 6, pp. 530–546.
Mayer-Schonberger, V. and Cukier, K. 2013. Big Data: A Revolution That Will Transform
How We Live, Work and Think. John Murray, New York.
Ramanathan, R., Duan, Y., Cao, G., and Philpott, E. 2012. Diffusion and impact of
business analytics in UK retail: Theoretical underpinnings and a qualita-
tive study. EurOMA Service Operations Management Forum, Cambridge,
September 19–20, 2012.
Rohit, T., Chakraborty, A., Srinivasan, G., Shroff, M., Abdullah, A., Shamasundar,
B., Sinha, R., Subramanian, S., Hill, D., and Dhore, P. 2013. Hewlett Packard:
Delivering Profitable Growth for HPDirect.Com Using Operations Research.
Interfaces, Vol. 43(1), pp. 41–61.
Savant, M.V. 1990. Ask Marilyn. Parade Magazine, p. 16, September 9, 1990.
Siegel, E. 2013. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die.
John Wiley and Sons, Hoboken, NJ.
Simon, H. 1972. Theories of bounded rationality. In Decisions and Organizations.
McGuire, C.B., and Radner, R. (Eds.). North-Holland Publishing Company,
New York, pp. 161–176.
Sirkin, H.L., Keenan, P., and Jackson, A. 2005. The hard side of change management.
Harvard Business Review, Vol. 83, No. 10, pp. 109–118.
Snow, S.J. 1999. Death by Water: John Snow and cholera in the 19th century. Liverpool
Medical Institution. http://www.lmi.org.uk/Data/10/Docs/11/11Snow.pdf
86 Big Data Analytics Using Multiple Criteria Decision-Making Models
Stelzner, M.A. 2013. 2013 Social Media Marketing Industry Report—How Marketers Are
Using Social Media to Grow Their Businesses. Social Media Examiner Report 2013.
Suhruta, K., Makija, K., and Dinesh Kumar, U. 2013. 1920 Evil Returns—Bollywood
and Social Media Marketing, IIMB Case Number IMB437.
Tandon, R., Chakraborty, A., Srinivasan, G., Shroff, M., Abdulla, A., Shamsundar,
B., Sinha, R., Subramaniam, S., Hill, D., and Dhore, P. 2013. Hewlett Packard:
Delivering profitable growth for HPDirect.Com using operations research.
Interfaces, Vol. 43, No. 1, pp. 48–61.
Tufte, E. 2001. Visual Display of Quantitative Information, Graphics Press, Connecticut.
Underhill, P. 2009. Why We Buy: The Science of Shopping. Simon & Schuster (paper-
back), New York.
4
Linear Programming (LP)-Based Two-
Phase Classifier for Solving a Classification
Problem with Multiple Objectives
CONTENTS
4.1 Introduction................................................................................................... 88
4.2 Research Problem Description and Assumptions................................... 89
4.2.1 Proposed MILP Model..................................................................... 91
4.2.2 Proposed LP-Based Classifier (LP Model 1): Based on the
Conventional Approach of Considering All the Attributes
with Power Index Equal to 1........................................................... 92
4.2.3 Proposed LP-Based Classifier with a Crisp Boundary
(LP Model 2)....................................................................................... 93
4.2.4 Numerical Illustration for the Constraints in Proposed
LP-Based Classifier (LP Model 1).................................................... 96
4.2.5 Numerical Illustration for the Constraints in Proposed
LP-Based Classifier (LP Model 2).................................................... 97
4.3 Proposed LP-Based Two-Phase Classifier................................................. 98
4.3.1 Proposed LP Model for Two-Phase Classifier.............................. 98
4.3.2 Algorithm for the Proposed LP-Based Two-Phase
Classifier......................................................................................100
4.3.3 Numerical Illustration for the Constraints in Proposed
LP-Based Two-Phase Classifier..................................................... 102
4.4 Results and Discussion.............................................................................. 102
4.4.1 Comparison Study of the Proposed LP-Based Classifiers
with the LP Classifier with Fuzzy Measure and the
Choquet Integral by Yan et al. (2006)........................................... 103
4.4.2 Comparison Study of the Proposed LP-Based Classifiers
for the Recommendation Data Set with Artificial Neural
Networks.......................................................................................... 105
4.5 Summary...................................................................................................... 111
Acknowledgment................................................................................................. 112
References.............................................................................................................. 112
87
88 Big Data Analytics Using Multiple Criteria Decision-Making Models
4.1 Introduction
Classification models are used to predict the group/category, to which a
new observation belongs, based on the training data set or set of observa-
tions for which the categories are known in advance. Linear programming
(LP)-based classifiers for classification problem were discussed by Freed
and Glover (1986), and the objective function of the classifier/LP model
was either set to minimize the sum of errors, or set to minimize the maxi-
mum error for the training data set. Multiobjective LP model was used
by Shi et al. (2001) for data mining in portfolio management. LP-based
classifiers provide good results in terms of accuracy when the data set is
linearly separable. To handle and to improve the accuracy of the classifier
when the data set is not linearly separable, researchers and practitioners
mostly consider logistic regression, support vector machines, and artifi-
cial neural networks (ANN) as classifiers. The problem of maximizing the
accuracy of classification is computationally hard, and the methods such
as LP-based classifier, logistic regression, support vector machines, and
ANN work on their specific/respective objective functions with respect
to each classifier, which indirectly improves the accuracy of the classifica-
tion. In order to improve the accuracy of the conventional LP model when
the data is not linearly separable, the concept of fuzzy measure was intro-
duced and utilized by Yan et al. (2006).
The big data is generally characterized by volume, velocity, variety, and
value of data. The technological advancements and the Internet of things
(IoT) enable the massive collection of data with high velocity from various
sources. The big data analytics is the process of understanding the hidden
pattern in this large volume of data to get better insights about the data.
The multiple criteria decision making (MCDM) is a topic in Operations
Research which deals with decision problems that involve multiple objec-
tives. In order to solve the decision problem with multiple objectives, the
researchers and practitioners consider goal programming or epsilon con-
straint-based models. The MCDM models can effectively be applied in the
field of big data analytics to solve the underlying decision problems.
This chapter first proposes a mixed integer linear programming (MILP)
model in order to maximize the accuracy of the classification. Since the prob-
lem is computationally hard, the MILP model can only solve the data set of
small samples (e.g., in order of 100 samples) with reasonable execution time.
In big data analytics, we often encounter the classification problem which
deals with large data sets. The study mainly concentrates on the develop-
ment of computationally efficient LP-based classifiers that can handle the
large volume of data, and also the ability to identify the non-dominated set
of solutions with respect to multiple objectives.
In order to handle the large volume of data, batch processing technologies
are used (e.g., Apache Hadoop) and to handle the high velocity of data, stream
Linear Programming (LP)-Based Two-Phase Classifier 89
normalize ( ai , j ) =
(a i,j − min{ai′ , j }
i′∈N )
(max{a )
.
i′ , j } − min{ai′ , j }
i′∈N i′∈N
Decision Variables
x ′′j An unrestricted variable to capture the coefficient for the term ai,j
y An unrestricted variable which acts as a surrogate constant to improve the accuracy
successi A binary variable to indicate whether the prediction is correct or not
Z= ∑ success i (4.1)
i =1
subject to the following constraints, for all i:
(
when the observation i is of category 0 (i.e., ∑ Aj =1(normalize( ai , j ) × x′′j + y ) should )
be less than or equal to 0.5 in the case of category-0 observation), we have
A
∑ (normalize(a
j =1
i, j ) × x′′j ) + y ≤ 0.5 + (1 − successi ) × M , (4.2)
(
when the observation i is of category 1 (i.e., ∑ Aj =1(normalize( ai , j ) × x′′j + y ) should )
be greater than or equal to (0.5 + epsilon) in the case of category-1 observa-
tion), we have
A
∑ (normalize (a
j =1
i, j ) × x′′j ) + y ≥ 0.5 + epsilon − (1 − successi ) × M. (4.3)
92 Big Data Analytics Using Multiple Criteria Decision-Making Models
In the above constraints (4.2) and (4.3), x′′j and y are unrestricted in sign,
and successi is a binary variable. The decision variable successi determines
whether the observation i is predicted correctly with respect to the corre-
sponding category, and objective function (4.1) maximizes the total accuracy
of the prediction.
Decision Variable
Z= ∑ error i (4.4)
i =1
A
errori ≥
∑ (normalize (a
j =1
i, j ) × x′′j ) + y − 0.5,
(4.5)
(
when the observation i is of category 1 (i.e., ∑ Aj=1 (normalize( ai , j ) × x ′′j + y ) )
should be greater than or equal to (0.5 + epsilon) in the case of category-1
observation), we have
A
errori ≥ (0.5 + epsilon) −
∑ (normalize (ai, j ) × x′′j ) + y .
j =1
(4.6)
In the above constraints (4.5) and (4.6), x′′j and y are unrestricted in sign,
and errori is a real continuous variable. The decision variable errori captures
Linear Programming (LP)-Based Two-Phase Classifier 93
the amount of deviation from their respective threshold for each observation
(if present), and objective function (4.4) minimizes the total errors of all the
observations in the training data set, so as to improve the accuracy of the
prediction.
The proposed classifier (LP Model 1) splits the data set into training data
set and test data set, and then uses the training data set to train the model,
and uses the test data set to validate the model.
The proposed classifier (LP Model 1) runs the above LP model for the
training data set to train the model. Then the proposed classifier (LP Model
1) makes use of the trained model to predict the category for the test data set.
Note that the decision variable errori is used only in the above LP model (LP
Model 1) for the training data set to train the model and then the value for
( ( ) )
the expression ∑ Aj =1 normalize( ai , j ) × x′′j + y is calculated for each observa-
tion i in the test data set to predict the category. If the value of the expression
is less than or equal to 0.5, then the corresponding observation is predicted
as category 0; and if the value of the expression is greater than or equal to
(0.5 + epsilon), then the corresponding observation is predicted as category 1.
Then the accuracy with respect to test data set is calculated for the follow-
ing objectives:
Parameters
normalize ( a ) =
pj
(a
{ }) ; pj
i,j − min ai′ ,j j
i′∈N
p
(max {a } − min {a })
i,j
pj pj
i′ , j i′ , j
i′∈N i′∈N
and
( q j′ q j′′
×a )=
(aq j′
i , j′
q
{
× ai ,jj′′′′ − min ai′j,′j′ × ai′j,′′j′′
i′∈N
q q
})
(max {a })
normalize a .
} − miin {a
i , j′ i , j′′ q j′ q j′′ q j′ q
i′ , j′ ×a i′ , j′′ i′ , j′ × ai′j,′′j′′
i′∈N i′∈N
Decision Variables
Z= ∑ error i (4.10)
i =1
Linear Programming (LP)-Based Two-Phase Classifier 95
(∑ A
j =1 (
∑ p j ∈Pj normalize aip, jj × x j , p j( ) )
( ( )
+ ∑ Aj ′=−11 ∑ Aj ′′= j ′+ 1 ∑ q j′ ∈Q j′ ∑ q j′′ ∈Q j′′ normalize aiq,jj′ ′ × aiq,jj′′′′ × x′j ′ , j ′′ , q j′ , q j′′ + y ) )
should be less than or equal to 0.5 in the case of category-0 observation), we
have
A
errori ≥
∑ ∑ (normalize (a ) × x )
j =1 p j ∈Pj
pj
i, j j,pj
A −1 A
+ ∑ ∑ ∑ ∑ (normalize (a
j ′=1 j ′′= j ′+1 q j′ ∈Q j′ q j′′ ∈Q j′′
q j′
i , j′
q
) )
× ai ,jj′′′′ × x′j′ , j′′ ,q j′ ,q j′′ + y − 0.5 . (4.11)
∑ ∑ (normalize (a ) × x )
A
pj
i, j j,pj
j =1 p j ∈Pj
A −1
∑ ∑ ∑ ∑ (normalize (a )
A
+ q j′
i , j′
q
)
× ai ,jj′′′′ × x ′j′ , j′′ ,q j′ ,q j′′ + y
j′=1 j′′= j′ +1 q j ′ ∈Q j ′ q j ′′ ∈Q j ′′
∑ ∑ (normalize (a ) × x )
A
pj
errori ≥ (0.5 + epsilon) − i, j j,pj
j =1 p j ∈Pj
A −1
∑ ∑ ∑ ∑ (normalize (a )
A
+ q j′
i , j′
q
)
× ai ,jj′′′′ × x ′j′ , j′′ ,q j′ ,q j′′ + y .
(4.12)
j′=1 j′′= j′+1 q j ′ ∈Q j ′ q j ′′ ∈Q j ′′
In the above constraints (4.11) and (4.12), x j , p j , x′j ′ , j ′′ , q j′ , q j′′ , and y are unre-
stricted in sign, and errori is a real variable. Constraints (4.11) and (4.12)
capture the contribution of attributes from their higher-order polynomial
degrees, and also capture the interaction effects among the attributes. The
decision variable errori captures the amount of deviation from their respec-
tive threshold for each observation (if present). Objective function (4.10)
96 Big Data Analytics Using Multiple Criteria Decision-Making Models
minimizes the total errors of all the observations in the training data set, so
as to improve the accuracy of the prediction.
The proposed classifier (LP Model 2) splits the data set into training data
set and test data set, and then uses the training data set to train the model,
and uses the test data set to validate the model.
The proposed classifier (LP Model 2) runs the above LP model for the train-
ing data set to train the model. Then this classifier makes use of the trained
model to predict the category for the test data set. Note that the decision vari-
able errori is used only in the above LP model (LP Model 2) for the training
data set to train the model, and then the value for the expression
∑ ∑ (normalize (a ) × x )
A
pj
i, j j,pj
j =1 p j ∈Pj
A −1
∑ ∑ ∑ ∑ (normalize (a )
A
+ q j′
i , j′
q
)
× ai ,jj′′′′ × x ′j′ , j′′ ,q j′ ,q j′′ + y
j′=1 j′′ = j′+1 q j ′ ∈Q j ′ q j ′′ ∈Q j ′′
is calculated for each observation i in the test data set to predict the category.
If the value of the expression is less than or equal to 0.5, then the correspond-
ing observation is predicted as category 0; and if the value of the expression
is greater than or equal to (0.5 + epsilon), then the corresponding observation
is predicted as category 1. Then the accuracy with respect to test data set is
calculated for the objectives, Objective 1, Objective 2, and Objective 3.
( 5 − 0)
normalize (5) = = 0.56.
(9 − 0 )
For the samples in Table 4.1, the constraints of the proposed LP model are
expressed as follows:
With respect to sample 1, LP model Constraint (4.5) appears as follows:
TABLE 4.1
Samples for the Numerical Illustration
Sample Attribute 1 Attribute 2 Category
1 1 5 0
2 4 9 1
As the values of attributes are in the range [0, 9], for attribute 2 of sample 1
with power term 2, the normalize function is expressed as follows:
((5 × 5) − 0)
normalize (52 ) = = 0.31.
((9 × 9) − 0)
For the samples in Table 4.1, the constraints of the proposed LP model are
expressed as follows:
With respect to sample 1, LP model Constraint (4.11) appears as follows:
errori ≥ (0.11 × x1,1 + 0.01 × x1,2 + 0.00 × x1,3 + 0.56 × x2 ,1 + 0.31 × x2 ,2 + 0.17
× x2 ,3 + 0.06 × x1′ ,2 ,1,1 + 0.03 × x1′ ,2 ,1,2 + 0.01 × x1′ ,2 ,2 ,1 + 0.00 × x1′ ,2 ,2 ,2 + y ) − 0.5.
(4.19)
98 Big Data Analytics Using Multiple Criteria Decision-Making Models
Note: The coefficients are presented in this entire chapter in two decimal
points precision, and hence the expression 0.00 × x1,3 in Constraint (4.19) rep-
resents actually a small coefficient for the variable x1,3.
With respect to sample 2, LP model Constraint (4.12) appears as follows:
errori ≥ (0.5 + epsilon) − (0.44 × x1,1 + 0.20 × x1, 2 + 0.09 × x1, 3 + 1.00 × x2 ,1
+ 1.00 × x2 , 2 + 1.00 × x2 , 3 + 0.44 × x1′ , 2 ,1,1 + 0.44 × x1′ , 2 ,1, 2 + 0.20 × x1′ , 2 , 2 ,1
+ 0.20 × x1′ , 2 , 2 , 2 + y ).
(4.20)
Z= ∑ error
i =1
i (4.21)
Linear Programming (LP)-Based Two-Phase Classifier 99
(∑ A
j =1 ( ( )
∑ p j ∈Pj normalize aip, jj × x j , p j )
( ( )
+ ∑ Aj′=−11 ∑ Aj′′= j′+1 ∑ q j′ ∈Q j′ ∑ q j′′ ∈Q j′′ normalize aiq,jj′′ × aiq,jj′′′′ × x ′j′ , j′′ ,qq j′ ,q j′′ + y ) )
should be less than or equal to b in the case of category-0 observation), we have
A
errori ≥
∑ ∑ (normalize (a ) × x )
j =1 p j ∈Pj
pj
i, j j,pj
A −1 A
+ ∑ ∑ ∑ ∑ (normalize (a
j’ =1 j’’ = j’ +1 q j’ ∈Q j’ q j’’ ∈Q j’’
q j′
i , j′
q
) )
× ai ,jj′′′′ × x′j′ , j′′ ,q j′ ,q j′′ + y − b ,
(4.22)
(∑ A
j =1 ( ( )
∑ p j ∈Pj normalize aip, jj × x j , p j )
( ( )
+ ∑ Aj′=−11 ∑ Aj′′= j′+1 ∑ q j′ ∈Q j′ ∑ q j′′ ∈Q j′′ normalize aiq,jj′′ × aiq,jj′′′′ × x ′j′ , j′′ ,q j′ ,q j′′ + y ) )
should be greater than or equal to (b + 1) in the case of category-1 observa-
tion), we have
∑∑ (normalize (a ) × x )
A
pj
errori ≥ (b + 1) − i, j j,pj
j =1 p j ∈Pj
A −1
∑ ∑ ∑ ∑ (normalize (a )
A
+
q j′
i , j′
q
)
× ai ,jj′′′′ × x ′j′ , j′′ ,q j′ ,q j′′ + y .
(4.23)
j′=1 j′′= j′+1 q j ′ ∈Q j ′ q j ′′ ∈Q j ′′
In the above constraints (4.22) and (4.23), x j , p j , x ′j′ , j′′ ,q j′ ,q j′′ , and y are unre-
stricted in sign, and errori and b are real variables. We have “1” in Constraint
(4.23), since we have dichotomous classification. Constraints (4.22) and (4.23)
capture the contribution of attributes from their higher-order polynomial
degrees, and also capture the interaction effects among the attributes. The
decision variable b captures the classification threshold/cut-off for the respec-
tive category. The decision variable errori captures the amount of deviation
from their respective threshold for each observation (if present). Objective
function (4.21) minimizes the total errors of all the observations in the train-
ing data set, so as to improve the accuracy of the prediction.
100 Big Data Analytics Using Multiple Criteria Decision-Making Models
Phase 0:
• In this phase, the proposed classifier splits the data set into three:
training data set, test data set, and validation data set.
Step 1: Split the data set into training data set, test data set, and valida-
tion data set.
Phase 1:
• In this phase, the proposed classifier uses the training data set to train
the model (i.e., to get the values of the decision variables b, x j , p j , x′j ′ , j ′′ , q j′ , q j′′,
and y) and also identifies the bandwidth of boundary [b, (b + 1)].
Step 1: Run the proposed LP model (LP model in the proposed two-
phase classifier) for the training data set to get the LP solution
in terms of the value of the variable b, and also the values of the
variables x j , p j , x ′j′ , j′′ ,q j′ ,q j′′, and y, to compute the following:
A
∑ ∑ (normalize (a ) × x )
j =1 p j ∈Pj
pj
i, j j,pj
A −1 A
+ ∑ ∑ ∑ ∑ (normalize (a
j′=1 j′′= j ′ +1 q j′ ∈Q j′ q j′′ ∈Q j′′
q j′
i , j′
q
) )
× ai ,jj′′′′ × x′j , j′′ ,q j′ ,q j′′ + y .
(4.24)
Phase 2:
• The second phase of the proposed classifier comprises two parts.
Part 1:
• In this part, the proposed classifier uses the test data set to trans-
form the bandwidth of boundary into a crisp boundary b + c. In this
process, the proposed classifier identifies a non-dominated set of
solutions with respect to multiple objectives, and captures the cor-
responding set of c values.
Linear Programming (LP)-Based Two-Phase Classifier 101
Note:
Pj = {1, 2, 3, 4, 5} ∀j , (4.27)
Q j = {1} ∀j. (4.28)
Linear Programming (LP)-Based Two-Phase Classifier 103
errori ≥ (0.06 × x1,1 + 0.00 × x1,2 + 0.00 × x1,3 + 0.00 × x1, 4 + 0.00 × x1,5
+ 0.43 × x2 ,1 + 0.19 × x2 ,2 + 0.08 × x2 ,3 + 0.04 × x2 , 4 + 0.02 × x2 ,5
+ 0.04 × x1′ ,2 ,1,1 + y ) − 0.5.
(4.31)
TABLE 4.2
Range of Values with Respect to Attributes
in Data Set Presented in the Paper by
Yan et al. (2006)
Attribute 1 Attribute 2
[0.01,0.99] [0.01,0.99]
TABLE 4.3
Two Samples from the Data Set Presented in the Paper
by Yan et al. (2006)
Sample Attribute 1 Attribute 2 Category
1 0.07 0.43 0
2 0.39 0.92 1
104 Big Data Analytics Using Multiple Criteria Decision-Making Models
For the samples in Table 4.3, the constraints of the proposed LP-based two-
phase classifier are expressed as follows: With respect to sample 1, LP model
Constraint (4.22) appears as follows:
errori ≥ (0.06 × x1,1 + 0.00 × x1,2 + 0.00 × x1,3 + 0.00 × x1, 4 + 0.00 × x1,5
+ 0.43 × x2 ,1 + 0.19 × x2 ,2 + 0.08 × x2 ,3 + 0.04 × x2 , 4
+ 0.02 × x2 ,5 + 0.04 × x1′ ,2 ,1,1 + y ) − b. (4.33)
TABLE 4.4
Performance of the Proposed LP-Based Classifiers (LP Model 1, LP Model 2, and the
LP-Based Two-Phase Classifier) for the Data Set by Yan et al. (2006)
Objective 1 (%) Objective 2 (%) Objective 3 (%)
LP Model 1 48.00 100.00 1.89
LP Model 2 98.00 98.94 97.17
Two-phase classifier 100.00 100.00 100.00
Note: LP Model 1 and LP Model 2 cannot address multiobjective optimization; A single
non-dominated solution of the proposed two-phase classifier (for the validation data set).
Linear Programming (LP)-Based Two-Phase Classifier 105
is not linearly separable. Note that Yan et al. (2006) included all the samples in
the data set for training the model and the accuracy of their classifier was 100%.
errori ≥ (0.25 × x1′′+ 0.09 × x2′′ + 0.21 × x3′′ + 0.11 × x 4′′ + 0.26 × x5′′
+ 0.05 × x6′′ + 0.42 × x7′′ + 0.31 × x8′′ + y ) − 0.5. (4.35)
errori ≥ (0.5 + epsilon) − (0.28 × x1′′+ 0.22 × x2′′ + 0.09 × x3′′ + 0.22 × x 4′′
+ 0.14 × x5′′ + 0.005 + 0.12 × x7′′ + 0.06 × x8′′ + y ). (4.36)
TABLE 4.5
Range of Values with Respect to Attributes in Recommendation Data Set
Attribute Attribute Attribute Attribute Attribute Attribute Attribute Attribute
1 2 3 4 5 6 7 8
[0,36] [0,23] [0,47] [0,18] [0,35] [0,21] [0,26] [0,16]
TABLE 4.6
Two Samples from the Recommendation Data Set
Attri Attri Attri Attri Attri Attri Attri Attri
Sample bute 1 bute 2 bute 3 bute 4 bute 5 bute 6 bute 7 bute 8 Category
1 9 2 10 2 9 1 11 5 0
2 10 5 4 4 5 1 3 1 1
106 Big Data Analytics Using Multiple Criteria Decision-Making Models
For the samples in Table 4.6, the constraints of the proposed LP-based clas-
sifier (LP Model 2) are expressed as follows:
With respect to sample 1, LP model Constraint (4.11) appears as follows:
errori ≥ (0.25 × x1,1 + 0.062 × x1,2 + 0.02 × x1,3 + 0.00 × x1,4 + 0.00 × x1,5
+0.09 × x2 ,1 + 0.01 × x2 ,2 + 0.00 × x2 ,3 + 0.00 × x2 ,4 + 0.00 × x2 ,5
+0.21 × x3 ,1 + 0.05 × x3 ,2 + 0.01 × x3 ,3 + 0.00 × x3 ,4 + 0.00 × x3 ,5
+0.11 × x 4 ,1 + 0.01 × x 4 ,2 + 0.00 × x 4 ,3 + 0.00 × x 4 , 4 + 0.00 × x 4 ,5
+0.26 × x5 ,1 + 0.07 × x5 ,2 + 0.02 × x5 ,3 + 0.00 × x5 ,4 + 0.00 × x5 ,5
+0.05 × x6 ,1 + 0.00 × x6 ,2 + 0.00 × x6 ,3 + 0.00 × x6 , 4 + 0.00 × x6 ,5
+0.42 × x7 ,1 + 0.18 × x7 ,2 + 0.08 × x7 ,3 + 0.03 × x7 ,4 + 0.01 × x7 ,5
+0.31 × x8 ,1 + 0.10 × x8 ,2 + 0.03 × x8 ,3 + 0.01 × x8 ,4 + 0.00 × x8 ,5
+0.04 × x1′,2 ,1,1 + 0.17 × x1′,3 ,1,1 + 0.08 × x1′,4 ,1,1 + 0.12 × x1′,5 ,1,1 + 0.03 × x1′,6 ,1,1
+0.32 × x1′,7 ,1,1 + 0.24 × x1′,8 ,1,1 + 0.03 × x2′ ,3 ,1,1 + 0.02 × x2′ ,4 ,1,1 + 0.04 × x2′ ,5 ,1,1
+0.00 × x2′ ,6 ,1,1 + 0.13 × x2′ ,7 ,1,1 + 0.07 × x2′ ,8 ,1,1 + 0.03 × x3′ , 4 ,1,1 + 0.14 × x3′ ,5 ,1,1
+0.03 × x3′ ,6 ,1,1 + 0.29 × x3′ ,7 ,1,1 + 0.16 × x3′ ,8 ,1,1 + 0.06 × x 4′ ,5 ,1,1 + 0.02 × x 4′ ,6 ,1,1
+0.12 × x 4′ ,7 ,1,1 + 0.12 × x 4′ ,8 ,1,1 + 0.02 × x5′ ,6 ,1,1 + 0.20 × x5′ ,7 ,1,1 + 0.09 × x5′ ,8 ,1,1
+0.04 × x6′ ,7 ,1,1 + 0.02 × x6′ ,8 ,1,1 + 0.28 × x7′ ,8 ,1,1 + y ) − 0.5.
(4.37)
With respect to sample 2, LP model Constraint (4.12) appears as follows:
For the samples in Table 4.6, the constraints of the proposed LP-based two-
phase classifier are expressed as follows:
With respect to sample 1, LP model Constraint (4.22) appears as follows:
errori ≥ (0.25 × x1,1 + 0.062 × x1,2 + 0.02 × x1,3 + 0.00 × x1,4 + 0.00 × x1,5
+0.09 × x2 ,1 + 0.01 × x2 ,2 + 0.00 × x2 ,3 + 0.00 × x2 ,4 + 0.00 × x2 ,5
+0.21 × x3 ,1 + 0.05 × x3 ,2 + 0.01 × x3 ,3 + 0.00 × x3 ,4 + 0.00 × x3 ,5
+0.11 × x 4 ,1 + 0.01 × x 4 ,2 + 0.00 × x 4 ,3 + 0.00 × x 4 , 4 + 0.00 × x 4 ,5
+0.26 × x5 ,1 + 0.07 × x5 ,2 + 0.02 × x5 ,3 + 0.00 × x5 ,4 + 0.00 × x5 ,5
+0.05 × x6 ,1 + 0.00 × x6 ,2 + 0.00 × x6 ,3 + 0.00 × x6 , 4 + 0.00 × x6 ,5
+0.42 × x7 ,1 + 0.18 × x7 ,2 + 0.08 × x7 ,3 + 0.03 × x7 ,4 + 0.01 × x7 ,5
+0.31 × x8 ,1 + 0.10 × x8 ,2 + 0.03 × x8 ,3 + 0.01 × x8 ,4 + 0.00 × x8 ,5
+0.04 × x1′,2 ,1,1 + 0.17 × x1′,3 ,1,1 + 0.08 × x1′,4 ,1,1 + 0.12 × x1′,5 ,1,1 + 0.03 × x1′,6 ,1,1
+0.32 × x1′,7 ,1,1 + 0.24 × x1′,8 ,1,1 + 0.03 × x2′ ,3 ,1,1 + 0.02 × x2′ ,4 ,1,1 + 0.04 × x2′ ,5 ,1,1
+0.00 × x2′ ,6 ,1,1 + 0.13 × x2′ ,7 ,1,1 + 0.07 × x2′ ,8 ,1,1 + 0.03 × x3′ , 4 ,1,1 + 0.14 × x3′ ,5 ,1,1
+0.03 × x3′ ,6 ,1,1 + 0.29 × x3′ ,7 ,1,1 + 0.16 × x3′ ,8 ,1,1 + 0.06 × x 4′ ,5 ,1,1 + 0.02 × x 4′ ,6 ,1,1
+0.12 × x 4′ ,7 ,1,1 + 0.12 × x 4′ ,8 ,1,1 + 0.02 × x5′ ,6 ,1,1 + 0.20 × x5′ ,7 ,1,1 + 0.09 × x5′ ,8 ,1,1
+0.04 × x6′ ,7 ,1,1 + 0.02 × x6′ ,8 ,1,1 + 0.28 × x7′ ,8 ,1,1 + y ) − b.
(4.39)
With respect to sample 2, LP model Constraint (4.23) appears as follows:
errori ≥ (b + 1)
−( x1,1 × 0.28 + x1,2 × 0.08 + x1,3 × 0.02 + x1,4 × 0.01 + x1,5 × 0.00
+ x2 ,1 × 0.22 + x2 ,2 × 0.05 + x2 ,3 × 0.01 + x2 ,4 × 0.00 + x2 ,5 × 0.00
+ x3 ,1 × 0.09 + x3 ,2 × 0.01 + x3 ,3 × 0.00 + x3 ,4 × 0.00 + x3 ,5 × 0.00
+ x 4 ,1 × 0.22 + x 4 ,2 × 0.01 + x 4 ,3 × 0.01 + x 4 , 4 × 0.00 + x 4 ,5 × 0.00
+ x5 ,1 × 0.14 + x5 ,2 × 0.02 + x5 ,3 × 0.00 + x5 ,4 × 0.00 + x5 ,5 × 0.00
+ x6 ,1 × 0.05 + x6 ,2 × 0.00 + x6 ,3 × 0.00 + x6 , 4 × 0.00 + x6 ,5 × 0.00
+ x7 ,1 × 0.12 + x7 ,2 × 0.01 + x7 ,3 × 0.00 + x7 ,4 × 0.00 + x7 ,5 × 0.00
+ x8 ,1 × 0.06 + x8 ,2 × 0.00 + x8 ,3 × 0.00 + x8 ,4 × 0.00 + x8 ,5 × 0.00
+ x1′,2 ,1,1 × 0.10 + x1′,3 ,1,1 × 0.08 + x1′,4 ,1,1 × 0.18 + x1′,5 ,1,1 × 0.08 + x1′,6 ,1,1 × 0.03
+ x1′,7 ,1,1 × 0.10 + x1′,8 ,1,1 × 0.05 + x2′ ,3 ,1,1 × 0.03 + x2′ ,4 ,1,1 × 0.08 + x2′ ,5 ,1,1 × 0.05
+ x2′ ,6 ,1,1 × 0.01 + x2′ ,7 ,1,1 × 0.09 + x2′ ,8 ,1,1 × 0.03 + x3′ ,4 ,1,1 × 0.03 + x3′ ,5 ,1,1 × 0.03
+ x3′ ,6 ,1,1 × 0.01 + x3′ ,7 ,1,1 × 0.03 + x3′ ,8 ,1,1 × 0.01 + x 4′ ,5 ,1,1 × 0.07 + x 4′ ,6 ,1,1 × 0.04
+ x 4′ ,7 ,1,1 × 0.07 + x 4′ ,8 ,1,1 × 0.05 + x5′ ,6 ,1,1 × 0.01 + x5′ ,7 ,1,1 × 0.03 + x5′ ,8 ,1,1 × 0.01
+ x6′ ,7 ,1,1 × 0.01 + x6′ ,8 ,1,1 × 0.00 + x7′ ,8 ,1,1 × 0.02 + y ).
(4.40)
108 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 4.7
Performance of the Proposed LP-Based Classifiers (LP Model 1 and LP Model 2)
for the Recommendation Data Set
LP-Based Classifier (LP Model 1) LP-Based Classifier (LP Model 2)
Objective 1 49.35 Objective 1 52.22
Objective 2 50.39 Objective 2 52.11
Objective 3 48.30 Objective 3 52.32
Note: LP Model 1 and LP Model 2 cannot address multiobjective optimization.
TABLE 4.8
Performance of the Proposed LP-Based Two-Phase Classifier for the Test Data Set
(2500 Observations) with Respect to Three Objectives
Non-Dominated Set of Solutions
Solution c Value Objective 1 Objective 2 Objective 3
1 0.00 72.52 93.80 50.93
2 0.05 73.84 92.77 54.63
3 0.10 74.64 91.82 57.21
4 0.15 75.76 90.95 60.35
5 0.20 77.04 89.52 64.38
6 0.25 78.44 88.64 68.09
7 0.30 78.76 86.74 70.67
8 0.35 79.32 84.83 73.73
9 0.40 79.64 83.40 75.83
10 0.45 79.96 81.41 78.49
11 0.50 80.16 79.90 80.42
12 0.55 80.16 77.12 83.24
13 0.60 80.20 75.06 85.41
14 0.65 80.32 72.99 87.75
15 0.70 80.16 71.01 89.44
16 0.75 79.52 68.39 90.81
17 0.80 78.44 64.89 92.18
18 0.85 77.40 61.87 93.15
19 0.90 76.04 58.38 93.96
20 0.95 75.08 55.76 94.68
21 1.00 74.00 52.74 95.57
Linear Programming (LP)-Based Two-Phase Classifier 109
Objective 3
90.0000
80.0000
70.0000
60.0000
60.00
70.00
80.00 74.00
76.00
Objective 2 90.00 80.00 78.00
Objective 1
FIGURE 4.1
Non-dominated set of solutions with respect to test data set.
Objective 3
90.0000
80.0000
70.0000
60.0000
60.00
70.00 74.00
80.00 76.00
Objective 2 90.00 80.00 78.00
Objective 1
FIGURE 4.2
Non-dominated set of solutions with respect to validation data set.
110 Big Data Analytics Using Multiple Criteria Decision-Making Models
and the set of non-dominated solutions for the validation data set (see Part 2
of Phase 2) is listed in Table 4.9, and the same is shown in Figure 4.2.
We can also observe from Table 4.8 that (in the case of single objective) if
we choose Objective 1 as the primary objective under consideration, then the
proposed classifier (in Part 1 of Phase 2) selects 0.50 as the best c value; and if
we choose Objective 2 as the primary objective under consideration, then the
proposed classifier (in Part 1 of Phase 2) selects 0.00 as the best c value; and
if we choose Objective 3 as the primary objective under consideration, then
the proposed classifier (in Part 1 of Phase 2) selects 1.00 as the best c value.
We also compare the accuracy of the proposed LP-based classifiers with
the accuracy of ANN. We consider the ANN model with sigmoid function
as the activation function for artificial neurons, in which the number of hid-
den layers is 1, and the number of input neurons is set to 8 to match with the
number of attributes in the data set, and the number of output neurons is set
to 2 to match with the number of categories, and the number of neurons in
the hidden layer is 8. We train the ANN using training data set (70%: 17,500
observations), and we present the results of the ANN with respect to the test
TABLE 4.9
Performance of the Proposed LP-Based Two-Phase Classifier for the Validation Data
Set (7500 Observations) with Respect to Three Objectives
Non-Dominated Set of Solutions
Solution c Value Objective 1 Objective 2 Objective 3
1 0.00 72.85 94.40 51.54
2 0.05 73.88 93.22 54.75
3 0.10 74.99 92.23 57.93
4 0.15 76.01 90.97 61.22
5 0.20 77.00 89.87 64.27
6 0.25 77.79 88.58 67.11
7 0.30 78.40 87.02 69.87
8 0.35 79.08 85.55 72.68
9 0.40 79.43 83.51 75.38
10 0.45 79.96 81.55 78.38
11 0.50 80.01 79.30 80.72
12 0.55 79.96 77.00 82.89
13 0.60 79.83 74.88 84.72
14 0.65 79.57 72.49 86.58
15 0.70 78.97 69.62 88.22
16 0.75 78.31 66.81 89.68
17 0.80 77.57 63.78 91.22
18 0.85 76.68 60.78 92.41
19 0.90 76.07 58.39 93.55
20 0.95 75.15 55.58 94.51
21 1.00 74.27 52.90 95.41
Linear Programming (LP)-Based Two-Phase Classifier 111
TABLE 4.10
Performance of the Artificial Neural Networks for the Recommendation Data Set
Solution Objective 1 Objective 2 Objective 3
1 79.00% 81.47% 76.51%
data set (30%: 7500 observations), for the considered objectives Objective 1,
Objective 2, and Objective 3 in Table 4.10. The results indicate that the pro-
posed LP-based two-phase classifier performs well, and it is able to find a
non-dominated set of solutions with respect to multiple objectives (Objective
1, Objective 2, and Objective 3). We can also observe that solution 9 (in Table
4.9) by the proposed LP-based two-phase classifier clearly dominates the
solution by the ANN.
• Note that in Figures 4.1 and 4.2, Objective 1 maximizes the total
accuracy with respect to both categories, Objective 2 maximizes
the accuracy with respect to category 1, and Objective 3 maximizes
the accuracy with respect to category 0.
4.5 Summary
This chapter proposes MILP-based classifier and LP-based classifiers for
binary classification, and when we compare the accuracy of all the proposed
LP-based classifiers with ANN, the results indicate that the proposed LP-based
two-phase classifier is able to give better results. Consequently, the proposed
LP-based two-phase classifier is able to handle data that are not inherently lin-
early separable, unlike the conventional MILP-based and LP-based classifiers.
The salient contributions of the proposed LP-based two-phase classifier are
in terms of treating the decision variables as unrestricted in sign; accounting
for the contribution of attributes from their interaction effects and the con-
tribution of attributes from their higher-order polynomial degrees; treating
the classification threshold/cut-off as a decision variable; converting the band-
width of boundary of threshold to a crisp boundary with the consideration
of multiple objectives. The proposed LP-based two-phase classifier considers
such multiple objectives because in the application areas such as medical diag-
nosis, absence of an alarm (failing to predict the category 1) is more serious
than a false alarm. The proposed LP-based two-phase classifier is an efficient
method in terms of the ability to solve (linear programming model) the under-
lying classification problem, and thus it can be effectively used in conjunction
with more sophisticated and computationally demanding approaches such as
random forest, support vector machines, and ANN to improve the accuracy
further on the data that has high variety and high volume.
112 Big Data Analytics Using Multiple Criteria Decision-Making Models
Acknowledgment
We are thankful to the reviewers and the editors for their valuable comments
and suggestions to improve our chapter.
References
Freed, N. and Glover, F. 1986. Evaluating alternative linear programming models
to solve the two-group discriminant problem. Decision Sciences, 17(2), 151–162.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. 2011. Learning
word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting
of the Association for Computational Linguistics: Human Language Technologies—
Volume 1, 142–150. Association for Computational Linguistics, Portland,
Oregon.
Shi, Y., Wise, M., Luo, M., and Lin, Y. 2001. Data mining in credit card portfolio man-
agement: A multiple criteria decision making approach. In Köksalan, M. and
Zionts, S. (Eds.), Multiple Criteria Decision Making in the New Millennium, 427–
436. Springer Berlin Heidelberg, Ankara, Turkey.
Yan, N., Wang, Z., Shi, Y., and Chen, Z. 2006. Nonlinear classification by linear pro-
gramming with signed fuzzy measures. In 2006 IEEE International Conference
on Fuzzy Systems, 408–413. IEEE, Vancouver, British Columbia, Canada.
5
Multicriteria Evaluation of
Predictive Analytics for Electric
Utility Service Management
CONTENTS
5.1 Introduction................................................................................................. 114
5.2 Layout of Electric Supply Distribution Network................................... 115
5.3 Predictive Analysis..................................................................................... 119
5.3.1 Decision Trees................................................................................. 120
5.3.2 Logistic Regression......................................................................... 120
5.3.3 Boosting............................................................................................ 120
5.3.4 Random Forest................................................................................ 120
5.3.5 Support Vector Machines.............................................................. 121
5.3.6 Artificial Neural Networks........................................................... 121
5.4 Evaluating Prediction Models Using Multicriteria
Decision-Making Techniques................................................................... 122
5.4.1 Performance of Classification Model........................................... 122
5.4.2 Criteria for Model Evaluation....................................................... 123
5.4.3 Multicriteria Ranking Techniques............................................... 123
5.5 Data Description......................................................................................... 124
5.5.1 Weather Data................................................................................... 124
5.5.2 Power Outage Data......................................................................... 125
5.6 Experimental Results................................................................................. 125
5.7 Smart Staffing.............................................................................................. 130
5.7.1 Objective Functions........................................................................ 130
5.7.1.1 Objective 1: Minimize Staffing Costs............................ 130
5.7.1.2 Objective 2: Minimize Power Restoration Time.......... 131
5.7.2 Model Constraints.......................................................................... 131
5.7.2.1 Restriction on Workforce Capacity................................ 131
5.7.2.2 Worker Type Requirement.............................................. 131
5.7.2.3 Nonnegativity Restriction.............................................. 131
5.8 Conclusions.................................................................................................. 132
References.............................................................................................................. 132
113
114 Big Data Analytics Using Multiple Criteria Decision-Making Models
5.1 Introduction
Utility companies are responsible for the infrastructure of the power
delivery system and power interruptions disrupt customers as well as cause
significant economic losses. In the United States, the estimated cost of power
interruptions is $79 billion per year (LaCommare and Eto, 2006). The out-
age costs are directly proportional to the customer’s dependence upon elec-
tricity during an outage. With annual electricity use in a typical U.S. home
increasing 61% since 1970, it is becoming increasingly important to reduce
and prevent outages (Swaminathan and Sen, 1998). Outage costs vary sig-
nificantly depending upon the outage attributes such as frequency, duration,
and intensity of the outage. In this chapter, an outage is considered to be
a complete or total loss of service, typically resulting from a distribution-
related cause or a transmission failure.
The priority of every organization is twofold: providing customers the best
possible product at the lowest cost while maintaining the quality. The situ-
ation is not very different in the case of a service industry. Electric utility
companies aim to optimize every form of the electric service system. Every
company tries to provide the most reliable service in the form of consistent
and uninterrupted service to its customers while reducing the cost of supply
and maximizing profits. Companies often have to deal with problems such
as downtime or outage time due to a failure in the supply system. Therefore,
better recovery planning and forecasting of outages is necessary to provide
reliable service to customers.
There are several reasons attributed to an electric power outage, which
range from equipment failure and overloading of the line to weather-
related events. Power systems are most vulnerable to storms and extreme
weather events. Seasonal storms combined with wind, snow, rain, ice, etc.
can cause significant outages. Data on weather-related outages have been
used in the past to estimate the costs of an outage and the impact it has
on consumers. According to past weather-related outage data, 90% of cus-
tomer outage-minutes are owing to events which affect the local distribu-
tion systems, while the remaining 10% are from generation and transmission
problems (Campbell, 2012). Electric utility companies can reduce the outages/
damages resulting from severe weather conditions by enhancing the overall
condition of the power delivery system and better prediction of the outage.
There has been a lot of research that focuses on predicting outages as shown
in Table 5.1.
However, most of the previous work does not use hourly weather fore-
cast to predict short-term outages. Increasingly, companies are looking
to tackle outages due to both local distribution systems and larger trans-
mission systems by developing strategies to reduce or prevent outages.
Short-term forecasts of an electric power outage and the cost param-
eters associated with the outage would help companies optimize their
Multicriteria Evaluation of Predictive Analytics 115
TABLE 5.1
Summary of Past Research Done
Predicting Predicting
Power Outage Power-Related
Cost Analysis for Extreme Damages for
of Power Weather Extreme Weather
Author Name Outages Conditions Conditions
Balijepalli et al. (2005) ✓
Cerruti and Decker (2012) ✓
Davidson et al. (2003) ✓
DeGaetano et al. (2008) ✓
Huang et al. (2001) ✓
J. Douglas (2000) ✓
LaCommare and Eto (2006) ✓
Li et al. (2010) ✓
Liu et al. (2007) ✓
Reed (2008) ✓ ✓
Reed et al. (2010) ✓ ✓
Sullivan et al. (1996) ✓
Winkler et al. (2010) ✓ ✓
Zhou et al. (2006) ✓
Zhu et al. (2007) ✓
Supplier
Service Service
system Customer
provider
Information Service
flow product flow
FIGURE 5.1
Service supply chain.
The electric power supply industry is an integral part of the service system
industry and also important for the economy as all businesses rely h eavily
on electric power to operate. The Energy Information Administration
predicts that there would be an increase of 29% in the electricity demand in
the United States between 2012 and 2040 (Sieminski, 2014). The power gen-
erators own and operate electricity-generating facilities or power plants and
sell the power produced to the utility service providers. These service pro-
viders are the key players in the network as they are responsible for provid-
ing a reliable source of electric power while ensuring uninterrupted service
at an affordable cost.
The utility service provider is directly involved in the design of the ser-
vice and is answerable directly to the customer. For instance, the products
and services in an electric power utility service supply chain network are
limited to the electric power supplied and transmission services provided.
The service utility in a particular geographical area handles the transmis-
sion and distribution of electricity to the end users through a vertically
integrated structure. The various decision makers in this system such as the
power generators, the power suppliers, the transmitters, and the customers
operate in a decentralized system. A depiction of the distribution network
for electric power is shown in Figure 5.2 (Nagurney and Matsypura, 2007;
Nagurney et al., 2007).
Understanding the outline of the electric power supply system is crucial
for identifying the critical points in the service supply network:
Secondary customer
Transmission
Generator step-up 120 kV and 240 kV
customer
transformer
138 kV or 20 kV
FIGURE 5.2
Basic electrical distribution system. (From U.S.–Canada Power System Outage Task Force, Final Report on the August 14, 2003, Blackout in the United States
and Canada: Causes and Recommendations, April 2004.)
117
118 Big Data Analytics Using Multiple Criteria Decision-Making Models
e(β0 + β⋅X ) 1
p(Y = 1) = = , (5.1)
1 + e(β0 + β⋅X ) 1 + e−(β0 + β⋅X )
5.3.3 Boosting
Boosting is one of the most recent and important developments in the
domain of predictive classification techniques. It works on the principle of
sequentially applying a classification algorithm to reweighted versions of the
training data, and then taking a weighted majority vote of the sequence of
classifiers produced as a result of this sequential process. For the two-class
problem, boosting can be viewed as an approximation to additive modeling
on the logistic scale using maximum Bernoulli likelihood as a criterion. This
simple strategy is found to yield drastic improvements in results for many
classification algorithms due to statistic principles of additive modeling and
maximum likelihood estimation.
TABLE 5.2
Confusion Matrix
Actual Outcome
Predicted outcome 0 (no outage) 1 (outage)
0 (no outage) TN FN
1 (outage) FP TP
Multicriteria Evaluation of Predictive Analytics 123
• Rating method
• Borda count
• L2 metric method
1/2
n
L2 (k ) =
∑
j =1
( x jk − y j )
, (5.3)
where
k denotes the prediction model, j denotes the criterion, and n is the total
number of criteria.
xjk is the value of criterion j for prediction model k.
xj is the ideal value of criterion j.
Therefore, each prediction model will have a score, and the model with the
least L2 score is ranked first, followed by the next smallest L2 score, etc.
TABLE 5.3
Weather Data Variables Description
Variable Name Variable Description Variable Type
Temperature Temperature of the location (°F) Numeric
Heat index Index combining effect of temperature and humidity (°F) Numeric
Dew point Dew point of the location (°F) Numeric
Humidity Humidity of the location (%) Numeric
Pressure Atmospheric pressure (in) Numeric
Visibility Visibility of the location (miles) Numeric
Wind speed Speed of the wind flowing at the location (mph) Numeric
Gust speed Gust speed of the wind at the location (mph) Numeric
Precipitation Amount of precipitation at the location (in) Numeric
Event Special weather condition at the location Binary
event types is 1. The final data set has over 8700 instances with 697 instances
experiencing a weather-related event. A sample of the final data set is shown
in Figure 5.3.
5/2/2013 7 73.90 0.00 69.10 0.85 29.89 10.00 11.50 0.00 0.04 1 1 0 0 0
5/2/2013 8 75.55 0.00 71.35 0.87 29.91 10.00 11.55 0.00 0.00 0 0 0 0 0
5/2/2013 9 73.75 0.00 71.03 0.91 29.91 10.00 10.95 5.48 0.00 1 1 1 0 0
5/2/2013 10 75.83 0.00 71.48 0.87 29.91 10.00 9.20 0.00 0.00 1 1 0 0 0
5/2/2013 11 74.43 0.00 71.03 0.89 29.91 6.50 15.28 11.50 0.00 1 1 0 0 0
5/2/2013 12 74.43 0.00 70.13 0.92 29.91 0.85 9.80 0.00 0.26 1 1 1 0 0
5/2/2013 13 71.68 0.00 69.84 0.94 29.90 2.60 8.30 0.00 0.82 1 1 1 1 0
5/2/2013 14 71.43 0.00 69.87 0.95 29.91 5.00 3.87 0.00 0.03 1 1 1 0 0
5/2/2013 15 69.57 0.00 66.43 0.90 29.90 3.33 8.47 6.13 0.16 1 1 1 0 0
FIGURE 5.3
Sample data table for historical weather.
Big Data Analytics Using Multiple Criteria Decision-Making Models
Temp Heat Index Dew Point Humidity Pressure Visibility Wind Speed Gust Speed Precipitation Event Rain Thunderstorm Fog Tornado Binary MI
73.9 0 61 0.64 30.18 10 8.1 0 0 0 0 0 0 0 0
73.2 0 63.3 0.71 30.2 8.5 8.1 0 0 1 1 0 0 0 1
70 0 66 0.87 30.21 3 6.9 0 0.02 1 1 0 0 0 1
68 0 64.9 0.9 30.18 6 4.6 0 0.05 1 1 0 0 0 0
69.1 0 66 0.9 30.15 5 5.8 0 0.02 1 1 0 0 0 0
69.27 0 65.53 0.88 30.13 4.33 5.8 0 0.06 1 1 0 0 0 1
71.1 0 66 0.84 30.12 7 0 0 0 1 1 0 0 0 0
70 0 66.9 0.9 30.11 7 0 0 0.01 0 0 0 0 0 1
71.35 0 66.1 0.84 30.11 10 3.5 0 0 0 0 0 0 0 1
69.45 0 66.1 0.89 30.12 6.5 6.35 0 0 1 1 0 0 0 0
Multicriteria Evaluation of Predictive Analytics
FIGURE 5.4
Sample data table for prediction model.
127
128 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 5.4
Criteria Values for Prediction Models
Prediction Model Accuracy AUC Sensitivity Specificity Precision
Decision tree 0.698 0.661 0.923 0.211 0.718
Random forest 0.713 0.693 0.896 0.313 0.746
Boosting 0.710 0.720 0.906 0.288 0.734
Support vector machines 0.706 0.672 0.938 0.204 0.718
Logistic regression 0.683 0.710 0.910 0.219 0.691
Artificial neural networks 0.706 0.711 0.922 0.239 0.724
TABLE 5.5
Summary of Scaled Results
Prediction Model Accuracy AUC Sensitivity Specificity Precision
Decision tree 0.979 0.918 0.984 0.676 0.963
Random forest 1.000 0.962 0.955 1.000 1.000
Boosting 0.997 1.000 0.966 0.921 0.983
Support vector machines 0.990 0.934 1.000 0.652 0.963
Logistic regression 0.958 0.986 0.970 0.699 0.927
Artificial neural networks 0.991 0.988 0.983 0.764 0.971
TABLE 5.6
Criteria Weights Using Rating and Borda Count Method
Weights Using Weights Using Borda
Criteria Rating Method Count Method
Accuracy 0.17 0.35
AUC 0.22 0.32
Sensitivity 0.22 0.07
Specificity 0.15 0.09
Precision 0.24 0.17
The overall score and the corresponding rank for each prediction model
obtained using the three MCDM ranking methods are presented in
Table 5.7.
Note that the final scores of each prediction model obtained using rating
method and Borda count method are calculated by multiplying the scaled
weights of the criterion with the corresponding criterion value.
Based on the analysis of data presented in Table 5.7, it is evident that r andom
forest, boosting, and ANNs are the best methods as they are consistently
TABLE 5.7
Results from L2 Metric
MCDM Ranking Methods
Rating Method Borda Count L2 Metric
Final Final Final
Prediction Model Score Rank Score Rank Score Rank
Decision tree 0.9174 6 0.9298 6 0.540 5
Random forest 0.9817 1 0.9847 2 0.8240 1
Boosting 0.9761 2 0.9866 1 0.7650 2
Support vector machines 0.9227 4 0.9378 5 0.537 6
Logistic regression 0.9205 5 0.9392 4 0.560 4
Artificial neural networks 0.9497 3 0.9657 3 0.637 3
130 Big Data Analytics Using Multiple Criteria Decision-Making Models
ranked in the top three. Since random forest is ranked first under the rating
method and L2 metric method, it is regarded as the best classifier to predict
outages for the given data set.
Parameters
Rwf Total man-hours required by worker of type w to repair failure type f
Aw Total workers of type w available for repairs
Qwf 1 if failure type f requires worker of type w; 0 otherwise
M Large positive number
Decision variables
xwf Number of workers of type w hired to repair failure type f
Minimize z1 = ∑∑x
f ∈F w ∈W
wf . (5.4)
Multicriteria Evaluation of Predictive Analytics 131
Rwf
Minimize z2 = ∑ ∑ x
f ∈F w ∈W
.
wf
(5.5)
∑xf
wf ≤ Aw ∀ w ∈W . (5.6)
The model can be solved using solution techniques such as goal pro-
gramming and ε-constraint method to obtain a set of efficient solutions
that provide a trade-off between staffing costs and restoration time. The set
of efficient solutions can then be presented to a decision maker to obtain
the best compromise solution that meets the need of the electric utility
company.
5.8 Conclusions
In the United States, power outages cause billions of dollars in losses. This
chapter aims at predicting the power outage occurrences accurately for
an electric utility company that serves over 9 million people in the United
States. Several machine-learning classifiers are used to predict the outages
using the hourly weather forecasts. The machine-learning classifiers are
later ranked based on different metrics, namely, accuracy, AUC, sensitiv-
ity, specificity, and precision using MCDM techniques.
Based on the analysis using multicriteria ranking methods, it was evi-
dent that random forest was the best method to predict the power outage
occurrences as it was ranked first by two out of the three MCDM ranking
techniques. In addition, an MCMP model was presented to determine the
appropriate staffing levels using the outputs of the prediction models. In
future work, we plan to develop prediction models to measure the intensity
of the outage, and solve the MCMP model to obtain the set of efficient solu-
tions that presents the trade-off between staffing costs and restoration time.
References
Balijepalli, N., Venkata, S. S., Richter Jr, C. W., Christie, R. D., and Longo, V. J. 2005.
Distribution system reliability assessment due to lightning storms. IEEE
Transactions on Power Delivery, 20(3), 2153–2159.
Boger, Z. and Guterman, H. 1997. Knowledge extraction from artificial neural network
models (Vol. 4, pp. 3030–3035). IEEE International Conference on Systems, Man,
and Cybernetics. Computational Cybernetics and Simulation, IEEE, Orlando, FL.
Braspenning, P. J., Thuijsman, F., and Weijters, A. J. M. M. 1995. Artificial Neural
Networks: An Introduction to ANN Theory and Practice (Vol. 931). Springer Science
& Business Media, Berlin, Germany.
Breiman, L. 2001. Random forests. Machine Learning, 45(1), 5–32.
Campbell, R. J. 2012. Weather-related power outages and electric system resiliency.
Congressional Research Service, Library of Congress, Washington, DC.
Multicriteria Evaluation of Predictive Analytics 133
CONTENTS
6.1 Introduction................................................................................................. 136
6.1.1 Seasonality....................................................................................... 137
6.1.2 Stationarity and Nonstationarity................................................. 137
6.1.3 Overview of ARMA Models......................................................... 137
6.1.4 Brief Literature Review on ARIMA Models............................... 138
6.2 Proposed Multiobjective Deterministic Pseudo-Evolutionary
Algorithm..................................................................................................... 139
6.2.1 Step-by-Step Procedure of the Proposed MDPEA..................... 139
6.2.1.1 Phase 1: MAPE Being the Primary Objective.............. 140
6.2.1.2 Phase 2: MaxAPE Being the Primary Objective.......... 142
6.2.1.3 Phase 3: Generating the Combined Netfront with
Respect to the Training Period....................................... 144
6.2.1.4 Phase 4: Generating the Combined Netfront
with Respect to the Test Data Set (of Size I″
Time Periods) from the Models Which Form the
Netfront Corresponding to the Training Period......... 145
6.2.1.5 Phase 5: Stop..................................................................... 147
6.3 Computational Evaluation of the Proposed MDPEA............................ 148
6.3.1 Data Set............................................................................................. 148
6.3.2 Multiobjective Netfront for a Retail Segment Sales Data
(90:10 with Respect to the Split of Training Data Set:
Test Data Set)................................................................................... 148
6.4 Summary and Conclusions....................................................................... 152
Acknowledgments............................................................................................... 152
References.............................................................................................................. 152
135
136 Big Data Analytics Using Multiple Criteria Decision-Making Models
6.1 Introduction
Autoregressive integrated moving average (ARIMA) method is a generaliza-
tion of autoregressive moving average (ARMA) models developed by Box
and Jenkins (1970). ARIMA method provides a parsimonious description of
the stationary data in terms of two polynomials, one for the autoregression
and the other for the moving average. In case of nonstationary data, an ini-
tial differencing step (corresponding to the integrated part of the model) is
applied to reduce nonstationarity. Nonseasonal ARIMA models are denoted
by ARIMA (p, d, q) where parameters p, d, and q are nonnegative integers,
p is the order of the autoregressive part, d is the degree of differencing,
and q is the order of the moving average part of ARIMA model. Seasonal
ARIMA models are denoted as ARIMA (p, d, q) (P, D, Q)m, where m refers to
the number of periods in each season, and P, D, Q refer to the autoregres-
sive, differencing, and the moving average terms for the seasonal part of the
ARIMA model.
In big data analytics, we encounter the forecasting problem that deals with
large data sets. In most business scenarios, the older data might be less use-
ful in building the forecast models and more weightage has to be given to
the immediate past; this is where ARIMA, seasonal ARIMA, and hybrid
ARIMA models come into play. Using batch-processing technologies such
as Apache’s Hadoop to handle the volume component of big data and using
stream-processing technologies such as Apache’s Spark to handle the veloc-
ity component of big data, we could convert the unstructured data into struc-
tured data. Once the data is structured, a forecasting model can be used to
arrive at forecasts, either with a single objective or with respect to multiple
objectives.
The usual industry requirement for time series forecasting has objectives
such as reducing the average error in predictions for a time period, as well
as restricting the maximum error that can pop up in any given period. A
typical example would be inventory management where the objective of the
time series model would be that the average inventory of products in any
quarter should be minimum and at any point of time, the inventory should
not go beyond a threshold based on the capacity of the warehouse. At times,
such objectives are conflicting since a model, which has the least mean abso-
lute percentage error (MAPE), does not necessarily give the least maximum
absolute percentage error (MaxAPE; true in most of cases). This chapter
addresses the multiple criteria decision analysis involved in the time series
forecasting, by using the ARIMA model. The algorithm detailed in Section
6.2 is scalable to any number of time series, and it can cater to present indus-
trial requirement of forecasting the sales at stock-keeping unit level (which
is very large in fast moving consumer goods [FMCG] or any other consumer
goods sector).
Multiobjective Forecasting 137
6.1.1 Seasonality
Seasonal components consist of effects that are reasonably stable with respect
to timing, direction, and magnitude. Seasonality in a time series can be iden-
tified by regularity in crests and troughs, which have consistent direction
and magnitude, relative to the trend. Commonly employed approaches to
modeling seasonal patterns include the Holt–Winters exponential smooth-
ing model and ARIMA model.
∑
N −k
(Yi − Y )(Yi + k − Y )
rk = i =1
. (6.1)
∑
N
(Yi − Y )2
i =1
The PACF gives the partial correlation of a time series with its own lagged
values, controlling for the values of the time series at all shorter lags.
yt = c + α 1 yt −1 + + α p yt − p + et ; or yt = c + ∑α y i t−i + et , (6.2)
i =1
y t = α 0 + α 1 y t − 1 + + α p y t − p − β 0 e t − β 1e t − 1 − − β q e t − q . (6.4)
Seasonal ARIMA (that is, SARIMA(p, d, q)(P, D, Q)m) model can be
expressed as below, where p, d, q, P, D, and Q are nonnegative integers and
m is periodicity:
exponential smoothing model. Empirical results from the study showed that
the SARIMA model produced more accurate short-term forecasts.
In this chapter, we attempt to improve on the basic ARIMA model with
the consideration of multiple objectives such as the minimization of MAPE
and MaxAPE. We develop offspring time series from the best parent ARIMA
models by considering the fitness values of parent ARIMA models. These
fitness values are computed using the appropriate primary objective func-
tion, namely, MAPE/MaxAPE; and considering the parent ARIMA models,
coupled with their relative fitness values, we deterministically generate the
offspring time series.
p ∈ {0,1, 2, 3, 4, 5},
q ∈ {0,1, 2, 3, 4, 5},
d ∈ {0,1},
(P , D,Q) ∈ {(0, 0, 0), (0,1,1), (1,1, 0), (1,1,1)}, an
nd
m = 7.
Step 2: Run the SARIMA model for every combination of (p, d, q,)(P, D, Q)m,
and choose the best N parent models with respect to MAPE and arrange
them in the nondecreasing order of their respective MAPE.
Denote the following:
• Actual data point of the time series as yi, where i = 1, 2, 3, …, I′, cor-
responding to the training set.
• Predicted value of nth parent model, with respect to data point i (i.e.,
p,I ′
time period i) corresponding to training set, as y1n , i for n = 1, 2, 3, …,
N, and i = 1, 2, 3, …, I′.
Multiobjective Forecasting 141
• The MAPE with respect to nth parent time series considering the
training set is given as follows:
I′
p ,I ′
y1n ,i − yi
E1 p ,I ′
n
=
∑
i =1
yi
1
× 100 × .
I′
(6.6)
• The MaxAPE with respect to nth parent time series (chosen with the
primary objective of minimizing the MAPE) considering the train-
ing set is given as follows:
y1p , I ′ − y
i
p,I ′ n,i (6.7)
ξ1 n = max i × 100 .
yi
Step 3: Do the following to generate offspring time series with the primary
objective of minimizing the MAPE, by considering ( n′ − 1) parent time series
at a time to generate a corresponding offspring time series, and by consider-
ing the relative fitness of these chosen parents:
Step 3.3: Calculate the relative fitness of nth parent time series, where
1 ≤ n ≤ n′ − 1:
f1 n, n′
f 1′n , n ′ = . (6.9)
∑
n ′− 1
f 1n ′′ , n ′
n ′′= 1
Step 3.4: Generate the predicted values with respect to offspring, that
is, the offspring time series n″ with respect to training set, where
n″ = n′ − 2:
∑ ( y1 )
n ′− 1
o,I ′ p,I ′
y1n ′′ , i = n,i
× f 1′n , n ′ , for i = 1, 2, 3, … , I ′. (6.10)
n =1
142 Big Data Analytics Using Multiple Criteria Decision-Making Models
Step 3.5: Set n′ = n′ + 1 and repeat step 3.2 through 3.5 up to n′ = N.
Step 4: Calculate MAPE and MaxAPE values for n″th offspring time series
with respect to the training set:
I′
o ,I ′
y1n′′ ,i − yi
E1 o ,I ′
n′′
=
∑
i =1
yi
1
× 100 × ,
I′
(6.11)
y1o , I ′ − y
i
n ′′ , i
ξ1on,′′I ′ = max i × 100 for n′′ = 1, 2, 3, … ,( N − 2). (6.12)
yi
p ∈ {0, 1, 2, 3, 4, 5},
q ∈ {0, 1, 2, 3, 4, 5},
d ∈ {0, 1},
(P , D, Q) ∈ {(0, 0, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1)}, and
m = 7.
Step 2: Run SARIMA models for every combination of (p, d, q,)(P, D, Q)m and
choose the best N parent models with respect to MaxAPE and arrange them
in nondecreasing order of their respective MaxAPE.
Denote the following:
I′
p ,I ′
y2n ,i − yi
E2 p ,I ′
n
=
∑i =1
yi
1
× 100 × .
I′
(6.13)
• The MaxAPE with respect to nth parent time series considering the
training period is:
p,I ′
y 2 − yi
n,i
ξ2np, I ′ = max i × 100 . (6.14)
yi
Step 3: Do the following to generate offspring time series with the primary
objective of minimizing the MaxAPE, by considering (n′ − 1) parent time
series:
Step 3.3: Calculate the relative fitness of nth parent model, 1 ≤ n ≤ n′ − 1,
as follows, given n′:
f2 n, n′
f 2′n , n ′ = . (6.16)
∑
n ′− 1
f 2n ′′ , n ′
n ′′= 1
Step 3.4: Generate the predicted values with respect to offspring, that
is, the offspring time series n″ with respect to training set, where
n″ = n′ − 2:
∑ ( y2 )
n ′− 1
o,I ′ p,I ′
y2n ′′ , i = n,i
× f 2′n , n ′ , for i = 1, 2, 3, … , I ′. (6.17)
n =1
Step 3.5: Set n′ = n′ + 1 and repeat step 3.2 through 3.5 up to n′ = N.
144 Big Data Analytics Using Multiple Criteria Decision-Making Models
Step 4: Calculate MAPE and MaxAPE values for n″th offspring time series
with respect to training set:
o,I ′
I′ y2n ′′ , i − yi
E2 o,I ′
n ′′
=
∑
i =1
yi
1
× 100 × ,
I′
(6.18)
o,I ′
y 2 − yi
n ′′ , i
ξ2on,′′I ′ = max i × 100 for n′′ = 1, 2, 3, … ,( N − 2). (6.19)
yi
• The performance of the parent time series models in the test data
set (with MAPE as primary objective), which have earlier entered
the netfront corresponding to the training data set, is denoted by
( )
E1np, I ′′ , ξ1np, I ′′ , n∈Ψ1p. Note that using the parent time series models
and their corresponding (p, d, q) (P, D, Q)m values obtained from the
training data set, we now generate the parent time series (forecasts)
for the test data set, and their performance is denoted as above. The
generated parent time series are checked for possible entry into the
nondominated netfront corresponding to the test period. The non-
dominated parents which enter the netfront corresponding to test
(
period are denoted by their respective performance E1np, I ′′ , ξ1np, I ′′ , )
n ∈ Ω1p.
• The performance of the parent time series models in the test data
set (with MaxAPE as primary objective), which have earlier entered
the netfront corresponding to the training data set, is denoted by
( )
E2np, I ′′ , ξ 2np, I ′′ , n ∈ Ψ2p. Note that using the parent time series mod-
els and their corresponding (p, d, q) (P, D, Q)m values obtained from
the training data set, we generate the parent time series (forecasts)
for the test data set. The generated parent time series are checked
for possible entry into the nondominated netfront correspond-
ing to the test period. The nondominated parents which enter the
( )
netfront corresponding to test period are denoted by E2np, I ′′ , ξ 2np, I ′′ ,
n ∈ Ω2p.
The following steps give the procedure to obtain the offspring time series
with respect to the test data set.
Step 1: MAPE as the primary objective:
Considering the offspring time series in Ψ1o one by one, and using the
corresponding set of parent time series models and their fitness values
146 Big Data Analytics Using Multiple Criteria Decision-Making Models
that have led to the generation of this offspring time series, we do the
following.
Step 1.1: Generate the forecasted time series for the test data set by
considering the parent time series model in the set of parent time
series models that have led to the generation of this offspring time
series; denote such a forecasted time series given by every such
p , I ′′
parent model as y1n , i with respect to the time period i in the test
data set.
Step 1.2: Using the above and the parents’ corresponding fitness val-
ues (that have been obtained in the training period) with respect to
this offspring in Ψ1o and this offspring’s corresponding n″ and n′, we
obtain the corresponding forecasted offspring time series for the test
data set as follows:
∑ ( y1 )
n ′− 1
o , I ′′ p , I ′′
y1n ′′ , i = n,i
× f 1′n , n ′ , for i = 1, 2, 3, … , I ′′. (6.20)
n =1
Note that f 1′n , n ′ is inherited from the training data set (Equation 6.9).
Step 1.3: Calculate:
I ′′
o , I ′′
y1n ′′ , i − yi
E1 o , I ′′
n ′′
=
∑ i =1
yi
1
× 100 × ;
I ′′
(6.21)
y1o , I ′′ − y
i
o , I ′′ n ′′ , i
ξ1 n ′′ = max i × 100 . (6.22)
yi
By repeating the above for every offspring in Ψ1o, we get the correspond-
ing set of offspring time series with respect to the test period. This set of
offspring time series with respect to the test period is checked for pos-
sible entry into the set of nondominated solutions with respect to the test
period. Let the set of nondominated offspring time series in the test period
be denoted by Ω1o. Let the performance of the nondominated offspring
which enter the netfront corresponding to the test period be denoted by
( )
E1on,′′I ′′ , ξ1on,′′I ′′ , n″ ∈ Ω1o.
Step 2: MaxAPE as the primary objective:
Considering the offspring time series in Ψ2o one by one, and using the
corresponding set of parent time series models and their fitness values
Multiobjective Forecasting 147
that have led to the generation of this offspring time series, we do the
following.
Step 2.1: Generate the forecasted time series for the test data set by con-
sidering every parent time series model in the set of parent time series
models that have led to the generation of this offspring time series,
denote such a forecasted time series given by every such parent model
p , I ′′
as y2n , i with respect to the time period i in the test data set.
Step 2.2: Using the above and the parents’ fitness values (that have been
obtained in the training period) with respect to this offspring from
Ψ2o and this offspring’s corresponding n″ and n′, we obtain the cor-
responding forecasted offspring time series for the test data set as
follows:
∑ ( y2 )
n ′− 1
o , I ′′ p , I ′′
y2n ′′ , i = n,i
× f 2′n , n ′ , for i = 1, 2, 3, … , I ′′. (6.23)
n =1
Note that f 2′n , n ′ is inherited from the training data set (Equation 6.16).
Step 2.3: Calculate:
o , I ′′
I ′′ y2n ′′ , i − yi
E2 o , I ′′
n ′′
=
∑ i =1
yi
1
× 100 × ;
I ′′
(6.24)
o , I ′′
y 2 − yi
n ′′ , i
ξ2on,′′I ′′ = max i × 100 . (6.25)
yi
By repeating the above for every offspring in Ψ2 , we get the correspond- o
ing set of offspring time series with respect to the test period. This set of
offspring time series with respect to the test period is checked for possi-
ble entry into the set of nondominated solutions with respect to the test
period. Let the set of nondominated offspring time series in the test period
be denoted by Ω2o. Let the performance of the nondominated offspring
which enter the netfront corresponding to the test period be denoted by
( )
E2on,′′I ′′ , ξ 2on,′′I ′′ , n″ ∈ Ω2o.
14.0K
12.0K
10.0K
8.0K
Sales
6.0K
4.0K x axis 1 cm = 1 day
2.0K y axis 1 cm = 2000 units
0.0K
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
Days
FIGURE 6.1
Real-life sales data of a retail segment; time period: 1 year at a day-level granularity.
Multiobjective Forecasting 149
TABLE 6.1
Labels for Parent and Offspring Time Series
Top 20 Parents with MAPE Top 20 Parents with MaxAPE
Label as Primary Objective Label as Primary Objective
P-1 ARIMA(5, 0, 3)(1, 1, 1) P-21 ARIMA(0, 0, 0)(0, 1, 1)
P-2 ARIMA(4, 1, 5)(0, 1, 1) P-22 ARIMA(0, 0, 0)(1, 1, 1)
P-3 ARIMA(5, 1, 4)(0, 1, 1) P-23 ARIMA(0, 0, 4)(0, 1, 1)
P-4 ARIMA(5, 0, 2)(0, 1, 1) P-24 ARIMA(0, 0, 4)(1, 1, 1)
P-5 ARIMA(5, 0, 5)(1, 1, 1) P-25 ARIMA(1, 0, 4)(0, 1, 1)
P-6 ARIMA(5, 1, 5)(1, 1, 1) P-26 ARIMA(2, 0, 3)(1, 1, 1)
P-7 ARIMA(4, 0, 4)(1, 1, 1) P-27 ARIMA(2, 0, 4)(0, 1, 1)
P-8 ARIMA(5, 0, 2)(1, 1, 1) P-28 ARIMA(2, 0, 4)(1, 1, 1)
P-9 ARIMA(4, 0, 5)(0, 1, 1) P-29 ARIMA(2, 0, 5)(1, 1, 1)
P-10 ARIMA(4, 0, 3)(0, 1, 1) P-30 ARIMA(3, 0, 4)(0, 1, 1)
P-11 ARIMA(5, 0, 5)(0, 1, 1) P-31 ARIMA(4, 0, 2)(0, 1, 1)
P-12 ARIMA(5, 1, 3)(0, 1, 1) P-32 ARIMA(4, 0, 2)(1, 1, 1)
P-13 ARIMA(5, 1, 5)(0, 1, 1) P-33 ARIMA(4, 0, 3)(1, 1, 1)
P-14 ARIMA(5, 1, 4)(1, 1, 1) P-34 ARIMA(4, 1, 5)(1, 1, 0)
P-15 ARIMA(4, 0, 4)(0, 1, 1) P-35 ARIMA(4, 1, 5)(1, 1, 1)
P-16 ARIMA(5, 1, 3)(1, 1, 1) P-36 ARIMA(5, 0, 4)(0, 1, 1)
P-17 ARIMA(3, 1, 4)(1, 1, 1) P-37 ARIMA(5, 0, 4)(1, 1, 0)
P-18 ARIMA(5, 1, 1)(1, 1, 1) P-38 ARIMA(5, 0, 4)(1, 1, 1)
P-19 ARIMA(5, 1, 2)(1, 1, 1) P-39 ARIMA(5, 0, 5)(1, 1, 0)
P-20 ARIMA(4, 1, 3)(1, 1, 1) P-40 ARIMA(5, 0, 5)(1, 1, 1)
Label Parent description (MAPE as primary objective)
P-1 to P-20 Top 20 parent SARIMA time series with respect to MAPE as primary
objective
Label Parent description (MaxAPE as primary objective)
P-21 to P-40 Top 20 parent SARIMA models with respect to MaxAPE as primary
objective
Label Offspring time series description (MAPE as primary objective)
O-41 Offspring obtained from top 3 parents with MAPE as primary objective
O-42 Offspring obtained from top 4 parents with MAPE as primary objective
… …
O-70 Offspring obtained from top 32 parents with MAPE as primary objective
Label Offspring time series description (MaxAPE as primary objective)
O-71 Offspring obtained from top 3 parents with MaxAPE as primary objective
O-72 Offspring obtained from top 4 parents with MaxAPE as primary objective
… …
O-100 Offspring obtained from top 32 parents with MaxAPE as primary objective
150 Big Data Analytics Using Multiple Criteria Decision-Making Models
Scenario 1:
In this scenario, we see the formation of nondominated solutions when we
consider only 20 offspring, with respect to MAPE and MaxAPE. We compare
O-41 to O-60 and O-71 to O-90 offspring time series with P-1 to P-40 par-
ent time series considering the training period and we create a netfront of
nondominated time series (Figure 6.2). The parent and offspring time series,
which enter this netfront, are evaluated in the test period and the final net-
front is created as shown in netfront test period chart (Figure 6.3). We refrain
from labeling all the data points due to congestion of data points.
As we can see from Figures 6.2 and 6.3, both parent and offspring time
series have entered the netfront of nondominated solutions in the training
set; however, of those time series that have entered the netfront in the training
set, only one offspring time series has entered the netfront of nondominated
20.60%
O-42 P-1
O-45
P-4
MaxAPE
O-48
O-49 P-33
O-50
O-90
O-89
O-77
O-76
O-71
O-72
18.60%
2.95% 3.45%
MAPE
FIGURE 6.2
Netfront corresponding to the training period with consideration of (O-41 to O-60) and (O-71
to O-90) offspring time series, (P-1 to P-20) and (P-21 to P-40) parent time series.
30.60%
28.60%
26.60% O-72
MaxAPE
24.60%
22.60%
20.60%
18.60%
2.95% 3.45% 3.95% 4.45% 4.95% 5.45% 5.95%
MAPE
FIGURE 6.3
Netfront corresponding to the test period.
Multiobjective Forecasting 151
20.60%
O-70
O-69
O-42 O-68
O-45 P-1
MaxAPE
O-48 P-4
O-49
O-50 P-33
O-100
O-99 O-77
O-76
O-71
O-72
18.60%
2.80% 3.20% 3.60%
MAPE
FIGURE 6.4
Netfront corresponding to the training period with consideration of (O-41 to O-70) and (O-71 to
O-100) offspring time series, and (P-1 to P-20) and (P-21 to P-40) parent time series.
solutions in the test set, when dealing with multiple objectives, that is, MAPE
and MaxAPE.
Scenario 2:
In this scenario, we see the formation of nondominated solutions when
we consider a set of 30 offspring, with respect to MAPE and MaxAPE. We
compare O-41 to O-70 and O-71 to O-100 offspring time series with P-1 to
P-40 parent time series considering the training period and create a netfront
of nondominated time series as shown in netfront training period chart
(Figure 6.4). The parent and offspring time series which enter this netfront
are evaluated in the test period and the final netfront is created as shown in
netfront test period chart (Figure 6.5).
30.60%
28.60%
26.60% O-72
MaxAPE
24.60%
22.60%
20.60%
18.60%
2.80% 3.20% 3.60% 4.00% 4.40% 4.80% 5.20% 5.60% 6.00%
MAPE
FIGURE 6.5
Netfront corresponding to test period for offspring.
152 Big Data Analytics Using Multiple Criteria Decision-Making Models
As we can see from Figures 6.4 and 6.5, both parent and offspring time
series have entered the netfront of nondominated solutions in the training set;
however, of those time series that have entered the netfront in the t raining set,
only one offspring time series has entered to the netfront of nondominated
solutions in the test set, when dealing with multiple o
bjectives, that is, MAPE
and MaxAPE.
1.
Accuracy: It is evident from the computational experiments that the
MDPEA is able to produce the offspring time series that perform
better than the parent time series with respect to the multiple objec-
tives of minimizing the MAPE and MaxAPE.
2.
Speed: MDPEA gets the best of both worlds, that is, it exploits the
simplicity of a deterministic model and uses the inheritance feature
from the evolutionary algorithms.
3.
Compatibility: The offspring forecasts from this algorithm can be
used as inputs to ANN to improve the performance.
Acknowledgments
The authors are grateful to the editors and reviewers for their suggestions
and comments to improve the earlier version of the chapter.
References
Box, G.E.P. and Jenkins, G.M. 1970. Time Series Analysis Forecasting and Control.
Holden-Day, San Francisco, CA.
Multiobjective Forecasting 153
CONTENTS
7.1 Introduction................................................................................................. 155
7.2 Energy Models for Microgrids.................................................................. 158
7.2.1 Energy Operations Model............................................................. 159
7.2.1.1 Microgrid Operations Optimization Example............ 161
7.2.2 Microgrids Design Optimization................................................. 164
7.2.2.1 Example for Microgrids Design..................................... 165
7.2.3 Multiperiod Energy Model............................................................ 166
7.2.3.1 Multiperiod Energy Planning Optimization
Example............................................................................. 167
7.2.4 Bicriteria Optimization of Microgrids......................................... 169
7.2.4.1 Bicriteria Microgrid Operations Example.................... 169
7.3 Conclusion................................................................................................... 171
References.............................................................................................................. 171
7.1 Introduction
Microgrid is defined as a set of local energy generators, energy transmitters,
energy storages, and users that can work either independent (islanded mode)
from the main grid or in connection with the main grid. In practice, main
grids could function as a backup system for microgrids. Microgrid concept
was originally presented as a solution for creating a sustainable, green, and
efficient energy model; see Zamora and Srivastava (2010).
Renewable energy resources have had increasing and accelerating pen-
etration rate in microgrids since the turn of the century. However, the inter-
mittent nature of renewables such as solar and wind has always been a
challenge for their integration in microgrids. For example, electricity gener-
ated from solar panels is impacted by factors such as weather conditions and
time of day. Fortunately, energy storages can decrease this impact by add-
ing more flexibility to the system and by alleviating the imbalance between
155
156 Big Data Analytics Using Multiple Criteria Decision-Making Models
energy supply and demand. There has been promising progress toward
building and deploying energy storages in microgrid scale and with reason-
able costs.
Microgrids are becoming more and more intelligent and connected to each
other. Generators, storages, and users are equipped with the sensors that can
measure and monitor parameters of the system at every spot and in every
second. Entities are also connected to each other through internet or intranet
where they can communicate to each other. A controller (such as heating,
ventilation, and air conditioning [HVAC] in buildings) can then receive these
data and adjust the parameters of the system (e.g., energy usage) in real time.
These set of sensors generate huge amount of data that can grow exponen-
tially within a short span of time. Energy big data and renewable resources
amplify the urgency for designing more efficient and flexible microgrids.
Energy big data contains variety of data sets, including but not lim-
ited to weather data, energy market data, geographical data (e.g., global
positioning system [GPS] information), and field measurement data (e.g.,
device status, electricity consumption, storage level, etc.). These sets of
data are blended together and then classified in order to generate use-
ful information and insights about the whole system. For example, GPS
data help to visualize locational marginal price in a geographic context.
Common sources of big data in microgrids are smart appliances and
metering points throughout the grid. Metering points can generate new
set of data as frequent as every 5–15 minutes. These points generate large
volume of data which makes communication, complexity, and data storage
an inefficient and costly process; see Diamantoulakis et al. (2015). Data is
also being generated from generators, transformers, and local distribu-
tors. In addition, information from grid monitoring and maintenance are
generated on a regular basis. The volume of collected data from a grid
can easily reach to terabytes over a year. As a result, data-mining and
machine-learning tools are required to refine the data, discover meaning-
ful patterns in data, and generate useful information from it. Information
and insights are then applied to analyze energy price fluctuations in real
time, to analyze energy generators status, and to plan and monitor energy
system’s status.
According to Zhou et al. (2016), seven steps for managing big energy data
are: data collection, data cleaning, data integration, data mining, visualiza-
tion, intelligent decision making, and smart energy management. Given the
large volume of generated data and its dynamic nature, the data processing
would be a daunting and challenging task.
Data-mining methods such as regression, clustering, neural network, and
support vector machines are important tools for extracting useful and rel-
evant information from a large pool of data. Regression models, time series
models, and state–space models are among the most popular short-term
forecasting methods; see Kyriakides and Polycarpou (2007). Also, forecast-
ing can predict the demand accurately and optimization tool can achieve
A Class of Models for Microgrid Optimization 157
FIGURE 7.1
A simple microgrid.
Microgrid 1 Microgrid 5
Microgrid 6
Microgrid 2
Microgrid 4
Microgrid 3
FIGURE 7.2
System of microgrids.
Alarcon-Rodriguez et al. (2010), Shabanpour and Seifi (2015), and Sharafi
and ELMekkawy (2014).
Net flow = All outgoing energy flows − All incoming energy flows
The sign of net flow for each entity signifies the nature of that entity.
Specifically,
TABLE 7.1
Notations and Symbols Used for Microgrids
Parameter Index
Ki: Maximum capacity of gen. i i: Origin entity
D: Set of energy demand for users j: Destination entity
gi: Cost per unit of energy for gen. i
Decision Variable
qp, max: Maximum capacity of storage p
xij: Amount of energy from
qp, min: Minimum capacity of storage p
entity i to entity j
Δsp: Projected gain in dollars of energy price for next period
qp: Energy level of storage p
cij: Energy transportation cost from entity i to entity j
and applications of Model 1, see Malakooti et al. (2013), Malakooti (2014),
Sheikh and Malakooti (2011, 2012), Sheikh (2013), and Sheikh et al. (2014, 2015).
Model 1: Energy operation optimization
Minimize f1 = ∑ ∑cx
for all i for all j
ij ij (7.1)
∑g ∑x
for all i
i
for all j
ij (7.2)
G ∑ s q .
for all p
p p (7.3)
Subject to:
∑x
for all i
ij ≥ Dj for all j demand constraint for users (7.4)
∑ x ≤k
for all j
ij i for all i capacity constraint for generators (7.5)
q p ,0 + ∑x −∑x
for all j
ij
for all i
ij = qp for all p (energy level of storagees) (7.6)
Gen. 1 User 1
K1= 1000 Saver 1 D1= 400
User 2
D2= 450
Gen. 2
K2 = 950
Saver 2
User 3
Gent 3 D3=350
K2 = 120
FIGURE 7.3
Illustration for systems operations example with Δs1 = +1 and Δs2 = +2.
TABLE 7.2
Data for Microgrid Operations
User j User j
cij 1 2 3 cij 1 2 3
Gen. i 1 6 5 8 Storage i 1 1 2 1
2 7 4 5 2 2 0.5 1.5
3 5 4 6 Demand Dj 400 450 350
Δs2 = +2. The next period’s energy price is predicted to be higher for storages
1 and 2 (i.e., buy energy or Δsp > 0 for all storages).
Formulation for the microgrids operation is shown below. This formula-
tion is used to find the optimal solution that minimizes the total cost for the
energy operation example.
Optimal distribution of energy for above microgrid is as follows. Lingo 13
was used to find the optimal solution for this problem.
A Class of Models for Microgrid Optimization 163
400 units
100 units
450 units
90 units
FIGURE 7.4
Solution for systems operations example.
164 Big Data Analytics Using Multiple Criteria Decision-Making Models
400 units
100 units
100 units
130 units
FIGURE 7.5
Solution for systems operations example with Δs1 = −1 and Δs2 = −2.
If the future prices were predicted to be lower than the current prices,
Δs1 = −1 and Δs2 = −2, the solution as presented in Figure 7.5 would be: q1 = 50,
q2 = 20, and f1 = $47,040.
+ ∑c u . i i (7.10)
i =1
∑x
for all j
ij ≤ K iu i for i = 1, … , I (generator investment constraints),, (7.11)
A Class of Models for Microgrid Optimization 165
where
For further explanation regarding Models 1 and 2, see Malakooti et al.
(2013, 2014).
Model 2: Microgrid design optimization
Gen. 1
K1= 500 Saver 1 1300 units
Saver 2
Gen. 3
K2= 900
Gen. 4
K3= 950
FIGURE 7.6
Energy systems design.
166 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 7.3
Information for Microgrids Design
User j User j
cij 1 2 cpj 1 2
Gen. i 1 4 4 Storage p 1 3.5 3
2 5 3 2 4 3.5
3 5 4 Demand Dj 1300 1400
4 4 6
TABLE 7.4
Solution for Microgrids Design
950 units
1000 units
400 units
350 units
350 units
FIGURE 7.7
Solution for energy design example.
q p , t −1 + ∑x −∑x
for all j
ijt
for all i
ijt = q p ,t for all p (energy level of storages), (7.12)
where q p,t−1 and q p,t represent energy level in storage p at the end of period
t − 1 (or start of period t) and the end of period t, respectively. The total gain
for future savings is not used in this formulation since future prices of given
periods are known.
We introduce two approaches for solving multiperiod energy model: (1)
Period-by-period approach which is used when energy price and demands per
period are only known at the beginning of that period. In this case, the energy
model is solved at the beginning of each period; (2) Aggregate multiperiod
approach which is used when energy price and demands of a given number of
future periods are known at the beginning of the first period. If the data for all
upcoming periods are known at the beginning, multiperiod energy operation
can be solved in one step. In this case, energy stored in storages at the end of
period t − 1 will be considered as the initial energy level for period t.
TABLE 7.5
Information for Night-Period Energy Operation
Predicted Predicted
Change Change
in Energy in Energy Gen- Gen- Gen-
Price for Price for erator 1 erator 2 erator 3
Demand Demand Demand Storage 1, Storage 2, Costs Costs Costs
Period User 1 User 2 User 3 Δs1,t Δs2,t (g1,t) (g 2,t) (g3,t)
1, Night 200 250 100 4 3 44.5 44 43.5
TABLE 7.6
Information for Day-Period Energy Operation
Predicted Predicted
Change Change
in Energy in Energy Gen- Gen- Gen-
Price for Price for erator 1 erator 2 erator 3
Demand Demand Demand Storage 1, Storage 2, Costs Costs Costs
Period User 1 User 2 User 3 Δs1,t Δs2,t (g1,t) (g 2,t) (g3,t)
2, Day 500 600 300 −4 −3 44.5 44 43.5
TABLE 7.7
Solution for Multiperiod Energy Problem
Solution
Problem I: Minimize f1 subject to Constraints of energy operation model f1,min f2,max
Problem II: Minimize f2 subject to Constraints of energy operation model f1,max f2,min
Varying f2 over its range will result in many alternatives with a wide range
of solutions. The decision maker will be able to select the best alternative by
specifying a value for f2.
TABLE 7.8
Information for Bicriteria Microgrid Example
User j User j
cij, Trans. Cost cpj, Trans. Cost
from i to j 1 2 3 4 from p to j 1 2 3 4
Gen. i 1 6 5 8 4 Storage p 1 1 2 1 2
2 7 4 5 7 2 2 0.5 1.5 1
Demand Dj 600 800 400 300
presented for this example. Note that environmental impacts of storages are
not considered in this example (Table 7.8).
First, the range of the criteria must be found. Then, the criteria must be
calculated when minimizing only f1, the total cost. This model is as follows:
In the next step, an evenly distributed range of values for f2 is found. The
following five evenly distributed values of f2 are used: 384.4, 637.5, 1021.5,
1405.5, and 1651.6.
TABLE 7.9
Optimal Solution Considering Single Objectives
Solution
Problem I: Minimize f1 subject to Constraints of Energy Operation Model 83,886 1651.6
Problem II: Minimize f2 subject to Constraints of Energy Operation Model 94,806 384.4
A Class of Models for Microgrid Optimization 171
TABLE 7.10
Solution for Bicriteria Microgrid Example
For example, for f2 = 1651.6, the following model is solved to find the opti-
mal solution for bicriteria energy operations example.
The solution to this bicriteria microgrids example is shown in Table 7.10.
The total cost (f1) and total environmental impact (f2) for this solution are
f1 = $83,886, and f2 = 1651.6, respectively. This solution shows that generator
1 generates a total of 316 units of energy while generator 2 generates 1800
units of energy.
7.3 Conclusion
Integrating renewable energy resources with microgrids requires develop-
ing flexible and efficient energy systems. In this chapter, energy operation
optimization problem was used as base for developing other applicable mod-
els such as multiple period microgrid operation model and bicriteria opera-
tion model. Linear structure of these models enables us to solve medium- to
large-scale problems in a matter of few seconds to few minutes. All presented
models can be applied as a base for more complex microgrid operation and
design problems where parameter values are subject to change every few
minutes. Bicriteria energy model can also be expanded to multicriteria model
by incorporating additional objectives such as reliability of energy system or
thermal comfort of energy users; see Sheikh (2013); Sheikh et al. (2014).
References
Alarcon-Rodriguez, A., Ault, G., and Galloway, S. 2010. Multi-objective planning of
distributed energy resources: A review of the state-of-the-art. Renewable and
Sustainable Energy Reviews, 14(5), 1353–1366.
172 Big Data Analytics Using Multiple Criteria Decision-Making Models
CONTENTS
8.1 Introduction................................................................................................. 176
8.1.1 Impact of Banks on the Society..................................................... 176
8.1.2 Risks Associated with Lending.................................................... 176
8.1.3 Loan Approval Process.................................................................. 177
8.1.4 Problem Statement and Objective................................................ 177
8.2 Literature Review....................................................................................... 179
8.2.1 Predictive Analytics Approach for Loan Approvals................. 180
8.2.2 Motivation........................................................................................ 181
8.3 Methodology............................................................................................... 181
8.3.1 Predicting the Risk Associated with Each Applicant................ 182
8.3.1.1 Data Description.............................................................. 182
8.3.1.2 Data Cleansing................................................................. 183
8.3.2 Training and Testing Using Machine-Learning Algorithms......184
8.3.2.1 Logistic Regression.......................................................... 184
8.3.2.2 Artificial Neural Network.............................................. 184
8.3.2.3 Random Forests................................................................ 186
8.3.2.4 Stochastic Gradient-Boosted Decision Trees................ 188
8.3.2.5 Stacking............................................................................. 189
8.3.3 Evaluating a Learning Algorithm................................................ 189
8.3.4 MCMP Model.................................................................................. 191
8.3.4.1 Model Objectives.............................................................. 192
8.3.4.2 Model Constraints........................................................... 193
8.3.4.3 Solution Approach: Multiobjective Optimization
Using ε-Constraint Method............................................ 195
8.3.4.4 Demand Uncertainty in Loan Application Requests....196
8.4 Experimental Results................................................................................. 197
8.4.1 Input Parameters of Machine-Learning Algorithms................. 197
8.4.2 Analysis............................................................................................ 198
175
176 Big Data Analytics Using Multiple Criteria Decision-Making Models
8.1 Introduction
In the United States, one in three Americans experiences financial prob-
lems and may not be able to completely fulfill his/her needs using his/
her savings or income (Soergel, 2015). Financial institutions, such as banks,
lend money to qualified borrowers to help them meet their credit needs and
the borrowers are expected to repay the loan in installments over a certain
time period. Irrespective of the size of the bank, loans contribute a major
portion of the bank’s total asset. For large commercial banks, such as Wells
Fargo, JP Morgan, Citibank, and Bank of America, loans as a percentage of
asset size is nearly 45%. For medium-sized banks, such as Capital One and
PNC, loans contribute about half of their total asset (Perez, 2015). Therefore,
banks use a major portion of deposited funds to issue different types of
loans (e.g., credit card loans, mortgage loans, auto loans) and earn a profit
by collecting interest on the loaned amount. The profit earned from a loan
is the difference between the total amount of money collected from the bor-
rower throughout the loan repayment period and the amount lent to the
borrower.
banks suffered huge losses due to late payment or loan default resulting in
financial instability. For instance, the net loss for Bank of India in the sec-
ond quarter of 2015 due to bad loans was about $170 million (Tripathy, 2015).
In late 2000s, banks issued a large sum of money toward home and personal
loans without considering the risk of bad loans. Due to this, banks could
not derive profits from all the approved loans leading to a financial crisis
and a significant decrease in the loan approval rate. The decrease in the loan
approval rate led to a fall in house prices and as a result, the borrowers had
to sell more assets to repay their loans. This was one of the major causes of
the recession in 2009 and could have been avoided if the banks had carefully
chosen their borrowers.
Preapproval
documents pass No Deny the loan application
initial verification?
Yes
Underwriter decides
No
to approve loan?
Yes
Loan approved
FIGURE 8.1
Steps involved in loan approval process.
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 179
• Analyze existing loan applicant data to predict the risk type (high
risk or low risk) of forthcoming applicants using machine-learning
algorithms.
• Develop a set of efficient loan portfolio that presents a risk–return
trade-off and meets the strategic goals of the financial institutions
using multicriteria mathematical programming (MCMP).
approvals may vary depending on the type of loan. For instance, factors
such as loan amount, total income, number of dependents, and real estate
securities are major factors that impact the approval of auto loans (Hill,
2014), and factors such as history of late payments, credit scores, payment
for existing debt, and loan tenure impact the approval of mortgage loans
(Choi, 2011).
8.2.2 Motivation
In reality, financial institutions offer different types of loans (such as
mortgage loans, commercial loans) to serve the varying customer needs.
The return varies depending on the type of loan and the risk type of the
borrower. However, most of the existing researches primarily focus on
developing a recommender system for loan approvals and do not take into
account the different types of loans. Moreover, the strategic goals of the
financial institution (e.g., restriction on the total approvals for a particular
type of loan) are seldom taken into consideration. Also, most of the previ-
ous researches consider only single loan applications and ignore the pos-
sibility of joint loan applications (application with coborrower). If the
applicant has a bad credit score, then the probability of loan acceptance can
be increased by having a cosigner with a good credit score because most
banks use the better of the two credit scores to determine the eligibility for
loan approval (Somers and Hollis, 1996). Therefore, the present work aims
to address these gaps in the literature. The proposed two-stage decision
support system involves the integration of data analytics with multiple
criteria decision making (MCDM). In the first stage, we evaluate different
machine-learning algorithms to classify the risk type of the applicants. In
the second stage, we propose an integer programming model with mul-
tiple objectives to construct the loan portfolio. The MCMP model uses the
risk type of the borrower as one of the inputs and constructs the portfolio
by considering the different types of loans, strategic goals of the banks,
and both single and joint applications.
8.3 Methodology
The proposed methodology is illustrated using Figure 8.2.
In the first stage, the borrowers are categorized as high risk or low risk by
different machine-learning classifiers using factors such as borrower’s age
and late payment history. The best machine-learning classifier is selected
based on the output performance measures discussed in Section 8.3.3.
It is important for banks to differentiate between high- and low-risk bor-
rowers because the interest rate for all borrowers depends on their risk type.
High-risk borrowers are often charged more interest rate to compensate for
the high probability of loan defaulting (Diette, 2000). In the second stage,
a diversified loan portfolio is developed with the objective of achieving
higher interest rate returns and lower risk using an MCMP model. Since the
risk type of the potential borrowers impacts the return and the risk of the
portfolio, the solutions obtained in the first stage using machine-learning
techniques are used as an input to the second stage.
182 Big Data Analytics Using Multiple Criteria Decision-Making Models
Raw data
Data cleansing
Subject to
Train the classifier Predict the output of
• Achieve diversification on different loan types
the trained classifier
• Achieve age diversification
Using testing data target • Limit the average
outputs and predicted output • Number of dependents in the portfolio
Choose another Evaluate the • Debt-to-income ratio
classifier classifier • Number of open credits of the portfolio
Yes
Evaluate another
classifier?
No
Compare the performance of all
evaluated classifiers
FIGURE 8.2
Overview of research methodology.
TABLE 8.1
Description of Data
Variable Name Description Type
Risk Risk associated with the borrower Binary (high risk or low risk)
Age Age of the borrower (in years) Integer
Debt ratio Ratio of monthly debt payments to Continuous between 0 and 1
monthly gross income
LOC Number of open loans and lines of credit Integer
Income Monthly income of the borrower Continuous
MREL Number of mortgage and real estate loans Integer
Dependents Number of dependents of the borrower Integer
Utilization Ratio of total balance on lines of credit to Continuous between 0 and 1
the total credit limits
30 days Number of times the borrower has been Integer
30–59 days past the due date in the last
2 years
60 days Number of times the borrower has been Integer
60–89 days past the due date in the last
2 years
90 days Number of times the borrower has been Integer
equal to or more than 90 days past the
due date
entered as $1. In addition, there were 29,731 instances in which the monthly
income was missing, and 3924 instances in which the number of dependents
was missing. The presence of erroneous or missing data can significantly
impact the results of the analyses, and therefore, it is important to cleanse the
data before using them as inputs to the machine-learning algorithm. Section
8.3.1.2 describes the various data-cleansing techniques used to handle the
user input errors and missing values.
Discard Variable and Substitute with Median: The features with too many
missing values are removed and the data entry error and missing data for
the remaining features are then substituted with the median value of that
feature.
Discard Incomplete Rows: The rows containing error or missing values are
removed from the data set.
1
P(Risk Type = High) = − ( b0 + b1 * Age + b2 * Debt Ratio + b3 * LOC + b4 * Income ++ b10 * 90 Day ) (8.1)
1+ e
Age
b0
b1
Debt
ratio b2 Logistic
Training data
Final prediction
function
b3
LOC
b10
90 days
FIGURE 8.3
Representation of the logistic regression algorithm.
of layers, namely, an input layer, hidden layer, and output layer and each
layer has a certain number of nodes. Each node (i) in a given layer (l) is con-
nected to each node (j) in the next layer (l + 1) by a connection weight (wij).
In order to train the classifier, the features are given as inputs to the input
layer. Each input value is multiplied by a weight at the input layer, and the
weighted input is relayed to each node in the hidden layer. Each node in
the hidden layer will combine the weighted inputs that it receives, use it
with the activation function (e.g., sigmoid activation), and relay the value
to the nodes in the output layer. The output layer then determines the net-
work output (risk type) by performing a weighted sum of the outputs of the
hidden layer. The process of using the training inputs, hidden layers, and
activation function to compute the response variable (risk type) is called
feed-forwarding or forward pass. Initially, the training process begins with ran-
dom weights. At the end of each feedforward step, the predicted output is
compared with the actual output. If the predicted risk type is same as the
actual risk type, then the neural network’s weights are reinforced. On the
other hand, if the predicted risk type is incorrect, then the neural network’s
weights are adjusted based on a feedback. This process is called backward
pass. The forward pass and the backward pass are repeated for different
186 Big Data Analytics Using Multiple Criteria Decision-Making Models
1
W0a
W0b
W0n
Age W1a
W1b
a
W1n Wah
Final prediction
W2n Wbh
Training data
b
W3a
Wbl
LOC W3b
W3n
FIGURE 8.4
Representation of a three-layered neural network algorithm.
training samples until the ANN classifier is fully trained. ANN can be use-
ful to uncover complex relationship between the features and the output.
However, it requires more parameters to be estimated, and therefore, may
require more time for training.
content. If pi indicates the probability of class i, then Equations 8.3 and 8.4
give the entropy and information gain, respectively.
Each decision tree provides an output class and the final output class is the
plurality voting of all the decision trees. Figure 8.5 is a schematic representa-
tion of the RF algorithm.
Training data
Debt
Age LOC 90 days
ratio
Plurality voting
Final prediction
FIGURE 8.5
Representation of the random forests algorithm.
188 Big Data Analytics Using Multiple Criteria Decision-Making Models
Training data
Debt
Age LOC 90 days
ratio
Weighted voting
Final prediction
FIGURE 8.6
Representation of the SGBDT algorithm.
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 189
the SGBDT randomly samples a subset of the training data to train each deci-
sion tree. The final output would be a weighted voting of the decision trees.
8.3.2.5 Stacking
Stacking is the idea of combining the predictions of multiple classifiers and
it involves two phases. In the first phase, the features are independently
trained using different classifiers (e.g., logistic regression, RFs). The classi-
fiers used to train the features are called base-level classifiers. In the second
phase, the predicted outputs obtained from the base-level classifiers are
used as inputs to the second-phase classifier (e.g., ANN, RFs) called meta-
level classifier. In other words, the second phase combines the individual
predictions of the base-level classifiers. The final class is the output obtained
from the meta-level classifier. A schematic representation of the stacking
algorithm is shown in Figure 8.7.
Training data
Debt
Age LOC 90 days
ratio
Meta-level classifier
Final prediction
FIGURE 8.7
Representation of stacking.
190 Big Data Analytics Using Multiple Criteria Decision-Making Models
the probability value to either high risk or low risk. For instance, a threshold of
0.50 indicates that any value below 0.50 is low risk and above 0.50 is high risk.
A classifier may have any threshold value (e.g., 0.70) depending on the data.
Therefore, the area under the curve (AUC) value for receiver operating charac-
teristics (ROC) is used to obtain the best threshold value for a given classifica-
tion problem. The AUC value quantifies the overall ability of the classifier to
discriminate between the risk types of the applicant. A random classifier has
an AUC value of 0.50 and a perfectly accurate classifier has an AUC value of
1.0. The ROC curve plots the true-positive rate (TPR) versus the false-positive
rate (FPR) for various threshold settings as shown in Figure 8.8. Therefore,
each point on the ROC curve represents the TPR/FPR value c orresponding
to a particular threshold. The dotted line has an AUC value of 0.50 and is the
performance of a random classifier. A perfect classifier would yield a point in
the upper left corner or the coordinate (0,1) of the ROC curve. Hence, using the
ROC curve, a decision maker can choose a threshold value that is closest to the
(0,1) coordinate. In other words, the threshold should be chosen in such a way
that the TPR is high and the FPR is low. Once the threshold value is obtained,
the probability values below the threshold are categorized as low-risk bor-
rowers and values above the threshold are categorized as high-risk borrowers.
The accuracy of the classifier is determined by using the actual output and
the predicted output by constructing a confusion matrix. A model has high
accuracy if the actual outputs are same as the predicted outputs for many
instances. As shown in Figure 8.9, the confusion matrix has four categories,
namely, true positive, true negative, false positive, and false negative.
If a classifier predicts the risk type of a potential borrower as high risk
and if the actual risk type of the potential borrower is low risk, then it is a
false positive or Type I error. Type I error may result in loss of customers to
the financial institution because the borrower may go to a competitor who
ROC curve
1.0
0.8
True-positive rate
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False-positive rate
FIGURE 8.8
A sample diagram of the ROC curve.
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 191
Actual output
High risk Low risk
False positive
Predicted High risk True positive (type I error)
output
False negative
Low risk (type II error) True negative
FIGURE 8.9
Elements of a confusion matrix.
offers a lower interest rate. If a classifier predicts the risk type of a potential
borrower as low risk and if the actual risk type of the borrower is high risk,
then it is a false negative or Type II error. Type II error may result in loss of
revenue for the bank since the borrower has a higher chance to default on
his loan. It is important to avoid both the Type I and Type II errors because
the loan portfolio model seeks to reduce the risk and increase the return.
If a classifier predicts the risk type of a potential borrower as high risk and
if the actual borrower is high risk, then it is a true positive. If a c lassifier
predicts the risk type of a potential borrower as low risk and if the actual
borrower is low risk, then it is a true negative. True positive and true nega-
tive both improve the accuracy of the classifier. Therefore, three metrics,
TPR (sensitivity), true negative rate (specificity) and accuracy, are derived
from the confusion matrix and are used in this present work to evaluate a
classifier, and are given below:
TPR (Sensitivity ) =
∑ True Positive (8.5)
∑ True Positive + ∑ False Negative
True Negative Rate (Specificity ) =
∑ True Negative
∑ True Negative + ∑ False Positive
(8.6)
Accuracy =
∑ True Positive + ∑ True Negative . (8.7)
∑ True Positive + ∑ False Negative + ∑ False Positive
+ ∑ True Negative
Decision Variables
Δa 1 if application a is approved; 0 otherwise
δa,b 1 if b is the primary borrower for application a; 0 otherwise
Max Z1 = ∑(I × ∆ ).
a a (8.8)
a ∈A
Min Z2 = ∑ ∑ (R
a ∈A b ∈B( a )
a,b × δ a , b ). (8.9)
∑ ∑ (DR × δ
a ∈A b ∈B( a )
a,b a,b )
≤ T DR . (8.10)
∑ ∑ δ a ∈A b ∈B( a )
a,b
∑∆
a ∈N ( t )
a ≥ LPt × (| A|) ∀t ∈ T (8.11)
∑∆
a ∈N ( t )
a ≤ UPt × (| A|) ∀t ∈ T . (8.12)
194 Big Data Analytics Using Multiple Criteria Decision-Making Models
∑ ∑ (L × δ
a ∈A b ∈B( a )
a,b a,b )
≤ T L. (8.13)
∑ ∑ δ a ∈A b ∈B( a )
a,b
∑ ∑ (D × δ
a ∈A b ∈B( a )
a,b a,b )
≤ T D. (8.14)
∑ ∑ δ a ∈A b ∈B( a )
a,b
8.3.4.2.5 Limit the Average Open Credits Associated with the Borrowers
Opening a new line of credit indicates that the borrower is unable to repay
his/her debts and too many open credits may be riskier for borrowers who
do not have an established credit history. Hence, financial institutions do
not prefer borrowers with many open credits (Dykstra and Wade, 1997).
Constraint (8.15) ensures that, on an average, the number of open credits
of the borrowers in the portfolio must be within the bank-specified open
credits value (TO).
∑ ∑ (O × δ
a ∈A b ∈B( a )
a,b a,b )
≤ TO. (8.15)
∑ ∑ δ a ∈A b ∈B( a )
a,b
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 195
∑ ∑ (E
a ∈A b ∈B( a )
a,b ,c × δ a,b )
≥ TcE ∀ c ∈ C. (8.16)
∑ ∑ a ∈A
δ a,b
b ∈B( a )
∑δ
b ∈B( a )
a,b ≤ 1 ∀a ∈ A (8.17)
∑δ
b ∈B( a )
a,b = ∆a ∀a ∈ A (8.18)
subject to
f 2 ( x) ≤ ε (8.21)
The RHS of the constrained objective function (ε) in Equation 8.21 is varied
to obtain the efficient frontier.
First, the model is solved as a single objective problem considering only
maximizing return and ignoring the risk constraint. The resulting total risk
and return are the upper bounds on the value of risk and return, respectively.
Next, the model is again solved as a single objective problem considering only
minimizing risk and ignoring the return. The resulting total risk and return
are the lower bounds on the value of risk and return, respectively. Finally,
the return is maximized as the risk is varied (the value of ε is varied) from
its lower bound to its upper bound. The model using ε-constraint method is
given below:
Objective
Max Z1 = ∑ ∑ (I × δ
a ∈A b ∈B( a )
a a,b ).
Subject to:
Constraints (8.10) through (8.19), and
∑ ∑ (R
a ∈A b ∈B( a )
a,b × δ a , b ) ≤ ε.
TABLE 8.2
Summary of Data Parameters for Machine-Learning Algorithms
Data Value
Training data 100,000
Testing data 50,000
Population of potential borrowers 100,000
Number of features 10
Number of outputs 1
Number of classifiers evaluated 5
Number of base-level classifiers used in stacking 2
Base-level classifiers in stacking RF and SGBDT
Meta-level classifier in stacking ANN
Types of loans considered 5
Percentage of individual loan applications 80%
Percentage of joint loan applications 20%
198 Big Data Analytics Using Multiple Criteria Decision-Making Models
8.4.2 Analysis
The five classifiers are trained and tested using the caret package in R and the
MCMP model was coded in Visual C++ and executed in IBM CPLEX®12.4.0.0
optimizer. The experiments were conducted on a computer with 8 GB RAM,
Intel i5 2.50 GHz processor running Windows 10.
Section 8.3.1.2 discusses five different alternatives for data cleansing:
substitute with −1 (A1), substitute with median (A2), discard variable and
substitute with unique value (A3), discard variable and substitute with
median value (A4), and discard incomplete rows (A5). Each alternative is
then used to train and test each of the five classifiers. Finally, the AUC values
for each alternative for the five classifiers are obtained and are shown in
Table 8.3. The best approach for data cleansing is the alternative with the
highest AUC value.
Based on the values in Table 8.3, A1 has the highest AUC value for all the
classifiers except neural network. Neural network has the highest AUC value
for the alternative A3. However, the difference between the AUC value for
A3 and A1 obtained using the neural network is very small. Therefore, A1
is chosen as the best alternative for data cleansing and error and missing
values are substituted with −1. Two-thirds of the cleansed data is used for
training and the remaining is used for testing. The training data was resam-
pled using fourfold cross validation technique in which the training data is
internally split to train/test runs to determine the optimal parameters of the
classifier. The AUC value is then obtained for each classifier for the testing
set. It is important to note that the entries in the training set and testing set
are randomly sampled and the AUC value changes depending on the entries
in the training and testing set. Therefore, to estimate the accuracy value of
each classifier, the procedure of estimating the AUC values is repeated 50
times. Figure 8.10 shows the average AUC value over 50 replications along
with its standard deviation for the five classifiers.
It is observed that the standard deviation of the AUC values of the classi-
fiers is very low and therefore, all the classifiers are robust in their classifica-
tion. The average AUC value for logistic regression is very low compared
with the other classifiers. Hence, logistic regression is no longer considered
TABLE 8.3
AUC Values for Different Alternatives
Logistic Neural Random
Regression Network Forest SGBDT Stacking
A1 0.69522 0.85519 0.84529 0.86210 0.86351
A2 0.69319 0.84344 0.84526 0.86148 0.86192
A3 0.68583 0.85856 0.84084 0.86082 0.86126
A4 0.69343 0.84344 0.84527 0.86148 0.86188
A5 0.68190 0.83376 0.83534 0.85182 0.85332
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 199
1.0
0.75
Average AUC values
0.50
0.25
0.00
Logistic Neural Random SGBDT Stacking
regression network forests
Learning method
FIGURE 8.10
Average AUC values for each learning method.
in choosing the best classifier. In order to choose the best classifier among
the other four, the class probabilities should be converted to class labels
by using a specified threshold value. The ROC curve for the classifiers is
shown in Figure 8.11 and can be used to select an appropriate threshold
value. The threshold value impacts the sensitivity, specificity, and accuracy,
and hence, different threshold values are evaluated to obtain a trade-off
between sensitivity, specificity, and accuracy.
Using the ROC curve and by experimenting different threshold values, a
threshold of 0.82 is chosen for RF and a threshold of 0.80 is chosen for SGBDT
and Neural Network to obtain the respective class labels. Since stacking uses
the neural network as its meta-level classifier, its threshold is also set to 0.80.
The class label is low risk if the class probability is less than the threshold
and is high risk if it is greater than the threshold.
The sensitivity, specificity, and accuracy values for each classifier are
obtained using the confusion matrix and the time required to train each
classifier is recorded as shown in Table 8.4. It is to be noted that the c lassifiers
must be trained periodically to learn any new patterns emerged during
environmental changes. An ideal method will have a sensitivity, specificity,
and accuracy values close to 1 and will require low-training time.
All the methods perform well in classifying the risk type and have
accuracy values close to each other. It can be observed that stacking yields
better results in terms of sensitivity, specificity, and accuracy when com-
pared with neural network, RFs, and SGBDT. However, the total training
200 Big Data Analytics Using Multiple Criteria Decision-Making Models
ROC curve
Random Gradient Neural Stacking
forest boosting network
1.0
0.8
True-positive rate
0.6
0.4
0.2
0.0
FIGURE 8.11
ROC curve for comparing the learning algorithms.
TABLE 8.4
Performance of Classification Algorithms
Algorithm Sensitivity Specificity Accuracy Training Time (in sec)
Neural network 0.7154 0.8308 0.8200 172
Random Forests 0.7212 0.8313 0.8200 229
SGBDT 0.7431 0.8231 0.8278 130
Stacking 0.7433 0.8314 0.8296 648
TABLE 8.5
Summary of Data Parameters for Mathematical Programming Model
Data Value
Bound on percentage of time the borrowers must be above 50 years of age 25%
Bound on number of dependents 2
Bound on debt ratio 0.4
Bound on late payments 5
Bound on open credits 10
Percentage of loan issued to loan type 1 29%
Percentage of loan issued to loan type 2 16%
Percentage of loan issued to loan type 3 13%
Percentage of loan issued to loan type 4 12%
Percentage of loan issued to loan type 5 30%
Interest rate of loan issued to loan type 1 5%
Interest rate of loan issued to loan type 2 3%
Interest rate of loan issued to loan type 3 2.5%
Interest rate of loan issued to loan type 4 2.5%
Interest rate of loan issued to loan type 5 5%
card loans, consumer loans, commercial real estate loans, and commer-
cial loans. The interest rate and percentage of these five different types
of loans considered are obtained from the literature (Dilworth, 2015;
Issa, 2015).
6,000,000
5,500,000
Total portfolio return
5,000,000
Scenario 1
4,500,000
Scenario 2
4,000,000 Scenario 3
Scenario 4
3,500,000 Scenario 5
Scenario 6
3,000,000 Scenario 7
Scenario 8
2,500,000 Scenario 9
Scenario 10
2,000,000
0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400
Total portfolio risk
FIGURE 8.12
Efficient frontiers for different scenarios.
(a) 40
30
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
(b)
40
Percentage of credit card loans
30
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
(c)
40
Percentage of consumer loans
30
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
FIGURE 8.13
Impact of portfolio risk on percentage of different types of loans. (Continued )
204 Big Data Analytics Using Multiple Criteria Decision-Making Models
(d)
40
Percentage of commercial loans
30
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
(e)
40
Percentage of other loans
30
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
8.5 Conclusions
Financial institutions, such as banks, cater to the needs of various customers
by offering a variety of loans, and the revenue generated by loans is one of
the largest assets for any bank. However, there is also a large risk associ-
ated with the approval of bad loans and may sometimes even lead to bank-
ruptcy. Therefore, in this work, a decision support recommender system is
developed for loan approvals taking into account the different types of loans.
The developed recommender system is a two-stage system. In the first
stage, the best machine-learning classifier is selected among the five c lassifiers
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 205
(a) 100
Loan approval rate (in %)
75
50
25
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
(b)
5
Portfolio return (in millions)
0
0 100 200 300 400 500 600 700 800 900 1000
Portfolio risk (no. of high-risk borrowers)
FIGURE 8.14
Impact of the portfolio risk on the loan approval rate and portfolio return.
c onsidered to classify the potential borrowers as high risk or low risk. Several
factors of the applicant, such as age, late payment history, and number of open
credits, are given as inputs to the classifiers. The five different classifiers used
are logistic regression, RFs, neural networks, gradient b oosting, and stacking.
The sensitivity, specificity, and accuracy values for each classifier are obtained
using the confusion matrix and the time required to train each c lassifier is
recorded. The method which has a s ensitivity, specificity, and accuracy val-
ues close to 1 and least training time is selected as the best classifier. The
output of the best classifier in the first stage along with other attributes of the
applicants is given as input to the second stage of the model. In the second
stage, a diversified loan portfolio is developed c onsidering the types of loans,
206 Big Data Analytics Using Multiple Criteria Decision-Making Models
return, and risk associated with the borrower. The strategic goals and con-
straints of the bank are also given as input to the second stage.
The data of about 150,000 samples with ten predictors and one response
variable were downloaded from the Kaggle competition site (Kaggle, 2011).
SGBDT performs the best for the given data set. The findings indicate that
the portfolio risk does not significantly impact the percentage of different
types of loans as well as the uncertainties in the total number of applications
received in a planning horizon. The loan approval rate and return increase
as the portfolio risk increases since the financial institution is willing to
tolerate more high-risk borrowers resulting in an increased loan approval
rate and portfolio return. Finally, the efficient frontier is developed with the
objective of maximizing the return and minimizing the risk, using which
the underwriter can estimate the maximum return possible generated for
each scenario for a bank-specified risk value.
References
Altman, E. I. 1980. Commercial bank lending: Process, credit scoring, and costs of errors
in lending. Journal of Financial and Quantitative Analysis, Vol. 15(4), pp. 813–832.
Apilado, V. P., Warner, D. C., and Dauten, J. J. 1974. Evaluative techniques in consumer
finance—Experimental results and policy implications for financial institu-
tions. Journal of Financial & Quantitative Analysis, Vol. 9(2), pp. 275–283.
Bell, T. B., Ribar, G. S., and Verchio, J. 1990. Neural nets versus logistic regression:
A comparison of each model’s ability to predict commercial bank failures.
In Proceedings of the 1990 Deloitte and Touche/University of Kansas Symposium of
Auditing Problems, Lawrence, KS, pp. 29–58.
Bierman, Jr, H. and Hausman, W. H. 1970. The credit granting decision. Management
Science, Vol. 16(8), pp. B-519–B532.
Breiman, L. 2001. Random forests. Machine Learning, Vol. 45(1), pp. 5–32.
Brown, I. and Mues, C. 2012. An experimental comparison of classification algorithms
for imbalanced credit scoring data sets. Expert Systems with Applications,
Vol. 39(3), pp. 3446–3453.
Burke, A. E. and Hanley, A. 2003. How do banks pick safer ventures? A theory
relating the importance of risk aversion and collateral to interest margins and
credit rationing. The Journal of Entrepreneurial Finance, Vol. 8(2), pp. 13–24.
Carr, D. 2013. Why older minds make better decisions. http://www.forbes.com/sites/
nextavenue/2013/04/29/why-older-minds-make-better-decisions/, accessed on
December 6, 2015.
Chapman, J. M. 1940. Factors affecting credit risk in personal lending. In Commercial
Banks and Consumer Installment Credit. New York: National Bureau of Economic
Research, pp. 109–139.
Choi, C. 2011. Mortgage approved: 5 factors that lenders consider on home loan
applications in tighter financial market. http://www.postandcourier.com/
article/20110703/PC05/307039941/, accessed on December 2, 2015.
A Data-Driven Approach for Multiobjective Loan Portfolio Optimization 207
Sirignano, J., Tsoukalas, G., and Giesecke, K. 2015. Large-scale loan portfolio selection.
Available at SSRN: http://dx.doi.org/10.2139/ssrn.2641301
Soergel, A. 2015. 1 in 3 Americans near financial disaster. http://www.usnews.com/
news/blogs/data-mine/2015/02/23/study-suggests-1-in-3-americans-flirting-
with-financial-disaster/, accessed on December 15, 2015.
Somers, P. and Hollis, J. M. 1996. Student loan discharge through bankruptcy.
American Bankruptcy Institute Law Review, Vol. 4, pp. 457.
Srinivasan, V. and Kim, Y. H. 1987. Credit granting: A comparative analysis of
classification procedures. Journal of Finance, Vol. 42(3), pp. 665–681.
Sullivan, B. 2014. 4 student loan debt collection tricks. http://www.cbsnews.com/
media/4-student-loan-debt-collection-tricks/, accessed on December 15, 2015.
Taylor, W. F. 1980. Meeting the equal credit opportunity act’s specificity requirement:
Judgmental and statistical scoring systems. Buffalo Law Review, Vol. 29, pp. 73.
Tripathy, D. 2015. Bank of India sinks to Q2 loss as bad debts jump. http://in.reuters.
com/article/bank-of-india-q2-results-idINKCN0SY1A020151109, accessed on
December 15, 2015.
Van Leuvensteijn, M., Bikker, J. A., Van Rixtel, A. A., and Kok Sorensen, C. 2007.
A new approach to measuring competition in the loan markets of the euro area.
Available at SSRN: http://ssrn.com/abstract=991604
Wang, G. and Ma, J. 2012. A hybrid ensemble approach for enterprise credit risk
assessment based on Support Vector Machine. Expert Systems with Applications,
Vol. 39(5), pp. 5325–5331.
Wang, G., Ma, J., Huang, L., and Xu, K. 2012. Two credit scoring models based on dual
strategy ensemble trees. Knowledge-Based Systems, Vol. 26, pp. 61–68.
Wiginton, J. C. 1980. A note on the comparison of logit and discriminant models
of consumer credit behavior. Journal of Financial and Quantitative Analysis,
Vol. 15(3), pp. 757–770.
Zurada, J. and Barker, R. M. 2011. Using memory-based reasoning for predicting
default rates on consumer loans. Review of Business Information Systems (RBIS),
Vol. 11(1), pp. 1–16.
9
Multiobjective Routing in a Metropolitan
City with Deterministic and Dynamic
Travel and Waiting Times, and
One-Way Traffic Regulation
CONTENTS
9.1 Introduction................................................................................................. 212
9.2 Literature Review....................................................................................... 213
9.3 Problem Definition: Multiobjective Shortest Path Problem
with Time Dependency.............................................................................. 215
9.4 Mathematical Model for the Time-Dependent Shortest Path
Problem When the Travel Times and Waiting Times, and One-
Way Traffic Regulation Are Dynamic and Time-Dependent in a
City Network............................................................................................... 217
9.4.1 Terminology.................................................................................... 217
9.4.2 Mathematical Model....................................................................... 219
9.4.3 Multiobjective Optimization Algorithm to Obtain
the Strictly Nondominated Solutions.......................................... 223
9.4.3.1 Terminology......................................................................223
9.4.3.2 Step-by-Step Procedure to Generate a Set of
Nondominated (Pareto-Optimal) Solutions,
Given εt............................................................................. 223
9.5 Experimental Settings, Results, and Discussion.................................... 226
9.5.1 Data with Respect to Distance and Travel Times along the
Roads and Waiting Times at Junctions........................................ 226
9.5.2 Sample Network Topology............................................................ 232
9.6 Implementation of the Proposed Multiobjective Model
to the Complete Chennai Network..........................................................234
9.7 Summary...................................................................................................... 236
Acknowledgment................................................................................................. 238
References.............................................................................................................. 241
211
212 Big Data Analytics Using Multiple Criteria Decision-Making Models
9.1 Introduction
A transportation network is a network of roads and streets, which permits
vehicles to move from one place to another. A directed or undirected graph
can be used to represent a transportation network, where edges of the graph
have a capacity and receive a flow. The vertices of the graph are called nodes
and the edges of the graph are called arcs. In any transportation network,
the vehicle starts from the source node and moves toward the sink node.
There are several types of network problems/models available in litera-
ture, for example, transportation problem, shortest path problem, minimum
spanning tree problem, maximum flow problem, and minimum flow prob-
lem. The shortest path problem has tremendous importance in network flow
models because it is applicable to any type of transportation network seen
in real life. One of the most important issues affecting the performance of a
transportation network is routing. The goal of routing between two points
in a network is to reach the destination as quickly as possible (shortest time
path problem) or as cheaply as possible (minimum cost or distance path
problem).
The shortest path problem is one of the most studied problems among net-
work optimization problems (Bertesekas, 1991; Ahuja et al., 1993; Schrijver,
2003). Two kinds of labelling approaches, namely, label-setting algorithm
and label-correcting algorithm, are in use to solve the shortest path problem.
Label-setting algorithm is in use only for acyclic network with nonnegative
costs, whereas label-correcting algorithm is applicable for acyclic network
with negative and nonnegative costs.
The shortest path problem with time dependency is dealt as a single-objec-
tive problem by several researchers, and one can classify the single-objective
problem into minimum cost path and fastest path problem. The minimum
cost path problem is solved to find the path having the minimum length with
respect to the cost by considering the travel time, while in the fastest path prob-
lem, the objective is to find the paths having the minimum length with respect
to time-dependent travel time. Bellman’s optimality principle (Bellman, 1958)
is a modification of the single-objective shortest path problem where arc travel
times are nonnegative integers for every time period, and achieve all-to-one
fastest paths to the single destination node from all other nodes. Dreyfus (1969)
proposed the modification of Dijkstra’s static shortest path algorithm to obtain
the fastest path between two vertices for a given departure time. The algo-
rithm of Dreyfus (1969) is suitable to solve first-in-first-out network problems.
Ziliaskopoulos and Mahmassani (1993) and Wardell and Ziliaskopoulos (2000)
proposed a label-correcting algorithm to obtain all-to-one minimum cost paths
and all-to-one fastest paths for all departure time. Chabini (1998) proposed
a label-setting algorithm running backward in the set of time parameters to
obtain all-to-one minimum cost and all-to-one fastest paths for all departure
time without the first-in-first-out assumption.
Multiobjective Routing in a Metropolitan City with Deterministic 213
with consideration of minimizing the total travel time and the total distance
simultaneously.
In this work, we consider multiobjective shortest path problem with real-
life constraints such as time-dependent dynamic and deterministic travel
times, time-dependent dynamic and deterministic waiting times, and time-
dependent one-way traffic, and we propose a mathematical model to solve
the same. The biggest advantage of developing a mathematical model is the
flexibility of the resulting model; many cost functions can be chosen, and
many constraints can be added that otherwise would be difficult to satisfy
with a Dijkstra-like approach. For example, we address the one-way traffic
regulation in the proposed mathematical model. Dynamic programming
becomes time consuming and is not very computationally efficient due to
curse of dimensionality. Another motivation to go for MILP model is that the
same model can be extended for developing a multiobjective optimization
algorithm.
The technological advancements (Internet of Things) in big data enable
us to collect the data about the entire transportation network of a city. We
evaluate the proposed model with multiple objectives using the real-world
(in Chennai city) network of major roads consisting of 1658 nodes and 4224
links, mostly undirected graph, except the roads involving one-way traffic.
We consider the following aspects in our work that are taken into consider-
ation while collecting data from the travel and incorporated into the MILP
model with multiple objectives:
• Consider the source node as n′ and the destination node as n″, and
the arrival time at node n′ be denoted by An′.
• As travelers do not wait at the origin, we assume the waiting time at
the source node as 0.
• A day consists of a given number of travel-time intervals.
• Unit of the distance is 1 km and unit of the time is 1 min.
• Traffic corresponds to a given number of congestion levels, thereby
influencing the travel time along a road and waiting time at a
junction.
• Waiting time at a junction or node i depends on the time interval
during which the actual arrival at node i takes place; waiting time is
dynamic and deterministic.
• Travel time along the road (i,j) depends on the time interval in the
day during which the actual travel takes place; travel time is dynamic
and deterministic.
• Entry into the road (i,j) should be avoided inherently in the MILP
model during the one-way traffic intervals. Most roads allow two-
way traffic; however, there can be some roads that allow two-way
traffic for most periods in a day, except for some periods when
Multiobjective Routing in a Metropolitan City with Deterministic 217
the roads will allow traffic in one-way; for example, traffic flow is
allowed along a given road (i,j) during 8 a.m. to 10 a.m., but no traf-
fic along arc (j,i) during the same period, whereas traffic is allowed
along the road (j,i) during 6 p.m. to 8 p.m., but no traffic is allowed
along arc (i,j) during the same period.
• Two objectives, namely, the minimization of total travel time
(including waiting times at junctions) and the minimization of total
distance traveled are considered.
9.4.1 Terminology
Parameters Description
n Number of nodes in the network
n′ Origin node of travel
n″ Destination node of travel
i,j A pair of nodes
dij Distance from node i to node j /*note: It is not necessarily symmetric*/
k Index for time interval
∇ iw Number of waiting-time intervals with respect to node i
ϕ tij Number of travel-time intervals with respect to road (i,j)
Ωijo Number of one-way and two-way traffic intervals with respect to road (i,j)
tijk Travel time from node i to node j in the time interval k
Wik Waiting time at node i during the time interval k
LLwik Lower limit for the time interval k with respect to node i, to define the
time-dependent waiting time
ULwik Upper limit for the time interval k with respect to node i, to define the
time-dependent waiting time
LLtijk Lower limit for the time interval k with respect to road (i,j), to define the
time-dependent travel time
(Continued)
218 Big Data Analytics Using Multiple Criteria Decision-Making Models
Parameters Description
ULtijk Upper limit for the time interval k with respect to road (i,j), to define the
time-dependent travel time
/*note: For a given road (i,j), the actual travel time can vary depending upon
the actual time of arrival at the head of the road (i,j); for example, in a day of
1440 min and ϕ tij = 4, if interval 1 corresponds to 7.30 a.m., 11 a.m., we set
LLtij1 = 0, ULtij1 = 210, and tij1 = 10; interval 2 corresponds to 11 a.m., 5 p.m., we
have LLtij 2 = 211, ULtij 2 = 570, and tij2 = 8; interval 3 corresponds to 5 p.m., 9
p.m., we have LLtij 3 = 571, ULtij 3 = 810, and tij3 = 9; and interval 4 corresponds
to 9 p.m., 7.30 a.m., we have LLtij 4 = 811, ULtij 4 = 1440, and tij4 = 7 */
LLoijk Lower limit for the allowed traffic regulation during the time interval k with
respect to road (i,j), to define the time-dependent one-way traffic time/
regulation
ULoijk Upper limit for the allowed traffic regulation during the time interval k with
respect to road (i,j), to define the time-dependent one-way traffic time/
regulation
/*note: For a given road (i,j) that has no traffic for interval between 7.30 a.m.
and 11 a.m., two-way traffic between 11 a.m. and 5 p.m., one-way traffic
between 5 p.m. and 9 p.m., and two-way traffic between 9 p.m. and 7.30
a.m., we have Ωijo = 4; LLoij1 = 0 (i.e., 7.30 a.m.), ULoij1 = 210 (i.e., 11 a.m.), and
set ∆ ijo1 = 0; we have LLoij 2 = 211, ULoij 2 = 570, and ∆ ijo 2 ∈{0, 1}; we have
LLoij 3 = 571, ULoij 3 = 810, and set ∆ ijo 3 = 0; and we have LLoij 4 = 811, ULoij 4 = 1440
, and ∆ ijo 4 ∈{0, 1} */
J(i) Set of nodes or junctions to which there exists a direct connectivity from/to
node i
M A large value; set to 10,000 in this study
Decision
Variables Description
Yij A binary variable that takes a value 1 if the road (i,j) is chosen in the travel route;
0 otherwise
/* note: If there exists no direct connectivity between nodes/junctions i and j,
then we set dij = ∞ and/or set Yij = 0 */
δi An indicator (binary variable) that takes value 1 if node i is visited in the
travel route;
0 otherwise
∆ ikw An indicator (binary variable) that takes value 1 if node i is reached during
the interval k in the travel route;
0 otherwise
∆ tijk An indicator (binary variable) that takes value 1 if road (i,j) is traversed
during the interval k in the travel route;
0 otherwise
∆ ijk
o
An indicator (binary) that takes value 1 if the one-way is allowed during the
interval k in the travel route;
0 otherwise
ωi Waiting time at node i that takes a value if δi = 1. /*note: ωn′ = ωn″ = 0 */
Ai Arrival time at node i that takes a value if δi = 1./*note: An′ is given as input*/
(Continued )
Multiobjective Routing in a Metropolitan City with Deterministic 219
Decision
Variables Description
τij Travel time along the road (i,j) that takes a value if Yij = 1.
Aikw Arrival time at node i in the interval k, and it takes a value if δi = 1 and other
Aikw ’s are equal to 0
t
Aijk Arrival time on the road (i,j) in the interval k, and it takes a value if Yij = 1, and
t
other Aijk ’s are equal to 0
o
Aijk Arrival time on the road (i,j) (with the possible one-way traffic regulation) in
o
the interval k, and it takes a value if Yij = 1, and other Aijk ’s are equal to 0
n n
Minimize Z2 = time = ∑∑
i = 1 j ∈J ( i )
τ ij + ∑ω .
i =1
i (9.2)
∑Y
j∈J ( i )
ij ≤1 ∀ i = 1,…, n, i ≠ n′ and i ≠ n′′. (9.3)
∑Y
j∈J ( i )
ji ≤1 ∀ i = 1,…, n, i ≠ n′ and i ≠ n′′. (9.4)
∑Y =∑Y
j∈J ( i )
ij
j∈J ( i )
ji ∀ i = 1,…, n, i ≠ n′ and i ≠ n′′. (9.5)
∑Y
j∈J ( i )
ji = δi ∀ i = 1,…, n, i ≠ n′ and i ≠ n′′. (9.6)
/* Constraints (9.8) through (9.12) ensure that there is only one outflow and
no inflow for the source node n′, and only one inflow and no outflow for the
destination node n″ */
∑Y
j ∈J ( n ′ )
n ′j = 1. (9.8)
∑Y
j ∈J ( n ′′ )
jn ′′ = 1. (9.9)
∑Y
j ∈J ( n ′ )
jn ′ = 0. (9.10)
∑Y
j ∈J ( n ′′ )
n ′′ j = 0. (9.11)
δ n ′ = δ n ′′ = 1. (9.12)
/* Constraints (9.13) through (9.21) capture the waiting time that is depen-
dent on the arrival time at node i with respect to corresponding interval ∆ ikw
*/
ω n ′ = ω n ′′ = 0. (9.13)
Ai ≤ Mδ i ∀i = 1, … , n. (9.15)
LLwik ∆ ikw ≤ Aikw ≤ ULwik ∆ ikw ∀i = 1,…, n, i ≠ n′ and i ≠ n′′ , and k = 1, 2,…, ∇ iw .
(9.18)
∇iw
∑∆ w
ik = δi ∀ i = 1, … , n, i ≠ n′ , and i ≠ n′′. (9.19)
k =1
Multiobjective Routing in a Metropolitan City with Deterministic 221
∇iw
∑A w
ik = Ai ∀ i = 1, … , n, i ≠ n′ , and i ≠ n′′. (9.20)
k =1
∇iw
ωi = ∑(∆ W ) w
ik ik ∀ i = 1, … , n, i ≠ n′ , and i ≠ n′′. (9.21)
k =1
/* Constraints (9.22) through (9.29) capture the arrival time and travel time
along the road (i,j) based on the departure time at node i, that is, the actual
arrival time at the road (i,j), defined with respect to ϕ tij time intervals*/
ϕtij
∑∆ t
ijk = Yij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.23)
k =1
ϕtij
∑A t
ijk ≤ Ai + ω i + M(1 − Yij ) ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.24)
k =1
ϕtij
∑A t
ijk ≥ Ai + ω i − M(1 − Yij ) ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.25)
k =1
ϕtij
∑A t
ijk ≤ MYij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.26)
k =1
ϕtij
τ ij ≤ ∑ (∆ t ) + M(1 − Y )
t
ijk ijk ij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.28)
k =1
ϕtij
τ ij ≥ ∑ (∆ t ) − M(1 − Y )
t
ijk ijk ij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.29)
k =1
222 Big Data Analytics Using Multiple Criteria Decision-Making Models
/* Constraints (9.30) through (9.34) ensure the travel along the road (i,j)
only during the allowed traffic-time intervals when we arrive at node i. The
mathematical model should inherently avoid the entry into the roads having
one-way traffic regulation in a particular interval (Section 9.4.1 for the set-
tings of ∆ ijk
o
).*/
LLoijk ∆ ijk
o
≤ Aijk
o
≤ ULoijk ∆ ijk
o
∀i = 1, … , n, i ≠ n′′ , ∀j ∈ J (i), and k = 1, … , Ωijo .
(9.30)
Ωijo
∑∆ o
ijk = Yij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.31)
k =1
Ωijo
∑A o
ijk ≤ MYij ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.32)
k =1
Ωijo
∑A o
ijk ≤ Ai + ω i + M(1 − Yij ) ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.33)
k =1
Ωijo
∑A o
ijk ≥ Ai + ω i − M(1 − Yij ) ∀i = 1, … , n, i ≠ n′′ , and ∀j ∈ J (i). (9.34)
k =1
node. Constraints (9.9) and (9.11) ensure that there is only inflow and no
outflow at the destination node. Constraint (9.12) ensures the selection of
the source and destination nodes as part of the path. Constraints (9.15)
through (9.17) determine the arrival time at node j from node i. Constraints
(9.18) through (9.21) capture the time-dependent dynamic waiting time on
the basis of the arrival time interval k at node i. Constraints (9.22) through
(9.29) capture the time-dependent dynamic travel time along the arc (i,j)
on the basis of the departure time from node i. Constraints (9.30) through
(9.34) ensure that the travel takes place during the allowed traffic periods
along road (i,j) during the day.
9.4.3.1 Terminology
Variables Description
iter Iteration number
1
distiter Distance obtained in the iteration iter
1
timeiter 1
Minimum travel time obtained in the iteration iter, corresponding to distiter .
time* Optimal travel time
{δel } iter
ij Solution from the MILP, associated with distiter 1 1
and timeiter
γ 1
iter { }
Solution set with δelij , distiter , and timeiter
iter 1 1
n n
Z2 = time 1
iter = ∑ ∑ τ + ∑ω
i = 1 j ∈J ( i )
ij
i =1
i
and subject to all constraints in the MILP given in Section 9.4.2, and
with the following add-on constraint:
∑ ∑ d Y = dist
i = 1 j ∈J ( i )
ij ij
1
iter . (9.35)
n n
Z2* = time * = ∑∑
i = 1 j ∈J ( i )
τ ij + ∑ω i =1
i
Step 2:
/* Do this step to get further Pareto-optimal solutions, given εt (with
respect to time decrement)*/
Multiobjective Routing in a Metropolitan City with Deterministic 225
With respect to the original MILP given in Section 9.4.2, do the following:
Set:
Minimize
Z1 = ∑∑d Y = dist
i = 1 j ∈J ( i )
ij ij
1
iter
( 1
if timeiter − 1 − ε t > time
*
)
then
n n
add : ∑∑
i = 1 j ∈J ( i )
τ ij + ∑ω ≤ time
i =1
i
1
iter − 1 − εt (9.36)
else
n n
add : ∑∑
i = 1 j ∈J ( i )
τ ij + ∑ω = time
i =1
i
*
(9.37)
n n
1
Z2 = timeiter = ∑∑
i = 1 j ∈J ( i )
τ ij + ∑ω i =1
i
and subject to all constraints in the MILP given in Section 9.4.2, and
with the following additional constraint
∑ ∑ d Y = dist
i = 1 j ∈J ( i )
ij ij
1
iter . (9.38)
226 Big Data Analytics Using Multiple Criteria Decision-Making Models
{
Denote the solution from the above MILP as δelijiter . }
{ } 1
Step 2.3: Set γ 1iter = δelijiter , associated with distiter 1
, timeiter , and their
respective Yij’s.
Step 3: If
1
timeiter = time *
then proceed to Step 4,
else return to Step 2.
Step 4: STOP: the set of strictly Pareto-optimal solutions is obtained,
{ } 1
denoted by γ 1iter , ∀iter with the corresponding distiter 1
, timeiter , and
their respective Yij’s, for the given εt.
2 1 4
3
7
8
5 6
9
10
11
12
13
14 15 16 17
18
19 20 21 22
23 24
25 26 27
30 29
28
33 32 31
FIGURE 9.1
Chennai road network topology: a sample (not drawn to scale). Note: origin and destination
nodes are denoted by node 1 and node 33, respectively.
9 p.m.), and low-traffic interval 4 (9 p.m. to 7.30 a.m.) in this study. We use
0.6 (i.e., 0.6 × FFT) as the multiplication factor for the low traffic and 1.8
(i.e., 1.8 × FFT) as the multiplication factor for the high traffic. The wait-
ing times with respect to normal-traffic interval at nodes are tabulated in
Table 9.3, and for other traffic levels, waiting times are generated using
these multiplication factors. Details of one-way traffic are shown in Table
9.4. Interested readers can obtain the time data from the authors at the
Indian Institute of Technology Madras.
228
TABLE 9.1
Distance between Nodes (in km): Chennai Road Network Topology (Sample)
To/From 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
TABLE 9.2
Free-Flow Travel Time (FFT in Minutes): Chennai Road Network Topology (Sample)
To/
From 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
(Continued)
Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 9.2 (Continued)
Free-Flow Travel Time (FFT in Minutes): Chennai Road Network Topology (Sample)
To/
From 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
TABLE 9.3
Waiting Time (at Nodes/Traffic Junctions) of Chennai Road Network
Topology
Node i 1 2 3 4 5 6 7 8 9
Waiting 1.700 1.700 1.700 1.700 1.700 1.020 1.700 1.700 1.700
Time (min)
Node i 10 11 12 13 14 15 16 17 18
Waiting 1.700 1.700 1.700 2.380 2.380 2.380 1.020 1.700 1.700
Time (min)
Node i 19 20 21 22 23 24 25 26 27
Waiting 2.380 2.380 1.700 1.020 1.700 1.700 1.700 1.020 1.700
Time (min)
Node i 28 29 30 31 32 33
Waiting 1.020 1.700 1.700 1.020 1.700 1.020
Time (min)
TABLE 9.4
One-Way Traffic along the Road (i,j) of Chennai Road Network Topology
One-Way Traffic along the Road (i,j) (with No Traffic
Road (i,j) Allowed along the Road (j,i)) during the Period(s)
(2,3) (7.30 a.m. to 11 a.m.); (11 a.m. to 5 p.m.); (5 p.m. to 9 p.m.).
(11,13) (7.30 a.m. to 11 a.m.); (5 p.m. to 9 p.m.).
(17,22) (7.30 a.m. to 11 a.m.); (5 p.m. to 9 p.m.).
(31,32) (7.30 a.m. to 11 a.m.); (5 p.m. to 9 p.m.).
Legend: Road (i,j) is blocked in the particular traffic interval.
Multiobjective Routing in a Metropolitan City with Deterministic 233
TABLE 9.5
Optimal Solution with Respect to Minimum Total Distance, and the
Corresponding Travel Times, and Waiting Times; Travel Starts from Node 1
at 4 p.m.
Distance along the Time along the Road Waiting Time at
Road (i,j) Road (km) (min) the Junction (min)
(1,4) 0.28 0.764 1.020
(4,10) 0.30 0.818 1.700
(10,9) 0.28 0.764 1.700
(9,11) 0.39 1.064 1.700
(11,13) 0.91 2.482 2.380
(13,19) 0.68 1.855 2.380
(19,20) 0.55 1.500 2.380
(20,25) 0.11 0.300 1.700
(25,24) 0.17 0.464 1.700
(24,23) 0.21 0.573 1.700
(23,33) 0.69 1.882 0.000
Total distance = 4.57 km Total travel time = 30.826 min
2 1 4
3 (0.28, 0.764)
(0.30, 0.818)
7
8
5 6
9
) 10
11 64 (0.28, 0.764)
1.0
) . 3 9,
2 (0
12 48
, 2.
13 91
(0.
(0.68, 1.855)
14 15 16 17
18
19 20 21 22
(0.55, 1.500)
(0.
11
,0
.30
0)
23 24
(0.21, 0.573) (0.17, 25 26 27
0.464)
2)
30 29
.88
28
9, 1
(0.6
33 32 31
FIGURE 9.2
Travel starts from node 1 at 4 p.m. without changeover (with respect to traffic conditions) of
travel time and waiting time (total distance = 4.57 km and total time covered = 30.826 min).
Note: optimal route (with respect to minimum total distance and the corresponding total time
covered/taken): {1–4–10–9–11–13–19–20–25–24–23–33}. Legend (distance, travel time): for exam-
ple (0.28, 0.764) in the above figure corresponds to distance and travel time from node 1 to node
4, and rest of the values follow suit.
TABLE 9.6
Travel Starts from Node 1 at 4.50 p.m.; Traffic Level Changes from 5 p.m. Onwards
Distance along the Time along the Road Waiting Time at the
Road (i,j) Road (km) (min) Junction (min)
(1,4) 0.28 0.764 1.020
(4,10) 0.30 0.818 1.700
(10,9) 0.28 0.764 1.700
(9,11) 0.39 1.064 1.700
(11,13) 0.91 2.482 4.280
(13,19) 0.68 3.339 4.280
(19,20) 0.55 2.700 4.280
(20,25) 0.11 0.540 3.060
(25,24) 0.17 0.835 3.060
(24,23) 0.21 1.031 3.060
(23,33) 0.69 3.388 0.000
Total distance = 4.57 km Total travel time = 45.865 min
2 1 4
3 (0.28, 0.764)
(0.30, 0.818)
7
8
5 6
9
) 10
11 054 (0.28, 0.764)
3 9 , 1.
.
12 .4 82) (0
13 91, 2
(0.
(0.68, 3.339)
14 15 16 17
18
19 20 21 22
(0.55, 2.700)
(0.
11
,0
.54
0)
23 24
(0.17, 25 26 27
(0.21, 1.031) 0.835)
38)
30 29
3.3
28
69,
(0.
33 32 31
FIGURE 9.3
Travel starts from node 1 at 4.50 p.m. with changeover of travel time and waiting time from
normal traffic period (up to 5 p.m.) to heavy traffic period (from 5 p.m. onwards) (total dis-
tance = 4.57 km and total time covered = 45.865 min). Note: optimal route (with respect to
minimum total distance and the corresponding total time covered/taken): {1–4–10–9–11–13–
19–20–25–24–23–33}. Legend (distance, travel time): For example (0.28, 0.764) in the above figure
corresponds to distance and travel time from node 1 to node 4, and rest of the values follow suit.
9.7 Summary
We propose a mathematical model for the shortest path problem considering
multiple conflicting objectives, namely, minimizing the total distance covered
and total time taken, thereby considering the problem closer to reality. We also
consider real-life aspects such as time-dependent dynamic and deterministic
travel times, time-dependent dynamic and deterministic waiting times, and
Multiobjective Routing in a Metropolitan City with Deterministic 237
TABLE 9.7
Travel Starts from Node 1 at 4.50 p.m., with the One-Way Traffic Regulation
from 5 p.m. Onwards
Distance along the Time along the Waiting Time at the
Road (i,j) Road (km) Road (min) Junction (min)
(1,4) 0.28 0.764 1.020
(4,10) 0.30 0.818 1.700
(10,9) 0.28 0.764 1.700
(9,11) 0.39 1.064 1.700
(11,6) 0.14 0.688 1.840
(6,5) 0.39 1.915 3.060
(5,12) 0.62 2.480 3.060
(12,14) 0.62 2.480 4.280
(14,18) 0.73 2.405 3.060
(18,19) 0.96 4.712 4.280
(19,20) 0.55 2.700 4.280
(20,25) 0.11 0.540 3.060
(25,24) 0.17 0.835 3.060
(24,23) 0.21 1.031 3.060
(23,33) 0.69 3.388 0.000
Total distance = 6.20 km Total travel time = 65.744 min
Note: Due to the one-way traffic being operational from 5 p.m. onwards with respect to road
(11,13), a new route starting from (11,6) is generated.
2 1 4
3 (0.28, 0.764)
(0.30, 0.818)
7
(0.14, 0.6 8
5 6 88)
(0.62, 2.480) (0.62, 2.480)
(0.39, 1.915) 9
) 10
11
1. 064 (0.28, 0.764)
39,
12 (0.
13
14 15 16 17
05)
2.4
73,
(0.
18
(0.96, 4.712) 19 20 21 22
(0.55, 2.700)
(0.
11
,0
.54
0)
23 24
(0.21, 1.031) (0.17, 25 26 27
0.835)
8)
30 29
.33
28
,3
69
(0.
33 32 31
FIGURE 9.4
Travel starts from node 1 at 4.50 p.m. with changeover of travel time and waiting time from
normal traffic to high-traffic period; one-way traffic along (13,11) being operational, but no traf-
fic between (11,13) being operational between 5 p.m. and 9 p.m. (total distance = 6.20 km and
total time covered = 65.744 min). Note: optimal route (with respect to minimum total distance
and the corresponding total time covered/taken): {1–4–10–9–11–6–5–12–14–18–19–20–25–24–23–33}.
Legend (distance, travel time): For example (0.28, 0.764) in the above figure corresponds to dis-
tance and travel time from node 1 to node 4, and rest of the values follow suit.
Acknowledgment
The authors gratefully acknowledge the support from the Centre of Excellence
in Urban Transport at the Indian Institute of Technology Madras, which is
funded by the Ministry of Urban Development, Government of India. The
first author acknowledges the support from the DAAD for carrying out a
portion of this work at the University of Duisburg-Essen. The authors are
thankful to the reviewers and the editors for their valuable comments and
suggestions to improve our chapter.
Multiobjective Routing in a Metropolitan City with Deterministic 239
FIGURE 9.5
An optimal solution with respect to the objective function (minimizing total distance) for the
travel from IIT Madras to Chennai Central Station.
240 Big Data Analytics Using Multiple Criteria Decision-Making Models
FIGURE 9.6
An optimal solution with respect to the objective function (minimizing total travel time) for
the travel from IIT Madras to Chennai Central Station.
Multiobjective Routing in a Metropolitan City with Deterministic 241
TABLE 9.8
A Set of Strictly Pareto-Optimal Solutions for Travel from IIT Madras to Chennai
Central Station, Given εt = 1 min
Distance (km) 10.54 10.77 11.42 11.72 11.95 12.60 12.83
Time (min) 114.6 111.4 107.6 105.0 101.8 100.2 97.0
116
(10.54, 114.6)
114
112
(10.77, 111.4)
110
108
(11.42, 107.6)
Time (min)
106
(11.72, 105.0)
104
102
(11.95, 101.8)
(12.60, 100.2)
100
98
(12.83, 97.0)
96
10.4 10.9 11.4 11.9 12.4 12.9 13.4
Distance (km)
FIGURE 9.7
Nondominated solutions (heuristically) for the travel from IIT Madras to Chennai Central
Station, given εt = 1 min.
References
Ahuja, R., Magnanti, T. and Orlin, V. 1993. Network Flows: Theory, Algorithms, and
Applications. Prentice Hall, Englewood Cliffs, NJ.
Bellman, R.E. 1958. On a routing problem. Quarterly of Applied Mathematics 16: 87–90.
Bertesekas, D. 1991. Linear Network Optimization: Algorithms and Codes. M.I.T. Press,
Cambridge, MA.
Brumbaugh-Smith, J. and Shier, D. 1989. An empirical investigation of some bicri-
terion shortest path algorithms. European Journal of Operational Research 43:
216–224.
Chabini, I. 1998. Discrete dynamic shortest path problems in transportation applica-
tions. Transportation Research Record 1645: 8170–8175.
Chitra, C. and Subbaraj, P. 2010. A non-dominated sorting genetic algorithm for
shortest path routing problem. International Journal of Electrical and Computer
Engineering 5: 53–63.
242 Big Data Analytics Using Multiple Criteria Decision-Making Models
Dell’Amico, M., Lori, M. and Pretolani, D. 2008. Shortest paths in piecewise continu-
ous time-dependent networks. Operations Research Letters 36: 688–691.
Dreyfus, S. 1969. An appraisal of some shortest-path algorithms. Operations Research
17: 395–412.
Ehrgott, M. and Wiecek, M. 2005. Multiobjective programming. In: J. Figueira, S.
Greco, and M. Ehrgott, editors, Multicriteria Decision Analysis: State of the Art
Survey, Springer Verlag, Boston, Dordrecht, London: 667–722.
Hamacher, H.W., Ruzika, S. and Tjandra, S.A. 2006. Algorithms for time-dependent
bicriteria shortest path problems. Discrete Optimization 3: 238–254.
Mohamed, C., Bassem, J. and Taicir, L. 2010. A genetic algorithm to solve the bicrite-
ria Shortest Path Problem. Electronic Notes in Discrete Mathematics 36: 851–858.
Müller-Hannemann, M. and Schnee, M. 2007. Finding all attractive train connec-
tions by multi-criteria Pareto search. In: F. Geraets, L. Kroon, A. Schoebel,
D. Wagner, and C. Zaroliagiis, editors, Algorithmic Methods and Models for
Railways Optimization, Springer, Berlin, Heidelberg, 4359: 246–263.
Ng, M.W. and Waller, S.T. 2010. A computationally efficient methodology to char-
acterize travel time reliability using the fast Fourier transform. Transportation
Research Part B 44: 1202–1219.
Prakash, A. and Srinivasan, K.K. 2014. Sample-based algorithm to determine mini-
mum Robust cost path with correlated link travel Times. Transportation Research
Record: Journal of the Transportation Research Board 2467: 110–119.
Ruzika, S. and Wiecek, M.M. 2005. Approximation methods in multi-objective
programming. Journal of Optimization Theory and Applications 126: 473–501.
Schrijver, A. 2003. Combinatorial Optimization: Polyhedra and Efficiency, Springer
Verlag, Berlin.
Seshadri, R. and Srinivasan, K.K. 2010. Algorithm for determining most reliable
travel time path on network with normally distributed and correlated link
travel times. Transportation Research Record: Journal of the Transportation Research
Board 2196: 83–92.
Seshadri, R. and Srinivasan, K.K. 2012. An algorithm for the minimum robust cost
path on networks with random and correlated link travel times. In: D. M.
Levinson, H. X. Liu, and M. Bell, editors, Network Reliability in Practice, Springer,
New York: 171–208.
Sung, K., Bell, M.G.H., Seong, M. and Park, S. 2000. Theory and Methodology-
Shortest paths in a network with time-dependent flow speeds. European Journal
of Operational Research 121: 32–39.
Tiwari, A. 2016. Multi-objective convoy routing problem: Exact and heuristic
methods, Unpublished MS thesis, IIT Madras, Chennai, India.
Wardell, W. and Ziliaskopoulos, A. 2000. A intermodal optimum path algorithm for
dynamic multimodal networks. European Journal of Operational Research 125:
486–502.
Ziliaskopoulos, A. and Mahmassani, H. 1993. Time-dependent shortest path algo-
rithm for real-time intelligent vehicle/highway system. Transportation Research
Record 1408: 94–104.
10
Designing Resilient Global Supply Chain
Networks over Multiple Time Periods within
Complex International Environments
Rodolfo C. Portillo
CONTENTS
10.1 Introduction................................................................................................. 243
10.2 Literature Review....................................................................................... 246
10.3 Problem Description................................................................................... 247
10.4 Model Features............................................................................................ 248
10.4.1 Multiperiod Model Features......................................................... 250
10.4.1.1 Notation............................................................................. 250
10.4.1.2 Variables............................................................................ 253
10.4.2 Objective Function..........................................................................254
10.4.3 Model Scaling..................................................................................254
10.4.4 Set of Goal Constraints.................................................................. 255
10.4.5 Set of Constraints............................................................................ 257
10.4.6 Objective Function Considering Currency Exchange Rates....... 260
10.5 Data Collection............................................................................................ 262
10.6 Case Study................................................................................................... 262
10.7 Conclusions and Future Research............................................................ 265
References.............................................................................................................. 266
10.1 Introduction
With increased globalization, as stated by Friedman (2005), “In this world, a
smart and fast global supply chain is becoming one of the most important
ways for a company to distinguish itself from its competitors.” As tradition-
ally, the objective of every supply chain continues to be the maximization of
the overall value by reducing procurement cost, increasing responsiveness
to customers, and decreasing risk. The big change now is that global sup-
ply chain management involves a myriad of company’s worldwide interests,
customers, and suppliers rather than just a domestic perspective. Besides the
243
244 Big Data Analytics Using Multiple Criteria Decision-Making Models
financial aspects, companies now deal with a plethora of other factors when
doing business abroad. Within this environment, as part of the company’s
strategy to manage its global supply chain, it must make decisions such as its
overall sourcing plan, supplier selection, capacity and location of facilities,
modes of transportation, etc. The emphasis of this research is on develop-
ing mathematical models to determine optimal supply chain designs that
best support competitive strategies. Multicriteria mixed-integer linear pro-
gramming models were developed to aid in a multiple echelon supply chain
design. This work also includes the definition of a set of design selection
criteria integrating financial, customer service, risk, and strategic factors.
A supply chain consists of (1) a series of physical entities (e.g., suppliers,
plants, warehouses, and retailers) and (2) a coordinated set of activities
concerned with the procurement of raw material and parts, production of
intermediate and final products, and their distribution to the customers
(Ravindran and Warsing 2013). The various decisions involved in manag-
ing a supply chain can be grouped into three types: strategic, tactical, and
operational. Strategic decisions deal primarily with the design of the sup-
ply chain network, namely, the number and location of plants and ware-
houses and their respective capacities. They are made over a longer time
horizon and have a significant impact with respect to the company’s assets
and resources, such as opening, expanding, closing, and downsizing facili-
ties. Tactical decisions are primarily of a planning nature and made over a
horizon of one or two years. They involve purchasing, aggregate production
planning, inventory management, and distribution decisions. Finally, opera-
tional decisions are short term and made on a daily or weekly basis, such as
setting customer delivery and weekly production schedules as well as inven-
tory replenishment.
Optimal supply chain design needs to balance among multiple conflicting
objectives, such as efficiency in terms of costs and profitability as well as speed
to source, produce, and distribute products to customers. Resiliency is also an
important objective and is measured in terms of the reliability of the supply
chain network when there are disruptions to the supply chain. The case study
presented in this chapter addresses strategic and tactical decisions in design-
ing and managing an agile global supply chain considering the effect of mul-
tiple foreign currency exchange rates over multiple periods of time.
Decision makers often need to consider multiple criteria in order to deter-
mine the best course of action to solve a particular problem. The relationship
among these decision criteria can be conflicting, which implies that trade-
offs need to be considered and carefully evaluated. The search for an opti-
mal solution for a multiobjective problem becomes a simultaneous process of
optimizing two or more conflicting objectives. Refer to Ravindran (2008) and
Ravindran et al. (2006) for further reading on multiple objective optimization
methods.
As described by Masud and Ravindran (2008), a multicriteria decision-
making problem in general can be represented as follows:
Designing Resilient Global Supply Chain Networks 245
Maximize
x ∈X
logistics aspects, while most developed thus far have focus primarily on
single-criterion financial measures. A vast majority has addressed portions
of the supply chain. Moreover, only a few have incorporated multinational
and global criteria. Research for supply chain design started early on with
a model developed for Hunt-Wesson Foods (Geoffrion and Graves 1974).
After a decade, a system was implemented for Nabisco Foods, Inc. (Brown
et al. 1987). Several applications followed for a petrochemical company (Van
Roy and Wolsey 1985), Libbey–Owens–Ford (Martin et al. 1993), and Auli
Foods (Pooley 1994). A comprehensive global supply chain model (GSCM)
was applied to Digital Equipment Corporation (Arntzen et al. 1995). Later,
a supply chain restructuring was supported at Procter & Gamble using
mathematical optimization models (Camm et al. 1997). Other large-scale
comprehensive models were implemented at Caterpillar (Rao et al. 2000), an
agrochemicals company (Sousa et al. 2008), and to a global chemicals firm
(Tsiakis and Papageorgiou 2008). All, except for the one at Digital Equipment
Corporation (1995), have been established either for optimizing costs or
profitability alone. This led to multiobjective approaches and surprisingly
enough, despite the nature of the problem, very little work has been devoted
using multicriteria techniques. The relationship among criteria can be con-
flicting, implying trade-offs, so the search for optimality becomes a simulta-
neous process. An early multicriteria approach was presented at Netherlands
Car BV (Ashayeri and Rongen 1987). Analytic hierarchy process (AHP) and
ranking methods for facility relocation were also proposed for solving sup-
ply chain design problems (Melachrinoudis and Min 1999). These authors
later included two weighted objectives within a single nonpreemptive Goal
Programing (GP) objective (Melachrinoudis et al. 2000). An AHP for optimiz-
ing the strategic importance of customers and their related risks was also
proposed (Korpela et al. 2002). More recently, a bicriteria nonlinear stochastic
optimization model was presented to determine the best supply chain (Gaur
and Ravindran 2006). Afterwards, a multiobjective model was presented for
solving a vendor selection problem (Wadhwa and Ravindran 2007). Arguably,
most attention has been paid to methodologies that break the problem into
pieces and simplify the inherent complexity of the supply chain.
Customer service level is measured using two factors: (1) demand fulfillment
and (2) speed of delivery. Demand fulfillment is defined as the portion of the
customer demand that is satisfied, namely the quantity that is effectively deliv-
ered to the customers. The ability to completely fulfill customer demand is
modeled as a goal constraint by specifying demand fulfillment targets for all
the combinations of products and customer zones. Speed of delivery is mea-
sured in terms of the lead time to deliver the products to the customers. This is
also modeled as a goal, by minimizing the quantity weighted lead time, based
on volume and the respective delivery lead times.* Weighted lead time targets,
for each customer zone, are explicitly considered in the GP model. In addition,
the multicriteria model considers the minimization of risk associated with
supply chain disruptions. Different measures of risk for domestic and global
sourcing are estimated for each manufacturing, converting, and distribution
location. These measures incorporate facility- and c ountry-specific risk fac-
tors. Facility-specific risk factors are determined based on assessments per-
formed by the decision makers. Country-specific risk factors are obtained by
considering the weighted average cost of capital rates for each country. The
objective of minimizing the risk measure is also modeled as a goal constraint
by setting the overall risk target value for the entire supply chain. Decisions
related to supply chain network design may also require the modeler to con-
sider strategic factors to open new markets, to increase market share, and to
strengthen relationships with customers. This model includes measures for
strategic factors for each facility, based on the ratings provided by the deci-
sion makers. A goal constraint is set to achieve the maximum possible overall
strategic measure for the entire supply chain network.
Among other features, the model allows the evaluation of outsourcing
decisions as well as the consideration of different product mix and corre-
sponding productivity rates on different production lines and at different
locations. The model supports both strategic and tactical decisions.
On the strategic side, the focus is on the design of the supply chain net-
work, in which the optimization model determines the facilities that need
to be opened and their locations, as well as the facilities that negatively
affect profitability and therefore need to be closed. In the case where cur-
rent network capacity (measured in product units) is not sufficient to fulfill
customers’ demand, the model provides for manufacturing and distribution
decisions, evaluating where and how capacity should be expanded or out-
sourced. Note that, when combining products with significantly different
specifications, a common standard unit of measure may be defined within the
enterprise, such as weight (i.e., tons) and volume (i.e., cubic meters) for mea-
suring capacity. Also, the ability to perform analysis at the production line
level facilitates decisions associated with the transfer of equipment among
facilities. Moreover, strategic decisions related to technological changes are
* Corresponding to each arc of the supply chain network that links a facility (plant/DC) to a
customer zone.
250 Big Data Analytics Using Multiple Criteria Decision-Making Models
supported by the model, such as what technologies are more convenient for
the required expansions, or what specific equipment should be considered
for write-off and replacement. The model also assists in tactical decisions,
such as customer zone assignments to the DCs, the development of high-level
production and distribution plans, product allocation to specific equipment,
and cross-sourcing among production facilities. The objective then becomes
the minimization of the deviations from the specified criteria targets: profit,
demand fulfillment, lead time, disruption risk, and strategic factors.
A detailed description of the additions to the multicriteria base model
(Portillo 2016) is included in the following sections, including some addi-
tional mathematical notation.
10.4.1.1 Notation
10.4.1.1.1 Index Sets
Index m is included to represent time periods that can be defined depend-
ing on the level of analysis required as months, quarters, years, or others.
Designing Resilient Global Supply Chain Networks 251
Index o is added to represent the starting period for specific design alterna-
tives. By using the latter, it is possible, for example, to set different start-up
periods for a production line for which the model will determine the optimal
solution by considering the trade-offs from productivity learning curves,
different sourcing options, and machine installation and operating costs. In
this case, a particular production line under evaluation will have multiple
production capacity and operating costs in a specific time period depending
on when it started operations.
10.4.1.1.2 Parameters
This model considers a breakdown of customer demand as well as produc-
tion and distribution capacities, balancing them at each period and optimiz-
ing the supply chain design and flows accordingly. Different values may be
applied to sales prices, raw material and machine variable costs, transfer
prices, duties, and freights over time, allowing for the analysis of the impact
that changes with respect to these parameters in the network, creating cross
sourcing and improving gross profit. These parameters are:
10.4.1.2 Variables
Similarly, the variables corresponding to the manufacturing and converting
supply chain echelons include the m and o indices indicating the correspond-
ing flow between a pair of nodes at period m if the facility or production line
is opened at period o.
x(hiptmo
1) = production of category p at production line t at manufacturing
facility h sent to converting facility i at time period m if production
line is opened at o
x hkptmo = production of category p at production line t at manufactur-
( 2)
The binary variables only include index o, so multiple binary variables will
exist for a particular facility or production line depending on different start-
up period options. Note that index m is not included since the objective is
to decide to open/close an asset or not and if so then determine when is the
optimal time to perform the corresponding action.
z = w d + w2
− ∑ d2+k
k ∈MKS
+ w3 d3+ + w4 d4− + w5
∑ m∈PDS ∑ k ∈MKS ∑
d5−kpm
.
p∈PRS
(10.1)
1 1
MKS MKS * PRS * PDS
Preemptive
z = P1d1− + P2
∑ d2+k
k ∈MKS
+ P3 d3+ + P4 d4− + P5
∑ m∈PDS ∑ k ∈MKS ∑
d5−kpm
.
p∈PRS
(10.2)
MKS MKS * PRS * PDS
below can be given in different units of measure and can significantly vary
in magnitude, for example, gross profit can be given in millions of currency
units, lead time from single to no more than double-digit quantity of days,
demand fulfillment as a percentage amount, and risk and strategy as single-
digit measures. The formulation incorporates each goal target for scaling
purposes assuring that the deviation variable values are such that 0 ≤ d ≤ 1.
In the nonpreemptive model, the objective function minimizes the weighted
result of the scaled deviation variables value. For the preemptive version
of the model, scaling is not necessary and consequently the goal constraint
should incorporate the target values on the right hand side of the equation,
allowing for the deviation variables to take a value according to the base unit
of measure of each goal constraint (currency, days, percentage points, and
risk and strategy units). In this case, the model will sequentially optimize
each of the goals based on their priority.
1
∑ ∑ ∑ ∑ z(jkpm
2)
Pkpm
T µ(1) k∈MKS p∈PRSKk j∈DCSKK m∈PDS
+ ∑ ∑ ∑ ∑ ∑ ( 2)
∑ y ikptmo Pkpm (10.3)
k ∈MKS p∈PRSK k i∈CFSK k t∈CTSCi m∈PDS o∈ODS
( 2)
+ ∑ ∑ ∑ ∑ ∑ ∑ x P
hkptmo kpm
k ∈MKS p∈PRSK k h∈MFSK k t∈CTSMh m∈PDS o∈ODS
− ∑ ∑ ∑ ∑ FMMhtm γ (hto
1)
− ∑ ∑ ∑ ∑ FCMitm γ (ito2) (10.5)
h∈MFS t∈CTSMh m∈PDS o∈ODS i∈CFS t∈CTSCi m o
− ∑ ∑ ∑ ∑ MRChpm ∑ ∑ x(1)) + ∑ ∑ x( 2)
i∈CFSMh t∈CTSMh hiptmo k∈MKSMh t∈CTSMh hkptmo
(10.6)
m∈PDS h∈MFS p∈PRSMh o∈ODS
− ∑ ∑ ∑ ∑ ∑ MMCtmo MPTptmo ∑ x(h1iptmo
i∈CFSMh
)
+ ∑ x(hkptmo
2)
(10.7)
m∈PDS o∈ODS h∈MFS t∈CTSMh p∈PRSTt k ∈MKSMh
− ∑ ∑ ∑ CRCipm
i∈CFS p∈PRSCi m∈PDS
∑ ∑ ∑
t∈CTSCi o∈ODS
∑ y ij(1p)tmo + ∑
j∈DCSCi m∈PDS
( 2)
∑ y ikptmo
k ∈MKSCi m∈PDS
(10.8)
− ∑ ∑ ∑ ∑
m∈PDS o∈ODS i∈CFS t∈CTSCi p∈PRSTt
∑ CMCtmoCPTptmo ∑ y i(j1ptmo
j∈DCSCi
) ( 2)
+ ∑ y ikptmo
k ∈MKSCi
(10.9)
256 Big Data Analytics Using Multiple Criteria Decision-Making Models
( 2)
− ∑ ∑ VOC j ∑ ∑ z(jlpm
1) ( 2)
DFjkp + ∑ ∑ z(jkpm
2)
DFjkp (10.10)
m∈PDS j∈DCS l∈DCSC j p∈PRSDt k ∈MKSD j p∈PRSK k
( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ Thipm x(hiptmo
1)
m∈PDS h∈MFS i∈CFSMh p∈PRSCi t∈CTSCh o∈ODS
( 4 ) (1)
− ∑ ∑ ∑ ∑ ∑ ∑ Tijpm y ijptmo (10.11)
m∈PDS i∈CFS j∈DCSCi p∈PRSDj t∈CTSCi o∈ODS
( 6 ) (1)
− ∑ ∑ ∑ ∑ T z
jlpm jlpm
m∈PDS j∈DCS l∈DCSD j p∈PRSDl
(3)
− ∑ ∑ ∑ ∑ ∑ ∑ Thkpm x(hkptmo
2)
m∈PDS h∈MFS k ∈MKSMh p∈PRSK k t∈CTSCh o∈ODS
( 5) ( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ Tikpm y ikptmo (10.12)
m∈PDS i∈CFS k ∈MKSCi p∈PRSK k t∈CTSCi o∈ODS
(7 ) ( 2)
− ∑ ∑ ∑ ∑ T z
ijpm jkpm
m∈PDS j∈DCS k ∈MKSD j p∈PRSK k
( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ MUChpm I hipm x(h1iptmo
)
m∈PDS h∈MFS p∈PRSMh i∈CFSMh t∈CTSMh o∈ODS
( 4 ) (1)
− ∑ ∑ ∑ ∑ ∑ ∑ CUCipm I ijpm y ijptmo (10.13)
m∈PDS i∈CFS p∈PRSCi j∈DCSDi t∈CTSCi o∈ODS
( 6 ) (1)
− ∑ ∑ ∑ ∑ DUC I
jpm jlpm jlpm z
m∈PDS j∈DCS p∈PRSD j l∈DCSD j
( 2) ( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ TI hipm I hipm x(hiptmo
1)
m∈PDS h∈MFS i∈CFSMh p∈PRSCi t∈CTSMh o∈ODS
(10.14)
− ∑ ∑ ∑ ∑ ∑ ∑ TI i(jpm
4 ) ( 4 ) (1)
I ijpm y ijptmo
m∈PDS i∈CFS j∈DCSCi p∈PRSD j t∈CTSCi o∈ODS
6 ) ( 6 ) (1)
− ∑ ∑ ∑ ∑ TI (jlpm I jlpm z jlpm + d1− − d1+ = 1. (10.15)
m∈PDS j∈DCS l∈DCSD j p∈PRSDl
∑ L(ik5) ∑ ∑ ∑ ( 2)
∑ y ikptmo + ∑ L(j7k) ∑ z(jkpm
2)
i∈CFS o∈ODS m∈PDS p∈PRSK k t∈CTSCi j∈DCS p∈PRSK k
∑ ∑ Dkpm
m∈PDS p∈PRSK k
Dkpm >0
2)
T (k2) ∑ ∑ ∑ ( 2)
∑ y ikptmo + ∑ ∑ z(jkpm
m∈PDS o∈ODS p∈PRSKk t∈CTSCi m∈PDS p∈PRSK k
−
+d − d =
2k
+
2k ∀k ∈ MKS. (10.16)
∑ ∑ Dkpm
m∈PDS p∈PRSK k
Dkpm >0
1
∑ ∑ R(h1)δ (ho1) + ∑ ∑ R(i2)δ (io2) + ∑ ∑ R(j3 )δ (jo3 )
T ( 3 ) h∈MFS o∈ODS i∈CFS o∈ODS j∈DCS o∈ODS
+ ∑
h∈MFS o∈ODS
(
∑ Rc(h1) 1 − δ (ho1) + ∑ ) i∈CFS o∈ODS
(
∑ Rci( 2) 1 − δ (io2) + ∑ ) j∈DCS o∈ODS
(
∑ Rc(j 3 ) 1 − δ (jo3 )
) (10.17)
+ d3− − d3+ = 1.
Designing Resilient Global Supply Chain Networks 257
2)
∑ ∑ ∑ ∑ x(hkptmo
2)
+ ∑ ( 2)
∑ y ikpm o + ∑ z(jkpm
h∈MKSMh t∈CTSMh o∈ODS
m∈PDS i∈CFSK k o∈ODS j∈DCSK k
Dkpm
(10.19)
+ d5−kpm − d5+kpm = 1 ∀m ∈ PDS, k ∈ MKS, p ∈ PRSK k .
( 2)
∑ ∑ z(jlpm1) (3)
DFjlp + ∑ ∑ z(jkpm
2)
DFjkp ≤ o∈∑ SC jmoδ (jo3 ) ∀m, j ∈ DCS. (10.20)
p∈PRSDl
l∈DCSD j k ∈MKSD j p∈PRSK k ODS
∑ ∑ ∑ ∑ DFip(1) x(hiptmo
1)
=
h∈MFSPp i∈CFSCi t∈CTSMh o∈ODS
∑ ∑ ∑ (1)
∑ y ijptmo + ∑ ∑ ∑ ( 2)
∑ y ikptmo , ∀m, p ∈ PRSS. (10.21)
i∈CFSFp j∈DCSCi t∈CTSCi o∈ODS i∈CFSPp k ∈MKSCp t∈CTSC j o∈ODS
DFip(1) ∑ ∑ ∑ x(1) − ∑ ∑ ∑ y (1) − ∑ ∑ ∑ y ( 2) = 0
h∈MFSCi t∈CTSMh o hiptmo j∈DCSCi t∈CTSCi o∈ODS ijptmo k∈MKSCi t∈CTSCi o∈ODS ikptmo
∑ ∑ ∑ (1)
∑ y ijptmo = ∑ ∑ z(jlpm
2)
, ∀m, p ∉ PRSS. (10.23)
i∈CFSPp j∈DCSCi t∈CTSCi o∈ODS j∈DCSPp l∈DCSD j
258 Big Data Analytics Using Multiple Criteria Decision-Making Models
Balance at distribution
∑ ∑ (1)
∑ y ijptmo + ∑ (1)
zljpm = ∑ (1)
zljpm + ∑ z(jkpm
2)
, ∀m, j , p ∈ PRSDj (10.24)
i∈CFSD j t∈CTSCi o∈ODS l∈DCSA j l∈DCSD j k ∈MKSD j
∑ (1)
zljpm = ∑ z(jlpm
1)
+ ∑ z(jkpm
2)
, ∀m, j ∈ DCSTOMARKETS, p ∈ PRSDj . (10.25)
l∈DCSA j l∈DCSD j k ∈MKSD j
Manufacturing capacity
∑ MPThptmo ∑ x(hiptmo
1)
∑ MPThptmo ∑ x(hkptmo
+ p∈PRST
2) ( 1)
≤ ∑o MChtmo γ hto
p∈PRSTt i∈CFSMh t k∈MKSMh
∀m, o, h ∈ MFS, t ∈ CTSMh . (10.26)
Conversion capacity
(1) ( 2)
∑ CPTiptmo ∑ y ijptmo
p∈PRST + p∈PRST
∑ CPTiptmo ∑ y ikptmo ≤ o∈∑ CCitmo γ (ito2)
t j∈DCSCi t k∈MKSCi ODS
(10.27)
∀m, o, i ∈ CFS, t ∈ CTSCi .
γ (hto
1)
≤ δ (ho1) , ∀o, h ∈ MFS, t ∈ CTSMh (10.28)
Distribution binary
Similarly, Equation 10.30 states a shipping capacity constraint in cubic
meters for the flows out of the distribution facility j considering if it is open
or closed. The capacity of a distribution facility may vary at different time
periods m. As well, the formulation provides for the ability to evaluate open-
ing a distribution center at different time periods, considering that its capac-
ity may differ as well depending on the opening period o.
( 3 ) (1) ( 2) ( 2)
∑ ∑ DFjlp z jlpm + ∑ ∑ DFjkp z jkpm − ∑ SC jmoδ (jo3 ) ≤ 0
l∈DCSD j p∈PRSDl k ∈MKSD j p∈PRSK k o∈ODS
γ (ht1)′o ≤ γ (hto
1)
, ∀( h, o, t , t ′ ) ∈ MFSTThtt′ (10.31)
∑ γ (hto
1)
≥ δ (ho1) ∀ h, o. (10.34)
t
In a similar way, above equations make sure that at least one production
line t is active in order to open a manufacturing facility h or converting
facility i, applying this restriction to every possible opening period o for a
production line or facility.
Time period binary
∑ δ (ho1) ≤ 1 ∀h (10.35)
o
∑ δ (io2) ≤ 1 ∀i (10.36)
o
∑ δ (jo3 ) ≤ 1 ∀j (10.37)
o
∑ γ (hto
1)
≤1 ∀h (10.38)
o
∑ γ (ito2) ≤ 1 ∀ i. (10.39)
o
The above equations are included to constrain that a given facility or pro-
duction line that has different start-up options opens at only one of the time
periods (i.e., can open only once).
260 Big Data Analytics Using Multiple Criteria Decision-Making Models
All decision variables are continuous and nonnegative, except for δ(1)ho,
γ (hto
1)
, δ(2)io, γ (ito2) , and δ(3)jo that are binary.
1
∑ ∑ ∑ ∑ ∑ z(jkpm
2)
Pkpm Ecm
T µ(1) c∈COSk k∈MKS p∈PRSKk j∈DCSKk m∈PDS
+ ∑ ∑ ∑ ∑ ∑ ∑ ( 2)
∑ y ikptmo Pkpm Ecm (10.40)
c∈COSk k ∈MKS p∈PRSK k i∈CFSK k t∈CTSCi m∈PDS o∈ODS
( 2)
+ ∑ ∑ ∑ ∑ ∑ ∑ ∑ y P E
hkptmo kpm cm
c∈COSk k ∈MKS p∈PRSK k h∈MFSK k t∈CTSMh m∈PDS o∈ODS
− ∑ ∑ ∑ ∑ ∑ Ecm MRChpm ∑ ∑ x(1) + ∑ ∑ x( 2)
i∈CFSMh t∈CTSMh hiptmo k∈MKSMh t∈CTSMh hkptmo
(10.43)
o∈COSh m∈PDS h∈MFS p∈PRSMh o∈ODS
− ∑ ∑ ∑ ∑ ∑ ∑ Ecm MMCtmo MPTptmo ∑ x(hiptmo
i∈CFSMh
1)
+ ∑ x(hkptmo
2)
(10.44)
o∈COSh m∈PDS o∈ODS h∈MFS t∈CTSMh p∈PRSTt k ∈MKSMh
− ∑ ∑ ∑ ∑ EcmCRCipm ∑
c∈COSj i∈CFS p∈PRSCi m∈PDS t∈CTSCi o∈ODS
∑ ∑ (1)
∑ y ijptmo
j∈DCSCi m∈PDS
+ ∑ ( 2)
∑ y ikptmo
k ∈MKSCi m∈PDS
(10.45)
− ∑ ∑ ∑ ∑ ∑
c∈COSi m∈PDS o∈ODS i∈CFS t∈CTSCi p∈PRSTt
(1)
∑ EcmCMCtmoCPTptmo ∑ y ijptmo
j∈DCSCi
( 2)
+ ∑ y ikptmo
k ∈MKSCi
(10.46)
( 2)
− ∑ ∑ ∑ EcmVOC j ∑ ∑ z(jlpm
1) ( 2))
DFjkp + ∑ ∑ z(jkpm
2)
DFjkp (10.47)
c∈COSj m∈PDS j∈DCS l∈DCSC j p∈PRSDl k ∈MKSD j p∈PRSK k
( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ EcmThipm x(hiptmo
1)
c∈COSh m∈PDS h∈MFS i∈CFSMh p∈PRSCi t∈CTSCh o∈ODS
( 4 ) ( 1)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ EcmTijpm y ijptmo
c∈COSi m∈PDS i∈CFS j∈DCSCi p∈PRSD j t∈CTSCi o∈ODS (10.48)
( 6 ) (1)
− ∑ ∑ ∑ ∑ ∑ E T z
cm jlpm jlpm
c∈COSj m∈PDS j∈DCS l∈DCSD j p∈PRSDl
(3)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ EcmThkpm x(hkptmo
2)
c∈COSh m∈PDS h∈MFS k ∈MKSMh p∈PRSK k t∈CTSCh o∈ODS
( 5) ( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ EcmTikpm y ikptmo (10.49)
c∈COSi m∈PDS i∈CFS k ∈MKSCi p∈PRSK k t∈CTSCi o∈ODS
(7 ) ( 2)
− ∑ ∑ ∑ ∑ ∑ E T z
cm jkpm jkpm
c∈COSj m∈PDS j∈DCS k ∈MKSD j p∈PRSK k
( 2)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ Ecm MUChpm I hipm x(hiptmo
1)
c∈COSh m∈PDS h∈MFS p∈PRSMh i∈CFSMh t∈CTSMh o∈ODS
( 4 ) (1)
− ∑ ∑ ∑ ∑ ∑ ∑ ∑ EcmCUCipm I ijpm y ijptmo (10.50)
c∈COSi m∈PDS i∈CFS p∈PRSCi j∈DCSDi t∈CTSCi o∈ODS
6 ) ( 6 ) ( 1)
− ∑ ∑ ∑ ∑ ∑ EcmTI (jlpm I jlpm z jlpm (10.52)
c∈COSj m∈PDS j∈DCS l∈DCSD j p∈PRSDl
In the last three years, the sales growth for this business has been 21% per
year. Recent customer demand forecasts indicate that with the currently
operating capacity plus a recent installation of one machine in a regional
plant outside the country, this business unit will need to import from outside
the region approximately 17% of the total volume over the next 5 years. The
incremental cost of import ranges from $12 to 22/Standard Unit versus pro-
ducing in the regional plant. The purpose is to determine the most efficient
supply chain network configuration and sourcing plans that best support
this expected growth. The analysis consists of comparing the status quo to
three new different scenarios as results from a preliminary analysis suggest
that the status quo scenario is not convenient for the long-term sustainability
of the business because of the supply disruption risk of importing 17% of the
demand over the next 5 years, besides incurring higher product costs.
The first scenario consists of adding one more machine in the regional
plant leveraging centralization efficiencies. The regional plant has four
machines currently running and a fifth machine that has been installed and
will soon begin operating. The proposal is to add a sixth machine. The sec-
ond scenario proposes installing a new machine at a local DC in a neighbor-
ing country. With this, relevant savings could be achieved from reductions
in freights and duties; however, additional fixed operating costs may become
necessary and operational efficiency in the local facility could be lower than
in the regional plant. Results of these scenarios illustrate how installing a
new machine, regardless of its location, significantly reduces the volume
of imports from outside the region, having a positive impact on the cost of
goods and reducing the overall risk of supply chain disruptions. On one
side, installing the new machine in the regional plant instead of a newly
opened plant locally increases the overall production volume due to a better
productivity learning curve in the regional plant, reducing the imports out
of the region. On the other hand, installing it in a new plant locally would
reduce the imports within the region but slightly increase the imports out-
side the region. Note that increasing the production capacity in the regional
plant implies that the local market would need to continue being sourced
with imported goods from the regional plant, which has an increased risk of
supply chain disruptions than producing locally.
The third scenario suggests installing the new machine in the local facility
and transferring the recently installed machine from the regional plant. In
this case, it is expected that the incremental fixed operating costs would get
diluted in the higher throughput and that better operating efficiency could
be reached. For Scenario 3, it is assumed that the new machines will have
a faster learning curve and better productivity levels in the regional plant,
leveraging the know-how and product mix efficiency because the high num-
ber of machines would allow to reduce the number of changes required to
produce different product types. Installing the two machines in the local dis-
tribution center would still cost more than installing them in the currently
operating regional plant. This is because existing overheads in the regional
264 Big Data Analytics Using Multiple Criteria Decision-Making Models
plant get diluted in much higher production volume even if new machines
are not installed, compared with additional headcount and infrastructure
required in a new production local facility. Moreover, it is important to deter-
mine the best time to install the new machine. This can happen immediately
or the following year, balancing between machine utilization rate and the
associated incremental fixed operating costs. Any of these options would
allow sourcing within the region for all of the market requirements, so the
increased costs from imports outside the region could be reduced and sup-
ply chain risk of disruptions minimized. The objective is to maximize gross
profit and minimize risk. Results from the analysis indicate that install-
ing two machines locally would significantly reduce the imports out of the
region and almost eliminate the imports within the region, giving a very
high autonomy to the local market and therefore strongly minimizing the
risk of supply disruption.
In conclusion, the three new proposals generate savings compared with
the current supply chain network design requiring only between 2% and
4% of imports from outside the region. The scenario of adding one new
machine to the regional plant generates more than twice the savings
obtained from the other alternatives. In addition, it provides at least half
the invested capital payback than installing one or two assets locally. These
results are driven by the fact that when installing assets locally the reduc-
tions in transportation and duties do not set off higher production costs.
The unit cost of producing locally is higher due to lower productivity and
increased operating costs. Therefore, neither opening a new production
facility locally nor continuing importing from outside the region is benefi-
cial considering profits and risk. The best solution to support the expected
business growth locally is to continue with a centralized strategy in the
regional plant moving forward with the newly installed machine plus
adding a new one.
As a complementary analysis indicated, delaying 1 year on the installa-
tion of the new machine in the regional plant affects its production volume.
Considering the high cost of importing goods out of the region, besides
the operational convenience of this solution, the results indicate that it is
economically favorable putting in operation and starting amortizing the
new machine the first year.
The analysis above was conducted in dollars. A second round of a nalysis
was performed in local currency. Given the expected monetary deprecia-
tion trend in the regional plant host country and a more stable currency
in the local one, the results reinforced the previous recommendation. The
currency exchange rate of the country, where the regional plant is located,
presents a steep depreciation rate forecast reaching almost 16% accumu-
lated in 4 years, while the exchange rate from the country proposed as
the location for the new plant presents only a 2% depreciation projection
within the same timeframe. The impact of such currency depreciation dif-
ference provides a competitive cost advantage for exports considering that a
Designing Resilient Global Supply Chain Networks 265
References
Arntzen, B. C., G. G. Brown, T. P. Harrison, and L. L. Trafton, 1995, Global supply
chain management at digital equipment corporation, Interfaces, 25, 69–93.
Ashayeri, J. and J. M. J. Rongen, 1987, Central distribution in Europe: A multi-criteria
approach to location selection, The International Journal of Logistics Management,
9(1), 97–106.
Brown, G. G., G. W. Graves, and M. D. Honczarenko, 1987, Design and operation of a
multi-commodity production/distribution system using primal goal decompo-
sition, Management Science, 33(11), 1469–1480.
Camm, J. D., T. E. Chorman, and F. A. Dull, 1997, Blending OR/MS judgment and GIS:
Restructuring P&G’s supply chain, Interfaces, 27, 128–142.
Friedman, T., 2005, The World is Flat: A Brief History of the Twenty-First Century,
Farrar, Straus & Giroux, New York.
Gaur, S. and A. R. Ravindran, 2006, A Bi-Criteria model for the inventory aggregation
problem under risk pooling, Computers & Industrial Engineering, 51, 482–501.
Geoffrion, A. and G. Graves, 1974, Multicommodity distribution system design by
benders decomposition, Management Science, 29, 822–844.
Kambatla, K., G. Kollias, V. Kumar, and A. Grama, 2014, Trends in big data analytics,
Journal of Parallel and Distributed Computing, 74(7), 2561–2573.
Korpela, J., K. Kyläheiko, and A. Lehmusvaara, 2002, An analytic approach to pro-
duction capacity allocation and supply chain design, International Journal of
Production Economics, 78, 187–195.
Martin, C. H., D. C. Dent, and J. C. Eckhart, 1993, Integrated production, distribution,
and inventory planning at Libbey–Owens–Ford, Interfaces, 23, 68–78.
Masud, A. M. and A. R. Ravindran, 2008, Multiple criteria decision making. In
Operations Research and Management Science Handbook, A. Ravi Ravindran
(Editor). CRC Press, Boca Raton, FL, 2008.
Melachrinoudis, E. and H. Min, 1999, The dynamic relocation and phase-out of a
hybrid, two-echelon plant/warehousing facility: A multiple objective approach,
European Journal of OR, 123, 1–15.
Melachrinoudis, E., H. Min, and A. Messac, 2000, The relocation of a manufacturing/
distribution facility from supply chain perspectives: A physical programming
approach, Multi-criteria Applications, 10, 15–39.
Designing Resilient Global Supply Chain Networks 267
Pooley, J., 1994, Integrated production and distribution planning at Ault foods,
Interfaces, 24, 113–121.
Portillo, R. C., 2009, Resilient Global Supply Chain Network Design. PhD dissertation,
Pennsylvania State University, University Park, PA, 2009.
Portillo, R. C., 2016, Designing resilient global supply chain networks. In Multiple
Criteria Decision Making in Supply Chain Management, A. Ravi Ravindran (Editor),
CRC Press, Boca Raton, FL, 2016.
Rao, U., A. Scheller, and S. Tayur, 2000, Development of a rapid-response supply
chain at caterpillar, Operations Research, 48(2), 189–204.
Ravindran, A. R., 2008, Operations Research and Management Science Handbook. CRC
Press, Taylor and Francis Group, Boca Raton, FL.
Ravindran, A. R., K. M. Ragsdell, and G. V. Reklaitis, 2006, Engineering Optimization:
Methods and Applications. Wiley, Hoboken, NJ.
Ravindran, A. R. and D. P. Warsing, Jr., 2013, Supply Chain Engineering: Models and
Applications. CRC Press, Boca Raton, FL.
Sousa, R., N. Shah, and L. G. Papageorgiou, 2008, Supply chain design and multi-
level planning—An industrial case, Computers and Chemical Engineering, 32,
2643–2663.
Tsiakis, P. and L. G. Papageorgiou, 2008, Optimal production allocation and distribu-
tion supply chain networks, International Journal of Production Economics, 111,
468–483.
Van Roy, T. J. and L. A. Wolsey, 1985, Valid inequalities and separation for uncapaci-
tated fixed charge networks, Operations Research, 4, 105–112.
Wadhwa, V. and A. R. Ravindran, 2007, Vendor selection in outsourcing, Computers
& OR, 34, 3725–3737.
11
MCDM-Based Modeling Framework for
Continuous Performance Evaluation of
Employees to Offer Reward and Recognition
CONTENTS
11.1 Introduction................................................................................................. 269
11.2 Identification of Multiple Variables/Multiple Criteria for
CPEE to Offer R&R..................................................................................... 271
11.3 Performance Evaluation of Employees Using MCDM Methods.......... 282
11.4 Development of an MCDM-Based Modeling Framework for
CPEE to Offer R&R..................................................................................... 288
11.5 Validation of the Proposed MCDM Modeling Framework for
CPEE to Offer R&R..................................................................................... 291
11.6 Managerial Implications............................................................................ 295
11.7 Conclusion................................................................................................... 298
References.............................................................................................................. 299
11.1 Introduction
Periodic performance evaluations of employees are required in order to
measure the contribution level of employee toward organizational objectives
and also to calibrate individual performance. Performance evaluations of
employees in organizations are customarily conducted via a formal perfor-
mance appraisal system (PAS) (Gruenfeld and Weissenberg 1966, Rosen and
Abraham 1966). In general, the PAS is administered on an annual basis to
evaluate the performance of employee over a period of past 1 year (Bassett
and Meyer 1968). Such annual PAS has been subject to criticism from the
industry citing long frequency, among other drawbacks (Henderson 1980,
Ilgen et al. 1981). Some industries such as Information Technology (IT) have
tried to shorten this frequency by making it a half-yearly PAS. Nevertheless,
the resentment with the existing PAS persists. Some of the most common
issues with the existing PAS on annual or half-yearly basis are: recency error,
269
270 Big Data Analytics Using Multiple Criteria Decision-Making Models
(Bhatnagar 2006, Economic Times 2015). It can be presumed that if the pro-
posed framework for CPEE to offer R&R is properly implemented, it could
address the issue of voluntary attrition to some extend and offer better moti-
vation for employees to perform better. This study focuses on developing a
framework for CPEE system to offer R&R for the IT industry where employ-
ees are represented by software engineers (SEs) and project managers (PMs).
Based on the analysis of the literature and based on the observation in
the industries, the performance evaluation of employees is assessed based
on a combination of multiple numerous variables (criteria). Accordingly, the
CPEE to offer R&R should be based on multiple variables/multiple criteria.
From the literature review on CPEE, particularly for SEs, it appears that there
are no variables/criteria that focus on CPEE to offer R&R. Due to this, mul-
tiple variables/multiple criteria suggested in the literature for the traditional
PAS (among all industries) are collected to understand the types of multiple
criteria being considered in PAS. Based on the understanding from these,
subsequently suitable explorative and descriptive research methods have
been carried out to identify the required set of variables/criteria for CPEE
to offer R&R, particularly for SEs. In order to effectively utilize these identi-
fied multiple variables/multiple criteria for evaluating the performance of
employees, a suitable multicriteria decision-making (MCDM) method is nec-
essary. Accordingly in this study, two MCDM methods, analytical hierarchy
process (AHP) and modified Pugh matrix method (MPMM), are considered
and appropriately implemented to develop the proposed framework for
CPEE to offer R&R.
The rest of the chapter is organized as follows: Identification of a compre-
hensive list of variables/criteria from (a) literature review, (b) exploratory,
and (c) descriptive research methods and determining the main criteria for
CPEE to offer R&R are detailed in Section 11.2. The literature focusing on
the analysis of the performance of employees using MCDM methods are
highlighted in Section 11.3. The development of MCDM-based modeling
framework for CPEE to offer R&R is elaborated in Section 11.4. A suitable
numerical example is developed to demonstrate the workability of the pro-
posed framework for CPEE to offer R&R in Section 11.5. Section 11.6 dis-
cusses the managerial implications of the proposed MCDM-based modeling
framework for CPEE to offer R&R. The study concludes by highlighting the
contributions, limitation, and further research of the study in Section 11.7.
literature specifying the variables for CPEE to offer R&R, the literature deal-
ing with performance evaluation of employees using the traditional PAS are
reviewed. Based on this review, a list of 44 variables is identified from the
literature and is shown in column 3 of Table 11.1. In addition, a set of six
variables are intuitively proposed by the researcher for the purpose of per-
formance evaluation of employees (Sreejith 2016) and these are also listed in
column 4 of Table 11.1.
The first 51 variables listed (based on columns 3 and 4) in Table 11.1 are not
exclusively identified for IT industry, so it cannot be confidently assumed
that all these 51 variables are relevant for performance evaluation of SEs in
the IT industry. In addition, in general, all the listed 44 variables in Table 11.1
(column 3) pertaining to the traditional PAS cannot be blindly assumed to
hold good specifically for continuous evaluation of employee. Hence, it is
necessary to conduct exploratory and/or descriptive research methods to
identify the required variables/criteria for CPEE to offer R&R, directly from
the employees of IT industry. Accordingly, a Caselet approach is carried out
and seven SEs are interviewed with suitably prepared Caselet schedule to
identify a set of variables/criteria and individual Caselets are developed.
Due to the brevity of the chapter, the Caselets developed are not presented
in this chapter. From the analysis of the seven Caselets developed, 27 unique
variables/criteria are identified for CPEE of SEs.
As the inference and/or finding obtained from Caselet approach cannot
be generalized, another phase of exploratory research based on semistruc-
tured interviews is conducted among 58 SEs by developing an appropriate
interview schedule using the finding from the Caselet approach. At the end
of this phase, 35 variables/criteria are identified based on the opinion from
SEs for CPEE to offer R&R and they are presented in Table 11.1 (5th column).
The list of variables/criteria identified based on 58 SEs for CPEE to offer
R&R is required to be cross-verified from the administrative employee’s (i.e.,
PM’s) perspective for the purpose of offering R&R. In order to cross-check
and confirm the variables/criteria identified from SEs to offer R&R, 31 PMs
are interviewed by developing a suitable interview schedule based on the
35 variables identified from SEs’ perspectives. At the end of this stage, 29
variables are confirmed from the list of variables identified from SEs’ per-
spective by the PMs. In addition, the PMs added four new variables/criteria
from their perspectives to offer R&R. Accordingly, the list of 33 variables (i.e.,
29 variables confirmed by PMs from the list of variables identified based on
SEs’ perspective and four variables exclusively included based on PM’s per-
spective) is considered based on PMs’ perspective and the same is presented
in Table 11.1 (6th column).
By comparing the list of variables presented in column 2 to column 6
of the Table 11.1, in this study, 33 variables, which are exactly accepted by
PMs, are considered to offer R&R. The 33 variables considered for offering
R&R are listed with description in Table 11.2. Accordingly, in this study, the
CPEE to offer R&R is considered as a function of all the 33 variables listed
TABLE 11.1
A Consolidated List of Variables Identified for CPEE to Offer R&R
Variables/Criteria To Be Considered for CPEE to
Offer R&R
Final Set of
Variables Identified from Software Project Variables/Criteria
Name of the Variable/ Considered Researcher’s Engineers’ Managers’ Considered for
No. Criterion from Literature Perspective Perspective Perspective CPEE to Offer R&R
1 Age ✓ – ✓ ✓ ✓
2 Gender ✓ – – – –
3 Marital status ✓ – – – –
4 Education ✓ – ✓ ✓ ✓
5 University/institution ✓ – ✓ ✓ ✓
MCDM-Based Modeling Framework
6 Experience ✓ – ✓ ✓ ✓
7 Tenure ✓ – ✓ ✓ ✓
8 Personality ✓ – – – –
9 Parent’s education ✓ – ✓ – –
10 Child status ✓ – – – –
11 Quantity of work ✓ – – – –
12 Timeline adherence ✓ – ✓ ✓ ✓
13 Customer interaction ✓ – ✓ ✓ ✓
14 Target achievement ✓ – – – –
15 Timely reporting ✓ – ✓ ✓ ✓
16 Documentation ✓ – ✓ ✓ ✓
17 Reviewing ✓ – ✓ ✓ ✓
18 Analytical ability ✓ – – ✓ ✓
19 Work planning ✓ – – – –
(Continued)
273
274
TABLE 11.1 (Continued)
A Consolidated List of Variables Identified for CPEE to Offer R&R
Variables/Criteria To Be Considered for CPEE to
Offer R&R
Final Set of
Variables Identified from Software Project Variables/Criteria
Name of the Variable/ Considered Researcher’s Engineers’ Managers’ Considered for
No. Criterion from Literature Perspective Perspective Perspective CPEE to Offer R&R
20 Creativity ✓ – ✓ ✓ ✓
21 Communication skills ✓ – ✓ ✓ ✓
22 Knowledge updation ✓ – ✓ ✓ ✓
23 Initiative ✓ – ✓ ✓ ✓
24 Understanding big picture ✓ – ✓ ✓ ✓
25 Additional responsibilities ✓ – ✓ ✓ ✓
26 Presentation skills ✓ – ✓ ✓ ✓
27 Negotiation skills ✓ – – ✓ ✓
28 Ideas/suggestions ✓ – ✓ ✓ ✓
29 Innovation ✓ – – – –
30 Patents/publications ✓ – – – –
31 Self-learning ✓ – ✓ ✓ ✓
32 Leadership ✓ – ✓ ✓ ✓
33 Team cooperation ✓ – ✓ ✓ ✓
34 Punctuality ✓ – ✓ ✓ ✓
35 Mentoring ✓ – – ✓ ✓
36 Perseverance ✓ – – – –
37 Humor sense ✓ – – – –
38 Critical thinking ✓ – – – –
(Continued)
Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 11.1 (Continued)
A Consolidated List of Variables Identified for CPEE to Offer R&R
Variables/Criteria To Be Considered for CPEE to
Offer R&R
Final Set of
Variables Identified from Software Project Variables/Criteria
Name of the Variable/ Considered Researcher’s Engineers’ Managers’ Considered for
No. Criterion from Literature Perspective Perspective Perspective CPEE to Offer R&R
39 Passion ✓ – – – –
40 Resilience ✓ – – – –
41 Commitment ✓ – ✓ ✓ ✓
42 Knowledge sharing ✓ – ✓ ✓ ✓
43 Proactiveness ✓ – – – –
MCDM-Based Modeling Framework
44 Code of conduct ✓ – ✓ ✓ ✓
45 Social volunteering – ✓ – – –
46 Agility – ✓ ✓ – –
47 Corporate social responsibility – ✓ – – –
48 Business domain knowledge – ✓ – – –
49 Multitasking – ✓ ✓ – –
50 Parent’s occupation – ✓ ✓ – –
51 Parent’s domicile – ✓ ✓ – –
52 Improving morale – – ✓ ✓ ✓
53 Quality of the job – – ✓ ✓ ✓
54 Process adherence – – ✓ ✓ ✓
55 Cost saving – – ✓ ✓ ✓
56 Cocurricular activities – – ✓ – –
57 Best practice – – – ✓ ✓
Total number of variables/criteria 44 07 35 33 33
275
276 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 11.2
Criteria Considered Based on the Perspectives of Both SEs and PMs for CPEE to
Offer R&R
Variables/Criteria
Considered for CPEE to
No. Offer R&R Description
1 Age Age of the software engineer (SE)
2 Education Completed highest education level of SE
3 University University/institution where the highest education was
obtained
4 Tenure Number of years spent in the current organization
5 Experience Total years of relevant experience in a similar profile
6 Quality of the job The output produced should conform to the
requirements/expectation
7 Timeline adherence The task should be completed on time or ahead of time
8 Process adherence Standard process for the job execution should be adhered
9 Customer interaction Ability to communicate with the client/customer to
convey and elicit required information
10 Documentation Creating, updating, and maintaining all documents
relating to the job
11 Reviewing Willingness and ability to review other documents,
codes, etc.
12 Timely reporting Reporting the progress or defects on time, so that
corrective action can be taken with minimal loss
13 Analytical ability Ability to think in a logical and analytical manner
14 Best practice Display best practice in process and quality
15 Communication skills Ability to convey ideas orally and verbally
16 Ideas and suggestions Recommend valid and implementable improvement
suggestions
17 Knowledge updation Keeping oneself updated with the knowledge in the
fields by certifications and other relevant qualifications
18 Negotiation skills Ability to confer with another person/department in the
team/organization in order to come to terms or reach
an agreement
19 Cost saving Demonstrate measures to save cost for the project and
organization
20 Presentation skills Ability to present the ideas to audience
21 Understanding big picture Ability to comprehend and understand the big picture of
the job assigned
22 Additional responsibilities Willingness to take up additional administrative
responsibilities that fall outside the normal scope of job
(like interviewing, auditing, etc.)
23 Creativity Ability to think and come up with some creative
solutions/process
24 Initiative Proactiveness, innovating processes, which influence the
project
(Continued)
MCDM-Based Modeling Framework 277
in Table 11.2. Upon carefully scrutinizing the variables listed in Table 11.2,
it can be observed that the first five variables: age, education, university,
tenure, and experience can be grouped to the demographic characteristic of
the SEs and called DCSE in this study, while the remaining 28 variables
can be grouped to performance of SEs and they are called PSE in this study.
Though the final list of 33 variables identified based on (a) analyzing the
literature review, and (b) both perspectives’ of SEs and PMs using explor-
atory research methods, the importance of these variables for CPEE to offer
R&R is not ascertained based on large-scale opinion of SEs. Accordingly, this
is done from the opinion sought from a sample of 443 SEs from 12 different
IT organizations by carrying out descriptive research.
For conducting the descriptive research, a questionnaire is designed incor-
porating the 33 variables and the respondents are asked to rate the impor-
tance of each of these variables for CPEE to offer R&R. The responses are
sought on a 7-point Likert scale (7 being extremely important). Using the
data obtained from 443 respondents, a bivariate analysis is conducted using
t-test to identify the significance of five demographic variables which may
influence the CPEE to offer R&R. The details on the bivariate analysis with
t-test for the significance of demographic variables on the influence of CPEE
of SEs are shown in Table 11.3. From Table 11.3, it is clear that only three
demographic variables viz. education (E), university (U), and experience (X)
278
TABLE 11.3
Bivariate Analysis and the Significant Demographic Variables in DCSE
Demographic Bifurcation Proactive Prompt Resourceful Responsible Diagnostic Dynamic
Variable Characteristics N Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Age ≤23 years 218 22.9 (3.44) 23.763 (3.054) 24.645 (2.79) 34.503 (4.77) 28.813 (3.45) 18.504 (1.84)
>23 years 225 23.802 (2.8) 23.114 (2.89) 24.870 (2.66) 34.24 (4.68) 28.41 (3.38) 18.395 (1.72)
t = 1.59 t = 2.23* t = 0.87 t = 0.59 t = 1.24 t = 0.64
Education UG 311 22.174 (3.401) 23.647 (2.69) 24.902 (2.97) 34.017 (4.23) 28.11 (3.74) 18.222 (1.9)
PG 132 23.784 (2.78) 22.925 (3.07) 24.261 (2.4) 34.982 (4.95) 28.956 (3.56) 18.871 (1.82)
t = 4.79** t = 2.47* t = 2.19* t = 2.08* t = 2.21* t = 3.33**
University IIX 27 23.781 (2.6) 24.48 (3.15) 25.63 (3.12) 35.288 (5.02) 30.04 (4.18) 19.104 (1.88)
Non-IIX 416 23.047 (3.07) 23.072 (3.06) 24.477 (2.89) 33.47 (4.53) 28.205 (3.48) 18.377 (1.75)
t = 2.76* t = 2.31* t = 1.95 t = 2.00* t = 2.62** t = 2.08*
Tenure ≤3 years 262 23.115 (3.27) 23.440 (3.43) 24.512 (2.64) 34.63 (4.47) 28.422 (3.55) 18.51 (1.83)
>3 years 181 22.94 (3.0) 23.911 (2.88) 24.808 (2.87) 34.895 (4.86) 28.83 (3.72) 18.79 (1.81)
t = 0.57 t = 1.51 t = 1.12 t = 0.59 t = 1.16 t = 1.56
Experience ≤3 years 280 23.101 (3.1) 23.911 (2.84) 24.205 (2.44) 34.2.1 (4.56) 28.3 (3.29) 18.314 (1.86)
>3 years 163 23.718 (2.55) 23.27 (3.1) 24.73 (2.74) 35.589 (4.8) 29.41 (3.86) 18.86 (1.78)
t = 2.15* t = 2.21* t = 2.09* t = 3.03** t = 3.21** t = 3.03**
*significant at p < 0.05, **significant at p < 0.01.
Big Data Analytics Using Multiple Criteria Decision-Making Models
MCDM-Based Modeling Framework 279
have significant influence in all the six main criteria. Accordingly, these three
significant demographic variables (E, U, and X) are considered to constitute
DCSE.
The importance of the 28 variables (i.e., variables numbered from 6 to 33
in Table 11.2), grouped under PSE, is subjected to statistical analysis using
factor analysis and structural equation modeling (SEM). The factor analysis
is performed so as to identify the latent structure of the 28 variables, if any,
so as to group them under the same factor. The factor analysis yielded six
factors and they are named as: proactive, prompt, resourceful, responsible, diag-
nostic, and dynamic. The corresponding variables being manifested by each
of these six factors are shown in Table 11.4. These six factors are considered
as the main criteria against which SEs will be evaluated to ascertain their
performance.
Summarizing the analysis based on t-test and factor analysis, the three
demographic variables (i.e., DCSE) and the six factors/main criteria: proac-
tive, prompt, resourceful, responsible, diagnostic, and dynamic related to PSE for
SEs are considered to be important for CPEE to offer R&R. Based on these, a
framework is proposed for CPEE to offer R&R and is shown in Figure 11.1.
Furthermore, the following hypotheses are also proposed based on the
framework shown in Figure 11.1:
TABLE 11.4
Main Factor/Criterion-Wise List of Variables Grouped under PSE by Factor Analysis
Factor 5:
Factor 1: Proactive Factor 2: Prompt Factor 3: Resourceful Factor 4: Responsible Diagnostic Factor 6: Dynamic
Knowledge updation Timeline adherence Understanding big picture Additional responsibilities Quality of the job Customer interaction
Initiative Timely reporting Ideas and suggestions Knowledge sharing Documentation Communication
Self-learning Process adherence Creativity Commitment Analytical ability Negotiation
Leadership Punctuality Cost saving Teamwork Reviewing –
– – – Mentoring Presentation –
– – – Improving morale – –
Note: The variables “Best Practice” and “Code of Conduct” listed under PES were not seen in the above table because these two variables are not con-
sidered for factor analysis owing to low KMO value.
Big Data Analytics Using Multiple Criteria Decision-Making Models
MCDM-Based Modeling Framework 281
PSE
Proactive
Prompt
Resourceful
DCSE
(E, U, X) R&R
Responsible
Diagnostic
Dynamic
FIGURE 11.1
A proposed framework for CPEE to offer R&R.
PSE
Proactive
0.7947 0.1737
(3.279**) (4.809**)
0.9035 Diagnostic
(4.196**) 0.162
(2.605**)
Dynamic
FIGURE 11.2
Proposed framework for CPEE to offer R&R with the path coefficients.
282 Big Data Analytics Using Multiple Criteria Decision-Making Models
PSE
Proactive
Moderates
Prompt
Influences
DCSE Resourceful R&R
Responsible
Diagnostic
Dynamic
FIGURE 11.3
Final version of the framework for CPEE to offer R&R.
by Jati (2011) using MCDM methods: AHP and PROMETHEE II, particularly for
evaluating three school teachers based on 10 criteria. AHP is used to determine
the weights of the criteria using pairwise comparison. Further, the author uses
PROMETHEE II to calculate preference functions for all pairs of alternatives,
across the ten criteria considered in that study. The net outranking flow is cal-
culated and based on which the performance of the three teachers is ranked.
Wu et al. (2012) uses a fuzzy MCDM approach to evaluate the employee
performance of aircraft maintenance staff. They use FAHP and Vise
Kriterijumska Optimizacija I Kompromisno Resenje (VIKOR) to develop a fuzzy
MCDM method to evaluate 51 employees based on four performance dimen-
sions. They observe that the ranking result obtained from the fuzzy MCDM
method is differed from the traditional evaluation model. They also claim
that their model brought in better fairness to the evaluation process.
Using DEA, a study is conducted to evaluate the performance efficien-
cies of banking professionals in Slovenia (Zbranek 2013) considering salary,
working conditions, and benefits as input variables, and work motivation,
job satisfaction, and organizational commitment as output factors. They
observe that 12 employees among 60 are fully efficient and the remaining
48 are recommended for training. Islam et al. (2013) uses FAHP and TOPSIS
to evaluate the performance of banking professionals based on five crite-
ria. They use FAHP to calculate the weights of these criteria and subcriteria.
Further, they evaluate the employees using TOPSIS by assigning triangular
fuzzy numbers to the subcriteria. Based on these scores, the employee with
a score closest to the ideal solution and farthest from the nonideal solution is
identified for efficient one and the ones who need training.
Morte et al. (2013) conducted the performance evaluation of 31 road drivers
in a Portuguese logistic company. As the number of employees (alternatives)
is large, they do not use AHP so as to eliminate the pairwise comparison
process. Instead they use PROMETHEE and Methodologia Multicriterio para
Apoio a Selecao de Sistemas de Informacao (MMASSI) for evaluating two groups
consisting of two and three decision makers, respectively, on 11 criteria and
31 employees. These evaluations are then compared with a set of self-evalua-
tion made by the employee themselves to finally rank the employees accord-
ing to their performance.
Gurbuz and Albayrak (2014) consider both analytical network processing
(ANP) and CI to evaluate the employees of a pharmaceutical organization.
They highlight that the traditional methods for performance evaluation of
employees are highly subjective and do not provide a comprehensive infor-
mation about the employee performance. They propose a framework based
on ANP and CI based on three major criteria: sales-related performance
criteria, customer-related performance criteria, and relations-related perfor-
mance criteria. In addition, they consider organizational climate and demo-
graphic variables as moderating factor. Finally, they recommend that it is
important to understand the interdependencies of the criteria and alterna-
tives which influence the decision-making process.
286 Big Data Analytics Using Multiple Criteria Decision-Making Models
DSi = ∑w * d
j
j ij for all i = 1, 2,…., m. (11.1)
DSi
NDSi = for all i = 1, 2,…., m, (11.2)
TDS
Step 4: Evaluate the SEs based on the main criteria of PSE using MPMM
In the PMM, one alternative is selected as a baseline (B), and all other
alternatives are compared against B, with respect to each criterion (Pugh
1991). The comparison is denoted as −1 for worse score, 0 for equal score and
+1 for better score. This evaluation results in a column vector with scores
for all the alternatives other than the baseline (whose score will be zero).
The alternative with the highest positive score would be chosen as the best
alternative.
The PMM has been criticized highlighting three major limitations (Mullur
et al. 2003): (i) the criteria weights are not incorporated, (ii) the rating scale
is too small, and (iii) the selection of random baseline alternative could lead
to bias. These limitations are addressed in this chapter by (i) using AHP to
calculate the criteria weights, (ii) by increasing the evaluation range of the
scale similar to a 5-point Likert scale, and (iii) by selecting all alternatives to
be baseline and all other alternatives could be evaluated against the baseline.
This implies a modification to the original PMM and referred as modified
PMM (MPMM) in this chapter.
The evaluation scale for the MPMM for comparing the performance
of two SEs A and B (where B is the baseline) would be by providing a
qualitative measure of performance evaluation (g ij) for a given criterion,
j such that:
The gij is calculated for m SEs with respect to each of the main criteria con-
sidered for CPEE to offer R&R. Using the values of gij and vj for n criteria, the
performance score (PS) of SEs can be calculated as:
PSi = ∑v * g
J
j ij for all i = 1, 2,… , m; and j = 1, 2,… , n. (11.3)
290 Big Data Analytics Using Multiple Criteria Decision-Making Models
PSi
NPSi = for all i − 1, 2, …., m, (11.4)
TPS
where aij = NPSij which represents the NPS of ith SE when jth SE is the
baseline.
After executing Step 4 using the MPMM, it would result in m NPS score for
every SE. The mean of NPS needs to be ascertained, such that
m
NPSij
NPSi =
∑ m
for all i = 1, 2, … , m.
j =1
For each of the m SEs, the final performance score (FPSi) can be obtained by
incorporating the moderating effect of DCSE with the six main factors relat-
ing to PSE. This can be computed in this study as follows:
NPSi
FPSi = for all i = 1, 2,…, m. (11.5)
NDSi
At the end of Step 5, the FPSi of m SEs is considered for ranking the given
set of SEs. Accordingly, the SE who has highest FPS can be offered R&R.
If more than one has equal highest score, then everyone should be offered
R&R. This completes one cycle (evaluation period) of CPEE of SEs.
The FPSi of m SEs should be stored in an R&R database (RRDB), which
could be accessed at a later point of time. Furthermore, at the end of every
cycle (i.e., evaluation period) of CPEE, the FPSi score of m SEs needs to be
MCDM-Based Modeling Framework 291
reset to zero. After initializing the FPSi score of m SEs, a fresh evaluation
needs to be carried out and the process needs to be repeated from Step 4.
The validation of the proposed MCDM-based modeling framework for
CPEE of SEs for R&R presented here is illustrated with a numerical example
in the following section.
TABLE 11.6
Data on Demographic Characteristics of Eight Software Engineers for the
Numerical Example
Demographic Characteristics of Each Software Engineer
Name of the Experience (X)
No. Software Engineer Education (E) University (U) in Years
1 Anita BTech Cochin University 5
2 Benny BE Amrita University 2
3 Casper BTech NIT, Trichy 2
4 Deepak MCA Anna University 1
5 Esther MTech IIT Delhi 2
6 Faizel BE Manipal University 4
7 Gopi ME VTU 6
8 Harsha BE Anna University 1
292 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 11.7
Computed Weights for Demographic Variables of DCSE Using AHP
Relative Importance
Demographic of Demographic
Variables E U X Variables by Weights
Education (E) 1 2 1/3 0.216
University (U) 1/2 1 1/7 0.103
Experience (X) 3 7 1 0.682
In this study, the relative importance (i.e., the normalized weights) obtained
for the demographic variables: E, U, and X would remain unchanged for the
PAS period unless a new demographic variable becomes significant for mea-
suring DCSE or one of the demographic variables considered for measuring
DCSE loses its significance for measuring DCSE. So, using these normalized
weights for the demographic variables: E, U, and X of DCSE, the team of
eight SEs can be evaluated so as to ascertain their demographic score (DS).
This is detailed in the next step.
After ascertaining the weights for demographic variables, each of the eight
SEs are rated on an ordinal scale, with respect to their demographic charac-
teristics (Table 11.6). The ordinal ratings for each of the eight SEs are obtained
according to the following scale:
Based on the scale defined for the demographic variables and the data
given in Table 11.6, the ordinal ratings are obtained for each of the eight
SEs and the same is given in Table 11.8. In addition, using Equations 11.1
and 11.2, both DS and NDS for each SE are calculated and presented in
Table 11.8. The SE-wise NDS presented in Table 11.8 could be fixed for
utmost 1 year (or for the PAS period), and hence the Step 2 need not to be
repeated unless there is a change in the team of SEs considered for perfor-
mance evaluation.
MCDM-Based Modeling Framework 293
TABLE 11.8
Software Engineer-Wise Ordinal Ratings and the Computed Score on DS
and NDS
Weights for the Demographic Computed
Name of the Variable: Score
Software
Engineer E: 0.216 U: 0.103 X: 0.682 DS NDS
Anita 1 1 2 1.682 0.154
Benny 1 1 1 1.000 0.092
Casper 1 2 1 1.103 0.101
Deepak 2 1 1 1.216 0.112
Esther 2 2 1 1.318 0.121
Faizel 1 1 2 1.682 0.154
Gopi 2 1 2 1.897 0.174
Harsha 1 1 1 1.000 0.092
Step 3: Determine importance of the six factors/main criteria of PSE using AHP
The six main criteria defined to represent PSE need to be weighted for their
relative importance. This is done using AHP utilizing Saaty’s basic scale.
The Saaty’s basic scale for the six main criteria and the weight obtained from
AHP toward the relative importance (i.e., normalized weights) of each of the
six main criteria are shown in Table 11.9.
The computed weights for the six main criteria considered for CPEE to
offer R&R, presented in Table 11.9, can be fixed for a relatively long duration
(or for the PAS period), unless some criteria become irrelevant or some new
criteria need to be added as part of PSE. Using these computed weights for
the six main criteria, the performance of SEs can be evaluated using MPMM
approach. This is detailed in the next step.
After computing the importance of weights for each of the six main criteria
of PSE, the team of eight SEs is compared against each of the main criteria for
evaluating their performance using MPMM. For proceeding with MPMM,
the qualitative measure of performance evaluation (gij) for a given criterion j,
as defined in Section 11.4, is followed to generate the data for each of the eight
SEs against each of the criteria.
The pairwise comparison and the qualitative measure are presented in
Table 11.10. Using the data given in Table 11.10, each of the SEs is considered
as baseline employee and the MPMM process is applied to obtain the m × m
matrix (i.e., 8 × 8 matrix for the numerical example) with NPS score of m (i.e.,
8) SEs and is shown in Table 11.11 (columns 2–9).
294
TABLE 11.9
Computed Weights for the Six Factors/Main Criteria, Representing PSE Using AHP
Main Criteria Representing PSE Relative
Main Importance of
Criteria Proactive Prompt Resourceful Responsible Diagnostic Dynamic Main Criteria
Proactive 1 1/7 1/3 1 3 1/3 0.088
Prompt 7 1 5 3 4 3 0.406
Resourceful 3 1/5 1 3 1/2 1 0.151
Responsible 1 1/3 1/3 1 4 2 0.141
Diagnostic 1/3 1/4 2 1/4 1 1/2 0.086
Dynamic 3 1/3 1 1/2 2 1 0.128
Big Data Analytics Using Multiple Criteria Decision-Making Models
MCDM-Based Modeling Framework 295
TABLE 11.10
Performance Evaluation of SE Using MPMM
Name of Main Criteria Representing PSE with Its Weight (i.e., Importance) in Bracket
the
Software Proactive Prompt Resourceful Responsible Diagnostic Dynamic
Engineer (0.088) (0.406) (0.151) (0.141) (0.086) (0.128)
Anita 0 0 0 0 0 0
Benny −1 0 1 2 0 −1
Casper 1 1 0 −1 0 1
Deepak 2 0 1 2 −1 −2
Esther −2 2 0 1 1 −1
Faizel 1 0 0 −2 2 −1
Gopi −1 −1 1 0 2 0
Harsha −1 1 1 −1 0 1
Finally the SE-wise mean of NPS (i.e., NPS) is computed and presented in
Table 11.11 (column 10). From Table 11.11, particularly the column titled NPS, it
can be observed that Esther has the highest NPS and ranks first. Accordingly,
Esther can be offered R&R, if the demographic profile of SEs is not considered. In
order to be judicious and equitable in evaluation, it is important to consider
the moderating effect of NDS on the NPS. This is explained in the next step.
Step 5: Calculate the FPS by incorporating the moderating effect of NDS on NPS
TABLE 11.11
The NPS and FPS for Each Software Engineer
NPS Obtained Using MPMM Considering Baseline Employee as
Software
Engineer Anita Benny Casper Deepak Esther Faizel Gopi Harsha NPS FPS
Anita 0.000 0.309 0.116 0.028 −0.075 0.406 0.278 −0.174 0.111 0.720
Benny 0.118 0.000 −0.129 0.340 −0.084 −0.024 0.000 0.205 0.053 0.582
Casper 0.263 0.317 0.000 −0.020 0.105 0.326 0.391 0.046 0.179 1.765
Deepak 0.146 0.328 0.014 0.000 0.411 −0.128 0.455 0.417 0.205 1.841
Esther 0.344 0.928 −0.046 0.283 0.000 0.234 0.014 0.000 0.220 1.816
Faizel −0.081 −0.509 0.372 −0.021 0.340 0.000 −0.279 0.000 −0.022 −0.144
Gopi −0.094 −0.165 0.055 0.188 −0.001 0.000 0.000 0.505 0.061 0.350
Harsha 0.249 −0.209 0.641 0.201 0.183 0.149 0.151 0.000 0.171 1.859
Big Data Analytics Using Multiple Criteria Decision-Making Models
MCDM-Based Modeling Framework 297
employee to offer R&R. The proposed R&R system loses its main objective if
it is not offered on a timely and continuous basis. The frequency (i.e., evalua-
tion frequency) of the CPEE may be recommended as weekly or fortnightly.
One can also think of having the FPS as transparent to every employee of
the team, involved in the evaluation system. When there is a continuous
and transparent output from the system (i.e., timely R&R based on proper
and transparent evaluation), the employee (i.e., SE) might feel energized and
motivated to exhibit better performance. In addition, the PM has to decide an
appropriate R&R to the deserving employee.
The numerical example illustrated in the previous section enables the PM
to identify the best-performing SE(s) based on the relative maximum FPS
after incorporating their respective demographic and performance-related
factors. Instead of having relative FPS, the PM may specify a threshold FPS,
beyond which all SEs can be offered R&R. Moreover, the proposed frame-
work can also be used to identify the relatively worst-performing SEs and
based on this the PM may recommend the specific SE(s) with the least FPS
for special training to get performance improvement.
The FPS obtained during every performance evaluation cycle serves
two purposes, as shown in Figure 11.4. The obvious purpose is to identify
the best-performing SE(s) to offer immediate R&R. The second purpose is
to cumulatively store the individual FPS of all SEs at every performance
evaluation cycle in RRDB. This cumulative score in RRDB can be linked to
organization’s existing PAS and can be used appropriately during the orga-
nization’s regular PAS. In addition to the cumulative FPS, data-indicate the
best- and worst-performing SE(s) over a longer period of time.
Given that the performance evaluation cycle is related to a shorter period
of time, the FPS of SEs during one performance evaluation cycle does not
really convey comprehensive information about the actual performance
of SEs. Hence, the CPEE system has to be a repetitive process with equal
frequency. It is also important that the PM needs to conduct the CPEE in
FIGURE 11.4
Purpose of the proposed MCDM-based modeling framework for CPEE to offer R&R.
298 Big Data Analytics Using Multiple Criteria Decision-Making Models
an independent manner, that is, the FPS obtained during one cycle (perfor-
mance evaluation cycle) should not influence the CPEE during the next cycle.
To address this, after the top performing SE is offered R&R, at the end of one
evaluation period, the FPS of all SEs in the team shall be set to zero and a
fresh evaluation needs to be conducted for the next cycle. Such independent
evaluation process when repeated in a continuous manner would produce a
performance trend, which can easily be captured and interpreted using time
series analysis.
The success of the proposed MCDM framework requires a commitment
from the PM. The PM should tenaciously execute the CPEE process and iden-
tify the best-performing SE to offer R&R. Conducting the CPEE on a weekly
or fortnightly frequency generates a large amount of data. The cumulative
FPS data could serve as an objective input during the periodic appraisal pro-
cess. Further, if a new PM assumes charge during the middle of an appraisal
cycle, the data in the RRDB makes it easier for the new PM to understand the
performance distribution in the team.
11.7 Conclusion
A new research problem on CPEE to offer R&R for an organization is
attempted in this chapter, particularly focusing on IT organizations. A mod-
eling framework for CPEE of SEs is proposed in order to offer R&R. In order
to propose the modeling framework for CPEE to offer R&R, appropriate
exploratory and descriptive research processes are carried out to identify
suitable and adequate variables/criteria. Accordingly, 33 variables/criteria
are identified. These variables/criteria are grouped into demographic vari-
ables (five numbers), which explain the DCSE and performance-related vari-
ables (28 numbers), which explain the direct PSE.
Based on the statistical test, out of five variables/criteria considered under
DCSE, only three variables education (E), university (U), and experience
(X) are statistically significant in representing the DCSE. Furthermore, the
performance-related 28 variables/criteria are grouped into six main criteria/
factors using factor analysis and in this study they are named as proactive,
prompt, resourceful, responsible, diagnostic, and dynamic of PSE.
The AHP is used in this study to compute SE-wise demographic score,
called normalized demographic score (NDS) and PSs, called normalized
performance score (NPS). In addition, the proposed MPMM is appropriately
implemented with the weighted scores: vj obtained from AHP and finally
obtained the mean NPS for each of the SEs. For introducing the moderating
effect of DCSE and ranking the SEs, the data on NPS and NDS are used to
obtain the score on FPS for each of the SEs. Finally, the SE who has the high-
est FPS will become the top performer for getting R&R.
MCDM-Based Modeling Framework 299
Though the MCDM methods, AHP and MPMM, are successfully imple-
mented for demonstrating the workability of the framework for CPEE of
SEs to offer R&R, identifying different MCDM method(s) for its applicabil-
ity toward CPEE to offer R&R and following by a systematic process for
comparing various possible MCDM methods are the immediate research
directions in this area. The development of interactive Excel-based soft-
ware could be another possible extension for the research considered in
this study.
References
Afshari, A. R., M. Mojahed, R. M. Yusuff, T. S. Hong, and M. Y. Ismail. 2010. Personnel
selection using ELECTRE. Journal of Applied Sciences. 10(23): 3068–3075.
Ardabili, F. S. 2011. New framework for modeling performance evaluation for bank
staff departments. Australian Journal of Basic and Applied Sciences. 10: 1037–1043.
Arora, A., V. S. Arunachala, J. Asundi, and R. Fernandes. 2001. The Indian software
services industry. Research Policy. 30: 1267–1287.
Bassett, G. A. and H. H. Meyer. 1968. Performance appraisal based on self-review.
Personnel Psychology. 21(4): 421–430.
Bhatnagar, S. 2006. Indian software industry. V. Chandra (Ed.) 95–124. Technology
Adaptation and Exports: How Some Developing Countries Got It Right. World Bank
Publications: Washington, DC.
Boice, D. H. and B. H. Kleiner. 1997. Designing effective performance appraisal sys-
tems. Work Study. 46(6): 197–201.
300 Big Data Analytics Using Multiple Criteria Decision-Making Models
Ramakrishnan Ramanathan
CONTENTS
12.1 Introduction................................................................................................. 303
12.2 Literature Review.......................................................................................304
12.2.1 Data Envelopment Analysis..........................................................304
12.2.2 DEA as an MCDM Tool..................................................................306
12.2.3 Relationship between Environmental Performance
and Financial Performance............................................................ 307
12.3 Data and Analysis.......................................................................................308
12.3.1 Sample and Data Collection..........................................................308
12.3.1.1 Manufacturing Performance (Manufacturing
Efficiency Scores)..............................................................309
12.3.1.2 Environmental Expenditure........................................... 309
12.3.1.3 Control Variables.............................................................. 310
12.3.2 Regression Model............................................................................ 310
12.4 Summary and Conclusions....................................................................... 311
References.............................................................................................................. 312
12.1 Introduction
We presently live in an era where data are being generated continuously
and in several forms. These data have been called as the next big innovation
(Gobble, 2013), and data analysts strive to make business sense of such data
by analyzing using appropriate tools (Bose, 2009). It is important that appro-
priate tools that have the ability to use such large data and generate useful
business insights are explored and made available to data scientists. In this
regard, this book and this chapter focus on the use of multicriteria decision-
making (MCDM) methods to help data scientists make sense of data. In this
chapter, we illustrate specifically how data envelopment analysis (DEA),
an MCDM tool, can be advantageously employed to help in economic and
303
304 Big Data Analytics Using Multiple Criteria Decision-Making Models
∑
J
vmj y mj
j =1
max
∑
I
umi xmi
i =1
such that
(12.1)
∑
J
vmj y nj
j =1
0≤ ≤ 1; n = 1, 2, … , N
∑
I
umi xni
i =1
vmj , umi ≥ 0; i = 1, 2, … , I ; j = 1, 2, … , J ,
DEA for Environmental–Manufacturing Link 305
where the subscript i stands for inputs, j stands for outputs, and n stands
for the DMUs. The variables vmj and u mi are the weights (also called
multipliers) to be determined by the above mathematical program, and
the subscript m indicates the base DMU. Soon after formulating Model
(12.1), its authors suggested that the nonnegativity restrictions should be
replaced by strict positivity constraints to ensure that all of the known
inputs and outputs have positive weight values (Charnes et al., 1979).
The optimal value of the objective function is the DEA efficiency score
assigned to the mth DMU. If the efficiency score is 1 (or 100%), the mth
DMU satisfies the necessary condition to be DEA efficient and is said to
be located on efficiency frontier; otherwise, it is DEA inefficient. Note
that the efficiency is relative to the performance of other DMUs under
consideration.
It is difficult to solve the above program because of its fractional objec-
tive function. However, if either the denominator or numerator of the ratio
is forced to be unity, then the objective function will become linear, and a
linear programming problem can be obtained. For example, by setting the
denominator of the ratio equal to unity, one can obtain the following output
maximization linear programming problem. Note that by setting the numera-
tor equal to unity, it is equally possible to produce input minimization linear
programming problem.
max ∑v
j =1
y
mj mj
such that
I
∑u
i =1
x
mi mi = 1; (12.2)
J I
∑v
j =1
y −
mj nj ∑u
i =1
x ≤ 0; n = 1, 2,… , N
mi ni
Model (12.2) is called the output maximizing multiplier version in the DEA
literature. A complete DEA model involves solving N such programs
(Model 12.2), each for a base DMU (m = 1, 2, …, N), to get the efficiency
scores of all the DMUs. In each program, the objective function and
the first constraint are changed while the remaining constraints are the
same.
Computation of efficiency score is usually done with the dual of Model
(12.2). The dual constructs a piecewise linear approximation to the true fron-
tier by minimizing the quantities of the different inputs to meet the stated
levels of the different outputs. The dual is given below:
306 Big Data Analytics Using Multiple Criteria Decision-Making Models
min θ m
such that
N
∑y λ
n =1
nj n ≥ y mj ; j = 1, 2, … , J
(12.3)
N
∑x λ
n =1
ni n ≤ θ m xmi ; i = 1, 2, … , I
λ n ≥ 0; n = 1, 2, … , N ; θ m free.
Stewart, 1999). In this study, we use this MCDM perspective of DEA to rank
manufacturing sectors in terms of their ability to produce maximum outputs
by consuming minimum inputs.
(Majumdar and Marcus, 2001; Ramanathan et al., 2010). Thus, based on
Porter’s hypothesis and previous literature, we propose the following
hypothesis:
Hypothesis 1. Environmental protection expenditure to meet environ-
mental regulations is significantly positively related to performance.
TABLE 12.1
Sectors Analyzed
SIC Code Description SIC Code Description
10–14 Mining and quarrying 26 Manufacture of other
nonmetallic mineral products
15–16 Manufacture of food, beverages, 27–28 Manufacture of basic metals
and tobacco products and fabricated metal products
21–22 Manufacture of pulp, paper and 29 Manufacture of machinery and
paper products publishing, equipment not elsewhere
and printing classified
23 Manufacture of coke, refined 30–33 Manufacture of electrical and
petroleum products, and optical equipment
nuclear fuel
24 Manufacture of chemicals, 34–35 Manufacture of transport
chemical products, and equipment
man-made fibers
25 Manufacture of rubber and 40–41 Electricity, gas, and water
plastic products supply
DEA for Environmental–Manufacturing Link 309
TABLE 12.2
DEA Inputs and Outputs for Measuring Manufacturing Efficiency and Descriptive
Statistics
Inputs or Outputs Mean Std. Dev. Min. Max.
Inputs
Compensation of employees (£ millions) 8704 4332 2048 14,870
Net capital stock (£ billions) 24.96 21.78 6.1 88.6
Intermediate consumption (£ millions) 25,035 12,110 6429 45,790
Outputs
Gross value added (£ millions) 14,596 6597 2377 32,202
Gross fixed capital formation (£ millions) 4816 7221 0 20,917
310 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 12.3
Summary Statistics and Correlation Coefficients
Variable Correlation Coefficients
1 2 3 4 5 6 7
1. Manufacturing efficiency 1
2. Number of employees (hundred 0.38*** 1
thousands)
3. Energy consumption (million tonnes −0.55*** −0.09 1
of oil equivalent)
4. Other pollution abatement −0.49*** 0.02 0.00 1
expenditure (£ hundred millions)
5. Waste pollution abatement −0.13 0.40*** −0.03 0.18 1
expenditure (£ hundred millions)
6. Air pollution abatement expenditure −0.48*** 0.02 0.26** 0.43*** 0.05 1
(£ hundred millions)
7. Water pollution abatement −0.20 0.48*** 0.18 0.41*** 0.23* 0.37*** 1
expenditure (£ hundred millions)
Mean 77.73 6.51 2.85 0.45 0.77 0.38 0.83
Std. Dev. 19.62 3.58 2.42 0.39 0.53 0.31 1.01
Min 32 1.22 0.38 0.06 0.03 0.04 0.13
Max 100 15.75 9.15 1.8 2.97 1.27 5.09
***p < 0.01; **p < 0.05; *p < 0.1.
DEA for Environmental–Manufacturing Link 311
TABLE 12.4
Results of the Simple Regression Model (Dependent
Variable: Manufacturing Efficiency)
Variables Regression Coefficient
Controls
Energy consumption −3.63***
Other pollution expenditure −17.08***
Number of employees 2.69***
Direct Effects
Waste expenditure −8.99***
Air expenditure −11.55**
Water expenditure −1.71
R2 0.73
F 23.45***
***p < 0.01; **p < 0.05.
Despite the fact that our hypothesis is not validated, we believe that this
chapter shows how MCDM models can be used to make business sense of
publicly available big data for policy analysis.
References
Banker, R.D., Charnes, A., and Cooper, W.W. 1984. Some models for estimating tech-
nical and scale efficiencies in data envelopment analysis. Management Science,
30(9), 1078–1092.
Belton, V. and Stewart, T.J. 1999. DEA and MCDA: Competing or complementary
approaches? In: Meskens, N. and Roubens, M. (Eds.). Advances in Decision
Analysis, Springer Science & Business Media, Dordrecht, Netherlands, 87–104.
Besley, T. and Burgess, R. 2004. Can labor regulation hinder economic performance?
Evidence from India. Quarterly Journal of Economics, 119, 91–134.
Bose, R. 2009. Advanced analytics: Opportunities and challenges. Industrial
Management & Data Systems, 109(2), 155–172.
Brammer, S., Brooks, C., and Pavelin, S. 2006. Corporate social performance and stock
returns: UK evidence from disaggregate measures. Financial Management, 35(3),
97–116
Callan, S.J. and Thomas, J.M. 2009. Corporate financial performance and corpo-
rate social performance: An update and reinvestigation. Corporate Social
Responsibility and Environmental Management, 16, 61–78
Charnes, A., Cooper, W.W., Lewin, A.Y., and Seiford, L.M. 1994. Data Envelopment
Analysis: Theory, Methodology and Applications, Kluwer, Boston.
Charnes, A., Cooper, W.W., and Rhodes, E. 1979. Short communication: Measuring
the efficiency of decision making units. European Journal of Operational Research,
3, 339–339.
Christainsen, G.B. and Haveman, R.H. 1981. The contribution of environmental
regulations to the slowdown in productivity growth. Journal of Environmental
Economics and Management, 8(4), 381–390.
DEFRA. 2008. UK environmental protection by industry survey. Available at: https://
www.gov.uk/government/statistics/environmental-protection-expenditure-
survey, Last accessed: September 29, 2015.
Emrouznejad, A., Parker, B.R., and Tavares, G. 2008. Evaluation of research in effi-
ciency and productivity: A survey and analysis of the first 30 years of scholarly
literature in DEA. Socio-Economic Planning Sciences, 42(3), 151–157.
Frosch, R.A. 1982. Industrial ecology: A philosophical introduction. Proceedings of the
National Academy of Sciences, 89, 800–803.
Gobble, M.M. 2013. Big data: The next big thing in innovation. Research Technology
Management, 56(1), 64–66.
Golany, B. and Thore, S. 1997. The economic and social performance of nations:
Efficiency and returns to scale. Socio-Economic Planning Sciences, 31(3), 191–204.
Hart, S.L. and Ahuja, G. 1996. Does it pay to be green? An empirical examination of
the relationship between emission reduction and firm performance. Business
Strategy and the Environment, 5, 30–37.
DEA for Environmental–Manufacturing Link 313
Joro, T., Korhonen, P., and Wallenius, J. 1998. Structural comparison of data envelop-
ment analysis and multiple objective linear programming. Management Science,
40, 962–970.
Konar, S. and Cohen, M.A. 2001. Does the market value environmental performance?
The Review of Economics and Statistics, 83(2), 281–289.
Majumdar, S.K. and Marcus, A.A. 2001. Rules versus discretion: The productivity
consequences of flexible regulation. Academy of Management Journal, 44, 170–179.
Margolis, J., Elfenbein, H.A., and Walsh, J. 2007. Does It Pay To Be Good? A Meta-
Analysis and Redirection of Research on the Relationship between Corporate Social
and Financial Performance. Mimeo, Harvard Business School, Boston, MA.
Moore, G. 2001. Corporate social and financial performance: An investigation of the
UK supermarket industry. Journal of Business Ethics, 34, 299–315.
Peloza, J. 2009. The challenge of measuring financial impacts from investments in
corporate social performance. Journal of Management, 35(6), 1518–1541.
Porter, M.E. 1991. America’s green strategy. Scientific American, 264, 168.
Porter, M.E. and van der Linde, C. 1995. Toward a new conception of the environ-
ment-competitiveness relationship. Journal of Economic Perspectives, 9, 97–118.
Ramanathan, R. 2003. An Introduction to Data Envelopment Analysis. Sage, New Delhi.
Ramanathan, R., Black, A., Nath, P., and Muyldermans, L. 2010. Impact of environ-
mental regulations on innovation and performance in the UK industrial sector,
Special Topic Forum on Using Archival and Secondary Data Sources in Supply
Chain Management Research. Management Decision (Special issue on Daring to
Care: A Basis for Responsible Management), 48(10), 1493–1513.
Rugman, A.M. and Verbeke, A. 2000. Six cases of Corporate Strategic responses to
environmental regulation. European Management Journal, 18(4), 377–385.
Russo, M.V. and Fouts, P.A. 1997. A resource-based perspective on corporate environ-
mental performance and profitability. Academy of Management Journal, 40, 534–559.
Sarkis, J. and Cordeiro, J.J. 2001. An empirical evaluation of environmental efficien-
cies and firm performance: Pollution prevention technologies versus end-of-
pipe practice. European Journal of Operational Research, 135(1), 102–113.
Waddock, S.A. and Graves, S.B. 1997. The corporate social performance-financial per-
formance link. Strategic Management Journal, 18(4), 303–319.
13
An Integrated Multicriteria Decision-
Making Model for New Product
Portfolio Management
CONTENTS
13.1 Introduction................................................................................................. 316
13.1.1 Characteristics of New Product Process (NPP).......................... 316
13.1.2 New Product Management........................................................... 317
13.2 New Product Portfolio Management....................................................... 318
13.2.1 Evaluative Dimensions for NPPM................................................ 319
13.3 Methodologies/Models for NPPM........................................................... 322
13.3.1 Quantitative Models....................................................................... 323
13.3.1.1 Quantitative Models with Single Criterion.................. 324
13.3.1.2 Quantitative Models with Multiple Criteria................ 325
13.3.2 Qualitative Models......................................................................... 326
13.3.2.1 Strategic Approaches, Scoring Models,
and Checklists................................................................... 326
13.3.2.2 Balanced Scorecard.......................................................... 326
13.3.2.3 Analytical Hierarchy Approaches/Analytical
Network Process.............................................................. 327
13.4 Development of Integrated DEA–BSC Model for NPPM...................... 328
13.4.1 BSC for Achieving Strategic and Balanced NPP........................ 328
13.4.2 DEA Approach for Ranking and Prioritization
of Portfolios................................................................................329
13.4.2.1 Base CCR Model............................................................... 330
13.4.3 Proposed Integrated MCDM Model for NPPM......................... 331
13.4.3.1 Phase 1: Development of BSC Evaluation Index
System................................................................................ 332
13.4.3.2 Phase 2: Determination of Balance Constraints
for DEA–BSC Model........................................................ 336
13.4.3.3 Phase 3: Development of the Proposed Integrated
DEA–BSC Model.............................................................. 337
13.5 Validation of the Proposed Integrated DEA–BSC Model for NPPM.....338
315
316 Big Data Analytics Using Multiple Criteria Decision-Making Models
13.6 Summary......................................................................................................342
Appendix A: An Analytical Hierarchical Process for BSC.............................343
Appendix B........................................................................................................... 347
References.............................................................................................................. 349
13.1 Introduction
A successful new product does more good for an organization than any-
thing else that can happen.
Crawford
if competitor comes out with that particular project idea. In this scenario,
effective and accurate decision making and managing of NPP is essential.
In literature, the managing processes, decision-making perspective of new
product development are referred to as new product management (NPM).
13.3. In Section 13.4, a base BSC system along with proposed BSC index sys-
tem for NPPM is first discussed and subsequently the proposed integrated
DEA–BSC model is presented. The workability of the integrated decision-
making model: DEA–BSC along with numerical example is presented in
Section 13.5.
Based on the reasons stated here on the complexity and the significance of
NPPM, one needs to consider multiple evaluative dimensions for obtaining
efficient NPPM. The reality and the literature revealed that product/project
managers use different evaluation dimensions/criteria for NPPM. In addition,
the ability of decision models for evaluating accurately the best set of prod-
ucts or projects varies depending on dimensions used and weights applied to
these dimensions. So the identification of evaluative dimensions is inevitable.
In this study, we made an attempt to identify and study the significance of
different evaluative dimensions and its impact on efficiency of NPPM from
the analysis of the literature. Accordingly, the evaluative dimensions identi-
fied through the related research studies on NPPM, R&D portfolio manage-
ment, and PES for NPPM are discussed in detail in the following section.
TABLE 13.1
Summary of Closely Related Literature on Evaluative Dimensions of NPPM
Evaluative Dimensions of Study
Strategic Portfolio Risk– Cost– Resource
Related Studies Fit Balance Uncertainty Revenue Allocation
R&D Project Evaluation and Selection
Osawa and Murakami (2002) + +
Mohanty et al. (2005) + + + +
Eilat et al. (2008) + + + +
Table 13.1 clearly indicates that no one has considered the entire five
e valuative dimensions together for PES decision in NPPM. In addition,
it appears that there is only one study: Oh et al. (2012) considering PIB
evaluative dimension, but not in the perspective of this study and par-
ticularly they considered only portfolio balance. Furthermore, there is
no significant study which is carried out to analyze the interrelationship
between these evaluative dimensions for the significance of NPPM. Finally,
in a ddition to these research gaps, there exists a gap where these identified
evaluative dimensions are not modeled for implementation in industry.
In this study, we attempt to develop an explicit decision-making model
which f ulfills these research gaps. In the next section, methodologies used
in different studies for development of decision models for NPPM are
presented.
Single criteria
linear programming, ILP,
dynamic programming
Quantitative
models
Multiple criteria
ANN, DEA, goal
programming, MOEA
NPPM decision-
making models
Single criteria
strategic buckets, product
matrices, score techniques
Qualitative
models
Multiple criteria
AHP, ANP, BSC,
promethee
FIGURE 13.1
Classification of new product portfolio management models/methodology.
TABLE 13.2
Summary of Recent Related Literature on Models/Methodologies for NPPM
Number of
Approach Criteria Methodology/Model Reference
Quantitative Single Financial indices and Patah and de Carvalho
methods (2007), Ibbs et al. (2004)
Linear programminga Chien (2002)
Integer programming Mavrotas et al. (2006),
Melachrinoudis and
Kozanidis (2002)
Dynamic programminga Kyparisis et al. (1996)
Multiple Goal programming Wey and Wu (2007), Lee and
Kim (2000)
Data envelopment analysis Chang et al. (2014), Kumar
et al. (2007)
Multiobjective Metaxiotis et al. (2012),
evolutionary algorithms Gutjahr et al. (2010)
Qualitative Single Strategic buckets, score Chan and Ip (2010)
techniques
Multiple Multiattribute utility Duarte and Reis (2006)
theorya
Promethee multicriteriaa Halouani et al. (2009)
Balanced scorecard (BSC) Asosheh et al. (2010), Eilat
et al. (2008)
Analytic hierarchy Ayağ and özdemr (2007),
process/analytic network Wang and Hwang (2007),
process Yurdakul (2003)
a Details of these methods/models are not attempted to provide in this study.
management processes, (b) clarify and translate their vision and strategy,
(c) communicate and link strategic objectives and measures, (d) plan and
align strategic initiatives, (e) enhance strategic feedback and learning, etc.
The significance and implementation of BSC is trending exponentially in
present competitive scenario (Eilat et al. 2006).
Translating Communicating
the vision and linking
Feedback
Business
and
planning
learning
• Articulating the
vision • Setting targets
• Strategic review • Strategic alignment
and feedback • Resource allocation
FIGURE 13.2
Managing strategy through balance scorecard.
max Z0 =
∑ vy n
n n0
, (13.1)
µ,ν
∑ ux i
i i0
∑ vyn
n nj
≤ 1, ∀j (where j = 1, … , k ). (13.2)
∑ ux i
i ij
max Z0 =
µ,ν ∑v y n n0
such that,
∑ u x = 1,
i
i i1
∑v y − ∑u x
n
n nj
i
i ij ≤ 0 ∀j
vn ≥ ε , ui ≥ ε.
Phase 1: Development of BSC index system: In this phase, first we relate the
identified evaluative dimensions to the perspectives of BSC. In this
study, we proposed seven perspectives and these have to be mea-
sured. In addition, the required evaluation indicators to measure
these perspectives also need to be determined. The details of these
seven perspectives are described in detail in the next section.
Phase 2: Determination of balance constraints for DEA–BSC model:
In this phase, different perspective balance limits are determined.
Subsequently, upper and lower bounds of each perspective are fixed
and accordingly constraints are drawn which are implemented in
base DEA model.
Phase 3: Development of integrated DEA–BSC model: In this phase, pri-
orities of each perspective to indicator level are determined. These
priorities are considered as weights in DEA model. The balance con-
strains proposed are introduced and respective balance matrixes
are developed and integrated into these balance constraints.
Additionally, resource constraints and other feasibility constraints
are introduced into base DEA model. With these three phases, the
final proposed integrated DEA–BSC model is developed.
Phase 1
Evaluation perspectives of
BSC
Evaluative Dimensions of
NPPM Customer perspective
Innovation perspective
Internal business perspective Estimation and
Strategic fit
Learning and growth validation of
Portfolio-innovation balance
perspective evaluation indicators
Cost-revenue estimation
Sustainability perspective for every perspective
Optimized resource allocation
Risk-uncertainty estimation Value contribution perspective
Risk perspective
Phase 3
Integrated Formulation of
Formulation of
DEA-BSC resource constraints
base DEA model
model
FIGURE 13.3
Proposed framework for integrated DEA–BSC model.
TABLE 13.3
Project Evaluation Index System of the Proposed BSC for NPP
Card Evaluation
Label Perspective Evaluation Indicator Evaluation Objective
O1 Customer Customer satisfaction Ensure that project meets and
perspective Customer trust satisfy customer needs
Degree of customer need met
O2 Internal business Employee satisfaction Ensure that the implementation
perspective Supplier’s satisfaction of project plan, control, and
Project quality planning and other aspects, etc. optimize the
tracking organizational internal
processes
Internal communication
Congruence
Priority level
O3 Learning and Propriety position Ensure that the implementation
growth Platform for growth of the projects cultivates the
perspective Technical and market organizational core
durability technologies and
competitiveness
Team incentive
Knowledge accumulation
Project management
maturity
O4 Innovation Technology newness Ensure that project acquires the
perspective Process newness cutting edge to succeed
Market newness
O5 Sustainability Ecological Ensure that project/product
perspective Social sustainability is achieved
Economic
O6 Value contribution Project profitability Ensure that the projects are
perspective Project speed-up cycle completed in accordance with
Product sales desired objectives and provide
business value
O7 Uncertainty Operational–technical Ensure that projects are
perspective Organizational completed in accordance with
Financial time and specifications
without any uncertainties and
Marketing
delay in development
I1 Resources Investments –
Human resources
Machinery and equipment
there exist seven output cards (evaluation perspectives) and one input card.
The base nomenclature of BSC index system is as follows:
For BSC index system:
LIs ≤
∑ ux
i ∈I s
i ij
≤ U Is ∀j , … (13.3)
∑ uxi
i ij
LOr ≤
∑ vy
nε Or
n nj
≤ UOr ∀j , … (13.4)
∑ vy
n
n nj
max Z1 =
u, ν ∑v y n n1
n
such that,
∑ u x = 1,
i
i i1
∑ v y − ∑ u x ≤ 0,
n
n nj
i
i ij
L ∑ v y − ∑ v y ≤ 0,
Or n n1 n n1
n n ∈Or
∑ v y − U ∑ v y ≤ 0,
n ∈Or
n n1 Or
n
n n1
L ∑ u x − ∑ u x ≤ 0,
Is i i1 i i1
i i ∈I s
∑ u x − U ∑ u x ≤ 0.
i ∈I s
i i1 Is
i
i i1
1. The data required for the evaluation indicators of each of the BSC
perspectives is suitably generated for 10 NPD projects. For gener-
ating values for each of the evaluation indicators (i.e., input and
output values), we assume a range for every evaluation indicators
and the same are presented in Table 13.4. Using the range defined
in Table 13.4, randomly generated values for each of the evaluation
indicators for the 10 NPD projects are given in Table 13.5.
2. In order to obtain weights for the evaluation perspectives and
respective evaluation indicators, we used AHP. For this, an appro-
priate questionnaire was prepared and project/product managers
were approached for their responses (in this study, we collected
An Integrated Multicriteria Decision-Making Model 339
TABLE 13.4
Range Considered for Generating Value for Each of the Evaluation Indicators
Range
Considered
Card Evaluation Evaluation for Indicator’s
Label Perspective Indicator Value
O1 Customer Customer satisfaction (O11) 5–10
perspective Customers’ trust (O12) 6–10
Degree of customer need met (O13) 6–10
O2 Internal business Employee satisfaction (O21) 6–10
perspective Supplier’s satisfaction (O22) 6–10
Project quality planning and tracking (O23) 7–10
Internal communication (O24) 5–10
Congruence (O25) 4–10
Priority level (O26) 5–10
O3 Learning and Propriety position (O31) 2–10
growth Platform for growth (O32) 2–10
perspective Technical and market durability (O33) 4–10
Team incentive (O34) 4–10
Knowledge accumulation (O35) 6–10
Project management maturity (O36) 4–10
O4 Innovation Technology newness (O41) 4–7
perspective Process newness (O42) 4–7
Market newness (O43) 4–7
O5 Sustainability Ecological (O51) 4–10
perspective Social (O52) 4–10
Economic (O53) 7–10
O6 Value contribution Project profitability(millions) (O61) 7–12
perspective Project speed-up cycle (years) (O62) 0.2–1
Product sales (thousand units) (O63) 20–50
O7 Uncertainty Operational–technical (O71) 0.6–0.9
perspective Organizational (O72) 0.5–0.9
(probability of Financial (O73) 0.7–0.9
success)
Marketing (O74) 0.7–0.9
I1 Resources Investments (millions) (I11) 70–100
Human resources (I12) –
Machinery and equipment (I13) –
TABLE 13.5
Project Wise, the Value of Evaluation Indicators
Project No.
Card Label 1 2 3 4 5 6 7 8 9 10
I11 73 82 96 87 75 78 96 89 83 91
O1 O11 6 5 7 7 8 6 8 7 9 8
O12 8 6 7 7 6 6 8 7 8 6
O13 8 6 7 8 9 8 7 8 9 7
O2 O21 7 7 6 6 8 7 8 6 8 7
O22 7 8 9 8 7 7 6 6 8 7
O23 7 8 9 9 8 8 7 7 7 9
O24 9 6 7 7 7 5 8 5 8 8
O25 4 6 8 10 5 6 7 4 4 8
O26 5 6 5 8 8 8 7 6 6 8
O3 O31 3 4 5 8 4 8 8 2 6 5
O32 3 5 5 8 9 9 10 5 2 6
O33 9 6 7 7 7 5 8 5 8 8
O34 4 6 8 10 5 6 7 4 4 8
O35 9 8 9 5 8 8 7 8 7 9
O36 7 4 5 8 4 9 8 4 6 5
O4 O41 4 6 7 6 5 6 5 5 5 6
O42 5 6 7 7 5 7 5 7 7 6
O43 5 6 7 5 5 6 7 6 5 7
O5 O51 7 6 5 9 5 8 8 7 8 8
O52 7 7 5 5 8 4 9 8 4 8
O53 7 7 8 7 8 8 8 7 7 8
O6 O61 8 9 11 10 12 11 10 12 8 9
O62 0.9 0.9 0.5 0.6 0.5 0.4 1 0.8 0.6 0.5
O63 25 24 29 38 45 35 40 42 35 44
O7 O71 0.7 0.6 0.9 0.8 0.8 0.6 0.7 0.7 0.9 0.9
O72 0.7 0.8 0.5 0.6 0.5 0.8 0.9 0.7 0.8 0.8
O73 0.7 0.8 0.8 0.7 0.9 0.7 0.8 0.9 0.7 0.8
O74 0.8 0.9 0.9 0.8 0.9 0.8 0.8 0.7 0.9 0.8
TABLE 13.6
Weights of the Evaluation Perspective and Evaluation Indictors
Card Evaluation First-Level Second-Level
Label Perspective Weights Evaluation Indicator Weights
O1 Customer 0.121 Customer satisfaction 0.0452
perspective Customer trust 0.0356
Degree of customers’ need met 0.0402
O2 Internal business 0.101 Employee satisfaction 0.00887
perspective Supplier’s satisfaction 0.00956
Project quality planning and tracking 0.04756
Internal communication 0.01548
Congruence 0.00157
Priority level 0.00896
O3 Learning and 0.096 Propriety position 0.00945
growth Platform for growth 0.00543
perspective Technical and market durability 0.04255
Team incentive 0.00923
Knowledge accumulation 0.02145
Project management maturity 0.00789
O4 Innovation 0.142 Technology newness 0.0756
perspective Process newness 0.0521
Market newness 0.0143
O5 Sustainability 0.068 Ecological 0.0149
perspective Social 0.0156
Economic 0.0375
O6 Value contribution 0.281 Project profitability 0.1457
perspective Project speed-up cycle 0.0478
Product sales 0.0875
O7 Uncertainty 0.191 Operational–technical 0.0145
perspective Organizational 0.0108
Financial 0.0895
Marketing 0.0762
I1 Resources 1 Investments 0.645
Human resources 0.143
Machinery and equipment 0.212
For generating the proposed integrated DEA–BSC model for any given
data, a LINGO set code has been developed and presented in Appendix B.
The data presented in Tables 13.5 through 13.7 can be given as input to the
LINGO set code and the proposed integrated DEA–BSC model can be gener-
ated for every project and solved using LINGO. Finally, the efficiency scores
of each of the projects along with relative ratings (rankings) are obtained and
the same are presented in Table 13.8. From Table 13.8, one can select the set of
projects for NPP based on the efficiency score.
342 Big Data Analytics Using Multiple Criteria Decision-Making Models
TABLE 13.7
Lower and Upper Balance Bounds of Evaluation
Perspectives of DEA–BSC Model
Lower Upper
Card Label Evaluation Perspective Bound Bound
O1 Customer perspective 0.1 0.7
O2 Internal business perspective 0.2 0.8
O3 Learning and growth perspective 0.2 0.8
O4 Innovation perspective 0.1 0.7
O5 Sustainability perspective 0.1 0.7
O6 Value contribution perspective 0.3 0.9
O7 Uncertainty perspective 0.12 0.72
TABLE 13.8
Project wise the Efficiency Score and Relative
Rating Yielded by the Proposed DEA–BSC Model
Project Efficiency Score Rating
1 0.7432 7
2 0.9221 4
3 0.6043 8
4 1.0000 1
5 0.9910 2
6 1.0000 1
7 0.7496 6
8 0.9452 3
9 1.0000 1
10 0.8457 5
13.6 Summary
As significance for the development of new products is rapidly increasing, it
becomes essential for an organization to have an effective and accurate deci-
sion-making process. In order to develop a successful new product, decision
maker/project manager needs to identify right set of new projects/products
and accordingly formulate a New Product Portfolio (NPP). Thus, the decisions
An Integrated Multicriteria Decision-Making Model 343
taken at the phase of PES play a significant role in NPPM performance. There
exist limited studies concentrating on (a) identifying factors/dimensions that
influence decision making, (b) development of decision-making model for
PES, and (c) improving the performance of NPPM. However, it is observed
through literature that NPPM is the weakest research area and identifica-
tion of explicit evaluative factors/dimensions is lacking (Cooper et al. 2001;
McNally et al. 2009). In this study, we address these gaps.
Accordingly, in this study, we have identified five different evaluative
dimensions (i.e., SF, PIB, Resource Allocation, CRE, and RUE) that are essen-
tial in the case of PES for NPPM. It is evident from literature that there exist
very limited number of studies that concentrate on development of MCDM
for PES in case of NPP. We make an attempt to employ all the evaluative
dimensions in the development of the MCDM model for NPPM. To the best
of our knowledge, this is the first study that considers all the five evaluative
dimensions simultaneously in the development of MCDM in this area.
We further briefed different methodologies employed for MCDM model
formulation in case of PES studies. From the discussion, it is identified that
there exist certain limitations in the existing methodologies. In present NPPM
scenario observed, it is essential to accommodate subjective data of a decision
maker into the developed methodology. This probed us to the development
of a methodology for NPPM in which qualitative and quantitative data can
be considered. Thus, we proposed to develop an integrated DEA–BSC model.
The methodology is based on relative evaluation of entities (projects or
portfolios), which is inspired by an integrated DEA–BSC model that was
first presented by Eilat et al. (2008). For this, the identified evaluative dimen-
sions are respectively reorganized into seven evaluation perspectives of BSC
index system. Accordingly, each evaluation indicators are measured using
different scales and metrics. The output of BSC is considered as input for
DEA, along with certain other inputs such as weights or priorities of perspec-
tives, obtained from AHP. Then, the proposed integrated DEA–BSC model
estimates the efficiency score of each and every project and ranks them
accordingly. Thus, the model proposed in this study provides clarity and
accuracy regarding the subjective data associated to PES decisions.
In future work, we intend to extend the model for hierarchical level of BSC.
We also intend to introduce an accumulation function that takes care of inter-
actions between resources, benefit functions, and output functions. Finally, we
propose to extend this model by introducing dynamic nature into the problem.
EXHIBIT 13.1
Hierarchical Network Diagram of BSC
O11: Customers’ satisfaction
O51: Ecological
O71: Operational–technical
O72: Organizational
O7: Uncertainty perspective
O73: Financial
O74: Marketing
An Integrated Multicriteria Decision-Making Model 345
EXHIBIT 13.2
Pairwise Comparison Matrix of Evaluation Perspectives
with Respect to NPPM Performance
O1 O2 O3 O4 O5 O6 O7
O1 1 1/4 6 4 1 1/3 1/5
O2 4 1 1/4 1/3 2 1/3 3
O3 1/6 4 1 2 5 1/3 1
O4 1/4 3 1/2 1 1 1/5 1/3
O5 1 1/2 1/5 1 1 1/5 1/3
O6 3 3 3 5 5 1 1/3
O7 5 1/3 1 3 3 3 1
EXHIBIT 13.3
Pairwise Comparison Matrix of Evaluation
Indicators with Respect to Uncertainty
Perspective
O61 O62 O63 O64
O61 1 2 1 1
O62 1/2 1 1/2 1/4
O63 1 2 1 2
O64 1 4 1/2 1
346 Big Data Analytics Using Multiple Criteria Decision-Making Models
EXHIBIT 13.4
Pairwise Normalized Matrix of Evaluation Perspectives with Respect to
NPPM Performance
O1 O2 O3 O4 O5 O6 O7 Priorities
O1 0.069 0.021 0.502 0.245 0.056 0.062 0.032 0.141
O2 0.277 0.083 0.021 0.020 0.111 0.062 0.484 0.151
O3 0.012 0.331 0.084 0.122 0.278 0.062 0.161 0.150
O4 0.017 0.248 0.042 0.061 0.056 0.037 0.054 0.074
O5 0.069 0.041 0.017 0.061 0.056 0.037 0.054 0.048
O6 0.208 0.248 0.251 0.306 0.278 0.185 0.054 0.219
O7 0.347 0.028 0.084 0.184 0.167 0.556 0.161 0.218
EXHIBIT 13.5
Pairwise Normalized Matrix of Evaluation
Indicators with Respect to Uncertainty Perspective
O61 O62 O63 O64 Priorities
O61 0.2 0.13 0.17 0.31 0.2
O62 0.2 0.13 0.17 0.08 0.14
O63 0.4 0.25 0.33 0.31 0.32
O64 0.2 0.5 0.33 0.31 0.34
EXHIBIT 13.6
Overall Priorities of Uncertainty Evaluation Indicators
with Respect to Uncertainty Perspective
Perspective Overall
Priority Priorities Priorities
O61 0.219 0.2 0.0436
O62 0.14 0.0305
O63 0.32 0.0698
O64 0.34 0.0741
λ max − n
CI =
n−1
CI
CR = ,
RI
An Integrated Multicriteria Decision-Making Model 347
Appendix B
A LINGO Set Code developed for generating the proposed integrated
DEA–BSC model:
MODEL:
! DEA-BSC MODEL FOR NPPM ;
! PLEASE NOTE: THIS CODE ISFOR ILLUSTRATION PURPOSE;
! Part of the sample data is presented in this code
SETS:
DMU/P1 P2 P3 P4 P5 P6 P7 P8 P9 P10/: ! 10 NEW PRODUCT PROJECTS;
SCORE; ! Each decision making unit in this case new
product project has a score to be computed;
FACTOR/I1 O11 O12 O13 O21 O22 O23 O24 O25 O26 O31 O32 O33 O34
O35 O36 O41 O42 O43 O51 O52 O53 O61 O62 O63 O71 O72 O73 O74/;
! ALL THE EVALUATION INDICATORS ARE CONSIDERED AS OUTPUT AND
INVESTMENTS AS INPUT;
DXF( DMU, FACTOR): F; ! F( I, J) = Jth factor of DMU I;
ENDSETS
DATA:
NINPUTS = 1; ! The first NINPUTS factors are inputs;
! The inputs, the outputs;
F = 73 6 8 8 -------- 0.7 0.8
82 5 6 6 -------- 0.8 0.9
96 7 7 7 -------- 0.8 0.9
87 7 7 8 -------- 0.7 0.8
75 8 6 9 -------- 0.9 0.9
78 6 6 8 -------- 0.7 0.8
96 8 8 7 -------- 0.8 0.8
89 7 7 8 -------- 0.9 0.7
83 9 8 9 -------- 0.7 0.9
91 8 6 7 -------- 0.8 0.8;
ENDDATA
!----------------------------------------------------------;
SETS:
! Weights used to compute DMU I’s score;
DXFXD(DMU,FACTOR) : W;
ENDSETS
DATA
W = 0.645
348 Big Data Analytics Using Multiple Criteria Decision-Making Models
0.0356
0.0402
0.00887
|
|
|
0.0108
0.0895
0.0762;
ENDDATA
!------------------------------------------;
SETS:
! Lower Bounds used to compute DMU I’s score;
DXFXA(DMU,FACTOR) : LB;
ENDSETS
DATA
LB= 0
0.1
0.1
0.1
0.2
|
|
|
0.12
0.12
0.12
0.12;
ENDDATA
!-------------------------------------------;
SETS:
! Upper Bounds used to compute DMU I’s score;
DXFXB(DMU,FACTOR) : UB;
ENDSETS
DATA
UB= 56
0.7
0.7
0.7
0.8
|
|
|
0.72
0.72
0.72
0.72;
ENDDATA
! The Model;
! Try to make everyone’s score as high as possible;
An Integrated Multicriteria Decision-Making Model 349
References
Abbassi, M., Ashrafi, M., and Tashnizi, E.S. 2014. Selecting balanced portfolios of
R&D projects with interdependencies: A cross-entropy based methodology.
Technovation 34 (1): 54–63.
Andrews, K.Z. 1996. Two kinds of performance measures. Harvard Business Review
74 (1): 8–9.
Asosheh, A., Nalchigar, S., and Jamporazmey, M. 2010. Information technology proj-
ect evaluation: An integrated data envelopment analysis and balanced score-
card approach. Expert Systems with Applications 37 (8): 5931–5938.
Ayağ, Z. and Özdemr, R.G. 2007. An analytic network process-based approach to
concept evaluation in a new product development environment. Journal of
Engineering Design 18 (3): 209–226.
Banker, R.D., Chang, H., and Pizzini, M.J. 2004. The balanced scorecard: Judgmental
effects of performance measures linked to strategy. Accounting Review 79 (1): 1–23.
Banker, R.D., Potter, G., and Srinivasan, D. 2000. An empirical investigation of an
incentive plan that includes non-financial performance measures. Accounting
Review 75 (1): 65–92.
Bhattacharyya, R., Kumar, P., and Kar, S. 2011. Fuzzy R&D portfolio selection of inter-
dependent projects. Computers & Mathematics with Applications 62 (10): 3857–3870.
Blundell, R., Griffith, R., and Van Reenen, J. 1999. Market share, market value and
innovation in a panel of British manufacturing firms. The Review of Economic
Studies 66 (3): 529–554.
350 Big Data Analytics Using Multiple Criteria Decision-Making Models
Brown, S.L. and Eisenhardt, K.M. 1995. Product development: Past research, pres-
ent findings, and future directions. Academy of Management Review 20 (2):
343–378.
Carbonell-Foulquié, P., Munuera-Alemán, J.L., and Rodrıguez-Escudero, A.I. 2004.
Criteria employed for Go/no-Go decisions when developing successful highly
innovative products. Industrial Marketing Management 33 (4): 307–316.
Chames, A., Cooper, W.W., and Rhodes, E. 1978. Measuring the efficiency of decision
making units. European Journal of Operational Research 2: 429–444.
Chames, A., Cooper, W.W., and Rhodes, E. 1989. Cone ratio data envelopment analy-
sis and multi-objective programming. International Journal of Systems Science
20 (7): 1099–1118.
Chan, S. and Ip, W. 2010. A scorecard-Markov model for new product screening deci-
sions. Industrial Management & Data Systems 110 (7): 971–992.
Chang, Y.T., Park, H.S., Jeong, J.B., and Lee, J.W. 2014. Evaluating economic and envi-
ronmental efficiency of global airlines: A SBM-DEA approach. Transportation
Research Part D: Transport and Environment 27: 46–50.
Chao, R.O. and Kavadias, S. 2008. A theoretical framework for managing the new
product development portfolio: When and how to use strategic buckets.
Management Science 54 (5): 907–921.
Chesbrough, H.W. and Teece, D.J. 2002. Organizing for Innovation: When Is Virtual
Virtuous? Harvard Business School Publishings, MA.
Chiang, T.-A. and Che, Z.H. 2010. A fuzzy robust evaluation model for selecting and
ranking NPD projects using Bayesian belief network and weight-restricted
DEA. Expert Systems with Applications 37 (11): 7408–7418.
Chien, C.-F. 2002. A portfolio-evaluation framework for selecting R&D projects. R&D
Management 32: 359–368.
Chiu, C.-Y. and Park, C.S. 1994. Fuzzy cash flow analysis using present worth crite-
rion. The Engineering Economist 39 (2): 113–138.
Cook, W.D. and Seiford, L.M. 1978. R&D project selection in a multi-dimensional
environment: A practical approach. Journal of the Operational Research Society
21: 29–37.
Cooper, R.G. 1994. Perspective third-generation new product processes. Journal of
Product Innovation Management 11 (1): 3–14.
Cooper, R.G., Edgett, S.J., and Kleinschmidt, E.J. 1997. Portfolio management in
new product development: Lessons from the Leaders-II. Research-Technology
Management 40 (6): 43.
Cooper, R.G., Edgett, S., and Kleinschmidt, E. 2001. Portfolio management for new
product development: Results of an industry practices study. R&D Management
31 (4): 361–380.
Cooper, R.G., Edgett, S.J., and Kleinschmidt, E.J. 2004. Benchmarking best NPD
practices-1. Research-Technology Management 47 (1): 31–43.
Crawford, C.M. and Di Benedetto, C.A. 2008. New Products Management. Tata
McGraw-Hill Education, New York, NY.
Duarte, B.P. and Reis, A. 2006. Developing a projects evaluation system based
on multiple attribute value theory. Computers & Operations Research 33 (5):
1488–1504.
Eilat, H., Golany, B., and Shtub, A. 2006. Constructing and evaluating balanced port-
folios of R&D projects with interactions: A DEA based methodology. European
Journal of Operational Research 172 (3): 1018–1039.
An Integrated Multicriteria Decision-Making Model 351
Eilat, H., Golany, B., and Shtub, A. 2008. R&D project evaluation: An integrated DEA
and balanced scorecard approach. Omega 36 (5): 895–912.
Feyzioğlu, O. and Büyüközkan, G. 2006. Evaluation of new product development
projects using artificial intelligence and fuzzy logic. International Conference on
Knowledge Mining and Computer Science 11: 183–189.
Fitzsimmons, J.A., Kouvelis, P., and Mallick, D.N. 1991. Design strategy and its inter-
face with manufacturing and marketing: A conceptual framework. Journal of
Operations Management 10 (3): 398–415.
Gates, W. 1999. Business @ the Speed of Sound. Warner Books, New York, NY.
Ghasemzadeh, F. and Archer, N.P. 2000. Project portfolio selection through decision
support. Decision Support Systems 29 (1): 73–88.
Graves, S.B., Ringuest, J.L., and Case, R.H. 2000. Formulating optimal R&D
portfolios. Research-Technology Management 43 (3): 47–51.
Griffin, A. and Hauser, J.R. 1996. Integrating R&D and marketing: A review and anal-
ysis of the literature. Journal of Product Innovation Management 13 (3): 191–215.
Gutjahr, W.J., Katzensteiner, S., Reiter, P., Stummer, C., and Denk, M. 2010. Multi-
objective decision analysis for competence-oriented project portfolio selection.
European Journal of Operational Research 205 (3): 670–679.
Halouani, N., Chabchoub, H., and Martel, J.-M. 2009. Promethee-md-2t method for
project selection. European Journal of Operational Research 195 (3): 841–849.
Ibbs, C.W., Reginato, J., and Kwak, Y.H. 2004. Developing project management capa-
bility: Benchmarking, maturity, modeling, gap analyses and ROI studies.
In: The Wiley Guide to Managing Projects, John Wiley & Sons, Inc., Hoboken, NJ,
pp. 1214–1233.
Kahraman, C., Büyüközkan, G., and Ateş, N.Y. 2007. A two phase multi-attribute
decision-making approach for new product introduction. Information Sciences
177 (7): 1567–1582.
Kaplan, R.S. and Norton, D.P. 1992. The balance scorecard—Measures that drive
performance. Harvard Business Review 70 (1): 71–79.
Kaplan, R.S. and Norton, D.P. 2001. Transforming the balanced scorecard from per-
formance measurement to strategic management: Part I. Accounting Horizons 15
(1): 87–104.
Kiranmayi, P. and Mathirajan, M. 2013. A theoretical framework for project evalu-
ation and selection in new product management. Proceedings of International
Conference on Sustainable Innovation and Successful Product Development for a
Turbulent Global Market, Indian Institute of Technology-Madras, Chennai, India.
Krishnan, V. and Ulrich, K.T. 2001. Product development decisions: A review of the
literature. Management Science 47 (1): 1–21.
Kumar, D.U., Saranga, H., Ramírez-Márquez, J.E., and Nowicki, D. 2007. Six sigma
project selection using data envelopment analysis. The TQM Magazine 19 (5):
419–441.
Kyparisis, G.J., Gupta, S. K., and Ip, C.-M. 1996. Project selection with discounted
returns and multiple constraints. European Journal of Operational Research
94 (1): 87–96.
Lee J.W. and Kim H.S. 2000. Using analytic network process and goal program-
ming for interdependent information system project selection. Computers &
Operations Research 27: 367–382.
Liao, S.-H. 2005. Expert system methodologies and applications—A decade review
from 1995 to 2004. Expert Systems with Applications 28 (1): 93–103.
352 Big Data Analytics Using Multiple Criteria Decision-Making Models
Loch, C.H. and Kavadias, S. 2002. Dynamic portfolio selection of NPD programs
using marginal returns. Management Science 48 (10): 1227–1241.
Lockett, G. and Stratford, M. 1987. Ranking of research projects: Experiments with
two methods. Omega 15 (5): 395–400.
Mahmoodzadeh, S., Shahrabi, J., Pariazar, M., and Zaeri, M.S. 2007. Project selec-
tion by using fuzzy AHP and TOPSIS technique. World Academy of Science,
Engineering and Technology 30: 333–338.
Mavrotas, G., Diakoulaki, D., and Caloghirou, Y. 2006. Project prioritization under
policy restrictions: A combination of MCDA with 0–1 programming. European
Journal of Operational Research 171 (1): 296–308.
McNally, R.C., Durmusoglu, S.S., Calantone, R.J., and Harmancioglu, N. 2009.
Exploring new product portfolio management decisions: The role of managers’
dispositional traits. Industrial Marketing Management 38 (1): 127–143.
Melachrinoudis, E. and Kozanidis, G. 2002. A mixed integer knapsack model for
allocating funds to highway safety improvements. Transportation Research Part
A: Policy and Practice 36 (9): 789–803.
Metaxiotis, K. and Liagkouras, K. 2012. Multi objective evolutionary algorithms for
portfolio management: A comprehensive literature review. Expert Systems with
Applications 39 (14): 11685–11698.
Mohanty, R.P., Agarwal, R., Choudhury, A.K., and Tiwari, M.K. 2005. A fuzzy ANP-
based approach to R&D project selection: A case study. International Journal of
Production Research 43 (24): 5199–5216.
Oh, J., Yang, J., and Lee, S. 2012. Managing uncertainty to improve decision-making
in NPD portfolio management with a fuzzy expert system. Expert Systems with
Applications 39 (10): 9868–9885.
Osawa, Y. and Murakami, M. 2002. Development and application of a new methodol-
ogy of evaluating industrial R&D projects. R&D Management 32 (1): 79–85.
Ozer, M. 1999. A survey of new product evaluation models. Journal of Product
Innovation Management 16 (1): 77–94.
Ozer, M. 2005. Factors which influence decision making in new product evaluation.
European Journal of Operational Research 163 (3): 784–801.
Patah, L.A. and De Carvalho, M.M. 2007. Measuring the value of project manage-
ment. PICMET 2007 Proceedings, Portland, OR, pp. 2038–2042.
Remera, D.S., Stokdykb, S.B., and Driel, M.V. 1993. Survey of project evaluation
techniques currently used in industry. International Journal of Production
Economics 32: 103–l15.
Roll, Y. and Golany, B. 1993. Alternate methods of treating factor weights in DEA.
Omega 21 (1): 99–109.
Ronkainen, I.A. 1985. Criteria changes across product development stages. Industrial
Marketing Management 14 (3): 171–178.
Saaty, T.L. 1977. A scaling method for priorities in hierarchical structures. Journal of
Mathematical Psychology 15(3): 234–281.
Schaffer, J.D. 1985. Some Experiments in Machine Learning Using Vector Evaluated Genetic
Algorithms. Vanderbilt Univ., Nashville, TN.
Seiford, L.M. 1996. Data envelopment analysis: The evolution of the state of the art
(1978–1995). Journal of Productivity Analysis 7: 99–137.
Thieme, R.J., Song, M., and Calantone, R.J. 2000. Artificial neural network decision
support systems for new product development project selection. Journal of
Marketing Research 37 (4): 499–507.
An Integrated Multicriteria Decision-Making Model 353
Ulrich, K.T. and Eppinger, S.D. 2004. Product Design and Development, 3rd edn.
McGraw Hill, NY.
Wang, J. and Hwang, W.-L. 2007. A fuzzy set approach for R&D portfolio selection
using a real options valuation model. Omega 35 (3): 247–257.
Wey, W.-M. and Wu, K.-Y. 2007. Using ANP priorities with goal programming in
resource allocation in transportation. Mathematical and Computer Modelling 46
(7): 985–1000.
Yahaya, S.-Y. and Abu-Bakar, N. 2007. New product development management issues
and decision-making approaches. Management Decision 45 (7): 1123–1142.
Yurdakul, M. 2003. Measuring long-term performance of a manufacturing firm using
analytic network process (ANP) approach. International Journal of Production
Research 41 (11): 2501–2529.
Index
355
356 Index
Value W
contribution perspective, 336
Warmth, 24
function, 35
Weather data, 124–125
theory, 35
sample data table for historical
trade-off, 51
weather, 126
Variable returns to scale (VRS), 306
Weighted objective problem, 31
Variables, 253–254
Work control document (WCD), 5, 6
Variance, 137
Worker type requirement, 131
Variety in big data, 75
Workforce capacity, restriction on, 131
Vector comparison, 51
Worst-case model error, 139–140
Velocity in big data, 75
Veracity in big data, 75
Vertical scalability, 75 X
Very short-term forecasting, 157 XtremeFS, 79
Vise Kriterijumska Optimizacija
I Kompromisno Resenje
Y
(VIKOR), 285
Volume in big data, 74–75 Yet Another Resource Negotiator
VRS, see Variable returns to scale (YARN), 81