Predictive Analytics
Predictive Analytics
Chapter 1
INTRODUCTION
In the recent years Big Data Analytics has emerged as an important area of
interest among practitioners and academicians. Exponential growth of digital devices,
penetration of internet, tablet computers and smart phones are spawning large volumes of
data round the clock. Contrary to traditional data, Big Data comes from variety of data
sources in different forms. The volume, variety and velocity of this data pose unique
challenges for those managing data centers. Nevertheless, computing, storage and
analysis capabilities have caught up to meet these challenges. Storage of large datasets
has become easy and economical.
Along with traditional business data, firms are realizing value from social media
data obtained from sites such as Twitter or Facebook. These mediums have exhibited
potential of gathering business intelligence required for designing competitive strategies.
In this paper, we have narrated different ways in which firms can derive intelligence
which helps business managers make informed decisions. This can translate into
improved ROI for business. This paper provides conceptual underpinnings about Big
Data, Predictive Analytics, applications of Big Data Analytics, challenges and
opportunities and further research direction. This field has a great potential to address
future challenges for business and society. It provides certain unique advantages
compared to statistical sampling method.
The present paper is organized as follows. The next section (section 2) provides
review of extant literature. In this section, we review literature on Big Data, sources of
data and Predictive Analytics. The third section provides concepts about Big Data
Analytics and further delves into Predictive Analytics. Fourth section will discuss
Opportunities and challenges dealing with Big Data and Predictive Analytics. Fifth
section will present conclusion followed by future research opportunities in this field.
.
1.1 RESEARCH OBJECTIVES
Driven by the need to further explore the role of big data and Predictive
Analytics, this paper acts to bridge the knowledge gap by achieving the following
objectives:
a) To explore the existing literature on the fundamental concepts of Big Data and
Predictive Analytics
b) To clarify the evolution and definitions of Big Data and Predictive Analytics
c) To explore the upcoming opportunities and challenges of Big Data and Predictive
Analytics
d) To identify gaps in existing research and identify further research directions on the role
of Big Data and Predictive Analytics
Figure 1: Year-wise classification of research papers on Big Data and Predictive Analytics
Chapter 2
LITERATURE REVIEW
In this section, we have discussed process of shortlisting research papers and
we present critical review of extensive literature on Big Data and Analytics. We are
presenting our findings from literature review below.
Over last several decades, information systems and internet have been major
enablers of globalization.From initial use of information systems for scientific
applications and departmental information systems,we have reached an era of “Smart
Phones” and “Internet of Things”. Prime purpose of early applications of information
systems was record keeping and efficient processing of business transactions. Since then,
several breakthroughs in computer science and engineering have led to information
revolution in last few decades. Chen et al. (2012) summarize this into 3 distinct phases as
narrated in table 1 below.
Phase Description
I – Till year 19 Database systems to collect, analyze and report structured data in R
99 DBMS systems
II – 2000- 201 Wide use of Internet, entry and growth of internet firms Yahoo, Goo
0 gle, Amazonetc., web based business applications, ecommerce, supp
ly chains
III – 2010- on Entry of smart phones, RFID, Sensor technologies, Internet of Thins
Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 4
wards
BIG DATA AND PREDICTIVE ANALYTICS
The era of Big data seem to have started around year 2000. Several developments
and trends led to the evolution of big data as depicted in figure 3 below. Big Data and
Analytics are the natural outcome of the above evolution process. This includes
advancements in computing hardware, digital storage capabilities, high speed software
solutions, internet and mobile technologies.
When compared with traditional data, Big Data differs not only on the size but also
in its form. It gets added continuously rather than relatively static data in the legacy or
ERP systems (Davenport, 2014).
All the diverse sources generate different forms of data which can be broadly
classified as Structured, Un-structured and Semi-structured data (Figure 4).
Structured Data: Source of structured data are organizational information systems such
as point of sales data, batch processes, ERP systems, extended enterprise systems such as
SCM and CRM systems. This data is organized into well-defined table structures in a
relational database. Traditional RDBMS systems use ETL tools and processes to extract,
transform and load the data into data warehouse.
Unstructured Data: Unstructured data originates from variety of sources such as social
media, text messages, emails, attachments, videos, images and sound files. In terms of
volume and velocity this is huge and accounts for over 80% of the large datasets. Analysis
and mining of this data is more challenging than structured data.
Semi-structured data: This originates from variety of different sources which is a mix of
structured and unstructured data. Various information systems have been in use for speed,
efficiency and accuracy of information exchange with stakeholders. Firms use emails for
communication, RFID technology for faster processing in logistics (Deng et al., 2010) or
sensor devices for tracking objects. Thus there are several sources of unstructured data -
emails, XML documents, server logs, communication log from RFID tags, GPS devices,
etc. There are some tags with IP address, date and time stamp, user information which is
structured. Besides, there are error messages, SQL statements, event logs which are in
unstructured text formats. Hadoop, HDFS (Hadoop Distributed File System) and
MapReduce provide technological framework to process large volumes of unstructured
data.
Volume: Volume of Big Data is quite large - Terabytes or Petabytes of data gets collected
in the span of few hours in business or social media databases. Amount of data.
is doubling every 40 months (Davenport, 2014). Number of mobile devices is increasing
at unprecedented rates. John Chambers of CISCO predicts that there will be over 40
billion wireless devices connected to internet in another 5 years (Embry, 2015). John
Sculley, a well-known business leader and ex-CEO of Apple foresees four exponential
technologies converging at high speeds to create next generation of digital age namely
cloud computing, internet of things, Big Data and mobiles (Embry, 2015). Convergence
of these 4 key technologies will lead to every higher volume of data at exponential rates.
customers visiting every week, generating revenue of more than 1300 million dollars.
These sales transactions lead to huge data trail across their supply chain. Social media is
even faster in terms of data generation.
Variety: Big Data originates from Variety of different sources: enterprise systems (such
as ERP), social media as well many other digital devices. This list includes text, video,
audio, location, date and time data, emails, sensors, RFID data, web applications, etc.
Data is in different structured or unstructured formats based on the source.
Veracity: Data needs Veracity – that is to understand how much percentage of data is
accurate. Value: Finally, businesses need to learn how we can design models to improve
business outcomes and derive “value” from Big Data. Obtaining value from the large
heterogeneous data leads to the success of any industry (Weber et al., 2014).
A. Arrival of internet accelerated the process of globalization and growth of global firms
as communication anywhere in the world is quick, economical and easy. Business
C. When it comes to Big Data, data flow is continuous and it comes from variety of
sources. There is no fixed source or structure to the data. Facebook records billions of
posts, likes, millions of photo uploads every hour (Kitchin, 2014).
D. The volume, variety and velocity of collection of data have far outstripped capacity of
manual analysis. In some cases it has even exceeded the capacity of conventional
databases. Analyzing such huge volumes of data require specialized technological
framework such as Hadoop, which is used by technology leaders such as Microsoft,
IBM and Oracle for managing Big Data (Chen et al., 2012)
Data Based on business volumes Very high, in petabytes and even more
Volu and extent of digitization
me
Variet Data source from database syst Besides data from business information
y of ems systems, text (emails,
Data documents), weblogs, sensors, RFID, etc.
Sourc
es
Analy Provide historical view, status Real-time, direct feedback from the
tics reports consumer, sentiment analysis, opinions
Pareto chart derived from classification of key literature on Big Data Analytics
(BDA) (Figure 2), indicates both industry and academic scholars have conducted studies
to tap the potential of BDA. There is no dearth of general articles explaining the
relevance, significance, challenges and opportunities of BDA.
Recent studies have investigated ways in which supply chain managers can
mine and derive value from BDA on structured and unstructured mix of data (Zhong et
al., 2015; Kitchin, 2014; Chae, 2015; Tan et al., 2015; Schoenherr and Speier‐Pero, 2015;
Hahn and Packowski, 2015; Sahay and Ranjan, 2008; Nair, 2012) or how social media
data can provide competitive intelligence or play role in brand promotion strategy (Kim et
al., 2016; He and Xu, 2016; Coursaris et al., 2016; Borra, and Rieder, 2014; Bell, 2012).
Besides, analytics studies have been conducted in the domains of HR (Lawler et al.,
2004), World Class Sustainable Manufacturing (Dubey et al., 2015), Process Analytics
(Vera-Baquero et al., 2015), Product Lifecycle Management (Li et al., 2015) and Cloud
Computing (Hashem et al., 2015). However, there is no study which attempts to
understand the role of Big Data and Predictive Analytics and how it is helping to add
value across different sectors. We aim to address this gap through the literature and
address the challenges in this paper. This gap has also helped us to move towards the
future directions in this field.
Chapter 3
Big Data Analytics has its roots in the earlier data analysis methodologies using
statistical techniques such as regression, factor analysis, etc. It includes data mining from
high speed data streams and sensor data to get real time analytics (Chen et al., 2012). It is
an interdisciplinary field which uses knowledge of computer science, data science,
statistics and mathematical models. It consists of a systematic process of capturing and
analyzing business data, developing a statistical model either to explain the phenomenon
(Descriptive Analytics), developing a model to predict future outcomes based on variable
inputs (Predictive Analytics) or developing a model to optimize or simulate outcomes
based on variations in inputs (Prescriptive Analytics). It leverages statistical techniques
such as regression, factor analysis, multivariate statistics and knowledge of mathematics
for developing equations (Dubey and Gunasekaran 2015).
Levalle et al., (2010) conducted an exploratory study on big data analytics and the
path from insights to value. They reported that with an improving technology there has
been an enormous collection of big data and researchers are still in the way for finding the
better ways to analyze these data so that they can reach to valuable information. In the
present era, researchers and people are not concerned with what happened or why it
happened commonly known as descriptive analytics but the main issue of concern is to
find out the answer of questions like what is happening in present and what is likely to
happen in the future commonly known as Predictive Analytics and what actions should be
taken to find out the optimal results basically known as Prescriptive Analytics. Therefore
business analytics can be classified into Descriptive, Predictive and Prescriptive Analytics
as explained in figure 6 below. We elaborate Predictive Analytics with further details in
the next section considering its significance for various stakeholders in the society and
business.
source and nature of different data, there are various analytics methods which support
data mining and statistical analysis techniques.
Text Analytics techniques derives real-time and meaningful information from
unstructured data sources such as documents, emails, web pages and social media. It is
being pursued in some of the emerging areas such as sentiment analysis, opinion mining
or for extracting information from text sources. (Chen et al., 2012). In the recent years,
soon after product launch, sentiment analysis with social media data provides early
indicators of consumer feedback about product.
As the data on social media is growing and it contains valuable information for
business firms, govrnments as well as NGOs it is being tapped for deriving value. It
requires a different process of data collection and analysis due to its large volume,
continuous flow and variety of data to arrive at meaningful information. We discuss this
in detail in the next sub-section.
Business intelligence obtained from social media can enable business analysts and
decision makers to develop market insights into consumer behavior, discover new
marketing ideas, improve customer satisfaction and finally improve ROI (Kim et al.,
2016).
In a competitive situation in the current era, firms use business data and external
information to support tactical and strategic decisions. The ability of a firm to make quick
and informed decisions differentiates itself from competitors in highly competitive
markets in the current era (Bose, 2008). As described earlier, predictive analytics and
social media analytics provide an opportunity to get first hand market intelligence.
Consumers provide instant feedback about products, services or movies on the social
media. This is a valuable source for firms to gather information about consumer
sentiments and opinions. There are several organizations tapping the value from Big Data
for improving customer satisfaction, tracking customer journeys to analyze customer
attrition or purchase decisions, identify supply chain risks, gather competitive intelligence
or for making pricing decisions (Davenport, 2014). There are numerous case studies of
effective use of Big Data Analytics as summarized below:
a) With the help of historical sales data in the hurricane affected regions, Walmart’s CIO
could predict higher level of demand for certain products just ahead of hurricane
Frances. MegaTelCo could predict customer churn and design strategies to minimize
them (Provost and Fawcett, 2013).
b) Film studios have seen staggering accuracy in the way tweets from the first showing of
new films to predict success of the films as well as success of its DVDs.
c) When dealing with large volumes of data in millions or billions, Big Data Analytics
can help to discover patterns and problems such as new forms of customer churn,
business opportunities such as new customer segments and sales prospects,
understanding customer behavior through clickstreams (Russom, 2011).
d) Big Data is already being used by businesses for developing market intelligence, by
governments for designing policies, by politicians for designing political campaigns, by
medical practitioners for smart health management. Some of the emerging research
areas in this field are Big Data Analytics, Text Analytics, Network Analytics and
Mobile Analytics (Chen et al., 2012).
Chapter 4
4.1 OPPORTUNITIES
Huge volume of data used to be a technological problem just few years ago, now it
presents an opportunity (Russom, 2011). Big Data provides many opportunities and
competitive advantages. Early mover Amazon.com started collecting customer
information, preferences, purchase history, search history and books reviews. Based on
this data, it provides product recommendations which motivates customer to buy similar
or related product, improving chances of additional purchases from the same customer.
Next generation retailers will be able to track behavior of individual customers and
develop models for prediction or influencers. Walmart makes use of “Social Genome”
that tracks connections between people, products, brands and other related entities. Social
Genome is used to make product recommendation to customers when they are online or
in store (Direction S., 2012). Big data provides several advantages over traditional
method of data collection such as drawing samples.
problems. Natural disasters have potential to disrupt crucial supply chains links.
Especially impact for countries like India impact is higher as the logistics system in India
is fragmented, infrastructure is inadequate and it consists of many small to large players
(Rai et al., 2015; Bag and Anand, 2015). Data collected from geographical locations,
weather and developing natural disasters (storms, floods, earth quakes, etc.) has its
direction application in real-time monitoring supply chain risks proactively - to prevent
disruptions and reactively - to investigate past events (Yin et al., 2016). Government of
Singapore uses the geolocation data from mobile phones to manage traffic during rush
hours. This information is used to predict real-time demand for transport services during
rush hours and to divert the taxies to those areas of the city. Citizens get real-time updates
about traffic, weather conditions through social media and revise their travel plans
accordingly. Netizens provide real-time updates about variety of events through social
media. Ecommerce companies Amazon, Flipkart make product recommendations based
on earlier purchases and search history, which leads sales of additional products.
large population size to gather detailed information and statistical analysis is conducted
on the sample. In case of big data, large datasets require sophisticated statistical and
computational methods for analysis (Fan et al., 2014). Hazen et al., (2014) acknowledge
the quality issues for Big Data in Supply Chain Management and suggest interdisciplinary
research to address data quality problems in the context of SCM and DPB.
4.2.3 Reliability
Another challenge is about Reliability of data. Most of the unstructured data is
often unreliable, prone to outages and losses. The data comes from different sources such
as social media, smart phones, emails or text messages (Boyd et al., 2012). In
manufacturing environment, data comes from heterogeneous sources such as information
systems and variety of sensors (Le and Pang, 2013). This makes data mining process
quite intense, requires mining through a large volume of unrelated data to arrive at small
piece of relevant and meaningful information. The process can be compared with finding
a needle in a haystack.
of data. Researchers must be able to account for the biases in their interpretation of data
(Behar and Gordon, 1996).
Chapter 5
Analytics strategy. Big data constitutes several other challenges like data life cycle
management, redundancy of data, analytical mechanism, confidentiality of data, energy
management, cooperation and data representation (Chen et al., 2014). However, with
strong leadership and willingness these can be overcome. Further, Big Data and
Predictive Analytics provide several opportunities to study, investigate and research in
different fields.
REFERENCES
[1]3PLs Investing Heavily in Big Data Capabilties to Ensure Seamless Supply Chain
Integration. 2014.
[3]Abbott, D. (2014). Applied Predictive Analytics: Principles and Techniques for the
Professional Data Analyst.John Wiley & Sons.
[7] Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., &Buyya, R. (2015).
Big Data computing and clouds: Trends and future directions. Journal of Parallel and
Distributed Computing, 79, 3-15.
[8] Bag, S., & Anand, N. (2015). Modelling barriers of sustainable supply chain
network design using interpretive structural modelling: an insight from food processing
sector in India.
[10]Batra, S. (2014). Big Data Analytics and its Reflections on DIKW Hierarchy.Review
of Management, 4(1/2), 5.
[11]Berg, W. F., Carlin, J. D., Kalmbach, M. T., & Schroeder, M. D. (2015). U.S. Patent
No. 8,989,067.
Washington, DC: U.S. Patent and Trademark Office.
[11]Bharathi, S. V., & Mandal, T. (2015). Prioritising and ranking critical factors for
sustainable cloud
[12]ERP adoption in SMEs. International Journal of Automation and Logistics, 1(3), 294-
316.
[13] Borra, E., & Rieder, B. (2014). Programmed method: developing a toolset for
capturing and analyzing tweets. Aslib Journal of Information Management, 66(3), 262-
278.
[15]Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a
cultural,technological, and scholarly phenomenon. Information, communication &
society, 15(5), 662-679.