0% found this document useful (0 votes)
35 views21 pages

1.20 Deep Learning in Business Analytics

The document discusses the expectations and reality of adopting deep learning in business analytics. While deep learning has advantages, its adoption is hindered by computational complexity, lack of infrastructure, lack of transparency, and skill shortage. Additionally, deep learning does not always outperform traditional machine learning methods on structured datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views21 pages

1.20 Deep Learning in Business Analytics

The document discusses the expectations and reality of adopting deep learning in business analytics. While deep learning has advantages, its adoption is hindered by computational complexity, lack of infrastructure, lack of transparency, and skill shortage. Additionally, deep learning does not always outperform traditional machine learning methods on structured datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Deep Learning in Business Analytics: A Clash of Expectations

and Reality

Marc Andreas Schmitt a, b, 1


a Department of Computer Science, University of Oxford, UK
b Department of Computer & Information Sciences, University of Strathclyde, UK

Abstract

Our fast-paced digital economy shaped by global competition requires increased data-driven
decision-making based on artificial intelligence (AI) and machine learning (ML). The benefits
of deep learning (DL) are manifold, but it comes with limitations that have – so far – interfered
with widespread industry adoption. This paper explains why DL – despite its popularity – has
difficulties speeding up its adoption within business analytics. It is shown – by a mixture of
content analysis and empirical study – that the adoption of deep learning is not only affected
by computational complexity, lacking big data architecture, lack of transparency (black-box),
and skill shortage, but also by the fact that DL does not outperform traditional ML models in
the case of structured datasets with fixed-length feature vectors. Deep learning should be
regarded as a powerful addition to the existing body of ML models instead of a “one size fits
all” solution.

Keywords: Deep Learning; Machine Learning; Business Analytics; Artificial Intelligence;


Digital Transformation; Digital Strategy

1 Introduction

1.1 Business Analytics

Companies operating in today’s world have to deal with global competition in an ultra-fast
marketplace (Davenport, 2018), and AI-enabled information management (Borges, Laurindo,
Spínola, Gonçalves, & Mattos, 2021; Collins, Dennehy, Conboy, & Mikalef, 2021; Duan,
Edwards, & Dwivedi, 2019; Verma, Sharma, Deb, & Maitra, 2021) is the key to navigating the
digital storm of the 21st century. The last decade was shaped by huge improvements in data
storage and analytics capabilities (Baesens, Bapna, Marsden, Vanthienen, & Zhao, 2016;
Henke et al., 2016). What started as the big-data revolution brought us the age of constant

1
Corresponding author. E-mail address: [email protected]
digital change, accelerating globalization, and the continuous move toward a digital world
economy (Davenport, 2018; Warner & Wäger, 2019).

We have entered the second wave of digital transformation and the deployment of advanced
analytics in the form of machine learning has become a necessity to survive and thrive in this
new environment where competitive advantage is mainly based on evidence-based or data-
driven decision making (Henke et al., 2016; Schmitt, 2020).

The function responsible for converting raw data into valuable business insights is called
business analytics. It is an interdisciplinary field drawing and combining expertise from
machine learning, statistics, information systems, operations research, and management
science (Sharda, Delen, & Turban, 2017). Business analytics constitutes a quite long chain of
different analytics, which includes descriptive, predictive, and prescriptive analytics (Delen &
Ram, 2018). ML operates mainly in the predictive sphere of business intelligence but has
started to incorporate prescriptive analytics as well (Bertsimas & Kallus, 2019).

1.2 Deep Learning

Amidst all this formed a new paradigm called deep learning (LeCun, Bengio, & Hinton, 2015)
which emerged out of earlier research on brain-inspired neural networks. DL is part of ML and
is one of the major technologies responsible for driving the current digital revolution (Agrawal,
Gans, & Goldfarb, 2019; Bughin et al., 2017). DL is capable of learning complex hierarchical
representations of data. It was able to outperform traditional methods and has predictive
capabilities that come close to or surpass human-level intelligence in different areas. The main
reasons for the breakthrough of DL stem from developments in three different areas
(Goodfellow, Bengio, & Courville, 2016): (1) Optimization algorithms allow the training of deep
neural networks (Hinton, Osindero, & Teh, 2006); (2) The era of “big data” increased the
amount of large structured, as well as unstructured data-sets, which are now ripe for harvesting
(Chen, Chiang, & Storey, 2012); and (3) hardware improvements, especially GPU’s made it
possible to train those highly power-hungry models with those huge data-sets. Stadelmann et
al. (2018) give a good summary of the current applications of DL across different domains.
When it comes to image recognition (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al.,
2015), NLP (Devlin, Chang, Lee, & Toutanova, 2018), and games (Silver et al., 2017; Vinyals
et al., 2019), DL is the go-to solution. Accurate performance for unstructured high-dimensional
data sets became only possible due to the advances of DL, which significantly enhances the
field of machine learning (Jordan & Mitchell, 2015) to tackle further use cases and take over
tasks that were initially only reserved for humans (Agrawal et al., 2019).
1.3 Adoption Speed of DL

Most analytics departments across the corporate value chain have traditionally been using
predictive statistics and machine learning models such as GLMs, CART, and ensemble
learning. Those models are vital tools to help with several analytics tasks that directly impact
the bottom line of firms and organizations (Siebel, 2019).

We have moved from fundamental progress to the application of deep learning in various
sciences, businesses, and governments (Lee, 2018; Stadelmann et al., 2018). Despite the
huge success of DL, a closer investigation of the current literature reveals that the adoption
rate for DL in business functions for analytic purposes is quite low.

Chui et al. (2018) analyzed 100 use cases to demonstrate the current deployment of AI/DL-
related models across industries and business functions compared to other models referred to
as traditional analytics. The result is that while the adoption of DL starts to increase, it seems
most units remain working with the older more established analytical models that have been
successful already years ago. McKinsey (Chui et al., 2018) also distinguishes departments that
have traditionally been using analytics as compared to departments that are foreign to
quantitative decision enablers. McKinsey draws a clear picture that shows that the only areas
where DL has been utilized so far are traditional analytics arms that have the natural
capabilities and skillsets in place to work with modern AI, while technology foreign departments
are reluctant to adopt DL models. But even in business units with traditionally strong links to
analytics – like risk management and insurance – the utilization of DL remains relatively low
and traditional models are still the go-to solution.

The latest paper on the topic confirmed this observation: “While deep learning is on the way to
becoming the industry standard for predictive analytics within business analytics and
operations research, our discipline is still in its infancy with regard to adopting this technology.”
Kraus et al. (2019) have analyzed several papers across the major journals and concluded that
DL does not prevail within business analytics functions as perceived due to the current hype
and job descriptions.

The main issues why it is not so easy to develop and deploy DL – especially for small to
medium-sized corporations – can be partially mapped to the three reasons why DL found its
breakthrough in recent years. The “content analysis” of the existing literature identified the
following bottlenecks when it comes to the adoption of DL in business analytics functions:

(1) Computational Complexity: The hardware necessary to train and validate DL models
on large datasets is tremendous, which makes infrastructure investments quite
expensive. This stands in large contrast to the question of whether the development
and implementation of those models will materialize and be reflected in a future value
increase (Bughin et al., 2017).
(2) Infrastructure: Companies need to be able to harvest a continuous flow of
unstructured data to capture the value from DL, which is difficult if the necessary “big
data” infrastructure is not in place (Bughin et al., 2017).
(3) Transparency: Another reason is the nature of DL itself. DL is mainly a black box,
which means it can predict correctly, but we lack a causal explanation of why it arrives
at a certain decision (Samek & Müller, 2019). This makes it problematic for industries,
which are subject to regulatory supervision.
(4) Skill Shortage: Talent is required to implement those models as well as subject matter
expertise to define use cases (Henke et al., 2016). The current supply and demand gap
for ML experts makes it difficult for small- and medium-sized corporations to utilize
advanced AI.

Nevertheless, what many studies about the adoption of DL in business analytics seem to ignore
is its general value contribution, which should come in the form of improved prediction
accuracy. DL must make a business case for itself to justify its adoption, but this is not always
given. Several standalone studies comparing the predictive ability of deep learning against
traditional machine learning methods on structured data sets have concluded that DL does not
outperform tree-based ensembles (Addo, Guegan, & Hassani, 2018; Hamori, Kawai, Kume,
Murakami, & Watanabe, 2018). This stands in contrast to the claim that DL offers performance
improvements across the board as indicated by Kraus et al. (2019) and also to the general
assumption that DL needs to be adopted in every business function (Chui et al., 2018). While
the success of DL for unstructured data problems such as image recognition and NLP is
beyond doubt, the reality of DL for structured data within companies’ business analytics
functions is less clear and is the main focus of this article. Structured data with fixed-length
feature vectors are vastly present in relational databases and standard business uses cases.

1.4 Contributions

Comments such as “DL can be a simple replacement of traditional models” are too general
and not always true. For structured data, tree-based ensembles as gradient boosting seem to
be at least on par with DL across different domains. In support of this claim, an empirical test
using three case studies based on real-world data is presented. Concrete, this paper will
contribute to the current body of literature in the following ways: (1) DL is compared to
traditional machine learning models such as GLMs, Random Forest, and Gradient Boosting
based on three real-world use cases within the context of business analytics to verify the
assumption that DL does not outperform traditional methods on structured datasets. (2) A
discussion on the bottlenecks of DL identified during the “content analysis” taking into account
the findings of the empirical study including managerial recommendations will be given. (3) In
the end, a roadmap for future research possibilities for deep learning and business analytics is
presented.

This article is structured as follows: Section 2 introduces the machine learning models used in
this study - logistic regression, random forest, gradient boosting, and deep learning. Second,
the experimental design is presented, which includes an explanation of the dataset,
preprocessing steps, and the software setup. In section 3, the numerical results from the three
case studies based on real-world data/business problems are presented. All three case studies
show that in the case of structured data (tabular data) DL does not have a performance
advantage over the tree-based ensembles random forest and gradient boosting machine.
Section 4 gives a discussion of the technical implications of these results, highlights managerial
implications, and suggestions for future research while section 5 concludes with a summary.

2 Methods and materials

2.1 Machine learning

This part gives an overview of predictive analytics and the ML models used in the experiment.
The ML models used and compared in this experiment are Logistic Regression (LR), Random
Forest (RF), Gradient Boosting Machine (GBM), and Deep Learning (DL). For a
comprehensive treatment of the underlying theory, it is referred to (Hastie, Tibshirani, &
Friedman, 2009) and (Murphy, 2012) for ML and to (Goodfellow et al., 2016) for DL.

2.1.1 Logistic regression

The Logistic Regression (LR) belongs to the big family of generalized linear models (GLMs).
GLMs are characterized by taking as input a linear combination of features and linking them to
the output with the help of a function where the output has an underlying exponential probability
distribution like the normal distribution or the binomial distribution (Murphy, 2012). The LR is
the standard method for binary classification and is widely used in academia and industry. A
linear combination of inputs and weights is calculated and applied by feeding 𝑤 𝑇 𝑥 into the logic
or sigmoid function represented by
𝑇𝑥
𝑇
1 𝑒𝑤 (1)
𝑠𝑖𝑔𝑚(𝑤 𝑥 ) = 𝑇 = 𝑇 .
1+ 𝑒 −𝑤 𝑥 𝑒𝑤 𝑥 +1

The sigmoid function restricts the range of the output to be in the interval [0, 1].

2.1.2 Random forest

The recursive partitioning algorithms Random Forest (RF) is part of the family of ensemble
methods and operates very similar to decision trees with bagging. Bagging (Breiman, 1996)
chooses randomly different M subsets from the training data with replacement and averages
these estimates. The random forest creates different decision trees and averages the results
in the end to reduce the variance of the prediction model (Murphy, 2012). It is one of the most
potent ML algorithms for classification and regression tasks out there.

2.1.3 Gradient Boosting

Boosting is like bagging but builds models in a sequential order instead of averaging different
results. The idea of boosting is to start with a weak learner that gradually improves by
correcting the error of the previous model at each step. This process improves the performance
of the weak learner and moves gradually towards higher accuracy. The most common model
used for boosting is a decision tree. There are several different Gradient Boosting (GM)
implementations out there. This paper uses the gradient boosting version implemented by
(Malohlava & Candel, 2019) which is based on (Hastie, Tibshirani, & Friedman, 2017). Gradient
boosting is one of the strongest prediction models for structured data currently available.

2.1.4 Deep Learning

Deep Learning comes with many architectures as feed-forward artificial neural networks
(ANN), Convolutional neural networks (CNNs), as well as Recurrent Neural Networks (RNNs).
The best architecture for transactional (tabular) data, which are not sequential – as in this study
– is a multi-layer feedforward artificial neural network. Other, more complex architectures as
RNNs do not possess any advantage in those cases (Candel & LeDell, 2019). The architectural
graph of a feed-forward neural network can be seen in figure 1. The first column represents
the input features and is called the input layer. The last single neuron represents the output to
where the final activation function is applied to. The two layers in the middle are called hidden
layers. In case the neural network has more than one hidden layer it is called a deep neural
network. A deep learning model can consist of several hidden layers and is trained with
stochastic gradient descent and backpropagation (Goodfellow et al., 2016).
Figure 1. The deep learning model used in this experiment is called a feedforward artificial neural
network as the signal flow through the network evolves only in a forward direction. It is the most
appropriate choice for problems based on structured datasets as used in this study. It contains one
input as well as one output layer and various hidden layers. At each node, a linear combination of
input variables and weights are fed into an activation function to calculate a new set of values for the
next layer.

A standard neural network operation consists of multiplying the input features by a weight
matrix and applying a non-linearity (activation function). Input variables 𝑋𝑖 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) are
fed into the neural network, weights 𝑊𝑖 = (𝑊1 , 𝑊2 , … , 𝑊𝑛 ) are added to each of the inputs and
a linear combination of ∑ 𝑋𝑖 𝑊𝑖 = 𝑤 𝑇 𝑥 is calculated. This linear combination plus the bias term
or interceptor serves as input for the activation function to calculate the output Y, which serves
either as input for the next layer or represents the final output/prediction. A neural network is
trained with stochastic gradient descent and backpropagation.

Applying a non-linearity in the form of an activation function is essential for neural networks to
be able to learn complex (non-linear) representations of the input datasets. The activation
function transforms the output at each node into a nonlinear function.

This study will build two different DL classifiers using the following activation functions for the
hidden layers:

• the rectified linear unit (ReLU): 𝑔(𝑧) = max(0, z) ∈ [0, ∞),

• the Maxout function: 𝑔(𝑧) = max(𝑤 𝑘 𝑧 + 𝑏 𝑘 ) ∈ (−∞, ∞), 𝑘 ∈ {1, … , 𝐾} .


As the scope of the research is binary classification of structured data the output activation
1 𝑒𝑧
function used is the sigmoid function 𝑠𝑖𝑔𝑚(𝑧) = = ∈ [0,1] in line with the binary
1+ 𝑒 −𝑧 𝑒 z +1

cross-entropy loss function.

2.2 Experimental Design

2.2.1 Data and Preprocessing

This experiment is based on three datasets. To facilitate reproducibility and comparability the
chosen data sets are all publicly available and can either be downloaded from the UCI machine
learning repository or from the public machine learning competition site “Kaggle”, which
regularly offers access to high-quality datasets for experimentation. See table 1 for an overview
of the case studies/datasets used in this study.

Table 1. Description
Table X. DescriptionofofDatasets
Datasets

Business Area Observations Description

Total y=0 y=1 Balanced* Features

Prediction whether a customer is


Credit Risk 30,000 23,364 6,636 6636/6636 23
going to default on their loan payment

Prediction whether a policy holder will


Insurance Claims 595,212 573,518 21,694 21694/21694 57 initiate an auto insurance claim in the
next year
Prediction whether a targeted
Marketing/Sales 45,211 39,922 5,289 5289/5289 16 customer will open a deposit account
after a direct marketing/sales effort
*For the purpose of this study random under-sampling was used to bring the datasets in a balanced state

Credit Risk: The first dataset represents payment information from Taiwanese credit card
clients. It consists of 30,000 observations, of which 23,364 are good cases and 6,636 are bad
cases (flagged as defaults). Each observation contains 23 features including a binary response
column for the default information of the credit cardholder. The features within the dataset
contain mainly historical payment information, but also demographic information such as
gender, age, marital status, and education. 2

Insurance Claims: The second dataset represents information about automotive insurance
policyholders. It consists of 595,212 observations, of which 573,518 are non-filed and 21,694

2
The “Credit Risk” dataset can be accessed here:
https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
are filed claims. Each observation contains 57 features including a binary response column
that indicates whether or not a particular policyholder has filed a claim. 3

Marketing and Sales: The third dataset stems from a retail bank and represents customer
information for a direct marketing campaign. It consists of 45,211 observations, of which 39,922
were unsuccessful and 5,289 were successful (resulting in a sale). Each observation contains
16 features including a binary response column indicating whether or not the person ended up
opening a deposit account with the bank following the direct marketing effort. 4

The experiment required several adjustments. All three datasets are highly unbalanced. For
this study, random under-sampling was used to bring the good as well as the bad cases into a
state of equilibrium. This can also be seen in table 1. Example: If highly unbalanced datasets
with a ratio of 90:10 are trained it is very easy for the classifier to reach an accuracy of 90% by
simply going for the positive observations in all cases. To counter this naturally occurring
gravitation towards the majority class resampling is used to better gauge the predictive ability
of the classifiers. One drawback of under-sampling might be a loss of information, but can be
neglected as the major purpose of the dataset is to benchmark the introduced ML classifiers.

Before model construction can take place, several other common preprocessing steps have
been performed. A required procedure in ML during preprocessing is to transform categorical
values into a numerical representation. Especially the “Case Study 3 – Marketing and Sales”
contains predominately categorical strings. Where necessary categorical features were
transformed into factor variables with a method called one-hot encoding. H2O has a parameter
setting called one_hot_explicit, which creates N+1 new columns for categorical features with
N levels.

For this experimental study, all three datasets are separated into a training set and a test set
with a proportion of 80:20. To tune the model parameters the training set will be further divided
into different training and validation sets using a method called cross-validation during the
construction of the classifiers. Cross-validation is used to increase the generalization ability of
the classifiers to unknown data and to avoid overfitting. This study uses 5-fold cross-validation.

Model tuning in ML is a highly empirical and interactive process and is essentially based on
trial and error. The methods commonly used to help with automating the model tuning process

3
The “Insurance Claims” dataset can be accessed here: https://www.kaggle.com/c/porto-seguro-safe-driver-
prediction/data
4
The „Marketing/Sales dataset can be accessed here: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
are grid search and random search. Grid search automatically trains several models with
different parameter settings over a predefined range of parameters. Overall, this does not
change the basic necessity of trying out different combinations of parameters that allow the
classifier to adjust adequately to the underlying dataset. This study used a random search,
selective grid search, and manual adjustments to arrive at the final parameter settings.

The four performance evaluation measures (Flach, 2019) used in this study are AUC,
Accuracy, F-score, and LogLoss.

2.2.2 Software

Data preparation and handling are managed in RStudio, which is the integrated development
environment (IDE) for the statistical programming language R (R Core Team, 2019). R is one
of the go-to languages for Data Science research as well as prototyping in practice. The
machine learning models in this paper are developed with H2O, which is an open-source
machine learning platform written in Java and supports a wide range of predictive models
(LeDell & Gill, 2019). This makes experimentation and research easier. The high abstraction
level allows the idea and the data to become the central part of the problem and helps to reduce
the effort required to reach a solution. Also, H2O has the advantage of speed as it allows us
to move from a desktop- or notebook-based environment to a large-scale environment. This
increases performance and makes it easier to handle large data sets. R is connected to H2O
by means of a REST API (Aiello, et al., 2016).

3 Numerical Results

In this section, three different case studies: Credit risk, insurance claims, and marketing and
sales are presented to demonstrate that deep learning while being promoted as a superior ML
solution has difficulties beating traditional machine learning methods in some cases. Concrete,
logistic regression, random forest, gradient boosting machine, and two different deep learning
classifiers were trained on each dataset. The first DL model was built with the ReLU activation
function whereas the second DL model was built with the Maxout activation function. The ReLU
activation function is widely used and has shown to be superior in terms of accuracy and
computational speed. The Maxout activation function has been developed to improve
classification accuracy in combination with dropout (Goodfellow, Warde-Farley, Mirza,
Courville, & Bengio, 2013; Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014)
and is hence the second choice for this experiment. Several hyper-parameters were adjusted
during the model training process to improve the performance measured by the evaluation
metrics AUC, Accuracy, F-score, and LogLoss.
3.1 Case Study 1: Credit Risk

Numerical results for the credit risk business case to accurately predict the default category of
an applicant. The performance of deep learning is compared to traditional machine learning
classifiers via the four evaluation matrices AUC, Accuracy, F-score, and LogLoss. The best
performance is highlighted in bold.

Table 2.
Table X.Numerical
Numericalresults
resultsfor
forCase
CaseStudy
Study11- -Credit
CreditRisk
Risk

Method Out-of-Sample Performance

AUC Accuracy F-score Logloss

Logistig Regression 0.712 0.671 0.653 0.623


Random Forest 0.773 0.711 0.688 0.572
Gradient Boosting Machine 0.774 0.712 0.691 0.572
Deep Learning + ReLU 0.760 0.700 0.646 0.592
Deep Learning + Maxout 0.762 0.703 0.687 0.599

Table 2 shows clearly that GBM has the best overall performance with the highest AUC,
Accuracy, and F-score of 0.774, 0.712, and 0.691 respectively, including a LogLoss of 0.572.
RF comes as a close second with an AUC of 0.773 and the same LogLoss as GBM of 0.572.
Both ensemble models achieve a better performance in the case of the credit risk dataset than
the two DL models with an AUC of 0.760 and 0.762 respectively. The DL + Maxout model has
a slightly higher AUC compared to the DL + ReLU, whereas the LogLoss is reversed, which
results in a similar performance for the two DL models.

A graphical presentation of the results of each model sorted by the evaluation measure can be
found in figure 2. The best performing model GBM is highlighted via a callout text field, which
shows the performance of each evaluation metric.
Figure 2. Graphical representation of the performance of each classifier for all 4 performance
evaluation metrics for case study 1 - credit risk. Gradient Boosting Machine (GBM) achieves the
highest accuracy according to those results.

3.2 Case Study 2: Insurance Claims

In table 3 the numerical results for the insurance case study are presented. The goal is to
accurately predict whether a policyholder is going to file an insurance claim within the next
year. The performance of deep learning is compared to traditional machine learning classifiers
via the four evaluation matrices AUC, Accuracy, F-score, and LogLoss. The best performance
is highlighted in bold.
Table X. Numerical results for Case Study 2 - Insurance Claims
Table 3. Numerical results for Case Study 2 - Insurance Claims

Method Out-of-Sample Performance

AUC Accuracy F-score Logloss

Logistig Regression 0.629 0.594 0.586 0.667


Random Forest 0.636 0.598 0.584 0.667
Gradient Boosting Machine 0.640 0.602 0.588 0.664
Deep Learning + ReLU 0.628 0.597 0.540 0.670
Deep Learning + Maxout 0.633 0.597 0.534 0.669

The results of table 3 are similar to the first case study. GBM is the clear winner in terms of
performance with the highest AUC, Accuracy, and F-score of 0.640, 0.602, and 0.588
respectively, including the lowest LogLoss of 0.664. RF takes the second place with an AUC
of 0.773 and a LogLoss of 0.664. Both ensemble models achieve a better performance in the
insurance case study than the two DL models. The DL + Maxout model with an AUC of 0.633
has a slightly higher AUC compared to the DL + ReLU with an AUC of 0.628.
A graphical presentation of the results of each model sorted by the evaluation measure can be
found in figure 3. The best performing model (Gradient Boosting) is highlighted via a callout
text field.

Figure 3. Graphical representation of the performance of each classifier on all 4 performance


measures for case study 2 - insurance claims. Also, in the second case study, Gradient Boosting
Machine (GBM) achieves the highest prediction accuracy.

3.3 Case Study 3: Marketing and Sales

Table 4 shows the numerical results for the marketing and sales case study to accurately
predict successful conversions based on a direct marketing effort. The performance of deep
learning is compared to traditional machine learning classifiers via the four evaluation metrics
AUC, Accuracy, F-score, and LogLoss. The best performance is highlighted in bold.
Table X. Numerical results for Case Study 3 - Marketing and Sales
Table 4. Numerical results for Case Study 3 - Marketing and Sales

Method Out-of-Sample Performance

AUC Accuracy F-score Logloss

Logistig Regression 0.918 0.839 0.845 0.377


Random Forest 0.940 0.879 0.888 0.320
Gradient Boosting Machine 0.940 0.878 0.886 0.299
Deep Learning + ReLU 0.930 0.861 0.877 0.328
Deep Learning + Maxout 0.930 0.857 0.865 0.336

Based on table 4 the results for the third case study are slightly different from case studies one
and two. GBM shares the maximum AUC of 0.940 with RF. The RF classifier has also a slightly
higher Accuracy of 0.879, and also a higher F-score of 0.888 while GBM has still the lowest
LogLoss, which indicates the highest prediction reliability across the models. In line with
previous results, both ensemble models achieve a better performance than the two DL models,
which have both an AUC of 0.930. LR underperforms all classifiers by a significant amount.

A graphical presentation of the results of each model clustered by the evaluation measure can
be found in figure 4. It can be seen that GBM and RF perform better than the two DL models
across all performance measures while logistic regression turns out the be the weakest
classifier.

Figure 4. Graphical representation of the performance of each classifier on all 4 performance


measures for case study 3 – marketing and sales. Gradient Boosting Machine (GBM) is again the
winner, but the results are less significant than before and Random Forest (RF) achieves a very similar
performance.

4 DL in Business Analytics: A Reality Check

4.1 Discussion of Results

To better understand the utility of Deep Learning for Business Analytics it was benchmarked
against traditional ML models such as GLMs, Random Forest, and Gradient Boosting Machine.
Based on the four evaluation measures AUC, Accuracy, F-score, and LogLoss. The empirical
results of all three case studies presented (Credit Risk, Insurance Claims, Marketing and
Sales) suggest that DL does not have a performance advantage for classification problems
based on structured data. Instead, the results are strongly in favor of tree-based ensembles as
Random Forest and Gradient Boosting. GBM turns out to be the model with the highest utility
for the type of problems analyzed in this study.
Kraus, et al. (2019) benchmarked several baseline models against their proposed embedded
DNN model, which resulted in superior performance for DL. The authors recommend fostering
the adoption of DL models within the field of Business Analytics and operations research. While
the paper of Kraus et al. (2019) is an excellent overview of DL for Business Analytics and very
insightful, the analysis does not include GBM as a baseline model in the comparison, which is
widely used and known to deliver strong and robust predictions on structured datasets.

Case study two in this study uses the same dataset as Kraus et al. (2019) and according to
the results of the empirical results is GBM at least on par with the proposed deep architecture
by Kraus et al. (2019). Other studies by Hamori et al. (2018) and Addo et al. (2018) included
tree-based ensembles as gradient boosting and came to the same conclusions as this study.
As the findings of this study are in line with several papers comparing the performance of DL
against other ML models there is strong evidence that tree-based methods (GBM as well as
Random Forest) do indeed outperform DL models (different configurations have been tested)
on most problems containing structured data. Also, DL has several weaknesses as
computational complexity, huge data requirements, transparency issues, and needs highly
skilled labor, which makes it often difficult to develop and deploy those models at scale.
Especially the computational complexity issue results in significantly longer training and
validation times compared to all other ML models.

The results strongly suggest that GBM can be seen as the go-to model for most business
analytics problems. It is fast, not too complex, and delivers for use cases based on structured
data the best performance currently available. The results are clear and business analytics
experts should carefully consider the type, characteristics, and volume of the data at hand to
make a final decision about the correct model choice.

4.2 Managerial Implications & Digital Strategy

It has been proven that data-driven or evidence-based decisions are superior compared to
pure intuitive business decisions and a comprehensive analytics strategy has become
necessary for businesses across all industries to capture value at the bottom line. One of the
challenges associated with becoming a digital enterprise is how exactly to leverage digital
technologies and especially advanced analytics and AI. Current discussions about AI and
digital strategy are strongly focused on the applications of DL, but this is not the best way to
approach digital transformation. This focus resulted in the problematic assumption that DL
adoption in business by itself can be regarded as a benchmark – thereby ignoring the question
of utility that always needs to be asked before the deployment of any new method or
technology.

The main explanation why DL has not found its way into the different business functions as
expected is often explained by computational complexity, lacking big-data infrastructure, the
non-transparent nature of DL (black-box), and a shortage of skills. But as was demonstrated
in this paper, an additional explanation for the lack of adoption in certain business analytics
functions is that DL does not have performance advantages over traditional analytics when it
comes to structured data use cases.

For example, many departments that have been utilizing advanced analytics as risk
management are perfectly capable of developing and deploying a DL model as the required
skillset is identical. Also, the necessary infrastructure to leverage DL in these departments
should be in place. The usually described problems are not the only reasons. The problem is
that DL does not offer any advantage over certain tree-based ensembles for the data present
in those departments. Also, the disadvantages of speed and transparency are still present,
which makes it, in fact, unreasonable to use DL instead of traditional analytics. DL should be
viewed as a valuable addition to the current body of ML that offers the possibility to create new
use cases based on its strength instead of forcefully replacing models that are equally powerful
and can easily coexist within advanced analytics.

This realization triggers the second argument, which is related to the nature of the underlying
dataset. The kind of data present in problems faced within business analytics can largely be
divided into three groups (Chen et al., 2012): (1) Structured data from relational database
management systems (DBMS), (2) unstructured data, which stem mainly from web-based
activities (Social Media Analytics, etc.), and (3) sensor- and mobile-based content, which is
largely untouched when it comes to research activities. Many problems in business analytics
are indeed based on structured datasets and given that most business functions utilize exactly
those kinds of data it should not come as a surprise that DL remains a rather scarce ML
algorithm to support their decision making.

The era of big data has brought tremendous amounts of data within a single data set across
several domains, which fulfills the requirement of empirical prediction based on deep learning.
However, it is important to differentiate and use DL models mainly in line with their strength,
which is the usage of vast unstructured datasets, which posed significant problems for
traditional analytics. ML overall has been recognized as a general-purpose technology (GPT)
for decision making, which has just started to infuse our economy with the ability to replace
mental tasks that were traditionally only reserved for humans (Agrawal et al., 2019). It has also
the potential to create completely new business models (Siebel, 2019). Finding use cases that
are in line with the strength of DL would help to foster the adoption of DL in business analytics.
And the major strength is unprecedented accuracy on unstructured datasets. Traditional ML
models reach a performance plateau quite early and further data are not helpful to increase
accuracy. DL has here an advantage as it gains predictive power with every additional data
point (Ng, 2018). This makes DL extremely scalable and future proof, especially since
hardware power and the amount of available data will increase continuously over the years.
Also, DL eliminates the need for extensive feature engineering as this was usually present in
the preprocessing stage of data mining and predictive analytics tasks (LeCun et al., 2015). The
time required for preparing data sets often amounts to 80% to 90% of overall task completion
and is one of the major reasons why further advances in DL would indeed be welcoming news
for all analytics functions. Overall, management and practitioners responsible for digital
strategy and transformation should avoid seeing DL as a simple replacement or enhancement
of existing tools for predictive analytics tasks, but as an opportunity to develop new application
areas and use cases for business analytics based on the strength of DL – which are predictions
based on vast amounts of unstructured data.

4.3 Future Research

The following four key areas could be identified where further research is necessary to increase
the utility and hence the adoption of DL in business analytics.

(1) Future research in business analytics could focus on identifying currently non-existing uses
which are in line with the strength of DL. Due to its ability to handle huge amounts of
unstructured data DL is in terms of future possibilities and new use cases more interesting than
traditional analytics. DL possesses the ability to create completely new business models and
ways of value generation. (2) Enhancing the prediction accuracy of DL for structured data
would be a game-changing development for neural networks. DL has several advantages over
traditional methods but has in its current capacity difficulties reaching the performance and
accuracy levels of tree-based ensembles as Random Forest and GBM for predictions on
structured data. A simple replacement makes hence no sense unless further research in this
area realizes performance improvements for DL on structured classification tasks.
Developments as dropout (Srivastava et al., 2014) and the Maxout activation function
(Goodfellow et al., 2013), which were specifically developed to tackle classification problems
are going in this direction, but as shown above, are not enough to reach accuracy levels to
justify the replacement of tree-based ensemble models as RF or GBM. Further research could
focus on enhancing the ability of DL models to consistently surpass traditional ML models. This
would be a significant development, which could result in the extinction of all other ML models.
(3) Another issue – especially in light of the skill shortage – is that hyperparameter tuning can
be a quite complex undertaking requiring the right talent. A recent development are automated
machine learning solutions called “AutoML”, which have started to gain traction and are an
interesting field of research that can help to further democratize the use of DL models. (4) As
this study was in its core only concerned with binary classification it is important to extent it
with tests on multiclass classification and regression. Especially regression is relevant for
finance and insurance due to the presence of financial times series data in those fields. Several
studies have shown that Deep Learning architectures such as recurrent neural networks (RNN)
and long short-term memory (LSTM) are strong candidates for time series data in finance and
offer superior performance (Fischer & Krauss, 2018).

5 Conclusion

The progress and breakthroughs achieved by DL are undeniable as can be witnessed by a


vast array of new real-world applications all around us. Despite this fact, the adoption rate and
hence diffusion across business analytics functions has been lacking behind. This study
employed a mix of content analysis and empirical study to explain the current lack of adoption
of DL in business analytics functions. The content analysis suggested that the lack of adoption
across business functions is based on the four bottlenecks computational complexity, no
existing big-data architecture, lack of transparency/black-box nature of DL, and skill shortage.
The empirical study based on three real-world case studies revealed that DL does not offer –
as widely assumed – any performance advantage when it comes to predictions based on
structured data sets. This has to be taken into account when using deep learning for data-
driven decisions within the context of business analytics and answers the question of why
analytics departments do not deploy those models consistently. Overall, ML as a general-
purpose technology for data-driven prediction will further find its way into business analytics
and shape the field of information management. An important realization is that deep learning
is a valuable additional tool for the ML ecosystem that enhanced our ability to analyze data.
But it is not yet possible to replace the other models. Especially tree-based models such as
random forest and gradient boosting are powerful classifiers when it comes to structured
datasets. Practitioners should concentrate on creating new use cases that leverage the
advantage of DL instead of forcing the replacement of traditional models.
6 References

Addo, P. M., Guegan, D., & Hassani, B. (2018). Credit risk analysis using machine and deep
learning models. Risks, 6(2), 1–20. https://doi.org/10.3390/risks6020038

Agrawal, A., Gans, J., & Goldfarb, A. (2019). The Economics of Artificial Intelligence: An
Agenda. (A. Agrawal, J. Gans, & A. Goldfarb, Eds.). London: National Bureau of
Economic Research.

Baesens, B., Bapna, R., Marsden, J. R., Vanthienen, J., & Zhao, J. L. (2016).
Transformational Issues of Big Data and Analytics in Networked Business. Mis Quartely,
40(4), 807–818.

Bertsimas, D., & Kallus, N. (2019). From Predictive to Prescriptive Analytics. Management
Science. https://doi.org/10.1287/mnsc.2018.3253

Borges, A. F. S., Laurindo, F. J. B., Spínola, M. M., Gonçalves, R. F., & Mattos, C. A. (2021).
The strategic use of artificial intelligence in the digital era: Systematic literature review
and future research directions. International Journal of Information Management,
57(December 2019), 102225. https://doi.org/10.1016/j.ijinfomgt.2020.102225

Breiman, L. (1996). Bagging predictors. Machine Learning.


https://doi.org/10.1007/bf00058655

Bughin, J., Hazan, E., Ramaswamy, S., Chui, M., Allas, T., Dahlström, P., … Trench, M.
(2017). ARTIFICIAL INTELLIGENCE: THE NEXT DIGITAL FRONTIER?
McKinsey&Company - McKinsey Global Institute. https://doi.org/10.1016/S1353-
4858(17)30039-9

Candel, A., & LeDell, E. (2019). Deep learning with H2O. (A. Bartz, Ed.), H2O. ai (6th ed.).
Retrieved from http://h2o.ai/resources/

Chen, H., Chiang, R. H. L. ., & Storey, V. C. (2012). Business Intelligence and Analytics:
From Big Data to Big Impact. MIS Quarterly: Management Information Systems, 36(4),
1165–1188. https://doi.org/10.5121/ijdps.2017.8101

Chui, M., Manyka, J., Mehdi, M., Henke, N., Chung, R., Nel, P., & Malhotra, S. (2018). Notes
from Hundrets of Insights From the AI Frontier Use Cases. McKinsey Global Institute.
McKinsey&Company.

Collins, C., Dennehy, D., Conboy, K., & Mikalef, P. (2021). Artificial intelligence in information
systems research: A systematic literature review and research agenda. International
Journal of Information Management, 60(July), 102383.
https://doi.org/10.1016/j.ijinfomgt.2021.102383

Davenport, T. H. (2018). From analytics to artificial intelligence. Journal of Business


Analytics, 1(2), 73–80. https://doi.org/10.1080/2573234x.2018.1543535

Delen, D., & Ram, S. (2018). Research challenges and opportunities in business analytics.
Journal of Business Analytics, 1(1), 2–12.
https://doi.org/10.1080/2573234x.2018.1507324

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding. Retrieved from
http://arxiv.org/abs/1810.04805
Duan, Y., Edwards, J. S., & Dwivedi, Y. K. (2019). Artificial intelligence for decision making in
the era of Big Data – evolution, challenges and research agenda. International Journal
of Information Management, 48(February), 63–71.
https://doi.org/10.1016/j.ijinfomgt.2019.01.021

Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for
financial market predictions. European Journal of Operational Research, 270(2), 654–
669. https://doi.org/10.1016/j.ejor.2017.11.054

Flach, P. (2019). Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly,
and the Way Forward. Proceedings of the AAAI Conference on Artificial Intelligence, 33,
9808–9814. https://doi.org/10.1609/aaai.v33i01.33019808

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Deep Learning.
https://doi.org/10.1016/B978-0-12-391420-0.09987-X

Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout
networks. 30th International Conference on Machine Learning, ICML 2013, (PART 3),
2356–2364.

Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, C. (2018). Ensemble Learning
or Deep Learning? Application to Default Risk Analysis. Journal of Risk and Financial
Management, 11(1), 12. https://doi.org/10.3390/jrfm11010012

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd
ed.). Stanford, California: Springer. https://doi.org/10.1007/b94608

Hastie, T., Tibshirani, R., & Friedman, J. (2017). The Elements of Statistical Learning Second
Edition. Springer. Springer. https://doi.org/111

Henke, N., Bughin, J., Chui, M., Manyika, J., Saleh, T., Wiseman, B., & Sethupathy, G.
(2016). THE AGE OF ANALYTICS: COMPETING IN A DATA-DRIVEN WORLD.
Retrieved from https://www.mckinsey.com/business-functions/mckinsey-analytics/our-
insights/the-age-of-analytics-competing-in-a-data-driven-world

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A Fast Learning Algorithm for Deep Belief
Nets. Neural Comp., 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415

Kraus, M., Feuerriegel, S., & Oztekin, A. (2019). Deep learning in business analytics and
operations research: Models, applications and managerial implications. European
Journal of Operational Research, 1–14. https://doi.org/10.1016/j.ejor.2019.09.018

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep
Convolutional Neural Networks. Advances In Neural Information Processing Systems,
1–9. https://doi.org/http://dx.doi.org/10.1016/j.protcy.2014.09.007

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
https://doi.org/10.1038/nature14539

LeDell, E., & Gill, N. (2019). H2O: R Interface for “H2O”. R Package. Retrieved November
22, 2019, from https://cran.r-project.org/web/packages/h2o/index.html

Lee, K.-F. (2018). AI Superpowers by Kai-Fu Lee.


Malohlava, M., & Candel, A. (2019). Gradient Boosting Machine with H2O. H2O. ai (7th ed.).
H2O. Retrieved from http://h2o.ai/resources/%0Ahttp://h2o-
release.s3.amazonaws.com/h2o/master/3805/docs-website/h2o-
docs/booklets/GBMBooklet.pdf

Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge,


Massachusetts; London, England: The MIT Press. https://doi.org/10.1038/217994a0

Ng, A. (2018). Machine Learning Yearning. https://doi.org/10.1007/978-981-10-1509-0_9

R Core Team. (2019). R: A language and environment for statistical computing. Retrieved
November 22, 2019, from https://www.r-project.org/

Samek, W., & Müller, K.-R. (2019). Towards Explainable Artificial Intelligence.
https://doi.org/10.1007/978-3-030-28954-6_1

Schmitt, M. (2020). Artificial Intelligence in Business Analytics: Capturing Value with Machine
Learning Applications in Financial Services. University of Strathclyde.
https://doi.org/10.48730/5s00-jd45

Sharda, R., Delen, D., & Turban, E. (2017). Business Intelligence, Analytics, and Data
Science: A Managerial Perspective. Pearson Education Limited. Retrieved from
www.pearsonglobaleditions.com

Siebel, T. M. (2019). Digital Transformation: Survive and Thrive in an Era of Mass Extinction
(1st ed.). New York: Rosetta Books.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., … Hassabis,
D. (2017). Mastering the game of Go without human knowledge. Nature.
https://doi.org/10.1038/nature24270

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout:
A simple way to prevent neural networks from overfitting. Journal of Machine Learning
Research.

Stadelmann, T., Amirian, M., Arabaci, I., Arnold, M., Duivesteijn, G. F., Elezi, I., … Tuggener,
L. (2018). Deep learning in the wild. ArXiv, 11081 LNAI, 17–38.
https://doi.org/10.1007/978-3-319-99978-4_2

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A.
(2015). Going deeper with convolutions. Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 07-12-June, 1–9.
https://doi.org/10.1109/CVPR.2015.7298594

Verma, S., Sharma, R., Deb, S., & Maitra, D. (2021). Artificial intelligence in marketing:
Systematic review and future research direction. International Journal of Information
Management Data Insights, 1(1), 100002. https://doi.org/10.1016/j.jjimei.2020.100002

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … Silver,
D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Nature. https://doi.org/10.1038/s41586-019-1724-z

Warner, K. S. R., & Wäger, M. (2019). Building dynamic capabilities for digital transformation:
An ongoing process of strategic renewal. Long Range Planning, 52(3), 326–349.
https://doi.org/10.1016/j.lrp.2018.12.001

You might also like