0% found this document useful (0 votes)
26 views

Comparison of Reinforcement and Supervised Learning Algorithms On Startup Success Prediction

This document compares reinforcement learning and supervised learning algorithms for predicting startup success based on important factors. It explores adapting multi-armed bandit policies to contextual bandits problems for binary classification. The paper aims to more efficiently predict startup outcomes using reinforcement learning instead of traditional supervised techniques that require large datasets and computing resources. It evaluates classification algorithms on a real dataset to benchmark reinforcement learning approaches for multilabeled classification of startup success.

Uploaded by

wormagscopybiz9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Comparison of Reinforcement and Supervised Learning Algorithms On Startup Success Prediction

This document compares reinforcement learning and supervised learning algorithms for predicting startup success based on important factors. It explores adapting multi-armed bandit policies to contextual bandits problems for binary classification. The paper aims to more efficiently predict startup outcomes using reinforcement learning instead of traditional supervised techniques that require large datasets and computing resources. It evaluates classification algorithms on a real dataset to benchmark reinforcement learning approaches for multilabeled classification of startup success.

Uploaded by

wormagscopybiz9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

86 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.

7, July 2020

Comparison of Reinforcement and Supervised Learning


Algorithms on Startup Success Prediction
Yong Shi1,2,3, Eremina Ekaterina2,3, Wen Long1,2,3,*
1
School of Economics & Management, University of Chinese Academy of Sciences, Beijing, P.R.China
2
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing, P.R.China
3
Key Laboratory of Big Data Mining & Knowledge Management, Chinese Academy of Sciences, Beijing, P.R.China

Abstract startup. It is very important to increase the success rate of


There has been an exponential growth in startups over the past startups.
few years. More than half of startups fail to gain funding. This work mainly aims to predict whether a startup which
Predicting the success of a startup allows investors to find is currently operating turn into a success or a failure. The
companies that have the potential for rapid growth, thereby success of a company is defined as the event that gives the
allowing them to be one step ahead of competition. This paper
proposes implementing a model to predict whether startup will be
company's founders a large sum of money through the
a failure or succeed based on many important factors like idea of process of M&A (Merger and Acquisition) or an IPO
the startup, place where the startup established, domain vertical to (Initial Public Offering). A company would be considered
which the startup belongs, type of funding. On the preprocessed as failed if it had to be shutdown.
data we used several classification techniques along with data This leads us to multi classification problem that usually is
mining optimizations and validations. We provide our analysis solved with classical supervised machine learning
using techniques such as Random Forest, KNN, Bayesian techniques, such as linear regression, k-nearest neighbors,
Networks, and so on. We evaluate the correctness of our models naive bayes or random trees. Perhaps this techniques
based on factors precision and recall. Our model can be used by requires a lot of time and machine resources if implemented
startup to decide on what factors they should focus in order to
succeed. Also this work aims to compare efficiency of supervised
for big datasets thus being not efficient. Therefore in this
machine learning algorithms and reinforcement learning work we redefine classification problem in terms of
algorithms for multi-labeled classification task. Adaptations of reinforcement learning and adopt some successful
successful multi-armed bandits policies to the online contextual strategies and baselines that have been proposed for the
bandits scenario with binary rewards using binary classification multi-armed bandits setting and variations thereof to the
algorithms is also explored. contextual bandits setting by using supervised learning
Key words: algorithms as black-box oracles, as well as exploration in
CrunchBase, multi-class classification, contextual bandits, the early phases in the absence of non-zero rewards.
supervised machine learning Failures of startups have drawn massive attention and most
of the companies are working on designing various kinds
of prediction/futuristic models to successfully predict the
Declarations fate of a new company. Few researchers have done some
interesting work trying to find the success/failure patterns
Funding - This research was partly supported by National
of a startup. One of the works discusses the success and risk
Natural Science Foundation of China (No.71771204,
factors involved in a pre-startup phase [22]. The authors
71932008).
focus on estimating the relative importance of a variety of
Conflicts of interest/Competing interests - not applicable.
approaches and variables in explaining pre-startup success.
Availability of data and material - available.
They created a framework, which suggests that startup
Code availability - available.
efforts differ in terms of the characteristics of the
Author’s contributions - not applicable.
individual(s) who start the venture, the organization that
they create, the environment around this venture and the
1. Introduction process, how it was started.
The work done in paper [5] closely addresses our problem.
Startups play a huge role in modern world economics, Research on personality characteristics relates dispositions
perhaps fast changing world leads many of them to failure. such as risk-taking, locus of control, and need for
There can be several reasons like inefficient planning, achievement to the emergence and the success of
inefficient way of using the funds, lack of good team to entrepreneurship (for an overview, see [20]).
work, insufficient funds, etc. which leads to failure of Greenewood et al. [16] have studied differences in motives
as a success factor in nascent entrepreneur- ship. They find

Manuscript received July 7, 2020


Manuscript revised July 20, 2020
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 87

that women who start for internally oriented reasons, and in the absence of non-zero rewards, benchmarking them in
men who start for externally oriented reasons (like an empirical evaluation using Crunchbase dataset for
perceiving a need in the market) have greater chances of multilabeled classification. In this work we use cumulative
successfully completing the pre-startup phase. Another reward throughout the rounds instead of accumulated
work was on crowd sourcing which gets a mention in [15]. regret.While this has some chance of not being able to
Authors focus on how successful organization can be reflect asymptotic behaviors in an infinite-time scenario
created by crowd sourcing. Work of paper [21] focuses on with all-fixed arms, it provides some insight on what
developing a research program to investigate the major happens during typical timelines of interest.
factors con- tributing to success in new technical ventures.
Strategic alliances between companies is a good way to
construct networks. Another work on new venture failure 2. Related Work
is done in the paper [19]. In this paper, the authors
demonstrate two ways to investigate new venture failure - A classification task has to predict the class for a given
testing for moderating effects of new venture failure on the unlabeled item. The class must be selected among a finite
relationship between startup experience and got expertise set of predefined classes. Classification algorithms are used
with a sample of 220 entrepreneurs, secondly, by exploring in many application domains, data associated to class label
the nature of these relationships. are available. In all these cases, a classification algorithm
Different research has been done trying to figure out can build a classifier that is a model M that calculates the
several aspects of entrepreneurship and how some of them class label c for a given input item x, that is, c = M(x),
can lead to a successful company. Work done in paper [3] where c ∈ {c1,c2, ...,cn} and each ci is a class label. For
addresses similar issues. Another famous work is by R. building model algorithm requires a set of available items
Dickinson in his article [11] where he discusses the critical together with their correct class label. Set of classified
success factors and small businesses. Article [14] discusses items is called the training set. After generating the model
a lot of problems faced by innovators. The market M, the classifier can automatically predict the class for any
orientation for entrepreneurs is discussed in article [10], new item that will be given as input. Several classification
that also focuses on problems in terms of management. models have been introduced such as decision trees, k-
Research paper [18] discusses factors which can create nearest neighbors, neural networks and others.
successful companies. The machine learning includes supervised, unsupervised
Our work involves, the data mining analysis of more than and reinforcement learning. The supervised learning
24000 companies, data (8361 companies with IPO, 4236 provides many different regression and classification
still in operation and 5600 closed/acquired companies). We techniques to implement a machine learning model based
modeled our data for top-10 countries (USA, Great Britain, on the labeled data. The existing solutions for this problems
Canada, China, India, France, Israel, Germany, include all the algorithms briefly explained below([1]).
Switzerland and Russia). We analyzed this data based on The simple logistic regression is built by plotting the graph
key factors like when the company was founded, how much of the dataset and then forming the boundaries separating
seed funds it raised, how many months it took to raise the the different classes. This is inefficient when the data is
seed funds, factors which were affecting the growth of the linearly inseparable and very sensitive to the underfitting
company both positive and negative. and overfitting problems. It uses a stage-wise fitting
Experiments with more than 30 classifiers were conducted process.
to find that many meta classifiers used with decision trees
can give impressive results, which can be further improved
by combining the resulting prediction probabilities from
several classifiers. For the first time both supervised
machine learning (regression, random forest, knn, bayes,
etc.) and reinforcement learning (adaptations of multi-
armed bandits policies) were applied and compared for
startup classification. Our results were represented in terms
of parameters like Recall, Precision and F1-score values for
supervised methods and with cumulative mean reward for
multi-armed bandits.
his work proposes adaptations of some successful
strategies and baselines that have been proposed for the
multi-armed bandits setting and variations thereof to the
contextual bandits setting by using supervised learning
algorithms as oracles, also exploration in the early phases
88 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020

Random Forest classifier is the collection of multiple of the output can be highly dependent on the supervised
independent decision trees. The disconnected decision learning settings and fails to find the patterns and
trees are formed by taking different starting nodes. The dependency of features.
initial nodes are selected based on the GINI index and K-Nearest Neighbors (KNN) is a standard method that has
many different criteria. the individual trees are built been extended to large-scale data mining efforts. The idea
independent of the other trees. when an unknown data item is that one uses a large amount of training data, where each
is given to the model, the individual outputs of the decision data point is characterized by a set of variables. Each point
tree is send to a optimiser which finds the maximum is plotted in a high-dimensional space, where each axis in
favorable class label and gives it as the output. As the the space corresponds to an individual variable. KNN has
output of multiple trees is considered the accuracy of the the advantage of being nonparametric. That is, the method
model is expected to be high and resist the underfitting and can be used even when the variables are categorical.
overfitting problems([7]). This algorithm is very robust and Reinforcement learning can be also applied for
handles the highly imbalanced classes very effectively. classification problem based on model settings. Multi-
Naive Bayes classifier assume that the features of the data armed bandits with covariates are known as contextual
items are independent to each other. The hidden correlation bandits. The main difference is that contextual bandits have
between the features are not addressed effectively. Naive side information at each iteration and can be used for arm
bayes classifier is trained on the supervised learning selection, rewards also depend on covariates.
settings depending on the probability model. The accuracy

The problem is very similar to multi-class or difference that the right set of labels is not known for each
multi-label classification (with the reward being whether observation, only whether the label that was chosen by the
the right label was chosen or not), but with the big agent for each observation was correct or not.
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 89

The simpler multi-armed bandits scenario has been continuous rewards have been studied before ([10], [4]), in
extensively studied, and many good solutions have been which these oracles are fit to the covariates and rewards
proposed which enjoy theoretical limits on their regrets from each arm separately, and the same strategies from
([8]), as well as demonstrated performance in empirical multi-armed bandits have also resulted in good strategies in
tests ([23]). Upper confidence bounds is one of the best this setting. Other related problems such as building an
solutions with theoretical guarantees and good empirical optimal oracle or policy with data collected from a past
performance (such bound gets closer to the observed mean policy have also been studied ([3], [9], [2]), but this work
as more observations are accumulated, thereby balancing only focuses on online policies that start from scratch and
exploration and exploitation), and Thompson sampling continue ad-infinitum.
which takes a Bayesian perspective to choose an arm, All algorithms were benchmarked and compared
according to its probability of being the best arm. Epsilon- to the simpler baselines by simulating contextual bandits
Greedy algorithms with variations are typical comparison scenarios using multi-label classification datasets, where
baselines. The idea is to select empirical best action or a the arms become the classes and the rewards
random action with some probability.
The contextual bandits has been studied in different
are whether the chosen label for a given
variations - as the bandits with ”expert advise”, with observation was true or not. Observations were fed in
rewards assumed to be continuous (usually in the range rounds and each algorithm made his own choice, the same
[0,1]) and the reward-generating functions linear ([10], time context was presented to all, and whether the chosen
[17]). label was correct or not was also revealed to each one.
Approaches taking a supervised learning algorithm as an
oracle for a similar setting as presented here but with

3. Algorithms
Most supervised learning algorithms used as oracles can
not fit to data with one value or one label (e.g. only
observations which had no reward), and typical domains of
interest involve a scenario in which the non-zero reward
rate for any arm is rather small regardless of the covariates
(e.g. clicks). In some settings this problem can be solved by
incorporating some shooting criterion, and it’s possible to
think of a similar application for the proposed scenario in
this work if the classifier is able to output probabilities.
Thus a natural adaptation of the upper confidence bound
strategy is as follows:
Adaptive-Greedy ([17]) use a random selection criterion, it
doesn’t require multiple oracles per arm thus shows good
performance:
90 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020

The choice of threshold z is problematic though it might be last m highest estimated rewards of the best arm (Algorithm
a better idea to keep a moving average window from the 9).

This moving window in turn might also be replaced with a then at time 2m update only with the observations that were
non-moving window, i.e. compute the average for the first between m and 2m.
m observations, but don’t update it until m more rounds,
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 91

It might also be a good idea to take the arm with the


Instead of relying on choosing arms at random for smallest/largest gradient for either label instead of a
exploration, active learning heuristics might be chosen for weighted average according to the estimated probabilities,
faster learning instead. Strategies such as Epsilon-Greedy but in practice a weighted average tends to give slightly
are easy to convert into active learning - for example, better results. Contextual AdaptiveGreedy also can be
assuming a differentiable and smooth model such as enriched with this simple heuristic:
logistic regression or artificial neural networks (depending
on the particular activation functions) - Algorithm 10.
92 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020

4. Empirical Evaluation

Each model will be evaluated by recall, precision and f1-


score metrics. In case of unbalanced data classification
accuracy is not enough to decide whether the model is good 5. Data And Preprocessing
or not. In order to avoid this we also use precision and recall.
Precision is the ratio of correctly predicted positive The CrunchBase dataset is a raw startup data which
observations to the total predicted positive observations. includes different attributes of different types. The
Recall is the ratio of correctly predicted positive repository contains 4 CSVs derived from the "CrunchBase
observations to the all observations in actual class. 2013 Snapshot" as made available by CrunchBase under
Weighted average of precision and recall is known as F1 CC-BY. It encompasses roughly 208,000 organizations,
score. Intuitively F1 is not as easy to understand as 227,000 people, 400,000 relationships, and 53,000
accuracy, but it is much better in case of uneven class fundraising events. Data was obtained as .csv files, that are
distribution. presented on Figure 1. Working with such datasets demand
The algorithms above were benchmarked and compared to for rigorous data preprocessing. The consolidated raw data
the simpler baselines by simulating contextual bandits may include some outliers, values that are out of range, few
scenarios using multi-label classification dataset, where the missing values, error values - such errors will lead to wrong
arms become the classes and the rewards can be for a given results. The quality of data directly proportional to the
label correct or not. Each algorithm was fed by accuracy of the model. While training the model the
observations in rounds and made it own choice, perhaps the ambiguity raises due to redundant and unimportant data.
presented context was the same to all, and revealing to each This makes the necessity of data preprocessing before
one whether the label that it chose in that round was correct training the model. This process consist of few steps:
or not. • Data cleaning: delete NaN-values, check for
Oracles were refit every 50 rounds. Experiments were outliers and resolve if needed.
performed until iterating all observations in dataset, then • Data transformation: normalize the attribute
data was shuffled and experiments were run again. Results values, aggregate. In order to reduce number of
were gained after 10 runs.Both full-refit and mini-batch- categories was performed mapping into main
update versions were evaluated. The classifier algorithm domains: entertainment, health, manufacturing,
used was logistic regression, with the same regularization news, social, etc. Number of categories was
parameter for every arm. reduced from 687 to 9.
For all contextual bandits policies was plotted cumulative • Data reduction: delete duplicates. In this work we
mean reward over time, where time is the number of focus on startups from top-10 countries: USA,
rounds/observations and the reward being whether they Great Britain, Canada, China, India, France, Israel,
choose a correct label/arm for an observation/context. Germany, Switzerland and Russia - they held most
of the fundings and companies. Distribution of
startups can be seen on Figure 2.
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 93

Fig. 1 Entity Relationship Diagram

№ of startups

USA 18162
GBR 1889
CAN 923
CHN 917
ISR 555
IND 864
DEU 509
FRA 734
CHE 178
RUS 234

Fig. 2 Startup distribution by countries

For each startup, we retrieved its status (operating, ipo, algorithm which combines under-sampling the positive
acquired, closed), country code, funding rounds, funding class with over-sampling the negative one: SMOTEENN
round type (venture, angel, seed, private equity), category that combines SMOTE (synthetic over sampling) with
(from mapping.csv) and sum of total funding. Overall, our Edited Nearest Neighbours, which is used to pare down and
dataset consists of 24965 companies. As it can be seen from centralise the negative cases. Startup distribution by status
Figure 3 dataset is imbalanced, to deal with it was used after preprocessing can be seen on Figure 4.
94 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020

19000 18,706

9500

3,678
1,946
635
0
IPO Operating Acquired Closed

Fig. 3 Startup distribution by status in raw dataset

9000 8,361

4,784 4,973
4500 4,236

0
IPO Operating Acquired Closed

Fig. 4 Startup distribution by status in final dataset

LR RF kNN GBM NB
6. Results 1.125

This work explored potential advantages of using multi- 0.9


armed bandits for classification task compared to classical 0.675
supervised machine learning algorithms. An empirical
evaluation of supervised algorithms proved higher 0.45
efficiency of kNN and Random Forest even compared to
0.225
classical logistic regression. Precision and recall can be
seen on Figures 5 and 6 respectively. 0
ipo operating acquired closed

Fig. 5 Precision for supervised startup predictions


IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 95

The variance in prediction precision indicates that we have Fig. 8 Accuracy for supervised startup predictions
unbalanced dataset. While kNN gives 0,90 precision for
«ipo», that is presented much less than other classes, Figure 8 clearly shows, that logistic regression and naive
logistic regression is not able to define this class at all. bayes that are very successful with binary classification
tasks fail to define classes in multi-labeled dataset.
LR RF kNN GBM NB Gradient boosting is slightly worse compared to kNN, but
1.25 shows higher speed performance.
In many cases empirical evaluation of adapted multi-armed
1 bandits policies showed better results compared to simpler
baselines.A further comparison (Figure 9) with similar
0.75 works meant for the regression setting was not feasible due
0.5 to the lack of scalability of other algorithms.
Just like in MAB, the upper confidence bound approach
0.25 proved to be a reasonably good strategy throughout all
datasets despite the small number of resamples used,
0 having fewer hyperparameters to tune. Enhancing it by
ipo operating acquired closed incorporating active learning heuristics did not seem to
have much of an effect, and it seems that setting a given
Fig. 6 Recall for supervised startup predictions
initial threshold provides better results compared to setting
the threshold as a moving percentile of the predictions.
From Figures 5-7 can be seen that evaluation metrics of
feature selection methods, on average, among all the
classification methods that use them. It can be seen that
gradient boosting and kNN are the best performers, when
regression and naive bayes fail.

LR RF kNN GBM NB
1

0.75

0.5

0.25

0
ipo operating acquired closed

Fig. 7 F1-score for supervised startup predictions

1.125

0.9

0.675

0.45 1 1 1
0.225
0 0
0
Logistic Random GBM
Regression Forest
96 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020

Fig. 9 Comparison of Contextual Bandit Policies

While theoretically sound, using stochastic optimization to CONCLUSION


update classifiers with small batches of data resulted in In this paper was proposed and implemented model to
severely degraded performance compared to full refits predict future of startup and suggest improvements for
across all metaheuristics, even in the later rounds of larger future progress. Based on information about startup, such
datasets, with no policy managing to outperform choosing as location, industry, investment type models can predict
the best arm without context, at least with the possible expected funding range.This model gave 86%
hyperparameters experimented with for the MAB-first trick. accuracy using kNN algorithm. It may alter the result if any
This might in practice not be much a problem consider the other external factors that affect the funding the external
short time it takes to fit models to a small number of factors can be like psychological reasons and emotional
observations as done in the earlier phases of a policy. reasons of employee or candidate.
It shall be noted that all arms were treated as being We have presented a case for an approach to RL that
independent from each other, which in reality might not be combines policy iteration and pure classification learning
the case and other models incorporating similarity with rollouts.We believe that these initial successes will
information might result in improved performance. help to apply modern classification methods on other
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 97

reinforcement leaning domains. Of course, there are still [15] M. D. Greenberg, B. Pardo, K. Hariharan, and E. Gerber.
many questions to be solved. This work proposed Crowdfunding support tools: predicting success & failure. In
adaptations for the MAB setting, variations of contextual CHI’13 Extended Abstracts on Human Factors in Computing
Systems, pages 1815–1820. ACM, 2013.
bandits setting by using supervised learning algorithms,
[16] P. G. Greene, M. M. Hart, E. J. Gatewood, C. G. Brush, and
benchmarking was performed using Crunchbase dataset. N. M. Carter. Women entrepreneurs: Moving front and
Our empirical results suggest that more traditional methods center: An overview of research and theory. Coleman White
such as GBM can be used successfully. However, Paper Series, 3:1–47, 2003.
contextual bandits have strong point in case of big datasets [17] Lihong Li, Wei Chu, John Langford, and Robert E Schapire.
the algorithm itself allows division on threads and A contextual-bandit approach to personalized news article
calculation on parallel kernels thus leading to time and recommendation. In Proceedings of the 19th international
processing costs. conference on World wide web, pages 661–670. ACM, 2010.
[18] D. C. McClelland. Characteristics of successful
entrepreneurs*. The journal of creative behavior, 21(3):219–
References 233, 1987.
[1] A. Agrawal, P. D. Deshpande, A. Cecen, G. P. Basavarsu, A. [19] R. K. Mitchell, J. Mitchell, and J. B. Smith. Failing to
N. Choudhary, and S. R. Kalidindi. Exploration of data succeed: new venture failure as a moderator of startup
science techniques used to predict the strength of steel experience and startup expertise. Frontiers of
andIntegrating Materials and Manufacturing Innovation, entrepreneurship research, 2004.
3(8):1–19, 2014. [20] A. Rauch and M. Frese. Let’s put the person back into
[2] Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, entrepreneurship research: A meta-analysis on the
Lihong Li, and Robert Schapire. Taming the monster: A fast relationship between business owners ’personality traits,
and simple algorithm for contextual bandits. In International business creation, and success. European Journal of Work
Conference on Machine Learning, pages 1638–1646, 2014. and Organizational Psychology, 16(4):353–385, 2007.
[3] T. M. Begley and W.-L. Tan. The socio-cultural environment [21] R. Stuart and P. A. Abetti. Start-up ventures: Towards the
for entrepreneurship: A comparison between east asian and prediction of initial success. Journal of Business Venturing,
anglosaxon countries. Journal of international business 2(3):215–230, 1987.
studies, pages 537–553, 2001. [22] M. Van Gelderen, R. Thurik, and N. Bosma. Success and risk
[4] Alina Beygelzimer and John Langford. The offset tree for factors in the pre-startup phase. Small Business Economics,
learning with partial labels. In Proceedings of the 15th ACM 24(4):365–380, 2005.
SIGKDD international conference on Knowledge discovery [23] Joannes Vermorel and Mehryar Mohri. Multi-armed bandit
and data mining, pages 129–138. ACM, 2009. algorithms and empirical evaluation. In European conference
[5] J. Bru ̈derl, P. Preisend ö rfer, and R. Ziegler. Survival on machine learning, pages 437–448. Springer, 2005.
chances of newly founded business organizations. American
sociological review, pages 227–242, 1992.
[6] Alberto Bietti, Alekh Agarwal, and John Langford. A
contextual bandit bake-off. 2018.
[7] Breiman. L. for Random forests. Mach. Learn., 45(1):5– 32,
Oct. 2001.
[8] Giuseppe Burtini, Jason Loeppky, and Ramon Lawrence. A
survey of on- line experiment design with the stochastic
multi-armed bandit. arXiv preprint arXiv:1510.00757, 2015.
[9] Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli
Upfal. Mortal multi- armed bandits. In Advances in neural
information processing systems, pages 273–280, 2009.
[10] Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire.
Contextual bandits with linear payoff functions. In
Proceedings of the Fourteenth International Conference on
Artificial Intelligence and Statistics, pages 208–214, 2011.
[11] R. Dickinson. Business failure rate. American Journal of
Small Business, 6(2):17–25, 1981.
[12] Miroslav Dud ́ık, Dumitru Erhan, John Langford, Lihong Li,
et al. Doubly robust policy evaluation and optimization.
Statistical Science, 29(4):485–511, 2014.
[13] Dylan J Foster, Alekh Agarwal, Miroslav Dud ́ık, Haipeng
Luo, and Robert E Schapire. Practical contextual bandits
with regression oracles. arXiv preprint arXiv:1803.01088,
2018.
[14] W. B. Gartner. Who is an entrepreneur? is the wrong
question. American journal of small business, 12(4):11–32,
1988.

You might also like