Comparison of Reinforcement and Supervised Learning Algorithms On Startup Success Prediction
Comparison of Reinforcement and Supervised Learning Algorithms On Startup Success Prediction
7, July 2020
that women who start for internally oriented reasons, and in the absence of non-zero rewards, benchmarking them in
men who start for externally oriented reasons (like an empirical evaluation using Crunchbase dataset for
perceiving a need in the market) have greater chances of multilabeled classification. In this work we use cumulative
successfully completing the pre-startup phase. Another reward throughout the rounds instead of accumulated
work was on crowd sourcing which gets a mention in [15]. regret.While this has some chance of not being able to
Authors focus on how successful organization can be reflect asymptotic behaviors in an infinite-time scenario
created by crowd sourcing. Work of paper [21] focuses on with all-fixed arms, it provides some insight on what
developing a research program to investigate the major happens during typical timelines of interest.
factors con- tributing to success in new technical ventures.
Strategic alliances between companies is a good way to
construct networks. Another work on new venture failure 2. Related Work
is done in the paper [19]. In this paper, the authors
demonstrate two ways to investigate new venture failure - A classification task has to predict the class for a given
testing for moderating effects of new venture failure on the unlabeled item. The class must be selected among a finite
relationship between startup experience and got expertise set of predefined classes. Classification algorithms are used
with a sample of 220 entrepreneurs, secondly, by exploring in many application domains, data associated to class label
the nature of these relationships. are available. In all these cases, a classification algorithm
Different research has been done trying to figure out can build a classifier that is a model M that calculates the
several aspects of entrepreneurship and how some of them class label c for a given input item x, that is, c = M(x),
can lead to a successful company. Work done in paper [3] where c ∈ {c1,c2, ...,cn} and each ci is a class label. For
addresses similar issues. Another famous work is by R. building model algorithm requires a set of available items
Dickinson in his article [11] where he discusses the critical together with their correct class label. Set of classified
success factors and small businesses. Article [14] discusses items is called the training set. After generating the model
a lot of problems faced by innovators. The market M, the classifier can automatically predict the class for any
orientation for entrepreneurs is discussed in article [10], new item that will be given as input. Several classification
that also focuses on problems in terms of management. models have been introduced such as decision trees, k-
Research paper [18] discusses factors which can create nearest neighbors, neural networks and others.
successful companies. The machine learning includes supervised, unsupervised
Our work involves, the data mining analysis of more than and reinforcement learning. The supervised learning
24000 companies, data (8361 companies with IPO, 4236 provides many different regression and classification
still in operation and 5600 closed/acquired companies). We techniques to implement a machine learning model based
modeled our data for top-10 countries (USA, Great Britain, on the labeled data. The existing solutions for this problems
Canada, China, India, France, Israel, Germany, include all the algorithms briefly explained below([1]).
Switzerland and Russia). We analyzed this data based on The simple logistic regression is built by plotting the graph
key factors like when the company was founded, how much of the dataset and then forming the boundaries separating
seed funds it raised, how many months it took to raise the the different classes. This is inefficient when the data is
seed funds, factors which were affecting the growth of the linearly inseparable and very sensitive to the underfitting
company both positive and negative. and overfitting problems. It uses a stage-wise fitting
Experiments with more than 30 classifiers were conducted process.
to find that many meta classifiers used with decision trees
can give impressive results, which can be further improved
by combining the resulting prediction probabilities from
several classifiers. For the first time both supervised
machine learning (regression, random forest, knn, bayes,
etc.) and reinforcement learning (adaptations of multi-
armed bandits policies) were applied and compared for
startup classification. Our results were represented in terms
of parameters like Recall, Precision and F1-score values for
supervised methods and with cumulative mean reward for
multi-armed bandits.
his work proposes adaptations of some successful
strategies and baselines that have been proposed for the
multi-armed bandits setting and variations thereof to the
contextual bandits setting by using supervised learning
algorithms as oracles, also exploration in the early phases
88 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020
Random Forest classifier is the collection of multiple of the output can be highly dependent on the supervised
independent decision trees. The disconnected decision learning settings and fails to find the patterns and
trees are formed by taking different starting nodes. The dependency of features.
initial nodes are selected based on the GINI index and K-Nearest Neighbors (KNN) is a standard method that has
many different criteria. the individual trees are built been extended to large-scale data mining efforts. The idea
independent of the other trees. when an unknown data item is that one uses a large amount of training data, where each
is given to the model, the individual outputs of the decision data point is characterized by a set of variables. Each point
tree is send to a optimiser which finds the maximum is plotted in a high-dimensional space, where each axis in
favorable class label and gives it as the output. As the the space corresponds to an individual variable. KNN has
output of multiple trees is considered the accuracy of the the advantage of being nonparametric. That is, the method
model is expected to be high and resist the underfitting and can be used even when the variables are categorical.
overfitting problems([7]). This algorithm is very robust and Reinforcement learning can be also applied for
handles the highly imbalanced classes very effectively. classification problem based on model settings. Multi-
Naive Bayes classifier assume that the features of the data armed bandits with covariates are known as contextual
items are independent to each other. The hidden correlation bandits. The main difference is that contextual bandits have
between the features are not addressed effectively. Naive side information at each iteration and can be used for arm
bayes classifier is trained on the supervised learning selection, rewards also depend on covariates.
settings depending on the probability model. The accuracy
The problem is very similar to multi-class or difference that the right set of labels is not known for each
multi-label classification (with the reward being whether observation, only whether the label that was chosen by the
the right label was chosen or not), but with the big agent for each observation was correct or not.
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 89
The simpler multi-armed bandits scenario has been continuous rewards have been studied before ([10], [4]), in
extensively studied, and many good solutions have been which these oracles are fit to the covariates and rewards
proposed which enjoy theoretical limits on their regrets from each arm separately, and the same strategies from
([8]), as well as demonstrated performance in empirical multi-armed bandits have also resulted in good strategies in
tests ([23]). Upper confidence bounds is one of the best this setting. Other related problems such as building an
solutions with theoretical guarantees and good empirical optimal oracle or policy with data collected from a past
performance (such bound gets closer to the observed mean policy have also been studied ([3], [9], [2]), but this work
as more observations are accumulated, thereby balancing only focuses on online policies that start from scratch and
exploration and exploitation), and Thompson sampling continue ad-infinitum.
which takes a Bayesian perspective to choose an arm, All algorithms were benchmarked and compared
according to its probability of being the best arm. Epsilon- to the simpler baselines by simulating contextual bandits
Greedy algorithms with variations are typical comparison scenarios using multi-label classification datasets, where
baselines. The idea is to select empirical best action or a the arms become the classes and the rewards
random action with some probability.
The contextual bandits has been studied in different
are whether the chosen label for a given
variations - as the bandits with ”expert advise”, with observation was true or not. Observations were fed in
rewards assumed to be continuous (usually in the range rounds and each algorithm made his own choice, the same
[0,1]) and the reward-generating functions linear ([10], time context was presented to all, and whether the chosen
[17]). label was correct or not was also revealed to each one.
Approaches taking a supervised learning algorithm as an
oracle for a similar setting as presented here but with
3. Algorithms
Most supervised learning algorithms used as oracles can
not fit to data with one value or one label (e.g. only
observations which had no reward), and typical domains of
interest involve a scenario in which the non-zero reward
rate for any arm is rather small regardless of the covariates
(e.g. clicks). In some settings this problem can be solved by
incorporating some shooting criterion, and it’s possible to
think of a similar application for the proposed scenario in
this work if the classifier is able to output probabilities.
Thus a natural adaptation of the upper confidence bound
strategy is as follows:
Adaptive-Greedy ([17]) use a random selection criterion, it
doesn’t require multiple oracles per arm thus shows good
performance:
90 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020
The choice of threshold z is problematic though it might be last m highest estimated rewards of the best arm (Algorithm
a better idea to keep a moving average window from the 9).
This moving window in turn might also be replaced with a then at time 2m update only with the observations that were
non-moving window, i.e. compute the average for the first between m and 2m.
m observations, but don’t update it until m more rounds,
IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020 91
4. Empirical Evaluation
№ of startups
USA 18162
GBR 1889
CAN 923
CHN 917
ISR 555
IND 864
DEU 509
FRA 734
CHE 178
RUS 234
For each startup, we retrieved its status (operating, ipo, algorithm which combines under-sampling the positive
acquired, closed), country code, funding rounds, funding class with over-sampling the negative one: SMOTEENN
round type (venture, angel, seed, private equity), category that combines SMOTE (synthetic over sampling) with
(from mapping.csv) and sum of total funding. Overall, our Edited Nearest Neighbours, which is used to pare down and
dataset consists of 24965 companies. As it can be seen from centralise the negative cases. Startup distribution by status
Figure 3 dataset is imbalanced, to deal with it was used after preprocessing can be seen on Figure 4.
94 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020
19000 18,706
9500
3,678
1,946
635
0
IPO Operating Acquired Closed
9000 8,361
4,784 4,973
4500 4,236
0
IPO Operating Acquired Closed
LR RF kNN GBM NB
6. Results 1.125
The variance in prediction precision indicates that we have Fig. 8 Accuracy for supervised startup predictions
unbalanced dataset. While kNN gives 0,90 precision for
«ipo», that is presented much less than other classes, Figure 8 clearly shows, that logistic regression and naive
logistic regression is not able to define this class at all. bayes that are very successful with binary classification
tasks fail to define classes in multi-labeled dataset.
LR RF kNN GBM NB Gradient boosting is slightly worse compared to kNN, but
1.25 shows higher speed performance.
In many cases empirical evaluation of adapted multi-armed
1 bandits policies showed better results compared to simpler
baselines.A further comparison (Figure 9) with similar
0.75 works meant for the regression setting was not feasible due
0.5 to the lack of scalability of other algorithms.
Just like in MAB, the upper confidence bound approach
0.25 proved to be a reasonably good strategy throughout all
datasets despite the small number of resamples used,
0 having fewer hyperparameters to tune. Enhancing it by
ipo operating acquired closed incorporating active learning heuristics did not seem to
have much of an effect, and it seems that setting a given
Fig. 6 Recall for supervised startup predictions
initial threshold provides better results compared to setting
the threshold as a moving percentile of the predictions.
From Figures 5-7 can be seen that evaluation metrics of
feature selection methods, on average, among all the
classification methods that use them. It can be seen that
gradient boosting and kNN are the best performers, when
regression and naive bayes fail.
LR RF kNN GBM NB
1
0.75
0.5
0.25
0
ipo operating acquired closed
1.125
0.9
0.675
0.45 1 1 1
0.225
0 0
0
Logistic Random GBM
Regression Forest
96 IJCSNS International Journal of Computer Science and Network Security, VOL.20 No.7, July 2020
reinforcement leaning domains. Of course, there are still [15] M. D. Greenberg, B. Pardo, K. Hariharan, and E. Gerber.
many questions to be solved. This work proposed Crowdfunding support tools: predicting success & failure. In
adaptations for the MAB setting, variations of contextual CHI’13 Extended Abstracts on Human Factors in Computing
Systems, pages 1815–1820. ACM, 2013.
bandits setting by using supervised learning algorithms,
[16] P. G. Greene, M. M. Hart, E. J. Gatewood, C. G. Brush, and
benchmarking was performed using Crunchbase dataset. N. M. Carter. Women entrepreneurs: Moving front and
Our empirical results suggest that more traditional methods center: An overview of research and theory. Coleman White
such as GBM can be used successfully. However, Paper Series, 3:1–47, 2003.
contextual bandits have strong point in case of big datasets [17] Lihong Li, Wei Chu, John Langford, and Robert E Schapire.
the algorithm itself allows division on threads and A contextual-bandit approach to personalized news article
calculation on parallel kernels thus leading to time and recommendation. In Proceedings of the 19th international
processing costs. conference on World wide web, pages 661–670. ACM, 2010.
[18] D. C. McClelland. Characteristics of successful
entrepreneurs*. The journal of creative behavior, 21(3):219–
References 233, 1987.
[1] A. Agrawal, P. D. Deshpande, A. Cecen, G. P. Basavarsu, A. [19] R. K. Mitchell, J. Mitchell, and J. B. Smith. Failing to
N. Choudhary, and S. R. Kalidindi. Exploration of data succeed: new venture failure as a moderator of startup
science techniques used to predict the strength of steel experience and startup expertise. Frontiers of
andIntegrating Materials and Manufacturing Innovation, entrepreneurship research, 2004.
3(8):1–19, 2014. [20] A. Rauch and M. Frese. Let’s put the person back into
[2] Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, entrepreneurship research: A meta-analysis on the
Lihong Li, and Robert Schapire. Taming the monster: A fast relationship between business owners ’personality traits,
and simple algorithm for contextual bandits. In International business creation, and success. European Journal of Work
Conference on Machine Learning, pages 1638–1646, 2014. and Organizational Psychology, 16(4):353–385, 2007.
[3] T. M. Begley and W.-L. Tan. The socio-cultural environment [21] R. Stuart and P. A. Abetti. Start-up ventures: Towards the
for entrepreneurship: A comparison between east asian and prediction of initial success. Journal of Business Venturing,
anglosaxon countries. Journal of international business 2(3):215–230, 1987.
studies, pages 537–553, 2001. [22] M. Van Gelderen, R. Thurik, and N. Bosma. Success and risk
[4] Alina Beygelzimer and John Langford. The offset tree for factors in the pre-startup phase. Small Business Economics,
learning with partial labels. In Proceedings of the 15th ACM 24(4):365–380, 2005.
SIGKDD international conference on Knowledge discovery [23] Joannes Vermorel and Mehryar Mohri. Multi-armed bandit
and data mining, pages 129–138. ACM, 2009. algorithms and empirical evaluation. In European conference
[5] J. Bru ̈derl, P. Preisend ö rfer, and R. Ziegler. Survival on machine learning, pages 437–448. Springer, 2005.
chances of newly founded business organizations. American
sociological review, pages 227–242, 1992.
[6] Alberto Bietti, Alekh Agarwal, and John Langford. A
contextual bandit bake-off. 2018.
[7] Breiman. L. for Random forests. Mach. Learn., 45(1):5– 32,
Oct. 2001.
[8] Giuseppe Burtini, Jason Loeppky, and Ramon Lawrence. A
survey of on- line experiment design with the stochastic
multi-armed bandit. arXiv preprint arXiv:1510.00757, 2015.
[9] Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, and Eli
Upfal. Mortal multi- armed bandits. In Advances in neural
information processing systems, pages 273–280, 2009.
[10] Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire.
Contextual bandits with linear payoff functions. In
Proceedings of the Fourteenth International Conference on
Artificial Intelligence and Statistics, pages 208–214, 2011.
[11] R. Dickinson. Business failure rate. American Journal of
Small Business, 6(2):17–25, 1981.
[12] Miroslav Dud ́ık, Dumitru Erhan, John Langford, Lihong Li,
et al. Doubly robust policy evaluation and optimization.
Statistical Science, 29(4):485–511, 2014.
[13] Dylan J Foster, Alekh Agarwal, Miroslav Dud ́ık, Haipeng
Luo, and Robert E Schapire. Practical contextual bandits
with regression oracles. arXiv preprint arXiv:1803.01088,
2018.
[14] W. B. Gartner. Who is an entrepreneur? is the wrong
question. American journal of small business, 12(4):11–32,
1988.