Paper3 TeamselectionusingRandomForestAlgorithm
Paper3 TeamselectionusingRandomForestAlgorithm
net/publication/301335652
CITATIONS READS
5 4,131
3 authors, including:
All content following this page was uploaded by Chellapilla Deep Prakash on 07 February 2018.
The use of analytical methods is very useful in cricket. The rest of the paper is organized as follows. In section 2, the
Batting, bowling and fielding are the three main departments statistics of previous IPL matches are examined to find the
of the game. There is a huge demand for cricket related changing scenarios and trends. In section 3, the relative
statistical studies because of the popularity of the game and importance of the factors that define batting and bowling
the staggering amounts of money involved. These statistics performances is determined using machine learning based
give clear picture of the performance of various players. approach and a composite performance index is defined. The
Followers of the game, especially in India, are keen followers top batsmen and bowlers are identified according to these
of its statistics also. indices. In section 4, a heuristic that attempts to maximize the
batting and bowling performance of the team for selecting the
Some studies related to cricket reported in the literature are as playing eleven is proposed. Some conclusions and insights
follows. Optimal batting strategies using dynamic from this analysis are presented in section 5.
42
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
2. CHANGING SCENARIOS AND retained, the model is refit and performance is assessed. The
value of N with the best performance is determined and the
TRENDS IN IPL CRICKET top N predictors are used to fit the final model. The predictor
Tables 1 to 4 show the average number of runs scored and rankings are recomputed on the model on the reduced feature
wickets fallen in head to head clashes of each IPL item set. Resampling methods (e.g. cross-validation, the bootstrap)
against every other team. If the statistics of seventh season are used to reduce variability caused by feature selection
and eighth season are compared it is evident that in eighth when calculating performance. The steps in the algorithm are
season both the batsmen and bowlers have performed better encapsulated inside an outer layer of 10-fold cross-validation
than in the seventh season as the average number of runs to ensure better robustness of results and provide better
scored per innings have increased and so have the number of estimates of performance. However, this makes the algorithm
wickets that have fallen. This reflects that both batsmen and compute intensive. A consensus ranking is used to finally
bowlers are evolving with more exposure to T20 cricket and determine the best predictors to retain.
understanding the game better. The same trend is going to
continue and more mature batting and bowling performances The features obtained in this manner and their corresponding
are expected to be seen in the ninth season. weights are shown in Tables 8 (batting) and Table 9
(bowling). The features and their weightages are different for
This implies that the batting attributes and bowling attributes different categories and are according to the requirements of
are and their relative importance would also change as the their respective roles. This clearly demonstrates the efficacy
cricketers evolve. The same set of attributes with the same of the proposed approach.
relative importance cannot be continued with for performance
evaluation of batsmen and bowlers in T20 cricket. A careful observation of the features of various categories
clearly highlights the fact that consistent players are preferred
3. PERFORMANCE INDICES AND no matter in which category they belong to. A typical T20
RANKING OF BATSMEN AND approach can be seen in these features as for an opener the
fast scoring and hard hitting capability are the prominent
BOWLERS IN IPL 9 features as the role of openers is to maintain the strike rate as
The first step is to identify the factors and their weightages for well as to take the full advantage of the batting power play.
creating a Performance Index for ranking the batting and For middle order batsmen, running between the wickets and
bowling performances. Deep Prakash et al. [16] develop a fast scoring are the prominent features. This is due to the fact
methodology called Deep Performance Index (DPI) in which that the team wants to save wickets during the middle overs
five parameters for batsmen and five parameters for bowlers and has to also maintain the strike rate. For Finishers hard
are identified for ranking the performances upto season VII. hitting and fast scoring capabilities are prominent according to
Deep Prakash et al. [17] present a category based Deep their desired role in the last few overs. In inexperienced
Performance Index for ranking players in different categories. batsmen their fast scoring and running are important as most
This approach is extended to calculate the attributes for each of the teams will only consider them in the lower middle order
category of players and the corresponding DPI for each player and not risk them at top of the order.
in this category for season IX. The details are as follows.
Among bowlers the most prominent feature which is coming
To assess Batting performance five indices are identified as is their short performance capability. Now the captains have
given in table 5. Similarly in order to assess Bowling understood how to utilize their key bowlers in short spells so
performance five indices are identified as given in Table 6. in a spell if the bowler takes one or two wickets then he would
The performance data for all the cricketers in IPL 9 in the be a very good asset for his captain. As we see most of the
previous IPLs and their performance data in all T20 matches pacers are used during the initial power play overs or the late
are collected. Their MVPI (Most Valuable Player Index) death overs, this the reason why their economy and wicket
values are also computed. Recursive Feature Elimination taking ability are coming prominent. In spinners their
algorithm is then utilized to get the important features among consistency is very important as they have to maintain their
the 10 features and their weights reflecting their relative economy and couple of wickets as well. Pace allrounders are
importance. usually preferred by most of the teams and this season there is
dominance of these players. Performance statistics show that,
Recursive Features Elimination using the Random Forests on their day, they have the capability to take 4 or 5 wickets
Algorithm [18] works as follows. In addition to constructing which is very important for the team. Spin all-rounders are
each tree using a different bootstrap sample of data, random usually preferred when the key bowlers are not striking and
forests change how the classification or regression trees are the opponent team is scoring at very fast rate, so they have to
constructed. In standard trees, each node is split using the best act like a partnership breaker. Inexperienced bowlers have all
split among all variables. In a random forest, each node is the capabilities otherwise it is very unlikely that they will get
split using the best among a subset of predictors randomly a chance to play.
chosen at that node. This strategy has been shown to perform
better than many other classifiers, including discriminant Using these weightages and the clustering, for each player his
analysis, support vector machines and neural networks, and is category based DPI has been calculated. The top Batsmen and
robust against overfitting [18]. It is very user-friendly because Bowlers in each category are shown in tables 10 and 11.
it has only two parameters (the number of variables in the
random subset at each node and the number of trees in the 4. HEURISTIC FOR SELECTING
forest) and is usually not very sensitive to their values. PLAYING ELEVEN FOR A MATCH
When a team of around 20-28 players is bought, the next
The algorithm performs Recursive Feature Elimination (RFE). question is to choose the best 11 for a match who fit into the
In this approach, the algorithm fits the model to all predictors needs of the team. Every team will have some strategy in
which are the indices in the current work. Each predictor is mind before selecting their playing eleven. Selecting playing
ranked according to its importance to the model. At each eleven by using the Ranking Methodology (DPI) and player
iteration of feature selection, the N top ranked predictors are
43
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
clustering methods described in detail in the previous sections and also put big totals on board, at least on paper. However,
can be applied with some heuristics to obtain the best playing bowling appears to be the weak link in the side and could be
eleven for a team. the issue of concern. RPS is another new team introduced this
season lead by MS Dhoni. Team composition looks like
Some constraints need to be kept in mind when selecting the another CSK with exactly the same strategy and type of
playing 11. players. With the astute MS Dhoni leading from the front one
can always presume that this team will be a handful once
1 Captain should be there in the team (Playing 11 ) again. SRH is a team which has learnt from past mistakes and
1 Wicket Keeper should be there in the team made a team which their fans hope can win the title. Overall
(playing 11) there is no problem with either the batting or the bowling and
2 Openers should be there in the team (Position 1 the side is balanced.
and 2)
3 Middle Order Players should be there in the team In continuation of this work it is proposed to create a multi-
(Position 3, 4, 5) objective optimization model using Genetic Algorithms for
2 Finishers should be there in the team (Position 6 maximizing the batting and bowling strengths simultaneously
and 7) within the constraints imposed by the team selection rules in
At least 1 Spinner should be there in the team IPL 9.
At least 2 Pacers should be there in the team
6. REFERENCES
The next best in ranking among available spinners
[1] Indian Premier League,
and pacers would complete the playing 11.
https://en.wikipedia.org/wiki/Indian_Premier_League
Additional considerations are as follows. [2] Clarke, S R, “Dynamic programming in one day cricket -
optimal scoring rates,” Journal of the Operational
If the ranking of an inexperienced batsman is > 0.5 Research Society, 50, 1988, pp 536 – 545.
and the next available highest ranking experienced
batsman is < 0.5 then, the inexperienced batsman [3] Kimber, A C and Hansford, A R, “A Statistical Analysis
will be taken into playing 11. of Batting in Cricket,” Journal of Royal Statistical
If the ranking of an inexperienced bowler is > 0.5 Society, 156, 1993, pp 443 – 455.
and the next available highest ranking experienced [4] Damodaran, U, “Stochastic Dominance and Analysis of
bowler is < 0.5 then, the inexperienced bowler will ODI Batting Performance: The Indian Cricket Team,
be taken into playing 11. 1989-2005,” Journal of Sports Science and Medicine, 5,
A team should not have more than 4 foreign players. 2006, pp 503 – 508.
If the difference between a foreign player‟s rank and
an Indian player‟s rank in one cluster is less than the [5] Barr, G. D. I., and Kantor, B.S., “A Criterion for
same difference in any other cluster, in case where Comparing and Selecting Batsmen in Limited Overs
there are more foreign players in the team, then the Cricket,” Journal of the Operational Research Society,
foreign player where the difference is large would 55, 2004, pp 1266-1274.
be selected and the one where the difference is
[6] Borooah, V. K., and Mangan, J E, “The „Bradman
smaller would be dropped.
Class‟: An Exploration of Some Issues in the Evaluation
Using these constraints, the playing 11s of the teams are of Batsmen for Test Matches, 1877–2006.”, Journal of
determined as given in Table 12 along with their batting and Quantitative Analysis in Sports, 6 (3), Article 14, 2010.
bowling DPIs. Overseas players are shown in bold.
[7] Norman, J and Clarke, S R, “Dynamic programming in
The Comparison of batting and bowling strengths of each cricket: Batting on sticky wicket,” Proceedings of the 7th
team is shown in Figure 1. Australasian Conference on Mathematics and Computers
in Sport, 2004, pp 226 – 232.
5. CONCLUSIONS AND FUTURE [8] Ovens, M and Bukeit, B, “A mathematical modeling
WORK approach to one day cricket batting orders,” Journal of
The strategy for designing the performance indices for various Sports Science and Medicine, 5, 2006, pp 495-502.
categories of players and the heuristic for team selection has
provided valuable insights into the team selection process and [9] Lewis, A., “Extending the Range of Player-Performance
the teams that have been formed appear to be the most Measures in One-Day Cricket,” Journal of Operational
reasonable given the choices available. Team DD could be the Research Society, 59, 2008, pp 729-742.
surprise package of this tournament without any big guns in [10] Van Staden, P., “Comparison of Cricketers' Bowling and
their playing 11. GLR is a new team introduced in season IX Batting Performance using Graphical Displays,” Current
and they still have to prove their worth. The team is led by Science, 96, 2009, pp 764-766.
Suresh Raina who is one of the best Indian T20 batsmen in the
middle order. KKR has almost the same team that played last [11] Lakkaraju, P., and Sethi, S., “Correlating the Analysis of
year with only a couple of key changes which could prove Opinionated Texts Using SAS® Text Analytics with
decisive. Kings XI Punjab have formed a team which can Application of Sabermetrics to Cricket Statistics,”
prove to be the best. They have a well- balanced and potent Proceedings of SAS Global Forum 2012, 136-2012, pp
bowling attack. Last season‟s winning team MI is promising 1-10.
this season too. However, with too many big guns in their
[12] Lemmer, H., “A Measure for the Batting performance of
team they have a problem of plenty and they have many
Cricket Players,” South African Journal for Research in
options in choosing their playing 11. RCB is one of the most
Sport, Physical Education and Recreation, 26, 2004, pp
promising teams this season with the best top order batsmen
55-64.
in the world in their lineup. A team that can chase any score
44
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
[13] Lemmer, H., “An Analysis of Players' Performances in [16] C. Deep Prakash, C.Patvardhan and Sushobhit Singh, “A
the First Cricket Twenty20 World Cup Series,” South new Machine Learning based Deep Performance Index
African Journal for Research in Sport, Physical for Ranking IPL T20 Cricketers”, International Journal
Education and Recreation 30, 2008, pp 71-77. of Computer Applications (0975 – 8887) Volume 137 –
No.10, March 2016
[14] Lemmer, H., “The Single Match Approach to Strike Rate
Adjustments in Batting Performance Measures in [17] C. Deep Prakash, C.Patvardhan and Sushobhit Singh,” A
Cricket,” Journal of Sports Science and Medicine, 10, new Category based Deep Performance Index using
2012, pp 630-634. Machine Learning for ranking IPL Cricketers”,
International Journal of Electronics, Electrical and
[15] Saikia, Hemanta and Bhattacharjee Dibojyoti, “A Computational System IJEECS ISSN 2348-117X
Bayesian Classification Model for Predicting the Volume 5, Issue 2 February 2016
Performance of All-Rounders in the Indian Premier
League, [18] Leo Breiman. Random forests. Machine Learning, 45(1):
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1622 5–32, 2001
0 60.
7. APPENDIX
Table 1: Batting Averages of teams in season 7 against each other
Team / CSK DD KKR KXIP MI RCB RR SRH
Team
CSK - 179 151 198 159.33 149 144.5 165
DD 131 - 163.5 139.5 142 157.5 145.5 161.5
KKR 135 163.5 - 155.5 152.5 172.5 156 153.5
KXIP 221 139.5 153.75 - 162 162.5 186 202
MI 157 149 131.5 164.5 - 151 186.5 158.5
RCB 148 166 156.5 145 142 - 130 159
RR 140.5 178.5 161 177 171 131 - 118.5
SRH 167 114 151 163 164.5 158 133.5 -
45
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
Short Performance Score (Number of wickets taken – 4* Number of times four wickets – 5* Number of times five
wickets taken) / (Number of innings played - Number of times four wickets or five wickets
taken)
Table 8: Selected Batting Indices for various Categories and their calculated weightages for season
Sl.No Category Indices and Corresponding Weights
1 Opener T20_Consistency(0.2638), IPL_Consistency(0.2027), T20_FScore (0.1865),
IPL_FastScorer(0.1856), IPL_HHScore(0.1612)
2 Middle Order T20_Consistency (0.3389), IPL_Consistency (0.2923), T20_RBWIndex (0.2040),
IPL_FastScorer (0.1646)
3 Finisher IPL_Consistency(0.6756), T20_Consistency(0.2204), IPL_HHScore (0.0752), IPL_FastScorer
(0.0286)
4 Inexperienced T20_Consistency(0.4627), T20_RBWIndex(0.3326), IPL_FastScorer (0.0929),
T20_Fscore(0.0540), IPL_RBWIndex(0.0234)
Table 9: Selected Bowling Indices for various Categories and their calculated weightages for season 9
Sl.No Category Indices and Corresponding Weights
1 Pacer IPL_ShortPerformance (0.266), IPL_Economy (0.237), IPL_Consistency (0.2069), IPL_WicketTaker
(0.1746), IPL_BigWicketTaking (0.1144)
2 Spinner IPL_ShortPerformance (0.311), IPL_Consistency (0.211), IPL_WicketTaker (0.189), IPL_Economy,
IPL_BigWicketTaking (0.126)
3 Pace T20_ShortPerformance(0.2938), IPL_BigWicketTaking (0.2544), T20_Consistency (0.2367),
Allrounder IPL_ShortPerformance (0.2153)
4 Spin IPL_ShortPerformance(0.4902), T20_ShortPerformance(0.2487), T20_Economy (0.2405),
Allrounder T20_Consistency(0.020)
5 Inexperienced T20_WicketTaker (0.3413), T20_Consistency (0.2751), T20_ShortPerformance (0.2704),
T20_Economy (0.0925)
46
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
Table 10: Top ten Batsmen in each category in season 9 and their corresponding DPI
Sl.No Category Batsmen and Corresponding DPI
1 Opener C.Gayle [0.984], S.Marsh [0.886], S.Watson [0.802], D.Warner [0.794], B.McCullum
[0.76],Q.De Kock [0.732], L.Simmons [0.682], A.Finch [0.654], R.Uthappa [0.616],
S.Dhawan [0.542]
2 Midddle Order D.Miller [0.947], MS Dhoni [0.896], K.Polllard [0.843], S.Raina [0.839] ,AB De Villiers
[0.827], K.Peterson [0.818], JP.Duminy [0.806], R.Sharma [0.737], Y.Pathan [0.726],
V.Kohli [0.7151]
3 Finisher A.Russell [0.959], A.Morkel [0.923] , J.Faulkner [0.807], I.Pathan [0.801], R.Jadeja [0.798],
M.Marsh [0.763], S.Binny [0.675], T.Perera [0.606], A.Reddy [0.546], GS.Mann [0.534]
4 Inexperienced N.Rana [0.888], K.Pandya [0.8], S.Khan [0.752], J.Sharma [0.728], B.Aparajit [0.642],
D.Hooda [0.587], U.Sharma [0.582], D.Punia [0.570], A.Nath [0.505], P.Sahu [0.496]
Table 11: Top ten Bowlers in each category in season 9 and their corresponding DPI
S.No Category Bowler and Corresponding DPI
1 Pacer L.Malinga [0.986], M.Starc [0.920], N.Coulter-Nile [0.911], S.Sharma [0.90], M.Sharma
[0.849], M.McClenaghan [0.842], A.Nehra [0.838], B.Kumar [0.794], M.Johnson [0.793],
RP.Singh [0.769]
2 Spinner S.Narine [0.925], I.Tahir [0.904], B.Hogg [0.872], A.Mishra [0.841], Y.Chahal [0.809],
R.Ashwin [0.790], A.Patel [0.740], I.Abdulla [0.741], S.Jakati [0.682], P.Tambe [0.640]
3 Pace Allrounder C.Morris [0.908], J.Faulkner [0.905], D.Wiese [0.833], D.Bravo [0.830], R.Vinay Kumar
[0.760], A.Reddy [0.734], T.Perera [0.634], I,Pathan [0.668], A.Morkel [0.633], K.Pollard
[0.626]
4 Spin Allrounder S.A.Hasan [0.985], H.Singh [0.931], P.Chawala [0.846], K.Sharma [0.811], P.Negi [0.698],
Y.Singh [0.640], R.Jadeja [0.633], Y.Pathan [0.609], F.Du Plesis [0.582], JP.Duminy
[0.499]
5 Inexperienced M.Ashwin [0.936], M.Stoinis [0.885], KC Cariappa [0.859], J.Shah [0.853], M.Singh
[0.839], B.Aparajit [0.704], S.Gopal [0.688], A.Nath [0.684], D.Punia [0.626], S.Lad [0.568]
Table 12: Playing eleven of each team according to the heuristics approach
Team Player[Batting,Bowling]
Delhi S.Iyer [0.5361,0], Q.De Kock [0.743,0], K.Nair [0.299,0], S.Samson [0.403, 0], JP.Duminy [0.806,0.499],
Daredevils C.Morris [0.216,0.908], P.Negi [0,0.6984], N.Coulter Nile [0,0.911], A.Mishra [0,0.841], M.Shami [0,0.439],
Z.Khan [0,0.748]
Gujarat B.McCullum [0.760,0], A.Finch [0.505,0.684], S.Raina [0.839,0.454], D.Karthik [0.346,0], D.Bravo
Lions [0.561,0.830], J.Faulkner [0.807,0.905], R.Jadeja [0.798,0.635], D.Kulkarni [0,0.769], P.Sangwan [0,0.570],
Rajkot P.Tambe [0,0.64], S.Jakati [0,0.6825]
Kolkata R.Uthappa [0.616,0], G.Gambhir [0.449,0], M.Pandey [0.388,0.507], C.Lynn [0.506,0.306], Y.Pathan
Knight [0.7262,0.6099], A.Russell [0.959,0.321], Suryakumar Yadav [0.367,0.488 ],P.Chawala [0.403,0.846], M.Morkel
Riders [0,0.767], S.Narine [0,0.925], U.Yadav [0,0.584]
Kings XI S.Marsh [0.886,0], M.Singh [0.459,0.839], D.Miller [0.947,0], W.Saha [0.531,0], G.Maxwell [0.574,0.235], GS
Punjab Mann [0.534,0.383], R.Dhawan [0.377,0.339], M.Johnson [0,0.793], A.Patel [0,0.743], M.Sharma [0,0.849],
S.Sharma [0,0.90]
Mumbai Parthiv Patel [ 0.217,0 ], R.Sharma [0.737,0.24], A.Rayudu [0.444,0], C.Anderson[0.594,0.195 ],J.Buttler
Indians [0.407,0 ], K.Pollard [0.843,0.626], H.Pandya [0.432, 0.221], H.Singh [0.517,0.986], J.Bumrah [0,0.405],
M.McClenghan [0,0.848], J.Suchith [0,0.583]
Royal C.Gayle [0.984,0.433], S.Watson [0.802,0.580], V.Kohli [0.715,0.148], AB De Villiers [0.827,0], K.Jadhav
Challengers [0.422,0], S.Binny [0.675,0.2591], S.Khan [0.752,0], M.Starc [0,0.92], Y.Chahal [0,0.809], H.Patel [0,0.676],
Bangalore V.Aron [0,0.634]
Rising A.Rahane [0.503,0], F.Du Plesis [0.533,0.582], K.Peterson [0.818,0], S.Tiwary [0.609,0], MS.Dhoni [0.896,0],
Pune M.Marsh [0.763,0.475], A.Morkel [0.923,0.633], R.Ashwin [0,0.79], RP Singh [0,0.769], M.Ashwin [0,0.936],
Supergiants A.Dinda [0,0.606]
Sunrisers S.Dhawan [0.542,0], D.Warner [0.794,0],K.Williamson [0.264,0.384], Y.Singh [0.604,0.640], M.Henriques
Hyderabad [0.558,0.553], A.Tare [0.419,0], A.Reddy [0.546,0.734], K.Sharma [0.529,0.811], B.Kumar [0,0.794], A.Nehra
[0,0.838], T.Boult [ 0,0.739]
47
International Journal of Computer Applications (0975 – 8887)
Volume 139 – No.12, April 2016
5.5359 5.493
5.3539
5.081 5.177
5.044 5.045
4.791
4.616
4.459
4.308 4.191 4.104 4.256
3.688
3.003
1 2 3 4 5 6 7 8
IJCATM : www.ijcaonline.org 48