Performance Analysis of A Cricketer by Data Visualization
Performance Analysis of A Cricketer by Data Visualization
https://doi.org/10.22214/ijraset.2022.40176
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
Abstract: Indian Premier League is a very competitive tournament where team selection is a very tricky and tedious procedure.
Analysis of sports data and Prediction of each player’s performance helps in filtering the best players. A novel method employing
the techniques of Data Analytics and Data Visualization is used in this research paper to extract individual player performance
from huge statistics and datasets. An application is created to bridge the space between selecting team, coaches, and team
management and to give a better interpretation on player steadiness, scoring and further capabilities. In this paper, pandas
library is used for data analysis and manipulation tool, Microsoft azure is used for performance prediction and HTML, CSS,
flask for the front-end application. Additionally, various machine learning algorithms are applied on the same data to find the
best fit. The proposed application can be beneficial for team managements and decision making
Keywords: Indian Premier League, Data analytics, Data Visualization, Prediction of player’s Performance, Microsoft Azure.
I. INTRODUCTION
Sports analytics and Data Visualization has given a great platform for Player’s selection, team managers, and to boost their on-field
performance. Decision making and analysis, is the process of applying different algorithms on data to gain insights into prediction
of the future. This data is made to undergo several algorithms, tools, and visualization techniques to make way for suggestion of the
players to create the team. To build predictive models various machine learning techniques are applied.
Indian Premier League (IPL) was established in 2008. The league is based on a round-robin group and knockout format, has teams
in major Indian cities. Each team management bids for almost up to 25 players and there can be only 4 foreign players in current
playing 11 and at most 8 foreign players in total. It is difficult to find best squad for the upcoming seasons. In this paper the
application is introduced to evaluate the performance of players. This tool provides a visualization of players' performance and helps
in predicting scores. The developed model can help decision makers during the IPL matches to evaluate the strength of a team
against another.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1800
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
Batsmen with Top Strike Rate, Top 10 Players with Maximum Runs. Refining and refining of data is done by modification,
consolidation. The authors in [6] discussed how to analyse things to study the performance of cricket players and the findings of his
study say that the force of battering dominates more than bowling. Studies show that the performance of throwers is one of the most
important factors in changing the status quo. [7] described the player rating model at the IPL auction. Their model considered
factors such as previous player bid price, player information, strike rate etc. Prakash, Patvardhan and Lakshmi [8] described the
batting and bowling index to measure the performance of players in their models to predict the results of IPL matches. The
mathematical method of proposing correct strike orders for ODI games is shown in [9]. In paper [10] the authors proposed a two-
way model using the Naïve Bayes and the Linear Regression Classifier. The first way is to predict the points of the first innings
based on the current running rate, etc. The second method predicts the outcome of a given goal by a batting team. The authors in [11]
predict the performance of the fourth-season IPL batsmen using the first three seasons. A Multi-Layer perceptron (MLP) neural
network is used to predict previous activity. The outcome of a match by comparing the strength of two teams is predicted by the
performance of each player [12] measured. They used algorithms to predict the performance of batsmen and bowlers from past and
recent activity data. The so-called Combined Bowling Rate is a combination of three traditional bowling algorithms: bowling rate,
strike rate and economy used to analyse bowlers in [13].
III. IMPLEMENTATION
A. Tools and Methodology
Indian Premier League has millions of fans across seas. It is one of the largest leagues played worldwide. Around 816 matches have
been played from 2008 to 2020. We can find large amount of data on the internet which consists of all the stats of every match.
Jupyter Notebook, an open-source application and python language is used for data exploration, data extraction, feature selection.
Packages like Pandas, NumPy is used as a data analysis and manipulation tool. The analysed data is visualized using Am charts.
Player performance prediction is done using Microsoft Azure. And the front end is developed using flask, a python web framework
and is designed using html and CSS.
B. Data Collection
This section describes the datasets selected for the project. The datasets were collected from www.kaggle.com. They provide
information on all the teams played from 2008-2020. There are two datasets used, namely Matches.csv and Ball-by-Ball.csv.In
Matches.csv data set, information such as match ID, city in which the match was played, date, venue, player of the match, the two
teams that took part, winner and decision of the toss, winner of the match, results, and names of the umpires of the matches are
listed. Ball-by-Ball dataset provides details that include match id, innings, in which over which particular bowler bowled, who was
at strike and non-strike, runs scored by the batsman, total runs scored, wickets that were taken, and the names of batting team and
bowling team.
C. Pre-Processing of Data
Data pre-processing is the most essential part of a data science project. It consumes a major time dedicated to the project. Pre-
processing of data includes getting rid of erroneous data, inconsistent data, formatting the data present and to fill the missing values.
The unwanted data is removed including duplicate observations. It mainly deals with correction, standardization, and transformation
of data. This is done to make sure outcomes are reliable.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1801
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
Fig 1. depicts the dataset and its cleaning in Jupyter notebook using python. And Fig2. depicts the first five entries in the dataset
after cleaning.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1802
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
2) Analysing Powerplay vs Death Overs: The figure 2 shows player’s performance during powerplay and death overs. It gives a
brief study of how a batsman scores during powerplay and in death overs. This gives an insight on which batsman must be
selected for such overs.
3) Analysing Player vs Venus: The figure 3 shows analysis of a player in different venues. It exhibits each player performance
with respect to all the stadiums he has played in. This helps in selecting an individual depending on where the match is being
conducting.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1803
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
4) Analysing Player vs Innings: The pie chart in figure 4 depicts analysis of player in different innings. It shows how well a player
scores depending on the innings. This will help a captain to choose to bat or bowl after the toss.
5) Analysing Player vs Bowlers: The Figure 5 depicts analysis of player against different Bowlers. It shows scores of a single
player against all bowling types, such as right arm medium pace, left arm medium pace, right arm leg off, left arm leg off, leg
spinner etc.
6) Analysing Bowler against Venues: The Figure 6 shows analysis of Bowler against venues. It describes how many wickets a
bowler takes in different venues irrespective of the batsman and opponent teams.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1804
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
7) Analysing Bowler against Team: The Figure 7 depicts analysis of Bowler against teams. It shows how many wickets a bowler
takes against different teams irrespective of the venue.
The algorithm analysis as depicted in the figure provides the coefficient of determination for each model against which the data is
tested. In a regression model coefficient of determination is a statistical measure that determines the proportion of variance in the
dependent variable that can be explained by the independent variable. Using this algorithm analysis, the Boosted Decision Tree is
used as the prediction model as it provides the highest accuracy having the value of coefficient of determination very close to zero,
and it was built in the Machine Learning studio in Azure platform which is shown in figure 8. The dataset in divided into training
data and testing data and fed into the model as depicted in the figure.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1805
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
The web application also provides an option for predicting the player’s performance. This button is provided in the analysis page.
On clicking this button, the name of the player is taken implicitly. Bowling team and Venue are to be chosen from the list provided
and these three parameters are taken as inputs for prediction. The result page appears once the inputs are submitted which outputs
the predicted score of the particular batsman. The Prediction form and result is shown in separate windows which are depicted in
figures 11 and 12.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1806
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue I Jan 2022- Available at www.ijraset.com
V. CONCLUSIONS
In this proposed work, the performance analysis of cricketers in IPL from season 2008-2020 has been visualized. The project
highlights the player performance with respect to venue, innings, death overs, powerplay overs, and type of bowlers. For selecting
best player for particular match against team and venue, an accurate prediction of batsman runs prior to the commencement will
help the team management in selecting the best players for each match. Depending on the stats and characteristics we have modelled
batting and bowling datasets. The best fit algorithm is found out for the dataset and the performance of the player is predicted using
Microsoft Azure
REFERENCES
[1] Vidit Kanungo, Tulasi B., “Data visualization and toss related analysis of IPL teams and batsmen performances”, International Journal of Electrical and
Computer Engineering, Vol. 9, No. 5, October 2019.
[2] Shubhra Singh, Parmeet Kaur, “IPL Visualization and Prediction Using Hbase”, Information Technology and Quantitative Management, 2017.
[3] S. Sharuka, R.Vani,” Insights on IPL Team Performance using Visual Analytics”, International Journal of Engineering Sciences & Research Technology,
November, 2019.
[4] Kasukruti Raviteja, Ganesh Kumar Macha, Dr. GR Anantharaman, “Predicting and Analyzing the Performance of the IPL Cricket Using Regression Models”,
Complexity International Journal, Volume 23, Issue 03, Dec 2019.
[5] Kalpdrum Passi and Niravkumar Pandey, “Predicting players performance in one day international cricket matches using Machine Learning”, 8th International
Conference on Computer Science, Engineering and Applications, February 2018.
[6] Sricharan Shah, et al., “A Study on Performance of Cricket Players using Factor Analysis Approach,” International Journal of Advanced Re-search in
Computer Science, vol. 8, no. 3, 2017.
[7] D. Parker, P. Burns and H. Natarajan, "Player valuations in the Indian Premier League", Frontier Economics, vol. 116, October 2008.
[8] C. D. Prakash, C. Patvardhan and C. V. Lakshmi, "Data Analytics based Deep Mayo Predictor for IPL-9," International Journal of Computer Applications, vol.
152, no. 6, pp. 6-10, October 2016.
[9] M. Ovens and B. Bukiet, "A Mathematical Modelling Approach to One-Day Cricket Batting Orders," Journal of Sports Science and Medicine, vol. 5, pp. 49 5-
502, 15 December 2006.
[10] Tijender Singh, et al., “Score and Winning Prediction in Cricket through Data Mining,” International Conference on Soft Computing Techniques and
Implementations- (ICSCTI), Oct 2015
[11] Hemanta Saikia and Dibyojyoti Bhattacharjee, “An application of multilayer perceptron neural network to predict the performance of batsmen in Indian premier
league,” International Journal of Research in Science and Technology, vol. 1, no. 1, 2014.
[12] M. G. Jhanwar and V. Pudi, "Predicting the Outcome of ODI Cricket Matches: A Team Composition Based Approach," European Conference on Machine
Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), 2016.
[13] H. H. Lemmer, "The combined bowling rate as a measure of bowling performance in cricket," South African Journal for Research in Sport, Physical Education
and Recreation, vol. 24, no. 2, pp 37-44, Jan 2002.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1807