0% found this document useful (0 votes)
42 views

Cricket Analytics and Predictor

This document discusses using machine learning techniques to predict the outcome of cricket matches, specifically matches in the Indian Premier League (IPL). It outlines previous research on cricket analytics and predictive modeling. It then proposes using a supervised learning approach to model individual player statistics and performance to predict the likelihood of different teams winning an IPL match.

Uploaded by

Puneet Choudhary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Cricket Analytics and Predictor

This document discusses using machine learning techniques to predict the outcome of cricket matches, specifically matches in the Indian Premier League (IPL). It outlines previous research on cricket analytics and predictive modeling. It then proposes using a supervised learning approach to model individual player statistics and performance to predict the likelihood of different teams winning an IPL match.

Uploaded by

Puneet Choudhary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Cricket Analytics and Predictor

Mr. Suyash Mahajan Ms. Gunjan Kandhari


Student, Department of Information Technology, Student, Department of Information Technology,
Walchand Institute of Technology, Solapur, India Walchand Institute of Technology, Solapur, India

Ms. Salma Shaikh Ms. Rutuja Pawar


Student, Department of Information Technology, Student, Department of Information Technology,
Walchand Institute of Technology, Solapur, India Walchand Institute of Technology, Solapur, India

Mr. Jash Vora Ms. A. R. Deshpande


Student, Department of Information Technology, Assistant Professor, Department of Information
Walchand Institute of Technology, Solapur, India Technology.
Walchand Institute of Technology, Solapur, India.
[email protected]

INTRODUCTION
Abstract—Cricket is one the most watched sport now-a- Sports analytics play a major role in various problems
days. Winning in cricket depends on various factors like associated with sport. Some of these problems are the
home ground advantage, performances in the past matches, ranking of individual players and their specialized skills, the
experience of the players, performance at the specific venue, composition of teams with an optimal balance of specialized
performance against the specific team and the current form skills, the ranking of teams, the negotiation of contracts,
of the team and the player. In the recent past, their potential revenue streams, the planning of both physical
a lot of research has been done which measures the player‘s and mental training, the development of strategies for
performance and predicts the winning percentage. This winning games and tournaments, assessing the effectiveness
article briefs about the factors that cricket game depends on of coaches and referees, the medical aspects of sports
and discusses various researches which predicted the injuries (health and insurance), the analysis and
winning of a team with an advent of statistical modeling in improvisation of rules, the quality of equipment and
sports. Cricket is one of the most popular team games in the technology, the determination of awards, historical records
world. With this article, we embark on predicting the and the generation of odds for gambling activities.
outcome of Indian Premier League (IPL) cricket match using Related to above information, the coherent statistical
a supervised learning approach from a team composition presentation of both raw data and its inference to the
perspective. Our work suggests that the relative team decision makers is to facilitate successful planning and
strength between the competing teams forms a distinctive implementation. Furthermore, the media and the public have
feature for predicting the winner. Modeling the team strength a great appetite for well visualized statistics. New
boils down to modeling individual player‘s batting and opportunities for sports analytics have arisen due to the
bowling performances, forming the basis of our approach. advent and availability of detailed and high quality data. For
We use statistics and recent performance of a player to example, in Major League Baseball (MLB), the systems
model him. Player independent factors have also been have provided comprehensive data on pitching and fielding.
considered in order to predict the outcome of a match. These systems record every play while also tracking the
Machine learning is used in predicting the outcome of a exact movements of all players on the field. Using these data
cricket match before and during a match. sources, we can make very useful prediction, and various
Statistics for improvement purposes.
Today‘s level of sports analytics has evolved where both the
technology which provides data, and the statistical
methodologies which provide the tools for analyzing data,
improved very rapidly. Though sports analytics has been
rapidly developing, it has not been the case with cricket. Due
to historical reasons where cricket was perceived as a
leisurely gentleman‘s game played without remuneration to
players (until recently), cricket was not subject to large
financial transactions.
This has changed in the last few years with the introduction
of shorter formats of the game. The shortest and newest
format, known as T20, generates intense interest and vast

Cricket Predictor and Analytics Page 1


sums of money, especially in the Indian subcontinent. The 4. Repeat above step until convergence is achieved, that is
demand for cricket analytics has increased accordingly and until a pass through the training sample causes no new
the main website for cricket information and data is assignments.
cricinfo.com. Cricket is a sport that originated in England in
the 16th century and later spread to her colonies. The first [3]In this paper, they have featured the various types of
international game however did not feature England but was possibilities for big data analysis in various fields
played between Canada and the United States in 1844 at the (Programming Language, Statistical Solutions, and
grounds of the St George‘s Cricket Club in New York. In Visualization Tools), also, endeavor to recognize which one
time, in both of these countries, cricket took a back seat to of them is more prominent to use than others, and they
other, faster sports like ice-hockey, basketball, and baseball. discovered that R is a normal programming language to use
International cricket is played today by a number of British for data scientists. SPSS is great as statistical apparatus for
Commonwealth countries; the main ones being Australia, non-analysts clients, and Tableau Public is best suitable
Bangladesh, England, India, New Zealand, Pakistan, South visualization instrument to introduce data and break down it
Africa, Sri Lanka, West Indies and Zimbabwe. These teams in graphical path, yet for web visualization reason D3 will be
are members of the International Cricket Council (ICC). A the best decision.
second rung of international teams called Associates
includes numerous countries including Canada. Statistical
modeling has been used in sports since decades and has [4]This paper presents the Usage of the Duckworth-Lewis
contributed significantly to the success on field. Various technique to decide assets staying, toward the end of each
natural factors affecting the game, enormous media finished over; the anticipated run aggregate of the batting
coverage, and a huge betting market have given strong group could be refreshed to give a more precise expectation
incentives to model the game from various perspectives. of the match result. Finally, it was discovered that the
However, the complex rules governing the game, the ability triumphant probabilities were allocated to the contending
of players and their performance on a given day, and groups in ODI matches. With the utilization of D-L
various other natural parameters play an integral role in approach, this procedure can be promptly adjusted to deliver
affecting the final outcome of a cricket match. This presents 'in the run' forecasts.
significant challenges in predicting the accurate results of a
game. The game of cricket is played in three formats - Test [5]This paper introduces a model that has three segments
Matches, ODIs and T20s. We focus our research on IPL. To which focuses on diverse contemplations developing out of a
predict the outcome of IPL cricket matches, we propose an more profound examination of T20 cricket. The models are
approach where we first estimate the batting and bowling made utilizing Data Analytics strategies from machine
potentials of the 22 players using their career statistics and learning area. In this work 5 highlights of IPL vocation and 5
active participation in recent games. highlights of International T20 Career have been thought
about for both batsmen and bowlers yet in future work more
highlights can be made and considered.
I. RELATEDWORK
[6]In this paper they have clarified the instruments of
[1]In this paper, a methodology for identifying promising different techniques utilized for resetting target scores that
batting orders in one-day cricket was presented. In particular, are interfered with one-day cricket matches. Each of these
they suggested some batting orders that have never been tried strategies yields a reasonable focus in a few circumstances.
by the Indian team and contradict prevailing wisdom. As a None has demonstrated palatable in inferring a reasonable
byproduct of investigation, a simulation procedure was focus under all conditions. We have introduced a strategy
developed for generating first innings runs against an average which gives a reasonable reconsidered target score under all
opponent. The simulation procedure was based on estimates conditions.
from a Bayesian log-linear model. Finally, methods were A two-factor relationship has been determined which gives
developed with the intention of finding optimal or nearly the normal numbers of runs which might be scored from any
optimal batting orders at the start of a team‘s innings. mix of these two assets and henceforth have inferred a table
of extents of an innings for any such blend. This empowers
[2]In this paper, two methodologies have been used. the extent of the assets of the innings of which the batting
MySQL database is used for storing data whereas Java for groups are denied when overs are lost because of the
the GUI. The algorithm used is Clustering Algorithm for stoppage in the play to be computed essentially and
prediction. The steps followed are as- subsequently a reasonable revision to the objective score to
1. Begin with a decision on the value of k being the number be made. The parameters of relationship may change, for
of clusters. example, change in principles or conceivably changes in
2. Put any initial partition that classifies the data into k group choice and playing procedure.
clusters.
3. Take every sample in the sequence; compute its distance [7]In this paper, they have discussed about the Duckworth-
from centroid of each of the clusters. If sample is not in Lewis technique for target forecast in the session of cricket
the cluster with the closest centroid currently, switch this and clarified traps in the strategy. During analytics they
sample to that cluster and update the centroid of the utilized Correlation based subset assessment technique. As
cluster accepting the new sample and the cluster losing opposed to the conviction of the Duckworth/Lewis strategy,
the sample. the setting of the diversion and strategic overs influences the
expectation. In spite of the inadequate idea of the dataset,

Cricket Predictor and Analytics Page 2


relapse calculations and closest neighbor calculation gave variables have additionally been considered with a specific
the approximately correct results. Thus a half and half end goal to anticipate the result of a match. It was
approach of utilizing quadratic relapse display with KNN as demonstrated that the K Nearest Neighbor (KNN)
a smoothening capacity was utilized as an indicator and in calculation yields better outcomes when contrasted with
addition the Duckworth/Lewis technique of having 1/1000 different classifier. That is, the paper tends to the issue of
of the data was considered. At last the idea of expectation anticipating the result of an ODI cricket match utilizing the
with energy of the amusement as a component was measurements of 366 matches. The oddity of this approach
presented. lies in tending to the issue as a dynamic one and utilizing in
taking part of players as key component in anticipating the
[8]This paper focuses on the execution of players as what prediction of the match.
number of runs will every batsman score and what number
of wicket will every bowler take for both the groups. Both [12]In this paper, they focus on anticipating the best
the issues are focused as grouping issues where number of appropriate Team to be lined for a specific match. We
runs and number of wickets are ordered in various reaches. propose statistical displaying way to deal with the ideal
The utilization of Naïve Bayes, random forecast, multiclass players for the match to be played. This work recommends
SVM and choice tree classifiers produce the expectation that the relative group quality between the contending
models for both the issues were made. Random Forest groups frames an unmistakable component for foreseeing
classifier was observed to be the most precise for both the the victory. Demonstrating the group quality comes down
issues. Four multiclass grouping calculations were utilized to displaying singular player batting and rocking the
and thought about. Random Forest ended up being the most bowling alley exhibitions, framing the premise of approach
precise classifier for both the datasets with an exactness of utilized. Vocation insights and also the ongoing exhibitions
90.74% for foreseeing runs scored by a batsman and of a player have been utilized to demonstrate. Player free
92.25% for anticipating wickets taken by a bowler. factors have additionally been considered keeping in mind
Consequences of SVM were amazing as it accomplished an the end goal to foresee the result of a match. Exploratory
exactness of only 51.45% for foreseeing runs and 68.78% investigation was performed utilizing Hadoop and Hive for
for anticipating wickets. Indian players. Results show up to 91% exactness when
contrasted with the genuine outcomes accessible over web.
[9]The paper displays a data visualization and prediction Finally, Making strategies of order of the batting innings
device in which an open source, circulated, and non-social or the bowling order can be sorted with these scores.
database, H-Base is used to keep the data identified with
IPL (Indian Premier League) cricket matches and players. [13]In this Research paper it is intended to distinguish the
This data is then utilized for picturing the past execution of variables which assume a key part in anticipating the result
players' execution. Moreover, the data is utilized to of an ODI cricket match and furthermore decide the
anticipate the result of a match through different machine exactness of the prediction made utilizing the method of
learning approaches. The proposed instrument can data mining. In this examination, statistical hugeness for
demonstrate the group administrations in the player barters different factors which could clarify the result of an ODI
for choosing the correct group. Finally, it was concluded cricket match is investigated. Home field advantage,
that the novelty of the proposed approach lies in addressing winning the hurl, approach (batting first or handling first),
the problem as a dynamic one and using a suitable non- match write (day or day and night), contending group,
relational database, H-Base for scalability of application. setting commonality and season in which the match is
Out of all the machine learning algorithms used, KNN has played will be key highlights considered for the
been observed to be the most accurate. examination. For motivations behind model building, three
calculations are focused: Logistic Regression, Support
[10]This Paper specifies the various factors that affect the Vector Machine and Naïve Bayes. Logistic regression is
game, winning in Cricket relies upon different variables connected to data as of now acquired from beforehand
like home group advantage, exhibitions before, involvement played matches to distinguish which includes independently
in the match, execution at the particular setting, execution or in a mix with different highlights assumed to be a part in
against the particular group and the present type of the the prediction. SVM and Naïve Bayes Classifier are utilized
group and the player. Amid the previous couple of years for display preparing and prescient examination. Graphical
part of work and research papers have been distributed portrayal and Perplexity frameworks are used to examine
which measure the player execution and their triumphant the models. An offering situation is likewise considered to
predictions. This article briefs about the variables that clarify the choices that can be taken after the model has
cricket diversion relies upon and focuses on couple of other been constructed. Impact of this choice on the cost and
research papers that anticipated the cricket wining. result of the model is additionally examined.

[11]With statistical displaying in sports, foreseeing the [14]The paper tends to the issue of foreseeing the after
result of an amusement has been built up as a central issue. effect of an ODI cricket match using the bits of knowledge
Cricket is a standout amongst the most prevalent group of 5000 matches. The interest of this approach lies in
recreations on the plane. It is observed that the relative tending to the issue as a dynamic one, and using the
group quality between the contending groups, frames an consequences of the past matches as the key component in
unmistakable component for foreseeing the victor. The foreseeing the prediction of the match. It was observed that
utilization of profession insights and also the ongoing basic features can yield especially reassuring outcomes.
exhibitions of a player are shown. Player autonomous Predicting the winner of the matches utilizing distinctive
Cricket Predictor and Analytics Page 3
administered calculations has been accomplished and now
we can anticipate the upcoming matches. There may be
some more calculations coming in future which give better
outcomes at that point utilized as a part of this paper.

II. METHODOLOGY

The work of our project focuses on two models.


The two models are:
1. Descriptive model
2. Predictive model
DESCRIPTIVE MODEL:
The descriptive model focuses mainly on two
aspects:
It describes the data and statistics of the previous
information i.e. batting, balling or all-rounder.
It gives the past information of the matches played
by the IPL teams.

PREDICTIVE MODEL:
The predictive model focuses on predicting the
winning percentage of the team. The ranking of the
players is displayed as well.
The user has the liability to choose the two teams
playing against each other. The selection of the
teams works on the criteria as:
1. If the players are batsmen then, sorting is done
according to the strike rate of the batsmen.
2. If the players are bowlers then, sorting is done
according to the average rate of the bowler.
3. If the players are all-rounder then, sorting is done Algorithm:
considering both strike rate as well as average rate. 1. Start
The algorithm used for this model is Decision Tree 2. Select the root node as ‗city‘.
Classifier. A decision tree is built using top-down 3. Choose one of the cities from
approach. In this algorithm the root node i.e. the ‗city1‘,‘city2‘,‘city3‘, etc.
prior factor considered is the ‗city‘ where the match 4. Select the venues among one of the venue
is being played. The tree is built according to the (‗venue1‘,‘venue2‘,‘venue3‘) present in the city.
prominent factors (city, venue, teams, toss decision) 5. A team is selected and compared against the other
considered in the match. teams.
Decision Tree Classifier Diagram: 6. Toss decision is made and the result is predicated
upon the win/loose criteria.
7. End

III. RESULT

The user has the option of sign up/login. After the


successful sign up/ login procedure, the user has liability
to access two models that is Descriptive model that
shows the statistics of the player and the Predictive
model that predicts the winning percentage of the team
that the user has selected.
The user can also read the latest tweets and news on the
website.

IV. CONCLUSION

The website developed is an authorized website. This


website is beneficial for the coach as he can rank the

Cricket Predictor and Analytics Page 4


players on their priority from the previous data, it is (ODI) Match and Predictive Analysis‖ International
beneficial to the owner to get the details of the IPL Journal of Advanced Research in Computer and
match played and the users who predict the winning Communication Engineering, vol. Vol. 4, no. Issue 6,
percentage of the team and get the statistics of the pp. 192–197, Jun. 2015.
player.
[14]Geddam Jaishankar Harshit and Rajkumar S, ―A
Review Paper on Cricket Predictions Using Various
V. REFERENCES Machine Learning Algorithms and Comparisons Among
Them,‖ International Journal for Research in Applied
[1] T. B. Swartz, P. S. Gill, D. Beaudoin, and B. M. Science & Engineering Technology (IJRASET) , vol.
Desilva, ―Optimal batting orders in one-day cricket,‖ Vol 45, no. 98, pp. 27–32.
Computers & Operations Research, vol.33, no. 7, pp.
1939–1950, 2006.
[2] Preeti Satao, ―Cricket Score Prediction System (CSPS)
Using Clustering Algorithm.‖ Technical Research
Organization India, Vol. 3, no. Issue 4, 2016, pp. 2394–
0697., troindia.in/journal/ijcesr/vol3iss4/43-46.pdf.
[3] Tamanna Siddiqui, Mohammad Alkadri, Najeeb
Ahmad Khan, ―Review of Programming Languages
and Tools for Big Data Analytics‖, International
Journal of Advanced Research in Computer Science,
vol.8,no.5,May-June 2017.
[4] ―Predicting The Match Outcome in One Day -
jssm.org.‖ [Online]. Available:
http://www.jssm.org/volume05/iss4/cap/jssm-05-
480.pdf&p=DevEx.LB.1,5063.1. [Accessed: 11-Aug-
2018].
[5] C. Deep, C. Patvardhan, and C. Vasantha, ―Data
Analytics based Deep Mayo Predictor for IPL-9,‖
International Journal of Computer Applications, vol.
152, no. 6, pp. 6–11, 2016.
[6] F. C. Duckworth and A. J. Lewis, ―A Fair Method for
Resetting the Target in Interrupted One-Day Cricket
Matches,‖ Operational Research Applied to Sports, pp.
128–143, 2015
[7] Vijay Ramakrishnan, Sethuraman K, and
Parameswaran R, ―Target Score Prediction in the game
of Cricket.‖ [Online]. Available:
https://people.ucsc.edu/~praman1/static/pub/ML_Proje
ct_CS7641_report.pdf. [Accessed: 11-Aug-2018]
[8] K. Passi and N. Pandey, ―Increased Prediction
Accuracy in the Game of Cricket Using Machine
Learning,‖ International Journal of Data Mining &
Knowledge Management Process, vol. 8, no. 2, pp. 19–
36, 2018.
[9] S. Singh and P. Kaur, ―IPL Visualization and
Prediction Using H-Base,‖ Procedia Computer Science,
vol. 122, pp. 910–915, 2017.
[10] ―Analysis on Attributes Deciding Cricket Winning,‖
Scribd. [Online]. Available:
https://www.scribd.com/document/357690109/Analysis
-on-Attributes-Deciding-CricketWinning. [Accessed:
11-Aug-2018].
[11] Madan Gopal Jhawar and Vikram Pudi, ―European
Conference on Machine Learning and Principles and
Practice of Knowledge Discovery in Databases.‖
[12]S. Agarwal, L. Yadav, and S. Mehta, ―Cricket Team
Prediction with Hadoop: Statistical Modeling
Approach,‖ Procedia Computer Science, vol. 122, pp.
525–532, 2017.
[13] Mehvish Khan and Riddhi Shah, ―Role of External
Factors on Outcome of a One Day International Cricket

Cricket Predictor and Analytics Page 5


Cricket Predictor and Analytics Page 6

You might also like