0% found this document useful (0 votes)
101 views

Cricket Player Data Analysis Using Clustering Technique

The document discusses using clustering techniques to analyze cricket player performance data. It provides context on cricket data analysis and reviews several related studies that use machine learning methods. The objective is to apply clustering algorithms to group players based on statistical metrics and create visualizations of the resulting clusters.

Uploaded by

mrh7850
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Cricket Player Data Analysis Using Clustering Technique

The document discusses using clustering techniques to analyze cricket player performance data. It provides context on cricket data analysis and reviews several related studies that use machine learning methods. The objective is to apply clustering algorithms to group players based on statistical metrics and create visualizations of the resulting clusters.

Uploaded by

mrh7850
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Cricket Player Data Analysis Using Clustering Techniques

Prof. Basavaraj Neelagund1, Mohammad Ameen S2, Shaik Mohammed Shafi3,


Shaikh Mohammed Yaasar4, Zakir Mohammed5
1
Assistant Professor of Department Computer Science and Engineering
2, 3, 4,5
Student 8th Semester, Department of Computer Science and Engineering
Yenepoya Institute of Technology, Moodbidri, Karnataka, India.
---------------------------------------------------------------------------****------------------------------------------------------------------------
Abstract - Cricket score data analysis involves the use of 2. LITERATURE REVIEW
statistical methods to analyze and understand the performance of
[1] Increased Prediction Accuracy in The Game of Cricket
cricket players by using very simple statistical tools and graphics.
Using Machine Learning Authors: Kalpdrum Passi and
Most of the times, the average scores, strike rates, coefficient of
Niravkumar Pandey:
variation and graphical measures are being utilized to measure the
performance and to make comparisons between different players. The literature review reveals limited but diverse
The objective is to gain insights into strengths and weaknesses of approaches to predicting cricket players' performance. Techniques
the players, identify trends and patterns of player performance. vary from neural networks to hierarchical linear models,
This information can be used by coaches, managers, and analysts introducing new performance measures, and using Social
to improve player performance, make tactical decisions, and gain Network Analysis. Some studies focus on specific elements such
a competitive advantage. The main focus to provide the as player valuation for IPL auctions or optimal batting orders,
while others apply mathematical approaches or machine learning
application of statistical quality control charts in analyzing the
techniques. However, most of these studies are specific to certain
performance of the players by using its batting scores.
teams or players. This project aims to overcome these limitations
by developing a generalized approach for predicting any player's
Index Terms - Machine Learning, Kmeans Algorithm,
performance in a given match using machine learning algorithms.
Hierarchical Clustering.
[2] Predicting Outcome of Indian Premier League (IPL)
1. INTRODUCTION Matches Using Machine Learning Authors: Rabindra Lamsal
and Ayesha Choudhary:
Cricket Score Data Analysis is a field that combines the
sport of cricket and the use of data analysis techniques to This explores the use of machine learning in predicting the
understand and improve performance. Cricket is a highly outcome of Indian Premier League (IPL) cricket matches. It
competitive sport that has millions of fans worldwide, and as a highlights the growing trend of sports analytics, with machine
result, there is a vast amount of data generated from every match. learning playing a significant role due to the availability of
This data provides valuable information about the performance of historical and live data. Examples include predicting player
teams and players that can be used to make informed decisions. performance, defining offensive tactics, and aiding decision-
The use of data analysis in cricket has been growing in recent making processes such as player selection. The paper references
years from player performance analysis to team strategy and the use of machine learning techniques in other sports like
tactical decision-making, the insights generated from data baseball and football, citing specific examples of data modelling
analysis can greatly improve the performance of teams and techniques being used to optimize player performance and
players. strategic decision- making. It also mentions real-world tools
implemented in cricket, such as the Winning and Score Predictor
This project aims to analyze the performance of cricket
(WASP) and Hawk-Eye system, which have revolutionized the
teams and players using statistical methods and provide insights
way cricket matches are analyzed and predicted.
into their strengths and weaknesses. The data used for this
analysis will be collected from various sources and will include [3] Analysing Long Short Term Memory Models for Cricket
information such as runs scored, and other relevant performance Match Outcome Prediction Authors: Rahul Chakwate,
metrics. The results of this analysis will be presented in a Madhan R A:
meaningful and concise manner to provide valuable insights for
coaches, managers and analysts, enabling them to make data- The application of Long Short Term Memory (LSTM)
driven decisions to enhance player performance and team models for predicting cricket match outcomes. It delves into the
strategies. capability of sports analytics and machine learning to analyze
complex match data, with the aim of providing insights and
Overall, this project will demonstrate the importance of predictions that could benefit teams and players. The authors
data analysis in the sport of cricket and how it can be used to gain propose a novel Recurrent Neural Network model that can predict
a competitive advantage. By leveraging statistical methods and match outcomes based on ball-by-ball statistics, offering real-time
machine learning algorithms, this research aims to uncover win probability assessments throughout a match. The paper also
hidden patterns in cricket score data, offering valuable insights reviews existing machine learning techniques, such as K-Nearest
that can revolutionize the way teams approach training, tactics, Neighbors and their application in cricket analytics. The potential
and player development. Through a comprehensive analysis of of neural networks and recurrent neural networks in modeling
player performance and team dynamics, this project seeks to complex functions and sequences, respectively, is also explored.
contribute to the growing body of knowledge in sports analytics
and pave the way for future advancements in cricket score data
analysis.
[4] Modelling and simulation for one-day cricket Authors: Tim •Explore the creation of new features that enhance the clustering
B. Swartz, Paramjit S.Gill and Saman Muthukumarana: process. For example, calculating performance indices or ratios
On cricket simulation and statistical analysis for a more holistic view.
encompasses various aspects such as developing simulators for •Apply a selected clustering algorithm to group players based on
one-day matches, fair resetting of targets, constructing plausible their performance attributes.
test cricket simulations, analyzing cricket scores and geometrical
progressions, and emphasizing the statistical analysis of batting as •Create visualizations that represent the clusters and allow for the
a fundamental aspect of the sport. These studies contribute assessment of their quality and interpretability.
significantly to enhancing the understanding and improvement of
cricket through statistical and simulation methodologies, •Consider implementing dynamic clustering if the dataset spans
providing valuable insights for the game's strategic and statistical multiple seasons or time periods to capture changes in player
aspects. Additionally, the use of simulation models in cricket performance over time.
allows for the investigation of complex questions and phenomena, •Develop an interactive visualization tool (web-based dashboard
offering practical and powerful tools for analyzing the sport. The or standalone application) for users to explore and interact with
popularity of one-day cricket, with its more aggressive gameplay the clusters.
and colorful uniforms, has led to the development of numerous
cricket games for personal computers and gaming stations, 4. METHODOLOGY
although many lack realism and transparency in their simulation
procedures. Data Collection:

[5] Data Analytics based Deep Mayo Predictor for IPL-9 The data used in this work is comprised of a large number of
Authors: Deep Prakash, Patvardhan, Vasantha Lakshmi: cricket score data collected from Kaggle repository. Historical
records from various cricket archives were used to gather
Cricket analytics in the Indian Premier League (IPL) information on scores and statistics from past matches. Surveys
covers various studies related to player valuations, team were conducted with cricket fans and experts to gather their
performance measurement, all-rounder classification, player wage opinions and perspectives on the game.
determination, and player pricing and valuation. The importance
of predictive modeling in cricket, especially in T20 matches, was Pre-Processing:
highlighted due to the significant investments made by franchises.
The literature also discussed the challenges of incorporating Data pre-processing is the most essential part of a data science
current form alongside career statistics for accurate predictions. project. It consumes a major time dedicated to the project. Pre-
Machine learning techniques were emphasized for performance processing of data includes getting rid of erroneous data,
ranking and outcome prediction in IPL matches. Overall, the inconsistent data, formatting the data present and to fill the
literature review underscores the significance of data analytics missing values. The unwanted data is removed including
and machine learning in enhancing decision-making processes in duplicate observations. It mainly deals with correction,
cricket. standardization, and transformation of data. This is done to make
sure outcomes are reliable.
[6] Cricket Players Performance Prediction and Evaluation
Using Machine Learning Algorithms Sumathi, Prabu and Feature Extraction:
Rajkamal: Feature Selection is an essential phase where the parameters to
This paper presents a system that leverages machine analyze cricketer’s performance are to be decided. Parameters
learning algorithms including K-means clustering and hierarchical such as player, span, match, innings, not out, runs, Highest score,
models to predict and evaluate cricket players' performance, Average, strike rate,100,50 and 0 runs scored are considered for a
ultimately aiming to enhance player ranking and increase match- batsman.
winning probabilities. By analyzing individual player Classifiers:
performance, clustering similar players based on attributes, and
selecting the top performers for team formation, the proposed Kmeans Algorithm:
system demonstrates effectiveness in predicting player
performance and optimizing team selection. The experimental In the cricket score data analysis project, the K-means algorithm
findings highlight the system's capability in accurately predicting is utilized as a clustering technique to group players based on
player performance and selecting the best players for a cricket their performance metrics. The algorithm is applied in the
team, showcasing the potential of machine learning in following manner:
revolutionizing player evaluation and team formation in cricket. 1. Initialization: The algorithm begins by randomly initializing K
centroids, where K represents the number of clusters to be
3. PROPOSED WORK
formed. These centroids serve as the initial cluster centers.
•To analyze and understand the performance of cricket players
2. Assignment: Each data point, in this case, each player's
using statistical tools and graphics.
performance metrics such as runs scored, average, strike rate, and
•To generate insights into the strengths and weaknesses of other relevant statistics, is assigned to the closest centroid based
players, identify trends, and recognize patterns of player on a distance metric, typically the Euclidean distance. This step
performance. involves calculating the distance between each data point and the
centroids and assigning the data point to the cluster with the
•Gather cricket score data from various sources, including official nearest centroid.
databases, team statistics, and other relevant repositories.
3. Update: After assigning all data points to clusters, the
•Address missing values, outliers, and inconsistencies in the data. algorithm recalculates the centroids for each cluster by taking the
Transform data if needed, such as converting categorical variables mean of all
into numerical representations.
the data points assigned to that cluster. This step involves
updating the cluster centers based on the data points assigned to
each cluster.
4. Iteration: Steps 2 and 3 are repeated iteratively until
convergence is reached. Convergence occurs when the centroids
no longer change significantly between iterations or when a
maximum number of iterations is reached.
5. Final Clusters: The final clusters and centroids obtained after
convergence represent the solution provided by the K-means
algorithm. Each cluster contains players with similar performance
characteristics based on the selected metrics.
By applying the K-means algorithm to the cricket score data,
players can be grouped into clusters based on their performance,
allowing for the identification of patterns and insights into player
performance. This clustering approach enables the analysis of
player similarities and differences, aiding in strategic decision-
making for coaches, managers, and analysts in the cricket domain.
Hierarchical Clustering:
Hierarchical clustering is utilized in the cricket score data
analysis project to group players based on their performance
metrics. The algorithm is applied as follows:
1. Initialization: The algorithm starts with each player as a
separate cluster and then iteratively merges clusters based on their
similarities until a final set of clusters is obtained .
Fig-4.2: Data Flow Diagram
2. Distance Calculation: A distance metric, such as Euclidean
distance or Manhattan distance, is used to calculate the
similarities between player performance metrics, such as runs
scored, average, and strike rate . 5. DATA DEFINITION
3. Cluster Merging: The algorithm merges clusters that are closest Cricket score data analysis refers to the process of collecting,
to each other based on the calculated distances, creating a cleaning, transforming and visualizing cricket score data in order
hierarchy of clusters . to draw insights, identify patterns and make meaningful
conclusions about the game of cricket. The aim is to understand
4. Dendrogram Creation: A dendrogram is constructed to
various aspects of the game such as player performance, team
visualize the hierarchical relationships between players and
strategies, and match outcomes by using statistical and machine
clusters, showing how they are grouped at different levels of
learning techniques. This helps to make better decisions,improve
similarity .
player performance and enhance the overall spectator experience.
5. Cluster Interpretation: The final clusters obtained from
hierarchical clustering represent groups of players with similar 6. MODELING AND ANALYSIS
performance characteristics. These clusters can provide insights
into player similarities, strengths, and weaknesses, aiding in Cricket score analysis can be used to determine player and team
strategic decision-making for coaches and analysts in the cricket performance over a period of time, identify strengths and
domain. weaknesses, and track progress. Analysis can be used to
understand match patterns and tendencies, and to create effective
By applying hierarchical clustering in the cricket score data playing strategies and By analyzing data from past matches and
analysis project, players can be grouped hierarchically based on player statistics, cricket score analysis that identify emerging
their performance metrics, allowing for a deeper understanding of trends in the game.
player relationships and performance patterns.

7. CHALLENGES
1. Data Collection: The first challenge in cricket score analysis is
collecting accurate and comprehensive data. This includes scoring
information, player statistics, team statistics, match conditions,
and more.
2. Data Quality: The quality of the data collected is critical in
determining the accuracy of the analysis. Factors such as errors in
data entry, missing information, and inconsistent data formatting
can all impact the quality of the data.
3. Data Integration: Integrating data from different sources, such
as from different seasons or from different cricket leagues, can be
Fig-4.1: Working Architecture a
challenging task. This is due to the differences in data structure,
formatting, and naming conventions used by different sources.
4. Data visualization: Visualizing cricket score data can be
challenging, as it is often complex and multidimensional. Data
analysts need to be able to present the data in a way that is easy to
understand and interpret.
5. Statistical Analysis: Score analysis involves complex statistical
analysis, which can be challenging for analysts without a strong
statistical background. This includes understanding the
underlying distribution of the data and selecting appropriate
statistical models.

8. RESULTS
In this graph, we can see the anaylized data of match played and
player played cricket for highest time. This result will help us
understand that the how many matches did player played matches.
Final list of players using Kmeans Clustering Algorithm

Final list of players using Hierarchical Clustering algorithm

In this graph, we can see the anaylized data of match played


and a player have more average runs. This result will help us
understand that the how many matches did player played
matches and good average.

9. CONCLUSION AND FUTURE SCOPE


In conclusion, the analysis of cricket score data has provided
valuable insights into the performance of players with teams.
The data has shown that there are clear patterns in the
performance of players, with all international cricket teams
players consistently performing better than others. This
information can be used by cricket analysts, coaches, and team
management to make informed decisions and develop strategies
that will help improve the performance of teams and players.
Additionally, the analysis can be used by fans to gain a deeper
understanding of the game and to better appreciate the skills and
strategies involved in cricket. Overall, the analysis of cricket
In this graph, we can see the anaylized data of Not Out, score data has proven to be a valuable tool for improving the
Average and strike rate of player. It is done by clustering understanding and appreciation of the game of cricket.
technique from which analysts can compare amoung players.
10. ACKNOWLEDGEMENT
We extend our sincere gratitude to Prof. Basavaraj
Neelagund, Assistant Professor at Yenepoya Institute of
Technology, for his invaluable insights and contributions to this
work. His expertise and feedback have played a significant role in
shaping the development and refinement of our fraud detection
framework.

11. REFERENCES
[1]. Tim Swartz, Paramjit Gill, and Saman Muthukumarana.
“Modelling and simulation for one-day cricket”. In: Canadian
Journal of Statistics 37 (June 2009), pp. 143–160. doi:
10.1002/cjs.10017.

[2]. Chellapilla Deep Prakash, C. Patvardhan, and C. Vasantha.


“Data Analytics based Deep Mayo Predictor for IPL-9”. In:
International Journal of Computer Applications 152 (Oct.2016),
pp. 6–11. doi: 10.5120/ijca2016911875.

[3]. Kalpdrum Passi and Niravkumar Pandey. “Increased


Prediction Accuracy in the Game of Cricket Using Machine
Learning”. In: International Journal of Data Mining & Knowledge
Management Process 8 (Mar. 2018), pp. 19–36. doi:
10.5121/ijdkp.2018.8203.

[4]. Rabindra Lamsal and Ayesha Choudhary. Predicting


Outcome of Indian Premier League (IPL) Matches Using
Machine Learning. 2020. arXiv: 1809.09813 [stat.AP]..

[5]. Rahul Chakwate and Madhan .R. Analysing Long Short Term
Memory Models for Cricket Match Outcome Prediction. Nov.
2020.

[6]. P. O. Donoghue, Research Methods for Sports Performance


Analysis. Evanston, IL, USA: Routledge, 2009.

[7]. Passi, K. and Pandey, N. (2018). Increased prediction


accuracy in the game of cricket using machine learning,
International Journal of Data Mining Knowledge Management
Process 8: 19–36..

[8]. Parag Shah, and Mitesh Shah. “Predicting ODI Cricket


Result,” Journal of Tourism, Hospitality and Sports, Vol 5, 2015.

[9]. S. Murdeshwar, “Data Mining on Cricket Data Set for


predicting the results”, report in December 2016.

You might also like