Predicting BPLMatch Winners An Empirical Study Using Machine Learning Approach
Predicting BPLMatch Winners An Empirical Study Using Machine Learning Approach
net/publication/375884996
CITATIONS READS
0 35
2 authors:
All content following this page was uploaded by Bornita Adhikari on 07 December 2023.
Abstract— With the evolution of computer science, every Every team want to give their best and for this purpose ML
company is implementing the newest technologies to survive in prediction can play significant role by handling any
market with better decision-making capabilities, better uncertainty and can make an impact in predicting the winner
communication and customer satisfaction. The only means of of matches using present data in several ways. ML models
fulfilling all these criteria’s is to perform data analysis that is can analyze the past data and predict how weather condition
more accurate and pure. In cricket, where no one can guess and a particular pitch or ground may affect the match’s
which team will win until the last ball of the last over, machine outcome. By analyzing the factors like team performance,
learning can help by predicting the results of the games. Match team ranking, head-to-head records and the recent form of
outcome prediction models have a lot of financial incentive
players, probability of winning the game at particular venue,
because cricket is a multi-billion-dollar industry. The goal of
score impact on field/bat first after winning the toss machine
this study is to identify the most accurate machine learning
model that can accurately predict the winner given the data learning models can predict which team is likely to win.
from the Bangladesh Premier League. For this analysis five ML In our analysis, the performance of each model and future
models XGBoost, Gradient Boosting, KNN, Decision Tree, directions are discussed with the goal of predicting the
Random Forest has been tested for the purpose of model outcome of BPL matches. With the growth of T20 leagues
building despite that our proposed model is XGBoost. To get and technological advancements the ask for cricket winner
access to BPL dataset web scrapping has been done, the dataset
prediction models is anticipated to expand in the upcoming
contains 15 columns and 3239 values and 8 team was available
years, as more fans and teams recognize the value of data-
in each season from 2018 to 2023. We use cutting-edge machine
learning techniques based on the use of numerous models,
driven insights and the potential competitive advantages they
feature selection, and data separation techniques. Finally, by can provide. Our experimental conclusion can help to
structuring every line of action, the forecast accuracy is optimize team strategies and increase their chances of
attained. winning. The prediction was convey using five machine
learning classifiers XGBoost, Gradient Boosting(GB), k-
nearest neighbors(KNN), Decision Tree(DT), Random
Forest(RF) where all the models in our research has shown
Keywords— BPL, Cricket, Prediction, XGBoost, outstanding accuracy. According to investigational result,
Visualization, Classification XGBoost indicates better prediction of 93%.We described the
benefits of cricket outcome prediction modeling with brief
introduction of this game. In section 2, some of the related
I. INTRODUCTION works about cricket outcome prediction is shown. The section
3, represent the workflow of this analysis, Section 4 comes
Cricket is a well-liked sport that is played and enjoyed by with the result formulation and conclusion section 5 of this
millions of people around the globe. It is especially popular in work provides a description.
countries such as India, Pakistan, Australia, England, South
Africa, Sri Lanka, Bangladesh, and the West Indies, where it
is considered a national pastime. The reason for the popularity
of cricket is it offers excitement and drama. The sport is II. LITERATURE REVIEW
known for its high-scoring matches, close finishes, and the
individual brilliance of its players. The fast-paced and Cricket has gained a lot of attention as it has progressed
dynamic nature of Twenty20 cricket has made it particularly among sports commentators. Cricket has been the subject of
popular in recent years, as it offers a shorter and more action- an increasing amount of research, but because this dataset is
packed version of the game. Nowadays Cricket is being brand-new and private, it has not yet been utilized in any
benefited from the growth of technology and the media, which research articles. Vistro et al. [1] conducted a study which
has made it more accessible to fans. Live streaming of
aimed to foresee the triumphant team in cricket matches
matches, social media, and mobile apps have made it easier
utilizing machine learning and data analytics methods. The
for fans to stay up-to-date with the latest news and scores.
study incorporated a range of features related to team and
A professional Twenty20 cricket league is called the player performance, venue, and other match-specific
Bangladesh Premier League (BPL) that was launched in 2012 variables to train their models. The study's findings showed
and it operates on franchise-based business model. BPL has that the suggested method able to foresee the winner of
become a t-twenty blast investing big amount of money. There cricket matches with an accuracy of more than 70%. The
are five individual winning team in BPL history, with the study accentuated the potential of machine learning and data
Dhaka Dynamites winning the most titles (3). Teams select analytics in predicting the winner of cricket matches. In the
their players based on draft system that says players should be study of Awan et al. [2] The team scores were predicted using
selected based on their performances in the previous season.
City date team1 Team2 toss- Toss- result winner Win-by- Win- value
runs by-
winner decision wickets
Chattogram 20/12/2019 Chattogram Comilla Comilla Field Normal 0 Zahur Ahmed
Challengers Warriors Warriors Chattogram Chowdhury
16
Challengers Stadium
City date team1 Team2 toss- Toss- result winner Win- Win- value
by-runs by-
winner decision wickets
0 41 0 2 1 1 1 0 16 0 4
2 8 2 5 4 1 1 6 0 8 1
2 79 2 5 4 1 1 2 0 4 2
2 75 0 3 0 0 1 2 0 7 2
2 76 0 2 0 0 1 1 0 6 2
C. Visualize dataset
scatter plot matrix visualize the correlation matrix Heatmap
In machine learning research, data visualization plays a To view the correlation matrix Heatmap, one needs to
crucial role in comprehending the connections between visualize a table that displays the correlation coefficients
variables, recognizing patterns, and gaining insights into between pairs of variables in a dataset. These coefficients are
model performance. One such visual representation is the statistical measures that indicate the degree of correlation
scatter plot matrix of features in xtrain, which displays the between two elements. Correlation coefficient values vary
correlations between pairs of features (or variables) in the from -1 to 1, where -1 denotes a fully negative correlation, 1
training dataset. To predict the winner of BPL matches, this denotes a fully positive correlation, and 0 denotes no
research has selected five columns - city, date, team1, team2, association. The plot that results from this visualization is a
and toss winner - to plot on the two axes, X and Y in a grid of matrix that is color-coded, with red hues representing positive
scatter plots. Each plot within the grid demonstrates the relationships and blue tones representing negative
relationship between two features, while the diagonal of the correlations. This display is shown in Figure3.
grid showcases the distribution of each feature. The
correlations observed in the dataset is shown in Figure 2
One sort of ensemble learning that uses the Random Forest KNN 79%
method utilizes numerous decision trees to predict outcomes.
In order to make a prediction using this algorithm, the input
data is processed through each decision tree from the root
node to a leaf node. Once the data has reached the leaf node
of each tree, the algorithm produces a prediction based on
either the majority class or the average prediction value of all
the decision trees within the Random Forest.
Gradient Boosting
Gradient Boosting is a technique that involves combining
several weak learners to form a powerful one. To make
predictions, the input data is first fed into the weak learner,
and the errors in the initial prediction are determined. To
improve the accuracy of the prediction, a new weak learner is
then trained to correct the previous learner. The final
prediction is obtained by adding up the predictions from all
the weak learners.
XGBoost
XGBoost is a well-known machine learning model that is
widely used for regression, classification, and ranking tasks.
It is an ensemble model that consists of multiple decision trees
and leverages the errors of previous trees to enhance its
predictions. During the prediction process, the input data is
processed through various decision trees, and the scores
generated by each tree are merged to produce a final
prediction. Finally, a non-linear function like the sigmoid or
softmax function is applied to convert the output into a Fig. 4. Accuracy compares
the training data. Predictions were then made on the test data
using the predict method. The evaluation of our model's
performance was done using accuracy as the primary metric.
In machine learning, performance metrics are
used to assess the effectiveness of different models. Accuracy
is a useful metric that measures the proportion of correctly
predicted instances in the dataset. Our XGBoost Classifier
achieved an accuracy of 94.25%, indicating a satisfactory
performance of our model.
A. Confusion Matrix:
The counts of true positives, true negatives, false positives, Khulna 0.91 1.00 0.95 20
and false negatives for each class in the dataset are shown in Tighers
the confusion matrix. In case the model is producing a lot of
false positives, we can attempt to enhance its performance by
modifying the decision threshold or experimenting with Comilla 1.00 1.00 1.00 7
different features. Figure 5 illustrates the confusion matrix of Warriors
the models that were tested.
[1] Vistro, Daniel Mago, Faizan Rasheed, and Leo Gertrude David. "The
cricket winner prediction with application of machine learning and data
analytics." International Journal of Scientific & Technology Research
8, no. 09 (2019).
[2] 2. Awan, Mazhar Javed, Syed Arbaz Haider Gilani, Hamza
Ramzan, Haitham Nobanee, Awais Yasin, Azlan Mohd Zain, and
Rabia Javed. "Cricket match analytics using the big data approach."
Electronics 10, no. 19 (2021): 2350.
[3] 3. Tekade, Pallavi, Kunal Markad, Aniket Amage, and Bhagwat
Natekar. "Cricket match outcome prediction using machine learning."
International journal 5, no. 7 (2020).
[4] 4. Sankaranarayanan, Vignesh Veppur, Junaed Sattar, and Laks VS
Lakshmanan. "Auto-play: A data mining approach to ODI cricket
simulation and prediction." In Proceedings of the 2014 SIAM
international conference on data mining, pp. 1064-1072. Society for
Industrial and Applied Mathematics, 2014.
[5] 5. Mittal, Harsh, Deepak Rikhari, Jitendra Kumar, and Ashutosh
Kumar Singh. "A study on machine learning approaches for player
performance and match results prediction." arXiv preprint
arXiv:2108.10125 (2021).
[6] 6. Mustafa, Raza Ul, M. Saqib Nawaz, M. Ikram Ullah Lali,
Tehseen Zia, and Waqar Mehmood. "Predicting the cricket match
outcome using crowd opinions on social networks: A comparative
study of machine learning methods." Malaysian Journal of Computer
Science 30, no. 1 (2017): 63-76.
[7] 7. Passi, Kalpdrum, and Niravkumar Pandey. "Increased prediction
accuracy in the game of cricket using machine learning." arXiv preprint
arXiv:1804.04226 (2018).
[8] 8. Kamble, R. R. "Cricket score prediction using machine learning."
Turkish Journal of Computer and Mathematics Education
(TURCOMAT) 12, no. 1S (2021): 23-28.
[9] 9. Passi, Kalpdrum, and Niravkumar Pandey. "Predicting players’
performance in one day international cricket matches using machine
View publication stats