Project ReportBDA
Project ReportBDA
ON
Anurag Prajapati
Jainam Parmar
Vishal Pandey
October - 2024
Declaration
We wish to state that the work embodied in this mini project titled “Selection of best-11
players of T20 Cricket Worldcup” forms our own contribution to the work carried out under the
guidance of “Prof. Bhavesh Panchal & Dr Shikha Gupta” at the Rajiv Gandhi Institute of
Technology.
I declare that this written submission represents my ideas in my own words and where
others' ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I
understand that any violation of the above will be cause for disciplinary action by the Institute and
can also evoke penal action from the sources which have thus not been properly cited or from
whom proper permission has not been taken when needed.
Keywords: Power BI, Data Analytics, Web Scraping, Python, Data Preprocessing, Sports
Analytics, Business Intelligence.
Contents
1 INTRODUCTION 5
Problem Statement
2 6
Aim & Objectives
Proposed System 7
3
System Architecture 8
4
Feature Selection 9
5 10
Implementation & Results
6 13
Future Scope
7 13
Conclusion
8 References 14
LIST OF FIGURES
Figure No. Name Page no.
1 Fig1: Selection of openers,Wk,bowlers 10
2 Fig2: Combined Performance of Whole team 10
3 Fig3: Profile Summary 11
4 Fig4: Average,Strike Rates 11
5 Fig5: Finalised 11 players 12
CHAPTER 1
INTRODUCTION
The project involves several essential steps starting with data collection using Python, where
advanced web scraping techniques were applied to gather real-time data. Following data
collection, the data underwent an extensive preprocessing phase. Data cleaning and
transformation were critical to ensure that the dataset was accurate, consistent, and ready for
analysis. The data was then normalized to bring all metrics to a comparable scale. The cleaned
data was exported into CSV format and subsequently imported into Power BI, where further
preprocessing steps such as creating calculated columns and measures were performed to enhance
the usability of the dataset.
The Power BI dashboard created as the final deliverable serves as a user-friendly platform where
the performance of different players can be analyzed interactively. The users can hover over
individual players to gain insights into their performance in specific matches. The system also
incorporates an alert mechanism to notify users if they select more than 11 players for the final
team. This project highlights how data analytics can provide meaningful insights that assist in
selecting the best possible cricket team based on current performance data.
Chapter 2
By fulfilling these objectives, the project ensures a seamless integration of data analytics and sports
management, offering a robust solution for team selection.
Chapter 3
Proposed System
The proposed system is designed to automate the selection process of cricket players for a T20
team by leveraging advanced web scraping, data preprocessing, and business intelligence tools.
The system starts with web scraping to gather the latest player statistics from ESPNcricinfo,
ensuring that the most current data is used for decision-making. Python was utilized to extract and
preprocess this data, performing operations such as data cleaning, transforming raw data into
structured formats, and normalizing player metrics to ensure uniformity across all performance
indicators.
Once the data was preprocessed in Python, it was exported to CSV files, which were then
transferred to Power BI for further processing. In Power BI, additional calculated columns and
measures were created to extract essential player metrics such as batting average, bowling strike
rate, and overall contribution to the team. These metrics are critical for evaluating player
performance across different roles in the game.
The system's primary interface is the Power BI dashboard, which enables users to interact with the
data. They can hover over individual players to view match-wise performance data, making it
easier to compare players across different matches. The dashboard categorizes players into five
roles: openers, middle-order batsmen, finishers, all-rounders, and fast bowlers. A key feature of
the system is the alert mechanism that notifies users if they attempt to select more than 11 players,
ensuring the team composition adheres to the rules of the game. The system's overall goal is to
provide a comprehensive, interactive, and data-driven approach to team selection.
System Architecture
The system architecture follows a multi-step process, integrating Python for data collection and
preprocessing with Power BI for data visualization and decision-making. The first step in the
architecture is Data Collection, which is achieved through web scraping techniques using Python.
ESPNcricinfo serves as the primary data source, offering a vast repository of player statistics,
including performance data across multiple matches and tournaments.
Once the raw data is collected, it undergoes Data Preprocessing. Using Python’s powerful libraries
like Pandas and NumPy, the data is cleaned by handling missing values, correcting inconsistencies,
and transforming the raw data into a structured format. Additionally, data normalization is applied
to bring different performance metrics, such as runs, strike rate, wickets, and economy, onto a
comparable scale. After preprocessing, the cleaned data is exported as CSV files, which are ready
to be loaded into Power BI.
In Power BI Integration, the CSV data is imported, and additional preprocessing is performed.
Power BI’s calculated columns and measures are used to derive key metrics like batting averages,
bowling strike rates, and total contributions, offering deeper insights into each player’s
performance. These calculated metrics allow for the precise evaluation of players across different
roles, making the data more actionable.
The final stage involves the Dashboard Creation in Power BI. The dashboard serves as a visual
interface for users, providing an intuitive platform to explore and analyze player data. It includes
filters for selecting player roles and a hover function that shows detailed performance data for each
player. The system also incorporates a Selection Mechanism that limits users to selecting a
maximum of 11 players, with alerts generated if this rule is violated. This system architecture
ensures a streamlined, efficient, and data-driven player selection process.
Chapter 4
Feature Selection
Feature selection is a critical component of this system, as it directly influences the effectiveness
of the player selection process. For this project, we focused on selecting features that provide a
comprehensive view of each player's abilities, enabling a holistic evaluation. The features are
divided into performance metrics and player roles.
One of the primary features is the Player Role categorization, which groups players into five
distinct categories: openers, middle-order batsmen, finishers, all-rounders, and fast bowlers. This
role-based classification ensures that the selected team is balanced and includes players who excel
in different aspects of the game. For example, openers are selected based on their ability to perform
under powerplay conditions, while all-rounders are judged on both their batting and bowling
contributions.
In terms of Performance Metrics, we considered a variety of features. For batsmen, critical metrics
include runs scored, strike rate, and batting average. For bowlers, we focused on wickets taken,
bowling average, and economy rate. These features provide a well-rounded assessment of both
batting and bowling performances, making it easier to compare players. Fielding metrics were also
considered, though they were not a primary focus for this iteration.
To enable more granular analysis, the dashboard includes a Match-by-Match Performance view,
which allows users to hover over a player’s name and see their performance in each individual
match. This feature is particularly useful for identifying consistency in player performance.
Another key feature is the Selection Alert System, which ensures that users cannot select more
than 11 players, adhering to team composition rules. If a user attempts to add more than 11 players,
a notification is triggered, alerting them to make adjustments. These features collectively enhance
the system’s functionality and ensure that team selection is both efficient and balanced.
Chapter 5
Chapter 7
Conclusion
This project showcases the application of data analytics and business intelligence in the realm of
cricket team selection. By integrating web scraping techniques, Python-based preprocessing, and
Power BI’s visualization capabilities, the system provides a comprehensive platform for selecting
the best 11 players from the T20 World Cup data. The Power BI dashboard not only simplifies the
data analysis process but also makes it interactive, allowing users to make data-driven decisions
with ease.The system’s features, such as the role-based categorization of players, performance
metrics, and hover-enabled match analysis, ensure that users have all the necessary information at
their fingertips. The alert mechanism that prevents the selection of more than 11 players ensures
compliance with cricket team composition rules, further enhancing the system’s utility.The project
demonstrates how advanced analytics tools can streamline and optimize traditionally manual
processes like player selection. By automating data collection and processing, it reduces the time
and effort required to make strategic decisions, allowing for more accurate and objective outcomes.
This system serves as a model for how data analytics can be applied not only in cricket but across
various sports, offering future potential for scalability and adaptation to different formats and
disciplines.
Chapter 8
References
1. H. J., & F. K. (2020). "Python in Sports Analytics: An Overview." Journal of Sports
Science and Technology, 8(1), 23-35
5. McKinney, W. (2018). "Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython." O'Reilly Media.