0% found this document useful (0 votes)
30 views

Project ReportBDA

Uploaded by

Kshitij Parad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Project ReportBDA

Uploaded by

Kshitij Parad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A MINI-PROJECT REPORT

ON

Big Data Analysis

“Selection of best-11 players of T20 Cricket World Cup”


BY

Anurag Prajapati
Jainam Parmar
Vishal Pandey

Under the guidance of


Prof. Bhavesh Panchal & Dr. Shikha Gupta

Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53

Department of Computer Engineering


University of Mumbai

October - 2024
Declaration

We wish to state that the work embodied in this mini project titled “Selection of best-11
players of T20 Cricket Worldcup” forms our own contribution to the work carried out under the
guidance of “Prof. Bhavesh Panchal & Dr Shikha Gupta” at the Rajiv Gandhi Institute of
Technology.
I declare that this written submission represents my ideas in my own words and where
others' ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity and
have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I
understand that any violation of the above will be cause for disciplinary action by the Institute and
can also evoke penal action from the sources which have thus not been properly cited or from
whom proper permission has not been taken when needed.

Anurag Prajapati (B724)

Jainam Parmar (B713)

Vishal Pandey (B710)


Abstract
The objective of this project is to create a dynamic and interactive Power BI dashboard for
selecting the best 11 players from the T20 Cricket World Cup data. The selection process is
divided into four key roles: opener, middle order, finisher/all-rounders, and fast bowlers, ensuring
a balanced and competitive team composition. The data used in this project is sourced through
advanced web scraping techniques from ESPN, capturing the most recent and relevant player
statistics and performance metrics. This raw data is then subjected to extensive preprocessing in
Python, where we perform data cleaning, transformation, and normalization to ensure accuracy
and consistency. By leveraging Python's powerful data manipulation libraries, we prepare the data
for insightful analysis and visualization. The Power BI dashboard serves as the final deliverable,
providing an intuitive interface for users to explore and analyze the data, enabling them to make
informed decisions in selecting the ideal T20 cricket team. This project not only highlights the
practical application of data analytics in sports but also demonstrates the seamless integration of
web scraping, data preprocessing, and business intelligence tools to derive meaningful insights
and drive strategic decision-making in sports management.

Keywords: Power BI, Data Analytics, Web Scraping, Python, Data Preprocessing, Sports
Analytics, Business Intelligence.
Contents

1 INTRODUCTION 5

Problem Statement
2 6
Aim & Objectives
Proposed System 7
3
System Architecture 8
4
Feature Selection 9

5 10
Implementation & Results
6 13
Future Scope
7 13
Conclusion

8 References 14

LIST OF FIGURES
Figure No. Name Page no.
1 Fig1: Selection of openers,Wk,bowlers 10
2 Fig2: Combined Performance of Whole team 10
3 Fig3: Profile Summary 11
4 Fig4: Average,Strike Rates 11
5 Fig5: Finalised 11 players 12
CHAPTER 1

INTRODUCTION

In today’s competitive sports environment, data-driven decision-making plays a critical role in


analyzing and predicting player performance. This project focuses on building an interactive
dashboard using Power BI for selecting the best 11 players from the T20 Cricket World Cup data.
Cricket team composition requires strategic decision-making, balancing various player roles such
as openers, middle-order batsmen, finishers, all-rounders, and fast bowlers. Given the abundance
of data available from multiple sources, making an informed decision about which players to
select can be a daunting task. This project leverages web scraping techniques to extract relevant
data from ESPNcricinfo, a comprehensive database that provides up-to-date player statistics.

The project involves several essential steps starting with data collection using Python, where
advanced web scraping techniques were applied to gather real-time data. Following data
collection, the data underwent an extensive preprocessing phase. Data cleaning and
transformation were critical to ensure that the dataset was accurate, consistent, and ready for
analysis. The data was then normalized to bring all metrics to a comparable scale. The cleaned
data was exported into CSV format and subsequently imported into Power BI, where further
preprocessing steps such as creating calculated columns and measures were performed to enhance
the usability of the dataset.

The Power BI dashboard created as the final deliverable serves as a user-friendly platform where
the performance of different players can be analyzed interactively. The users can hover over
individual players to gain insights into their performance in specific matches. The system also
incorporates an alert mechanism to notify users if they select more than 11 players for the final
team. This project highlights how data analytics can provide meaningful insights that assist in
selecting the best possible cricket team based on current performance data.
Chapter 2

2.1 Problem Statement


Selecting the best players for a cricket team, particularly in the fast-paced T20 format, requires
analyzing multiple data points such as batting averages, strike rates, bowling economy, and
fielding performance. The traditional approach of relying on intuition or past experience is not
only time-consuming but can lead to suboptimal decisions. The primary challenge lies in
evaluating players based on diverse metrics and roles, such as openers, middle-order batsmen,
finishers, all-rounders, and bowlers. The proposed solution automates the process of player
selection by gathering performance data, visualizing it, and providing alerts to prevent errors in
team composition.

2.2 Aim & Objectives


The primary Aim of this project is to develop a comprehensive, data-driven system that automates
the process of selecting the best 11 players for a cricket team from the T20 World Cup data. This
system leverages the power of web scraping, data preprocessing, and business intelligence tools
like Power BI to provide an interactive platform for optimal team selection based on performance
metrics. By doing so, the project addresses the complexities and challenges associated with manual
team selection, offering a more objective, data-informed approach.

The specific Objectives of the project are:


Data Acquisition: To collect player statistics from ESPNcricinfo using Python-based web
scraping techniques. This ensures that the most up-to-date and relevant data is used for team
selection.
Data Preprocessing: To clean, transform, and normalize the raw data, making it suitable for
analysis. This step is critical for ensuring the accuracy and consistency of the data.
Power BI Integration: To import the preprocessed data into Power BI and create calculated
columns and measures for analyzing player performance.
Dashboard Creation: To design an intuitive Power BI dashboard that enables users to interact
with the data, analyze player performance, and make informed decisions.
Team Selection Mechanism: To implement a selection system that allows users to choose a
maximum of 11 players, with alerts generated if the limit is exceeded.
Performance Visualization: To enable users to hover over individual players to view detailed
match-by-match performance, aiding in the decision-making process.

By fulfilling these objectives, the project ensures a seamless integration of data analytics and sports
management, offering a robust solution for team selection.
Chapter 3

Proposed System
The proposed system is designed to automate the selection process of cricket players for a T20
team by leveraging advanced web scraping, data preprocessing, and business intelligence tools.
The system starts with web scraping to gather the latest player statistics from ESPNcricinfo,
ensuring that the most current data is used for decision-making. Python was utilized to extract and
preprocess this data, performing operations such as data cleaning, transforming raw data into
structured formats, and normalizing player metrics to ensure uniformity across all performance
indicators.

Once the data was preprocessed in Python, it was exported to CSV files, which were then
transferred to Power BI for further processing. In Power BI, additional calculated columns and
measures were created to extract essential player metrics such as batting average, bowling strike
rate, and overall contribution to the team. These metrics are critical for evaluating player
performance across different roles in the game.

The system's primary interface is the Power BI dashboard, which enables users to interact with the
data. They can hover over individual players to view match-wise performance data, making it
easier to compare players across different matches. The dashboard categorizes players into five
roles: openers, middle-order batsmen, finishers, all-rounders, and fast bowlers. A key feature of
the system is the alert mechanism that notifies users if they attempt to select more than 11 players,
ensuring the team composition adheres to the rules of the game. The system's overall goal is to
provide a comprehensive, interactive, and data-driven approach to team selection.

System Architecture
The system architecture follows a multi-step process, integrating Python for data collection and
preprocessing with Power BI for data visualization and decision-making. The first step in the
architecture is Data Collection, which is achieved through web scraping techniques using Python.
ESPNcricinfo serves as the primary data source, offering a vast repository of player statistics,
including performance data across multiple matches and tournaments.

Once the raw data is collected, it undergoes Data Preprocessing. Using Python’s powerful libraries
like Pandas and NumPy, the data is cleaned by handling missing values, correcting inconsistencies,
and transforming the raw data into a structured format. Additionally, data normalization is applied
to bring different performance metrics, such as runs, strike rate, wickets, and economy, onto a
comparable scale. After preprocessing, the cleaned data is exported as CSV files, which are ready
to be loaded into Power BI.
In Power BI Integration, the CSV data is imported, and additional preprocessing is performed.
Power BI’s calculated columns and measures are used to derive key metrics like batting averages,
bowling strike rates, and total contributions, offering deeper insights into each player’s
performance. These calculated metrics allow for the precise evaluation of players across different
roles, making the data more actionable.

The final stage involves the Dashboard Creation in Power BI. The dashboard serves as a visual
interface for users, providing an intuitive platform to explore and analyze player data. It includes
filters for selecting player roles and a hover function that shows detailed performance data for each
player. The system also incorporates a Selection Mechanism that limits users to selecting a
maximum of 11 players, with alerts generated if this rule is violated. This system architecture
ensures a streamlined, efficient, and data-driven player selection process.
Chapter 4

Feature Selection

Feature selection is a critical component of this system, as it directly influences the effectiveness
of the player selection process. For this project, we focused on selecting features that provide a
comprehensive view of each player's abilities, enabling a holistic evaluation. The features are
divided into performance metrics and player roles.

One of the primary features is the Player Role categorization, which groups players into five
distinct categories: openers, middle-order batsmen, finishers, all-rounders, and fast bowlers. This
role-based classification ensures that the selected team is balanced and includes players who excel
in different aspects of the game. For example, openers are selected based on their ability to perform
under powerplay conditions, while all-rounders are judged on both their batting and bowling
contributions.

In terms of Performance Metrics, we considered a variety of features. For batsmen, critical metrics
include runs scored, strike rate, and batting average. For bowlers, we focused on wickets taken,
bowling average, and economy rate. These features provide a well-rounded assessment of both
batting and bowling performances, making it easier to compare players. Fielding metrics were also
considered, though they were not a primary focus for this iteration.

To enable more granular analysis, the dashboard includes a Match-by-Match Performance view,
which allows users to hover over a player’s name and see their performance in each individual
match. This feature is particularly useful for identifying consistency in player performance.
Another key feature is the Selection Alert System, which ensures that users cannot select more
than 11 players, adhering to team composition rules. If a user attempts to add more than 11 players,
a notification is triggered, alerting them to make adjustments. These features collectively enhance
the system’s functionality and ensure that team selection is both efficient and balanced.
Chapter 5

Implementation & Results

Fig1: Selection of openers,Wk,bowlers

Fig2: Combined Performance of Whole team


Fig3: Profile Summary

Fig4: Average,Strike Rate


Fig5: Finalised 11 players
Chapter 6
Future Scope
While this project successfully integrates data scraping, preprocessing, and Power BI to provide
an efficient platform for player selection, there are several areas where it could be expanded in the
future. One potential enhancement is the inclusion of Fitness Data and Injury History to offer a
more comprehensive view of a player’s availability and long-term performance prospects. By
analyzing fitness metrics, teams can make more informed decisions regarding player workload
management and rotation policies, thereby reducing the risk of injuries during crucial
games.Another possible future development is the use of Machine Learning Algorithms to predict
future player performance. Machine learning models could be trained on historical data to forecast
player performance in upcoming matches or tournament. Instead of manually scraping data from
ESPNcricinfo, the dashboard could be linked to live data feeds, ensuring that the player statistics
and performance metrics are always current. This would be particularly useful during tournaments,
where teams need to make decisions based on real-time information.Lastly, the system could be
adapted for other cricket formats, such as ODI or Test cricket, or even for other sports, where
player selection involves analyzing multiple performance metrics. The flexibility of the system
architecture ensures that it can be easily modified to suit different sports analytics needs.

Chapter 7
Conclusion
This project showcases the application of data analytics and business intelligence in the realm of
cricket team selection. By integrating web scraping techniques, Python-based preprocessing, and
Power BI’s visualization capabilities, the system provides a comprehensive platform for selecting
the best 11 players from the T20 World Cup data. The Power BI dashboard not only simplifies the
data analysis process but also makes it interactive, allowing users to make data-driven decisions
with ease.The system’s features, such as the role-based categorization of players, performance
metrics, and hover-enabled match analysis, ensure that users have all the necessary information at
their fingertips. The alert mechanism that prevents the selection of more than 11 players ensures
compliance with cricket team composition rules, further enhancing the system’s utility.The project
demonstrates how advanced analytics tools can streamline and optimize traditionally manual
processes like player selection. By automating data collection and processing, it reduces the time
and effort required to make strategic decisions, allowing for more accurate and objective outcomes.
This system serves as a model for how data analytics can be applied not only in cricket but across
various sports, offering future potential for scalability and adaptation to different formats and
disciplines.
Chapter 8

References
1. H. J., & F. K. (2020). "Python in Sports Analytics: An Overview." Journal of Sports
Science and Technology, 8(1), 23-35

2. Khosrow-Pour, M. (Ed.). (2020). "Business Intelligence and Analytics: Systems for


Decision Support." IGI Global.

3. D. D. (2019). "A study on web scraping: Methods and applications." International


Journal of Computer Applications, 182(21), 8-13.

4. W. L. (2019). "The role of sports analytics in performance improvement." Journal of


Sports Analytics, 5(1), 1-16.

5. McKinney, W. (2018). "Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython." O'Reilly Media.

You might also like