0% found this document useful (0 votes)
9 views

BCA 8th Project report(Linear regression)

This project report presents the development of an IPL Prediction System using machine learning techniques to analyze past match data and predict outcomes. The report details the methodologies employed, including data collection, preprocessing, and various machine learning algorithms used for prediction, such as Logistic Regression and Decision Trees. It aims to provide insights into player performance and match outcomes, contributing to strategic planning for teams in the Indian Premier League.

Uploaded by

noelty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

BCA 8th Project report(Linear regression)

This project report presents the development of an IPL Prediction System using machine learning techniques to analyze past match data and predict outcomes. The report details the methodologies employed, including data collection, preprocessing, and various machine learning algorithms used for prediction, such as Logistic Regression and Decision Trees. It aims to provide insights into player performance and match outcomes, contributing to strategic planning for teams in the Indian Premier League.

Uploaded by

noelty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

TRIBHUVAN UNIVERSITY

Faculty of Humanities & Social Science

A project report on

“Predicting Results of IPL Matches”

Submitted to

Xavier International College


Department of Computer Application
Bouddha, Kathmandu

In partial fulfillment of the requirements for the Bachelor’s in Computer Application


Submitted By
Sameer Ansari ()
Sonam Dorje Lama ()

Under the guidance of


Mr. Amit Chaudhary
TRIBHUVAN UNIVERSITY
Faculty of Humanities & Social Science

Xavier International College


Department of Computer Application

Supervisor Recommendation

I hereby recommend that this project report under my supervision by Sameer Ansari and Sonam Dorje
Lama entitled “IPL Prediction System” in partial fulfillment of the requirement for a Bachelor's Degree
in Computer Application of Tribhuvan University be processed for evaluation

………………….
Mr. Amit Chaudhary

Project Supervisor

Xavier International College

Bouddha, Kathmandu
TRIBHUVAN UNIVERSITY
Faculty of Humanities & Social Science

Xavier International College


Department of Computer Application

LETTER OF APPROVAL

This is to certify that this project prepared by Sameer Ansari and Sonam Dorje Lama
entitled “IPL Prediction System” in partial fulfillment of the requirements for the degree of
Bachelor in Computer Application has been evaluated. In our opinion, it is satisfactory in the
scope and quality of a project for the required degree.

Amit Chaudhary Tika Thapa


Supervisor Coordinator
Xavier International College Xavier International College

Internal Examiner External Examiner


Xavier International College Tribhuvan University
ACKNOWLEDGMENT

We would like to express our deepest appreciation to all those who provided us with the
possibility to complete this report. A special gratitude is given to our final project supervisor,
Mr. Amit Chaudhary, whose contribution in stimulating suggestions and encouragement,
helped us to contribute to our project, especially in writing this report.

Further more, we would also like to acknowledge with much appreciation the crucial role of
the coordinator, who gave the permission to use all required equipment and the necessary
materials to complete our project. Special thanks to our Academic Manager, Mr. Tyson
lama, who gave us valuable suggestions regarding the project. Last but not least, many
thanks go to our teachers, friends, and guardians who directly or indirectly helped us in
achieving the goal. We would like to thank all the guidance which has improved our
presentation skills thanks to their comment and advice.

i
ABSTRACT

In today’s date data analysis is need for every data analytics to examine the sets of data to extract the
useful information from it and to draw conclusion according to the information. Data analytics
techniques and algorithms are more used by the commercial industries which enables them to take
precise business decisions. It is also used by the analysts and the experts to authenticate or negate
experimental layouts, assumptions and conclusions. In recent years the analytics is being used in the
field of sports to predict and draw various insights. Due to the involvement of money, team spirit,
city loyalty and a massive fan following, the outcome of matches is very important for all stake
holders. In this paper, the past seven year’s data of IPL containing the player’s details, match venue
details, teams, ball to ball details, is taken and analyzed to draw various conclusions which help in
the improvement of a player’s performance. Various other features like how the venue or toss decision
has influenced the winning of the match in last seven years are also predicted. Various machine
learning and data extraction models are considered for prediction are Linear regression, Decision tree,
K-means, Logistic Regression etc. The cross validation score and the accuracy are also calculated
using various machine learning algorithms. Before prediction we have to explore and visualize the
data because data exploration and visualization is an important stage of predictive modeling.

ii
TABLE OF CONTENTS

TITLE PAGE NO.

Abstract i
Acknowledgement ii

Chapter 1 1-2
INTRODUCTION 1
1.1 Introduction 1
1.2 Plan of Implementation 2
1.3 Problem Statement 2
1.4 Objective of the Program 2

Chapter 2 3-4
BACKGROUND STUDY & LITERATURE REVIEW 3

Chapter 3 5-7
SYSTEM ANALYSIS AND DESIGN 5
3.1 Data Collection 5
3.2 Data Processing 5
3.3 Data Visualization 6
3.4 Model Development and Evaluation 7

Chapter 4 8-10
SYSTEM REQUIREMENTS SPECIFICATION 8
4.1 Functional Requirements 8
4.2 Non- Functional Requirements 9
4.3 System Configuration 10
4.4 Hardware Requirements 10
4.5 Software Requirements 10

Chapter 5 11-12
SYSTEM DESIGN 11
5.1 System Development Methodology 11

Chapter 6 13-18
IMPLEMENTATION & TESTING 13

Chapter 7 19-24
RESULTS 19

Chapter 8 25-25
FUTURE SCOPE AND CONCLUSION 25

REFERENCES 26-26
LIST OF FIGURES

FIGURE NO. TITLE PAGE NO.

Fig-1 Process of predicting IPL Team 5

Fig-2 Relation between match winning and toss winning 6

Fig-3 System Architecture 12


Chapter 1
INTRODUCTION

Introduction
Machine Learning is a branch of Artificial Intelligence that aims at solving real-life
engineering problems. This technique requires no programming, whereas it depends on only data
learning where the machine learns from pre-existing data and predicts the result accordingly.
Machine Learning methods have benefit of using decision trees, heuristic learning, knowledge
acquisition, and mathematical models. It thus provides controllability, observability, stability and
effectiveness.

Cricket is being played in many countries around the world. There are a lot of domestic and
international cricket tournaments being held in many countries. The cricket game has various forms
such as Test Matches, Twenty20 Internationals, Internationals one day, etc. IPL is also one of them,
and has great popularity among them. It's a twenty-20 cricket game league played to inspire young
and talented players in India. The league was conducted annually in March, April or May and has
a huge fan base among India. There are eight teams which represent eight cities which are chosen
from an auction. These teams compete against each other for the trophy. The whole match depends
on the luck for the team, player’s performance and lot more parameters that will be taken in to the
consideration. The match that is played before the day is also will make a change in the prediction.
The stakeholders are much more benefited due to the huge popularity and the huge presence of
people at the venue. The accuracy of a data depends on the size of the data we take for analysing
and the records that are taken for predicting the outcome.

Cricket is a game played between two teams comprising of 11 players in each team. The result is
either a win, loss or a tie. However, sometimes due to bad weather conditions the game is also
washed out as Cricket is a game which cannot be played in rain. Moreover, this game is also
extremely unpredictable because at every stage of the game the momentum shifts to one of the
teams between the two. A lot of times the result gets decided on the last ball of the match where the
game gets really close. Considering all these unpredictable scenarios of this unpredictable game,
there is a huge interest among the spectators to do some prediction either at the start of the game or
during the game. Many spectators also play betting games to win money.

1
Plan of Implementation
The project can be broken down into 7 main steps which are as follows:

1. Understand the dataset.


2. Clean the data.
3. Analyse the candidate columns to be Features.
4. Process the features as required by the model/algorithm.
5. Train the model/algorithm on training data.
6. Test the model/algorithm on testing data.
7. Tune the model/algorithm for higher accuracy.

Problem Statement
To predict the results of an IPL match using machine learning techniques or algorithms such
as Logistic Regression, Gaussian Naive Bayes, K Nearest Neighbours, SVM, Gradient boost
algorithm, Decision tree and Random forest.

We have used 17 features which are as follows: season, city, date, team1, team2, toss_winner,
toss_decision, result, dl_applied, winner, win_by_runs, win_by_wickets, player_of_match, venue,
umpire1, umpire2 and umpire3.

Objective of the Project


The main objective of this project is to give the team players information about how each
venue makes a difference to the game. And give feedback of how the players can improve their
own performance in each game. And also give have a better planning of how the match should be
played overall by the whole team regardless of the toss decision.

2
Chapter 2
BACKGROUND STUDY &
LITERATURE SURVEY
In order to get required knowledge about various concepts related to the present application,
existing literature were studied. Some of the important conclusions were made through those are
listed below.

1. Kalpdrum Passi and Niravkumar Pandey discussed about the prediction accuracy in
terms of runs scored by batsman and the no. of wickets taken by the bowler in each team
[1].

2. P. Wickramasinghe proposed a methodology to predict the performance of batsman for the


previous five years using hierarchial linear model [2].

3. R.P.Schumaker et. al, discussed about different statistical simulations used in predictive
modeling for different sports [3].

4. John McCullagh implemented neural networks and datamining techniques to identify the
talent and also for the selection of players based on the talent in Australian Football
League[4].

5. Bunker et. al, proposed a novel sport prediction framework to solve specific challenges and
predict sports results [5].

6. Ramon Diaz-Uriarte et. al, investigated the use of random forest for classification of
microarray data and proposed a new method of gene selection in classification problem
based on random forest [6].

7. Rabindra Lamsal and Ayesha Choudhary, proposed a solution to calculate the weightage
of a team based on the player’s past performance of IPL using linear regression [7].

8. Akhil Nimmagadda et. Al, proposed a model using Multiple Variable Linear Regression
and Logistic regression to predict the score in different innings and also the winner of the
match using Random Forest algorithm [8].

3
9. Ujwal U J et. Al, predicted the outcome of the given cricket match by analyzing previous
cricket matches using Google Prediction API [9].

10. Rameshwari Lokhande and P.M.Chawan came up with live cricket score predicton using
linear regression and Naïve Bayes classifier [10].

11. Abhishek Naik et. Al, proposed a new model used matrix factorization technique to analyze
and predict the winner in ODI cricket match [11].

12. Esha Goel and Er. Abhilasha discussed the improvements in Random Forest
Algorithmand described the usage in various fields like agriculture, astronomy, medicine,
etc. [12].

13. Amit Dhurandhar and Alin Dobra proposed a new methodology for analysing the error
of classifiers and model selection measures to analyse the decision tree algorithm [13].

14. H. Yusuff et. Al, performed logistic regression using mammograms to find the accuracy
with valid samples [14].

4
Chapter 3
APPROACH AND DESIGN

The below figure explains the approach we have taken into building the predictive model using
machine learning algorithms.

Data Collection
Data collection is the process of gathering and measuring information from countless
different sources. In order to use the data, we collect to develop practical machine learning
solutions.

Collecting data allows you to capture a record of past events so that we can use data analysis to
find recurring patterns. From those patterns, you build predictive models using machine learning
algorithms that look for trends and predict future changes.

The Indian Premier League's official website is the principal basis of data for this project. The
data was web scrapped from the website and kept in the appropriate format using a python library
called beautiful soup. The dataset has the columns regarding match-number, IPL season year, the
place where match has been held and the stadium name, the match winner details, participating

5
teams, the margin of winning and the umpire details, player of the match etc. Indian Premier
League was only 11 years old, which is why, after the pre-processing, only 577 matches were
available. Here, some of the columns may contain null values and some of the attributes may not
be required for match winner prediction which is discussed in data preprocessing.

Data Preprocessing

Data cleaning

There are some null values in the dataset in the columns such as winner, city, venue
etc. Due to the presence of these null values, the classification cannot be done
accurately. So, we tried to replace the null values in different columns with dummy
values.

Choosing Required Attributes

This step is the main part where we can eliminate some columns of the dataset that
are not useful for the estimation of match winning team. This is estimated using
feature importance. The considered attributes have the following feature importance.

Data Visualization
 The data which has been collected is used for visualizing for the better understanding
of the information.

Fig 2 - Relation between toss winning and match winning

6
 Matplotlib Library is used here for visualizing the graphs
 The data visualization is necessary to understand the solution in a better way. The below
graphs were drawn based up on the previous seasons of the IPL matches.

Model Development and Evaluation

Here, we have developed a generic model and applied all classification methods. The data
is split into training data and test data, we train the model using certain features and use it to
predict the testing data, then we calculate the performance of the system. The various classification
models used are: Logistic Regression, Gaussian Naïve Bayes Classifier, KNN (K Nearest
Neighbor) algorithm, Support Vector Machines, Gradient Boost Algorithm, Decision Trees and
Random Forest Classifier. Among these methods the Random Forest and Decision tree has given
good results.

7
Chapter 4

SYSTEM REQUIREMENT SPECIFICATION

A System Requirement Specification (SRS) is basically an organization’s understanding


of a customer or potential client’s system requirements and dependencies at a particular point prior
to any actual design or development work. The information gathered during the analysis is
translated into a document that defines a set of requirements. It gives the brief description of the
services that the system should provide and also the constraints under which, the system should
operate. Generally, SRS is a document that completely describes what the proposed software
should do without describing how the software will do it. It’s a two-way insurance policy that
assures that both the client and the organization understand the other’s requirements from that
perspective at a given point in time.

SRS document itself states in precise and explicit language those functions and capabilities a
software system (i.e., a software application, an ecommerce website and so on) must provide, as
well as states any required constraints by which the system must abide. SRS also functions as a
blueprint for completing a project with as little cost growth as possible. SRS is often referred to
as the “parent” document because all subsequent project management documents, such as design
specifications, statements of work, software architecture specifications, testing and validation
plans, and documentation plans, are related to it.

Requirement is a condition or capability to which the system must conform. Requirement


Management is a systematic approach towards eliciting, organizing and documenting the
requirements of the system clearly along with the applicable attributes. The elusive difficulties of
requirements are not always obvious and can come from any number of sources.

Functional Requirements
Functional Requirement defines a function of a software system and how the system must behave
when presented with specific inputs or conditions. These may include calculations, data
manipulation and processing and other specific functionality.

8
Following are the functional requirements on the system:

1. The whole process can be handled at minimal human interaction with android and web both.
2. The application automatically receives the captured data from server.
3. The user can call emergency, map location and ECG graph on demand
4. The system gives a warning message.

Non Functional Requirement

Non-functional requirements are the requirements which are not directly concerned with
the specific function delivered by the system. They specify the criteria that can be used to judge
the operation of a system rather than specific behaviours. They may relate to emergent system
properties such as reliability, response time and store occupancy. Non-functional requirements
arise through the user needs, because of budget constraints, organizational policies, the need for
interoperability with other software and hardware systems or because of external factors such
as :-
 Performance Requirements
 Design Requirements
 Security Constraints
 Basic Operational Requirements

Product Requirements

 Platform independency: A progressive web app will be developed and deployed so


that users with a smartphone or a computer can access the voting site to cast their vote.
 Ease of use: The progressive web app provides an interface which is easy to use and
eliminates the need for the voter to go to a voting booth.
 Modularity: The complete product is broken up into modules and well-defined
interfaces are developed to explore the benefit of flexibility of the product.
 Robustness: This software is being developed in such a way that the overall
performance is optimized, and the user can expect the results within a limited time with
utmost relevancy and correctness.

9
System Configuration

H/W System Configuration:

 Processor - Pentium – IV


 Speed - 1.1 GHz
 RAM - 256 MB (min)
 Hard Disk - 20 GB

S/W System Configuration:


 Operating System - XP/7/8/8.1/10
 Coding Language - Python

Hardware Requirements
 Processors - Pentium IV Processor
 Speed - 3.00 GHZ
 RAM - 2 GB
 Storage - 20 GB

Software Requirements
 Operating system - Windows 10 Professional
 IDE used - Visual Studio Code

10
Chapter 5

SYSTEM DESIGN
Design is a meaningful engineering representation of something that is to be built. It is the most
crucial phase in the developments of a system. Software design is a process through which the
requirements are translated into a representation of software. Design is a place where design is
fostered in software Engineering. Based on the user requirements and the detailed analysis of the
existing system, the new system must be designed. This is the phase of system designing. Design
is the perfect way to accurately translate a customer’s requirement in the finished software
product. Design creates a representation or model, provides details about software data structure,
architecture, interfaces and components that are necessary to implement a system. The logical
system design arrived at as a result of systems analysis is converted into physical system design.

System development methodology


System development method is a process through which a product will get completed or
a product gets rid from any problem. Software development process is described as a number of
phases, procedures and steps that gives the complete software. It follows series of steps which is
used for product progress. The development method followed in this project is waterfall model.

Model phases

The waterfall model is a successive programming improvement process, in which advance is


seen as streaming relentlessly downwards (like a waterfall) through the periods of Requirement
start, Analysis, Design, Implementation, Testing and upkeep.

Prerequisite Analysis: This stage is worried about gathering of necessity of the framework.
This procedure includes producing record and necessity survey.

Framework Design: Keeping the prerequisites at the top of the priority list the framework
details are made an interpretation of into a product representation. In this stage the fashioner
underlines on calculation, information structure, programming design and so on.

11
Coding: In this stage developer begins his coding with a specific end goal to give a full portray
of item. At the end of the day framework particulars are just changed over into machine
coherent register code.

Usage: The execution stage includes the genuine coding or programming of the product. The
yield of this stage is regularly the library, executables, client manuals and extra programming
documentation.

Testing: In this stage all projects (models) are coordinated and tried to guarantee that the
complete framework meets the product prerequisites. The testing is worried with check and
approval.

Support: The upkeep stage is the longest stage in which the product is upgraded to satisfy the
changing client need, adjust to suit change in the outside environment, right mistakes and
oversights beforehand undetected in the testing stage, improve the proficiency of the product.

System Architecture

Fig 3- System Architecture

12
Chapter 6
IMPLEMENTATION

13
14
15
16
17
18
Chapter 7
RESULTS

MODEL ACCURACY

LOGISTIC REGRESSION 30.329%

GAUSSIAN NAÏVE BAYES 20.264%


ALGORITHM

KNN ALGORITHM 62.565%

SUPPORT VECTOR 89.081%


MACHINE

GRADIENT BOOST 89.601%


ALGORITHM

DECISION TREE 89.601%


ALGORITHM

RANDOM FOREST 89.601%


CLASSIFIER

19
20
21
22
23
24
Chapter 8

FUTURE SCOPE AND CONCLUSION


Selection of the best team for a cricket match plays a significant role for the team’s victory.
The main goal of this paper is to analyse the IPL cricket data and predict the players’
performance. Here, three classification algorithms are used and compared to find the best
accurate algorithm. The implementation tools used are Anaconda navigator and Jupyter.
Random Forest is observed to be the best accurate classifier with 89.15% to predict the best
player performance. This knowledge will be used in future to predict the winning teams for the
next series IPL matches. Hence using this prediction, the best team can be formed. This project
opens scope for future work in the field of cricket and predicting other important things like
best team of players, best venue, best city, best fielding decision to win a match.

25
REFERENCES
T. A. Severini, Analytic methods in sports: Using mathematics and statistics to understand
data from baseball, football, basketball, and other sports. Chapman and Hall/CRC, 2014.
8. H. Ghasemzadeh and R. Jafari, “Coordination analysis of human movements with body
sensor networks: A signal processing model to evaluate baseball swings,” IEEE Sensors
Journal, vol. 11, no. 3, pp. 603–610, 2010
9. R. Rein and D. Memmert, “Big data and tactical analysis in elite soccer: future challenges
and opportunities for sports science,” SpringerPlus, vol. 5, no. 1, p. 1410, 2016
Veppur Sankaranarayanan, Vignesh and Sattar, Junaed and
Lakshmanan,”Auto-play: A Data Mining Approach to ODI Cricket
Simulation and Prediction”,SIAM Conference on Data Mining, 2014
K. A. A. D. Raj and P. Padma, ”Application of Association Rule
Mining: A case study on team India”, 2013 International Conference
on Computer Communication and Informatics, 2013
Tim B. SWARTZ, Paramjit S Gill and S. Muthukumarana,”Modelling
and simulation for one-day cricket”, Canadian Journal of Statistics, 2009, Vol 37, No 2,
pp-143-160

26

You might also like