Final Prjoect
Final Prjoect
Final Submission
This document briefs about the Exploratory Data Analysis (EDA) of the
given cricket data set. The data set contains the match data of Indian
Team Played. Using the past data an appropriate model is built on the
data set and strategies are recommended
Windows User
6/3/2022
Table of Contents
List of Figures: .................................................................................................................................................................. 3
1. Introduction ............................................................................................................................................................. 5
2. EDA and Business Implication ................................................................................................................................... 5
a) Visual inspection of data (rows, columns, descriptive details): .............................................................................. 6
Exploratory data analysis ............................................................................................................................................. 7
a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for
categorical ones): ......................................................................................................................................................... 7
b) Bivariate analysis (relationship between different variables, correlations) ...........................................................10
Multi-Variate Analysis: ............................................................................................................................................10
Pair Plot: .................................................................................................................................................................11
3. Data Cleaning and Pre-processing ...........................................................................................................................12
a) Missing Value treatment (if applicable): ..............................................................................................................12
Outlier Treatment (if applicable): ................................................................................................................................13
b) Understanding of attributes (variable info, renaming if required): .......................................................................14
c) Variables removed or added and why (if any) ......................................................................................................15
4. Clear on why was a particular model(s) chosen. ......................................................................................................15
a) Build various models (You can choose to build models for either or all of descriptive, predictive or prescriptive
purposes): ...................................................................................................................................................................15
a) Logistic Regression Model using Sklearn: .........................................................................................................16
b) Test your predictive model against the test set using various appropriate performance metrics ..........................17
c) Interpretation of the model(s): ............................................................................................................................18
Effort to improve model performance.............................................................................................................................18
a) and b) Ensemble modeling, wherever applicable: ................................................................................................18
i) Decision Tree Model: .......................................................................................................................................18
ii) Random Forest Model: ....................................................................................................................................21
iii) ANN Model:.....................................................................................................................................................23
5. Model validation - Interpretation of the most optimum model and its implication on the business .........................26
6. Final interpretation / recommendation ...................................................................................................................26
1 Test match with England in England. All the match are day matches. In England, it will be rainy season at the
time to match. ............................................................................................................................................................26
Output of the Model: .......................................................................................................................................28
1
2 T20 match with Australia in India. All the match are Day and Night matches. In India, it will be winter season at
the time to match.. .....................................................................................................................................................28
Output of the Model: .......................................................................................................................................29
2 ODI match with Sri Lanka in India. All the match are Day and Night matches. In India, it will be winter season at
the time to match. ......................................................................................................................................................30
Output of the Model: .......................................................................................................................................31
2
List of Figures:
4
1. Introduction
Business Problem:
BCCI has come up with a problem on how to increase the Team India winning Probability. For the same, BCCI
had a tie up with Data Analytics Consultant. The major objective of this tie up is to extract actionable insights from the
historical match data and make strategic changes to make India win. Primary objective is to create a Machine Learning
model which correctly predicts a win for the Indian Cricket Team. Once a model is developed then you have to extract
actionable insights and recommendation.
Constraints:
• The data set provided has slight imbalance. 83% of the data relates to the matches where India has won. Rest
17% of the data relates to the matches where India lost. So this may over train the model and bias the output
towards Winning
• Also, another constraint is Data pertaining to some formats is not available. One such scenario is as per the
problem requirement; it is needed to predict the India winning strategy against Australia in T20 format. But the
dataset has no record of India played T20 with Australia in the past. So this is a constraint that will not allow
splitting the data into format wise and doing analysis. Also, we cannot able to build three separate models based
on format type.
The scope of the business problem is to draw actionable insights and recommendations from the data set and
build the models to predict the result of Team India.
The main objective is providing the winning strategy for the upcoming matches India will play with their
opponents. The strategy should be different when you play a match with same opponent and same parameters.
The dataset is provided in the excel format and same is loaded into jupyter notebook for further analysis. The
dataset has 23 columns, with variable factors related to the match and which will affect the India’s win with the
opponents. The ‘Result’ column of the data set is the target variable which will be used for model building.
5
a) Visual inspection of data (rows, columns, descriptive details):
The dataset has 2930 rows and 23 columns.
As per the info command, we can see data types of float – 9, int-4, object -10
A descriptive Analysis is performed on the data set. Below are the insights.
Average Age of the team is 30 and ranges from 12 to 70. Seems like there are outliers in the age column
Bowlers_in_team has a mean of 3 and ranges from 1 to 5. Most of the times team preferred for 3
bowlers in the team and can see that winning rate it is high. So 3 bowlers will be a good number to win
the match
Wicket_Keeper_team is always 1. So we can exclude this variable from our analysis as it has no impact
on target variable
All_rounder_in_team average is 3 and ranges between 1 and 4
Max_run_scored_1 over has a mean of 16 and ranges between 11 and 25.
Max_wicket_taken_1 over has a mean of 3 and ranges between 1 and 4.
Extra_bowls_bowled has a mean of 11 and ranges between 7 and 40.
Min_run_given_1 over has a mean of 2 and ranges between 0 and 4.
6
Min_run_scored_1 over has a mean of 2 and ranges between 1 and 4.
Max_run_given_1 over has a mean of 5 and ranges between 6 and 40.
Extra_bowls_opponent has a mean of 4 and ranges between 0 and 18.
Player_highest_run has a mean of 65 and ranges between 30 and 100.
Based on the data set, most of the matches are won by India against their opponents. We need to check the
data balance in model prediction.
7
Figure 6 – Univariate Analysis for Float and Integer data types
8
Figure 7 – Univariate Analysis for Categorical Variables
A bar plot is drawn between Opponent and Bowlers_in_team variable considering result as the hue.
Multi-Variate Analysis:
Also, a correlation plot is drawn for all the variables. From the plot following are the observations
10
Figure 9 – Heat Map
Pair Plot:
A pair plot is also drawn for viewing the data distribution which we already got from skewness calculation. For
complete picture refer jupyter notebook
11
3. Data Cleaning and Pre-processing
Total Percentage of missing values in the dataset is: 1.17 %, which is negligible. But in this data set Avg_team_age
column has 97 missing values. So deleting the missing value rows is not a good choice. So we will impute the null
values accordingly.
All categorical variables are imputed with the mode values and non-categorical variables are imputed with Median
values.
12
Outlier Treatment (if applicable):
The best way to check the outliers is box plot. Hence box plot is drawn for all the numeric variables.
13
Figure 13 – Box plot for ‘Avg_team_Age’ column after outlier treatment
Observations:
Match format type T20 has two ways of entry: T20 and 20-20
First_Selection has two ways of entry for batting: Batting and Bat
Player_scored_Zero has two ways of entry for three members: 3 an Three
Player_Highest_Wicket has two ways of entry for three : 3 an Three
14
So these information needs to be replaced in the data. Data after processing the data
a) Build various models (You can choose to build models for either or all of descriptive,
predictive or prescriptive purposes):
a) We have built four models “Decision Tree, Random Forest, ANN and Logistic Regression(both sklearn and stats)
and will evaluate the best model based on the model metrics.
b) All the ‘Object’ variables are encoded using ‘One hot encoding method’ and the target variable is encoded using
‘Label Encoder’ method.
c) For the model building, performed train test split is done in the ratio of 70:30
Imp Note: I have built the model on the data without splitting the dataset based on the Match format type.
This is because, in one of the problem statements, it asked to provide the winning strategy of team India
against Australia in T20. But as per the source data set we don’t have any records of India playing with
15
India so splitting the data set based on format wise will not give the accurate the results. So, build model
without splitting the data.
Using the best Params, found the best model and below are the best parameters. L2-penalty, saga-solver and
tolerance of 1e-05 are the best parameters and the prediction is made using this model.
Classification Report:
The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.88, recall =0.98, f1= 0.9
16
AUC and ROC Curve:
The AUC of the model on Train data is 84.36% on the train data. Below is the ROC curve of the Train data
b) Test your predictive model against the test set using various appropriate
performance metrics
Classification Report:
The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.89, recall =0.97, f1= 0.93
The AUC of the model on Train data is 84.32% on the test data. Below is the ROC curve of the Test data
17
Figure 24 – Logistic Model – ROC Curve of Test Data
'min_samples_split': [150,300,450],
18
Figure 25 – Decision Tree Model – “GridSearchCV” Method
After multiple iterations best Parameters are identified to build the model and below are the best parameters..
Classification Report:
The model has an accuracy of 85% on the train data. Correspondingly, precision= 0.87, recall =0.97, f1= 0.91
The AUC of the model on Train data is 78.70% on the train data. Below is the ROC curve of the Train data
19
Figure 29 – Decision Tree Model – ROC Curve of Train Data
Classification Report:
The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.86, recall =0.97, f1= 0.91
The AUC of the model on Train data is 75.40% on the Test data. Below is the ROC curve of the Test data
20
Figure 32 – Decision Tree Model – ROC Curve of Test Data
Classification Report:
The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.84, recall =1, f1= 0.91
21
Figure 35 – Random Forest Model – Classification Report of Train Data
The AUC of the model on Train data is 84.98% on the train data. Below is the ROC curve of the Train data
Classification Report:
The model has an accuracy of 83% on the train data. Correspondingly, precision= 0.83, recall =1, f1= 0.91
22
Figure 38 – Random Forest Model – Classification Report of Test Data
The AUC of the model on Train data is 83.11% on the Test data. Below is the ROC curve of the Test data
23
Figure 41 – ANN Model – Confusion Matrix of Train Data
Classification Report:
The model has an accuracy of 87% on the train data. Correspondingly, precision= 0.88, recall =0.98, f1= 0.93
The AUC of the model on Train data is 84.30% on the train data. Below is the ROC curve of the Train data
24
Figure 44 – ANN Model – Confusion Matrix of Test Data
Classification Report:
The model has an accuracy of 84% on the train data. Correspondingly, precision= 0.89, recall =0.97, f1= 0.93
The AUC of the model on Train data is 84.07% on the train data. Below is the ROC curve of the Train data
25
5. Model validation - Interpretation of the most optimum model and its
implication on the business
As mentioned above all the models are built on the train and test dataset. The metrics of each model are
compared based on the Accuracy, precision and recall values. Below Figure is the comparison of metrics
between the models.
On comparison, it is observed that Accuracy is high in ANN and Logistic Regression models. Also, the precision is
high in these two models compared to Decision Tree and Random Forest models. So being a binomial
classification variable, I have opted for Logistic Regression to build the strategy.
Logistic Model has highest accuracy of 87% and a Precision of 0.89 and also the model performance is stable
across train and test data
Apart from the accuracy, especially, for this business case we have to consider Precision and recall. As, this
provide true positive and negative values. Since both precision and recall are high Logistic SK learn model is considered.
Also, I have not considered the stats model as I was forced to remove the important variables due to high alpha
values which indirectly, a wrong indication about the model.
1 Test match with England in England. All the match are day matches. In England, it
will be rainy season at the time to match.
To build the strategy against the England, an excel sheet is developed as the actual test data.
26
In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.
Opponent_England England 1
Match_Format_Test Test 1
Match_light_Type_Day Day 1
Offshore_Yes England 1
Season_Rainy Rainy 1
The variables used for the model building are show in the below Figure.
27
By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.
Match_li Max_wic Max_wic Max_wic
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen ket_take ket_take ket_take Players_s Players_s Players_s player_hi player_hi player_hi player_hi
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes in_team_ in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_ Offshore n_1over_ n_1over_ n_1over_ cored_ze cored_ze cored_ze ghest_wi ghest_wi ghest_wi ghest_wi
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t 2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter _Yes 2 3 4 ro_2 ro_3 ro_4 cket_2 cket_3 cket_4 cket_5
30 11 24 3 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0
50 18 22 2 3 12 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0
50 20 27 2 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0
50 15 10 2 2 10 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0
50 19 10 6 4 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0
50 13 8 1 3 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
50 20 6 0 3 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
50 15 10 3 2 10 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
50 20 17 2 3 17 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0
50 20 27 2 2 6 15 10 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0
Figure 49 – Actual Test Data to predict the Winning Strategy against England
Ma tch_l i g
ht_type_D Ma tch_l i g Opponent Opponent Opponent Opponent Bowl ers _i Bowl ers _i Bowl ers _i Bowl ers _i Al l _round Al l _round Al l _round Fi rs t_s el ec Ma x_run_s Mi n_run_s extra _bow Ma x_wi cke Ma x_wi cke Ma x_wi cke Pl a yers _s c Pl a yers _s c Pl a yers _s c pl a yer_hi g pl a yer_hi g pl a yer_hi g pl a yer_hi g
Res ul ts _Pr a y a nd ht_type_Ni Ma tch_for Ma tch_for _Ba ngl a de Opponent Opponent Opponent _South Opponent _Wes t _Zi mba bw Sea s on_Su Sea s on_W Offs hore_Y n_tea m_2. n_tea m_3. n_tea m_4. n_tea m_5. er_i n_tea er_i n_tea er_i n_tea tion_Bowl i Extra _bowl Avg_tea m_ cored_1ov Mi n_run_g cored_1ov Ma x_run_g l s _oppone pl a yer_hi g t_ta ken_1 t_ta ken_1 t_ta ken_1 ored_zero ored_zero ored_zero hes t_wi ck hes t_wi ck hes t_wi ck hes t_wi ck
ed Ni ght ght ma t_T20 ma t_Tes t s h _Engl a nd _Kenya _Pa ki s tan Afri ca _Sri l a nka Indi es e mmer i nter es 0 0 0 0 m_2.0 m_3.0 m_4.0 ng s _bowl ed Age er i ven_1over er i ven_1over nt hes t_run over_2 over_3 over_4 _2 _3 _4 et_2 et_3 et_4 et_5
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 24 30 11 3 2 6 15 10 0 0 1 0 0 1 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 22 50 18 2 3 12 15 10 0 0 1 0 1 0 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 27 50 20 2 2 6 15 10 0 0 1 1 0 0 0 0 1 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 15 2 2 10 15 10 0 1 0 1 0 0 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 19 6 4 6 15 10 0 0 1 1 0 0 0 1 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 8 50 13 1 3 6 15 10 0 0 0 1 0 0 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 6 50 20 0 3 6 15 10 0 1 0 0 0 1 1 0 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 10 50 15 3 2 10 15 10 0 0 1 0 1 0 1 0 0 0
0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 17 50 20 2 3 17 15 10 0 0 0 1 0 0 0 1 0 0
1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 27 50 20 2 2 6 15 10 0 0 1 1 0 0 0 0 1 0
2 T20 match with Australia in India. All the match are Day and Night matches. In
India, it will be winter season at the time to match..
To build the strategy against the Australia, an excel sheet is developed as the actual test data.
In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.
Opponent_England Australia 1
Match_Format_T20 T20 1
Match_light_Type_Day Day and Night 1
Offshore_Yes India 0
28
Season_Winter Winter 1
By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.
Match_li
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes Bowlers_in_te in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t am_2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter
30 24 31 0 2 29 10 83 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 6 3 2 6 4 48 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1
30 17 20 6 3 6 0 60 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
30 16 5 1 3 6 3 62 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
30 13 6 2 3 6 2 93 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
30 22 21 3 3 6 16 55 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
30 13 10 0 1 6 3 80 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 6 2 3 6 3 42 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1
30 12 12 2 3 6 0 66 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
30 11 1 5 3 6 0 32 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 14 9 2 2 7 7 87 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
30 12 35 2 2 9 8 39 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 23 28 0 3 26 15 69 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
30 11 5 3 2 6 2 95 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
30 12 4 2 2 6 2 83 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
Figure 51 – Actual Test Data to predict the Winning Strategy against Australia
Result Avg_team_Age
Max_run_scored_1over
Extra_bowls_bowled
Min_run_given_1over
Min_run_scored_1over
Max_run_given_1over
extra_bowls_opponent
player_highest_run
Match_light_type_Day
Match_light_type_Night
Match_format_T20
and Night
Match_format_Test
Bowlers_in_team_2.0
Bowlers_in_team_3.0
Bowlers_in_team_4.0
Bowlers_in_team_5.0
All_rounder_in_team_2.0
All_rounder_in_team_3.0
All_rounder_in_team_4.0
First_selection_Bowling
Opponent_Bangladesh
Opponent_England
Opponent_Kenya
Opponent_Pakistan
Opponent_South
Opponent_Srilanka
Africa
Opponent_West
Opponent_Zimbabwe
Indies
Season_Summer
Season_Winter
Offshore_Yes
Max_wicket_taken_1over_2
Max_wicket_taken_1over_3
Max_wicket_taken_1over_4
Players_scored_zero_2
Players_scored_zero_3
Players_scored_zero_4
player_highest_wicket_2
player_highest_wicket_3
player_highest_wicket_4
player_highest_wicket_5
1 30 24 31 0 2 29 10 83 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1
1 30 12 6 3 2 6 4 48 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
1 30 17 20 6 3 6 0 60 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0
1 30 16 5 1 3 6 3 62 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0
1 30 13 6 2 3 6 2 93 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
1 30 22 21 3 3 6 16 55 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1
1 30 13 10 0 1 6 3 80 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0
1 30 12 6 2 3 6 3 42 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 12 2 3 6 0 66 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0
1 30 11 1 5 3 6 0 32 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0
1 30 14 9 2 2 7 7 87 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0
1 30 12 35 2 2 9 8 39 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0
1 30 23 28 0 3 26 15 69 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0
1 30 11 5 3 2 6 2 95 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0
2 ODI match with Sri Lanka in India. All the match are Day and Night matches. In
India, it will be winter season at the time to match.
To build the strategy against the Srilanka, an excel sheet is developed as the actual test data.
In the excel sheet, since the one-hot encoding is done on the object variables, the parameters mentioned in the
problem statement are marked as ‘1’. Rest of them is marked with 0’s and 1’s as per the strategy plan.
Opponent_Srilanka Srilanka 1
Match_Format_ODI ODI 1
Match_light_Type_Day_Night Day and Night 1
Offshore_Yes India 0
Season_Winter Winter 1
By fixing the problem variables, rest of the variables are changed to build enough strategy and a csv test file is
built to predict the output using the Logistic Regression model.
Match_li Max_wic Max_wic Max_wic
Max_run Extra_bo Min_run_ Min_run_ Max_run extra_bo player_hi ght_type Match_li Match_fo Bowlers_ Bowlers_ Bowlers_ Bowlers_ All_roun All_roun All_roun First_sel Opponen Opponen Opponen Opponen Opponen Opponen Opponen ket_take ket_take ket_take Players_s Players_s Players_s player_hi player_hi player_hi player_high
Avg_tea _scored_ wls_bowl given_1o scored_1 _given_1 wls_opp ghest_ru _Day and ght_type Match_fo rmat_Tes in_team_ in_team_ in_team_ in_team_ der_in_t der_in_t der_in_t ection_B t_Bangla t_Englan Opponen t_Pakista t_South t_Srilank t_West t_Zimbab Season_S Season_ Offshore n_1over_ n_1over_ n_1over_ cored_ze cored_ze cored_ze ghest_wi ghest_wi ghest_wi est_wicket_
m_Age 1over ed ver over over onent n Night _Night rmat_T20 t 2.0 3.0 4.0 5.0 eam_2.0 eam_3.0 eam_4.0 owling desh d t_Kenya n Africa a Indies we ummer Winter _Yes 2 3 4 ro_2 ro_3 ro_4 cket_2 cket_3 cket_4 5
30 11 9 3 3 9 7 82 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0
30 20 3 4 2 6 2 45 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
30 12 1 3 3 6 0 75 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 15 11 3 3 10 8 76 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 14 8 5 3 6 3 62 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
30 14 2 3 3 6 1 57 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
30 14 5 2 3 6 2 71 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
30 15 1 2 3 6 0 45 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
30 12 1 3 3 6 0 75 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 12 1 2 2 6 1 59 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
30 14 7 1 3 6 2 96 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
30 12 10 2 3 10 0 79 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0
30 12 4 2 3 6 0 61 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 20 5 2 3 6 3 94 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30 11 10 2 3 10 8 89 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
30
Figure 53 – Actual Test Data to predict the Winning Strategy against Srilanka
Results_Pred
Avg_team_Age
Max_run_scored_1over
Extra_bowls_bowled
Min_run_given_1over
Min_run_scored_1over
Max_run_given_1over
extra_bowls_opponent
player_highest_run
Match_light_type_Day
Match_light_type_Night
Match_format_T20
and NightMatch_format_Test
Bowlers_in_team_2.0
Bowlers_in_team_3.0
Bowlers_in_team_4.0
Bowlers_in_team_5.0
All_rounder_in_team_2.0
All_rounder_in_team_3.0
All_rounder_in_team_4.0
First_selection_Bowling
Opponent_Bangladesh
Opponent_England
Opponent_Kenya
Opponent_Pakistan
Opponent_South
Opponent_Srilanka
Africa
Opponent_West
Opponent_Zimbabwe
Indies
Season_Summer
Season_Winter
Offshore_Yes
Max_wicket_taken_1over_2
Max_wicket_taken_1over_3
Max_wicket_taken_1over_4
Players_scored_zero_2
Players_scored_zero_3
Players_scored_zero_4
player_highest_wicket_2
player_highest_wicket_3
player_highest_wicket_4
player_highest_wicket_5
1 30 11 9 3 3 9 7 82 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0
1 30 20 3 4 2 6 2 45 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
1 30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 30 12 1 3 3 6 0 75 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 15 11 3 3 10 8 76 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
1 30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 14 8 5 3 6 3 62 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
1 30 14 2 3 3 6 1 57 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0
1 30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
0 30 14 5 2 3 6 2 71 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 30 15 1 2 3 6 0 45 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1 30 12 1 3 3 6 0 75 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 1 2 2 6 1 59 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
1 30 14 7 4 3 7 7 60 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 30 14 7 1 3 6 2 96 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0
1 30 12 10 2 3 10 0 79 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0
1 30 12 4 2 3 6 0 61 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 20 5 2 3 6 3 94 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
1 30 11 10 2 3 10 8 89 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0
1 30 11 11 2 4 8 7 81 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0
1 30 13 6 2 4 6 2 91 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
31