0% found this document useful (0 votes)
82 views

Big Ipl

The document discusses regression models to analyze factors that influence the price at which players are sold in the Indian Premier League cricket tournament. A linear regression shows strike rate alone explains only 2.6% of price variation. The ability to hit "sixers" (home runs) correlates more strongly at 19.6% of price variation. A multiple regression of strike rate and sixers together explains 19% of price variation. Player age has a very weak 2.6% correlation with price. Players of Indian origin have higher average sold prices than international players. The best model to predict sold price for franchises explains 31.57% of price variation based on factors like strike rate, sixers, and player origin.

Uploaded by

Nishant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Big Ipl

The document discusses regression models to analyze factors that influence the price at which players are sold in the Indian Premier League cricket tournament. A linear regression shows strike rate alone explains only 2.6% of price variation. The ability to hit "sixers" (home runs) correlates more strongly at 19.6% of price variation. A multiple regression of strike rate and sixers together explains 19% of price variation. Player age has a very weak 2.6% correlation with price. Players of Indian origin have higher average sold prices than international players. The best model to predict sold price for franchises explains 31.57% of price variation based on factors like strike rate, sixers, and player origin.

Uploaded by

Nishant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PRICING OF PLAYERS IN THE INDIAN PREMIER LEAGUE

CASE QUESTIONS
1. Develop a simple linear regression model between the sold price and batting strike rate,
is there a statistically significant relationship between sold price and batting strike rate?
Answer:
Equation for estimating line:-

Y = β0 + β1 (X)

Sold Price = β0 + β1 (Strike rate)

β0 = 289510.4

β1 = 2086.5

So,

Sold Price = 289510.4 + 2086.5* (Strike rate)

We get R2=0.02641

Variation in Strike rate explains only 2.6% of variation in Sold Price. Therefore, variation in Strike
rate doesn’t explain most of the variations in Sold Price.

In the concerned analysis we can observe that R2=0.02641. This implies that variation in strike
rate explains only 2.6% of variation in sold price. This implies that the level of degree of
dependency between strike rates and sold price is only of 2.6%. Thus, they are not closely related.
In general variables with 70% or above level of dependency are considered to be closely related.
With only 2.6% of dependency, we can conclude that there are other factors which affect sold price
more than strike rate.
2. What is the impact of ability to score “SIXERS” on the player’s price?

Answer:

Equation for estimating line:-

Y = β0 + β1 (X)

Sold Price = β0 + β1 (Sixers)

β0 = 385115

β1 = 7693

So,

Sold Price = 385115 + 7693* (Strike rate)

We get R2= 0.1968

This implies that the level of degree of dependency of sixers on sold price is only of 19.6%.
3. Develop a multiple linear regression model between Sold price and batting striking rate
and Sixers? What do you conclude from this model?

Answer:

Equation for estimating line:-

Y = β0 + β1 (X1) + β2(X2)

Sold Price = β0 + β1 (strike rate) + β2 (Sixers)

β0 = 395327.0

β1 = -102.4

β2 = 7758.7

So,

Sold Price = 395327.0 + (-102.4) * (Strike rate) + 7758.7 * (sixers)


We get R2= 0.1906

This implies that the level of degree of dependency of batting strike rates and sixers on sold price
is only of 19%.

4. Cricket in the T20 format is considered a young man’s sport, is there evidence that the
player’s price is influenced by age?

Answer:

For Category 1: Age < 25, we have taken 1, for other category 0.

Equation for estimating line:-

Y = β0 + β1 (X)

Sold Price = β0 + β1 (Age)


β0 = 493290

β1 = 226961

So,

Sold Price = 493290 + 226961*(Age)

We get R2= 0.02631

This implies that the level of degree of dependency of player’s Age on sold price is only 2.6%.
So the age of the player’s hardly depends on the sold price.

5. Are players of Indian origin paid more than players from other countries?

Answer:

Calculating Mean values of each Category of Age

In the given data, a column was added where Countries cricketers belong to, were codified into
two categories:
Player of Indian Origin – represented by A

Others – represented by B (Clubbed into one category)

Mean for the above mentioned categories have been calculated individually. The mean value of
Sold Price for Category A=Rs.652339.6 and Category B=Rs.430974.

Regression Model between Sold Price and Age

Result

The Multilinear regression model of the given sample is:

Mean Sold Price = β1+ β2 *(A)

Solving the Equation, we get β1 = 430974 and β2 = (221366).

Solving the indicator variable:


If A belongs to 1 then it belongs to player of Indian origin and if it belongs to 0 then it is from
other origin.

Mean Sold Price =720250 - 235715*(1) - 200071*(0) = 484535

Therefore, the mean selling price when the individual’s age lies in the Category 2 is Rs.
4,84,535.

Similarly,

1, if the individual’s age equals 3


X3 is called
0, otherwise

Mean Sold Price (A) = 430974 + 221366*(1) = 652340


Therefore, the mean selling price when the Player is of Indian Origin is Rs.6,52,340.

Country Code A is serving as our reference or base line. Therefore, to know the mean selling
price when the Players are other than that of Indian Origin is obtained when A equals 0 in the
above equation.

 Mean Sold Price(B) = 430974 + 221366*(0) = 430974


All these values matches the individual categorial means that was calculated through command
1.

Analysis of Regression Model:

In the given result, it can be seen that p-value of the F-statistic is 0.002015, which is very less than
0.05, hence, it is highly significant. This means that, there exist a statistical relation between Sold
Price and the Country Cricketers belong to.

It can be seen that, changing in Country Cricketers belong to is significantly associated to changes
in Sold Price.

Model accuracy assessment

R-squared:

In the given result, it is observed that R-squared value is 0.06481 i.e., 6.481% which is extremely
less. This implies that variation in Sold Price explains only 6.481% of variation in Country
Cricketers belong to. Thus, the predictor variables and the outcome variable are not closely
related. In general variables with 70% or above level of dependency are considered to be
closely related. It can be concluded that there are other factors also, which affect the Sold
Price.

6. Develop the model which can used by Franchises to predict the sold price.

To develop the best model which can be used by Franchises to predict the Sold Price, four models
have been created – modelOpt1, modelOpt2, modelOpt3 and modelOpt4.
Model p-value R-Squared

Value Interpretation Value Interpretation


Option 1 0.5516 More than 0.05. Hence, not -0.78% Negative Adjusted R2 appears when
significant. There doesn’t exists any Residual sum of squares approaches to
significant relationship between one the total sum of squares, that means the
or more predictor variables and the explanation towards response is very
outcome variable. low or negligible. So, Negative
Adjusted R2 means insignificance of
explanatory variables.

Option 2 1.548e-05 Very less than 0.05. Hence, it is 14.68% Extremely less %. Thus, the predictor
highly significant. This means that, variables and the outcome variable are
at least, one of the predictor not closely related.
variables is significantly related to
the outcome variable.

Option 3 3.742e-07 Very less than 0.05. Hence, it is 23.17% Less %. Thus, the predictor variables
highly significant. This means that, and the outcome variable are not closely
at least, one of the predictor related.
variables is significantly related to
the outcome variable.

Option 4 4.91e-07 Very less than 0.05. Hence, it is 31.57% Less %. Thus, the predictor variables
highly significant. This means that, and the outcome variable are not closely
at least, one of the predictor related.
variables is significantly related to
the outcome variable.

In the given result, it can be seen that p-value of the F-statistic in Option 1 is more than 0.05, more
than 50%. This shows that there does not exist any significant relationship between one or more
predictor or independent variables and the outcome variable. Also, the Adjusted R2 has a negative
value which proves that the explanation towards response is very low or negligible. So this option can be removed this
model from the four options.

At the same time, p-value in Option 2, Option 3 and Option 4 are very less than 0.05, Hence, they
represent highly significant relationship. This means that, one or more predictor variables are
significantly related to the outcome variable.
Comparing all the other three options, we see that p-value of Option 3 is farthest from 0.05 as
against other two options. This tells us that model 3 is better than the rest of the three models. But
Adjusted R2 is highest in Option 4. This tells us that model 4 is better than the other three models.
If we compare the p-values of Option 3 and Option 4, there is not much difference in the values.
This contradiction in selection of best model between Option 3 and Option 4 may be due to one or
more insignificant variables in the models. In that case it is better to remove such insignificant
variables to show the best relation between dependant and independent variables.
Model accuracy assessment

The overall quality of the model can be assessed by examining the R-squared (R2) and Residual
Standard Error (RSE).
R-squared:
In multiple linear regression, the R2 represents the correlation coefficient between the observed
values of the outcome variable (y) and the fitted (i.e., predicted) values of y. For this reason, the
value of R will always be positive and will range from zero to one.
R2 represents the proportion of variance, in the outcome variable y, that may be predicted by
knowing the value of the x variables. An R2 value close to 1 indicates that the model explains a
large portion of the variance in the outcome variable.
A problem with the R2 is that, it will always increase when more variables are added to the model,
even if those variables are only weakly associated with the response. A solution is to adjust the R2
by taking into account the number of predictor variables.
The adjustment in the “Adjusted R Square” value in the summary output is a correction for the
number of x variables included in the prediction model.

In the given result, it is observed that R-squared value is 0.1906 i.e., 19.06% which is very less.
This implies that variation in Sold Price explains only 19.06% of variation in Strike Rate of Batting
and Sixers. Thus, the predictor variables and the outcome variable are not closely related. In
general variables with 70% or above level of dependency are considered to be closely related. It
can be concluded that there are other factors also, which affect the Sold Price.

You might also like