0% found this document useful (0 votes)
22 views

Final Project Powerpoint

Uploaded by

sujithreddy765
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Final Project Powerpoint

Uploaded by

sujithreddy765
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Predicting Plays

in the National
Football League
By:
Drake Hath, Ritesh Patil,
Harsha Suddamalla, Benjamin Ilacqua
Date: 7th Dec 2023
Purpose of our Study
 The aim of this analysis is to explore and model
NFL (National Football League) data to derive
predictive insights into pass attempt outcomes.
 Predicting the likelihood of a pass attempt
during an NFL game based on various game-
related attributes.
 Understanding the factors that influence teams
to opt for a pass play in different game
situations.
 Enhancing strategic insights for coaches,
analysts, and teams to make informed decisions
during gameplay.
Why it’s Important?

 Crucial for strategizing offensive plays and


defensive plays based on the likelihood of a
pass attempt.
 Strategic Adjustments that affects game
momentum, and overall game strategy.
 Provides insights into team tendencies in
different game scenarios.
 Data-Driven Decision Making.
The Data Source

•Kaggle dataset that had NFL play-by-play data from 2009-2018.


• 255 variables
• 447,382 observations.
• The dataset used in this analysis comprises NFL data, encompassing player
statistics, game and play outcomes, and contextual information.
• Dataset considered reliable due to large number of unbiased observations.

Reference:
Horowitz, Max. “Detailed NFL Play-by-Play Data 2009-2018.” Kaggle, 22 Dec.
2018,
www.kaggle.com/datasets/maxhorowitz/nflplaybyplay2009to2016?select=NFL%2
BPlay%2Bby%2BPlay%2B2009-2018%2B%28v5%29.csv
.
Problems with the
Data
 Missing Values: All missing values in the
dataset were marked with N/A. This turned our
interval variables with missing values in to
Nominal variables.
 Outliers: Extreme values that do not align with
expected ranges for certain attributes.
 Data Inconsistencies: Inconsistent or conflicting
data within the dataset.
 Data Entry Errors: Typos, incorrect formatting, or
erroneous values.
Methodology
STEPS INVOLVED:
 Data Preparation
 Variable Selection and Transformation
 Model Building
 Model Evaluation
 Deployment and Monitoring
 Iterative Improvement
 Documentation and Reporting

 The football analytics process included thorough data


preparation, variable refinement, and model construction. We
used three model building techniques: decision tree, logistic
Regression, and Neural network. Evaluations were conducted
using rigorous techniques like ROC analysis and Misclassification
rate to ensure the accuracy and reliability of predictive models.
Data Cleaning and Wrangling
1. Selection of Relevant Year:
 Removed all data except 2018 data, took observations
from 427,382 to 42,037.
 Reduced observation from 42,037 to 33,071 by
removing missing values.
2. Variable Selection:
 Eliminated redundant variables (e.g., time left in
minutes and seconds).
 Excluded play-specific variables (e.g., yards gained)
not available pre-play.
3. Handling Missing Values:
 Identified missing values labeled as N/A, converting
interval variables to nominal.
 Used Python to remove observations with N/A values.
4. Exclusion of Invalid Observations:
 Removed observations where playtime equals kickoff.
 Ensured removal of impossible scenarios.
5. Final Dataset Size:
 After preprocessing, narrowed down observations to

Variable Description
half_seconds_remaining: The remaining time in seconds within
the current half
 game_seconds_remaining: The remaining time in seconds in
the entire game.
 down: The down number indicating the current play sequence in
the set of downs.
 ydstogo: Yards remaining to reach a first down.
 wp: Win probability from the current situation.
 pass_attempt: Boolean value indicating if the play involves a
pass attempt.
score_differential: The difference in score between the two teams.
no_score_prob: Probability of no score from the current situation.
safety_prob: Probability of a safety occurring in the play.
ep: Expected points from the current situation.
yardline_100: The yard line expressed as a percentage of the
entire field length
 posteam: The team possessing the ball during a play in the
game.
Variable Descriptive Statistics
Interval
Describe the main Variables
features, such as its
central tendency,
variability, and
distribution

Class
Variables
The Model Diagram
Model Analysis
Input Variable Summary:
 Target Variable: 1
 Interval Variables : 10
 Nominal Variables: 5

Data Partition:
 Train Data : 60%
 Validation Data: 20%
 Test Data: 20%
Variable Selection Node

 The Variable
Selection node is
identifying and
retaining the most
relevant variables for
modeling. This
approach enhances
model efficiency,
interpretability, and
has the potential to
result in more
accurate predictions.

 We used Chi-
Squared selection
criteria
Decision Tree (Gini Splitting rule)
 Decision tree
Model
models are useful
for both
classification and
regression tasks.
 They are well-suited
for problems with
complex decision
boundaries or
interactions
between variables
 A Decision Tree
model fits the goal
of our study

Advantages:
 Easy to understand
and interpret.

 Handles both
numerical and
egression(Stepwise, SBC) Model
 Useful when you want to understand the
relationship between input variables and
the target variable

 Provides a clear understanding of the


impact of individual predictors on
the target.
Neural Network Model
 Neural networks
are powerful for
complex tasks,
especially when
dealing with non-
linear relationships
and high-
dimensional data.
 Suitable for both
classification and
regression
problems

Advantages:
 Can capture
intricate patterns
and relationships in
the data.
 Effective for tasks
involving image
Model Results (Statistics Table)
Based on below parameters, we can obtain best
model:​
 ROC higher score indicating better
performance.​
 A lower misclassification rate indicates better
performance.​
 A lower
Based average
On the squared
Test Data, error indicates
“Neural Network” is
thebetter performance.
Best Model when comparing ROC,
Misclassification Rate and Average Squared
Error
Model Neural Decision Regression
Network Tree-(Gini) (Stepwise, SBC)
Test Data ROC 0.77 0.76 0.74
Misclassification 0.31 0.31 0.33
Rate
Average Squared 0.19 0.20 0.21
Error
Validation Data ROC 0.75 0.74 0.72
Misclassification 0.32 0.33 0.35
Rate
Average Squared 0.20 0.20 0.21
Error
Training Data ROC 0.75 0.75 0.72
Misclassification 0.32 0.32 0.35
Conclusion:
• In conclusion, the analysis sheds light on
critical factors influencing the probability of a
pass attempt in NFL games.
• Neural Networking was the best model for
predicting pass attempt however other models
worked well.​
• Reveals the different effects of pre-play
variables has on the attempt of a pass.​
• Advocates for model refinement to adapt
to dynamically changing strategy in the NFL.​
• Recommendations include further exploring
real-time player tracking data
for more accurate predictions and refining
models to adapt to rule changes and

You might also like