Final Project Powerpoint
Final Project Powerpoint
in the National
Football League
By:
Drake Hath, Ritesh Patil,
Harsha Suddamalla, Benjamin Ilacqua
Date: 7th Dec 2023
Purpose of our Study
The aim of this analysis is to explore and model
NFL (National Football League) data to derive
predictive insights into pass attempt outcomes.
Predicting the likelihood of a pass attempt
during an NFL game based on various game-
related attributes.
Understanding the factors that influence teams
to opt for a pass play in different game
situations.
Enhancing strategic insights for coaches,
analysts, and teams to make informed decisions
during gameplay.
Why it’s Important?
Reference:
Horowitz, Max. “Detailed NFL Play-by-Play Data 2009-2018.” Kaggle, 22 Dec.
2018,
www.kaggle.com/datasets/maxhorowitz/nflplaybyplay2009to2016?select=NFL%2
BPlay%2Bby%2BPlay%2B2009-2018%2B%28v5%29.csv
.
Problems with the
Data
Missing Values: All missing values in the
dataset were marked with N/A. This turned our
interval variables with missing values in to
Nominal variables.
Outliers: Extreme values that do not align with
expected ranges for certain attributes.
Data Inconsistencies: Inconsistent or conflicting
data within the dataset.
Data Entry Errors: Typos, incorrect formatting, or
erroneous values.
Methodology
STEPS INVOLVED:
Data Preparation
Variable Selection and Transformation
Model Building
Model Evaluation
Deployment and Monitoring
Iterative Improvement
Documentation and Reporting
Class
Variables
The Model Diagram
Model Analysis
Input Variable Summary:
Target Variable: 1
Interval Variables : 10
Nominal Variables: 5
Data Partition:
Train Data : 60%
Validation Data: 20%
Test Data: 20%
Variable Selection Node
The Variable
Selection node is
identifying and
retaining the most
relevant variables for
modeling. This
approach enhances
model efficiency,
interpretability, and
has the potential to
result in more
accurate predictions.
We used Chi-
Squared selection
criteria
Decision Tree (Gini Splitting rule)
Decision tree
Model
models are useful
for both
classification and
regression tasks.
They are well-suited
for problems with
complex decision
boundaries or
interactions
between variables
A Decision Tree
model fits the goal
of our study
Advantages:
Easy to understand
and interpret.
Handles both
numerical and
egression(Stepwise, SBC) Model
Useful when you want to understand the
relationship between input variables and
the target variable
Advantages:
Can capture
intricate patterns
and relationships in
the data.
Effective for tasks
involving image
Model Results (Statistics Table)
Based on below parameters, we can obtain best
model:
ROC higher score indicating better
performance.
A lower misclassification rate indicates better
performance.
A lower
Based average
On the squared
Test Data, error indicates
“Neural Network” is
thebetter performance.
Best Model when comparing ROC,
Misclassification Rate and Average Squared
Error
Model Neural Decision Regression
Network Tree-(Gini) (Stepwise, SBC)
Test Data ROC 0.77 0.76 0.74
Misclassification 0.31 0.31 0.33
Rate
Average Squared 0.19 0.20 0.21
Error
Validation Data ROC 0.75 0.74 0.72
Misclassification 0.32 0.33 0.35
Rate
Average Squared 0.20 0.20 0.21
Error
Training Data ROC 0.75 0.75 0.72
Misclassification 0.32 0.32 0.35
Conclusion:
• In conclusion, the analysis sheds light on
critical factors influencing the probability of a
pass attempt in NFL games.
• Neural Networking was the best model for
predicting pass attempt however other models
worked well.
• Reveals the different effects of pre-play
variables has on the attempt of a pass.
• Advocates for model refinement to adapt
to dynamically changing strategy in the NFL.
• Recommendations include further exploring
real-time player tracking data
for more accurate predictions and refining
models to adapt to rule changes and