Strength Standards
Strength Standards
Satyar Foroughi
December 1, 2024
1
1 Introduction
In this study we aim to develop, optimize, and compare different statistical methods to accurately predict the
maximum vertical jump height of an individual. The main objective of the study is to identify the strength
metrics most important for vertical jumping performance.
Our data was collected through an Instagram survey, from people who have tested their maximum ver-
tical and have exposure to weight room exercises. Research shows general strength movements like squatting
and hinging enhance vertical jump performance[1], while the power clean can stimulate explosive strength
even more than movements kinematically similar to jumping, such as the jump squat[2]. Thus, single repe-
tition maximums in four barbell exercises: back squat, front squat, deadlift, and power clean were collected
as our weight room metrics.
Other related athletic metrics were also collected, such as the individual’s tested standing vertical, whether
they can dunk a basketball and age they first dunked if they did, along with general metrics; height, stand-
ing reach, weight, age, and whether they ever trained with THP coaching service (one of the most popular
vertical jump specific programs).
Due to the nature of strength training, ability to lift heavy in one exercise generally follows ability lift
heavy in other exercises as well. This poses a multicollinearity issue, which we will get into in a later section.
To battle this, instead of looking at the absolute numbers of the lifts, we created new variables that show the
exercise metric relative to an individual’s bodyweight. This method improved our final results along with
allowing us to drop the ”weight” variable as well. Since height and standing reach are also directly related,
we created an interaction variable of the two.
2 Data
2.1 Dataset Description and Pre-Processing
We collected 746 survey responses, focusing on individuals aged 16 or older with a minimum 28-inch vertical
jump and a 135-pound back squat to exclude beginners and reduce bias. Unrealistic responses were removed,
and only participants who answered at least 11 of 13 questions were included. For 21 entries with remaining
missing values, averages from similar strength profiles were used to fill the gaps.
Our final dataset has 315 observations (rows) and 11 variables (columns) listed below.
HxSR: Height multiplied by standing reach (inches squared)
– 0: Never dunked
– 1: First dunk before age 17
– 2: First dunk between ages of 17-19
– 3: First dunk between ages of 20-25
– 4: First dunk after age 25
Max Vert: Target variable maximum vertical jump height given full approach (inches)
Standing Vert: Maximum vertical jump height from standing position with no approach (inches)
2
THP: Categorical variable
– 0: Never trained on THP (Translating Human Performance) jump program
– 1: Has trained on THP jump program
bs bw: Maximum barbell back squat to bodyweight ratio (lbs/lbs)
fs bw: Maximum barbell front squat to bodyweight ratio (lbs/lbs)
pc bw: Maximum power clean to bodyweight ratio (lbs/lbs)
dl bw: Maximum barbell deadlift to bodyweight ratio (lbs/lbs)
Linear Regression model to test our hypothesis. With t-statistic of 101.28 and p-value 0.0010, at α = 0.1
significance level, there is sufficient evidence to indicate that training with THP is significantly related to a
higher maximum vertical.
3
4 Correlation and Multicollinearity
As mentioned before, due to the nature of the study, many of our variables are highly correlated. However,
with our approach and modifications, none of the Variance Inflation Factors in the final model were greater
than 5. In the heatmap below, we can see that the strength exercises are very much correlated, as well
as standing vertical and max vertical. We will later discuss whether keeping standing vertical in our final
model aligns with our study’s objectives. Full correlation matrix and VIF values are included in the outputs
section of the appendix.
randomly scattered around zero with no obvious patterns, and the Q-Q plot shows a linear line with slight
4
deviations at the tails. These suggest that linearity, normality, and constancy of variance assumptions are
met. The main concern are outliers which we will get into in the next section.
5.1.1 Outliers
Our survey responders consisted of many high level athletes who specialize in vertical jumping, including
high jumpers, professional dunkers, and even world record holders. It was inevitable that there would be
some outliers and influential points. With total parameters p = 11 and sample size n = 315, studentized
deleted residuals (|ti | > B(Bonferoni-corrected-c.v.)) were used to detect target variable outliers, leverage
values hii > 2pn = 0.07 were used to detect outlying predictor observations, and |DF F IT S| > 0.37 and
Cook ′ sDistance > 2 np = 0.37 were used to detect influential points. Full details can be found in appendix.
p
Upon closer look, the responses from the influential points did not seem unrealistic, and most were due to
high level athletes, which for our study seemed more informative than harmful. Therefore no outliers were
dropped.
6 Variable Selection
Our model on the full set of variables resulted with a Root Mean Square Error = 2.23 and R2 = 0.78,
meaning the variables in this model can explain 78% of the variability in Maximum Vertical. The following
variables were significant at α = 0.1 when all other variables were included in the model. However, standing
vertical is closely related to maximum vertical, which makes the model too specific and not general enough.
Since the main goal of this study is to analyze strength metrics for vertical jumping, we decided that we will
not include standing vertical in our final model.
For variable selection, we tried three methods shown below, which all gave similar results.
M axˆV ert = 30.11 − 1.52HxSR + 8.88Dunk − 2.16Age F irst Dunk + 0.41f s bw + 1.19pc bw + 0.80dl bw
5
Figure 5: Final Model
7.1 Results
Although without standing vertical, the model cannot predict maximum vertical as well, we still get mean-
ingful insight with some accuracy. With an F-value of 42.53 and a p-value less than 0.0001, there is sufficient
evidence to indicate that this model is useful in predicting maximum vertical.
8 Conclusion
Although outside the scope of this class, 6 more advanced statistical models were also trained and tested.
All 6 had power clean to bodyweight ratio among their top most important features, with both Random
Forest and Gradient Boosting having it as their top feature. Full details can be found in the first section of
the appendix.
Our analysis shows that knowing an individual’s standing vertical and whether they can dunk, age they
did, along with some strength metrics, we can predict their maximum vertical with good accuracy.
Among the four barbell exercises: back squat, front squat, power clean, and deadlift, it appears that the
power clean to bodyweight ratio is most useful in predicting an individual’s maximum vertical jump height
and is likely the most important strength metric among the four.
6
References
[1] Daniel Baker. “Improving Vertical Jump Performance Through General, Special, and Specific Strength
Training: A Brief Review”. In: Journal of Strength and Conditioning Research 10.2 (May 1996), pp. 131–
136.
[2] Sasho James MacKenzie, Robert J. Lavers, and Brendan B. Wallace. “A biomechanical comparison of
the vertical jump, power clean, and jump squat”. In: Journal of Sports Sciences 32.16 (2014). PMID:
24738710, pp. 1576–1585. doi: 10.1080/02640414.2014.908320. eprint: https://doi.org/10.1080/
02640414.2014.908320. url: https://doi.org/10.1080/02640414.2014.908320.
7
9 Appendix
10 Advanced Methods
6 methods other than basic Linear Regression were trained and tested, with 5-fold cross validation, which
all resulted in similar metrics.
8
10.1 Additional Plots and Tables
9
Figure 10: Deeper Look at Influential Points
10
10.2 Full Outputs
11
10.3 SAS Code
SAS Code:
1 / * Reading the data * /
2 PROC IMPORT DATAFILE =" / home / u63983223 / STA 6236 / final _ project / final _ data3 . csv "
3 OUT = df
4 DBMS = CSV
5 REPLACE ;
6 GETNAMES = YES ;
7 RUN ;
8
9 data df1 ( drop = Height Standing _ Reach Weight Back _ Squat Front _ Squat Power _ Clean
Deadlift ) ;
10 set df ;
11
35
54 output ;
55 end ;
56 keep Var1 Var2 Correlation ;
57 run ;
58
59 / * Heat Map * /
60 proc sgplot data = corr _ data noautolegend ;
61 title " Correlation Heat Map ";
62 xaxis label =" Variable " discreteorder = data ;
63 yaxis label =" Variable " discreteorder = data reverse ;
64 heatmapparm x = Var1 y = Var2 colorresponse = Correlation / colormodel =( blue
white red ) outline ;
65 run ;
66
67 title1 " Fitting Linear Regression on Full Data ( Standardized and Scaled ) ";
68 title2 " Keeping Standing Vertical in the model ";
69 title3 " Variance Inflation Factors ";
70 / * VIFs * /
71
77 / * VIFs Scaled * /
78 ods output outputstatistics = outstats ;
79 proc reg data = df _ scaled ;
80 model Max _ Vert = HxSR Age Dunk Age _ First _ Dunk Standing _ Vert THP bs _ bw fs _ bw
pc _ bw dl _ bw / vif influence r ;
81 run ;
82 quit ;
83 ods output close ;
84
36
110 proc print data = y _ outliers noobs ; run ;
111 title " X outliers ";
112 proc print data = x _ outliers noobs ; run ;
113 title " Influential Points ";
114 proc print data = influential noobs ; run ;
115
37
164 title " Final Model with 6 Variables ";
165
184 title " Performance on Test Data ( Ability of Model to Generalize ) ";
185 / * Evaluate the model ’s ␣ performance ␣ on ␣ test ␣ data ␣ * /
186 ␣ ␣ ␣ ␣ proc ␣ sql ;
187 ␣ ␣ ␣ ␣ ␣ ␣ ␣ select
188 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ mean ( residual * * 2) ␣ as ␣ MSE ,
189 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sqrt ( mean ( residual * * 2) ) ␣ as ␣ RMSE ,
190 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ mean ( abs ( residual ) ) ␣ as ␣ MAE ,
191 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sum (( Max _ Vert ␣ -␣ Predicted ) * * 2) ␣ as ␣ SSE ,
192 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sum (( Max _ Vert ␣ -␣ & mean _ Max _ Vert .) * * 2) ␣ as ␣ SST ,
193 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ 1 ␣ -␣ ( calculated ␣ SSE ␣ / ␣ calculated ␣ SST ) ␣ as ␣ Rsquared
194 ␣ ␣ ␣ ␣ ␣ ␣ ␣ from ␣ predicted _ test ;
195 ␣ ␣ ␣ ␣ quit ;
196
38