0% found this document useful (0 votes)
32 views

Strength Standards

Uploaded by

spam4brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Strength Standards

Uploaded by

spam4brian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Strength Metrics for Enhancing Vertical Jump Height

Satyar Foroughi

University of Central Florida


STA 6236: Dr. Jongik Chung
[email protected]

December 1, 2024

1
1 Introduction
In this study we aim to develop, optimize, and compare different statistical methods to accurately predict the
maximum vertical jump height of an individual. The main objective of the study is to identify the strength
metrics most important for vertical jumping performance.

Our data was collected through an Instagram survey, from people who have tested their maximum ver-
tical and have exposure to weight room exercises. Research shows general strength movements like squatting
and hinging enhance vertical jump performance[1], while the power clean can stimulate explosive strength
even more than movements kinematically similar to jumping, such as the jump squat[2]. Thus, single repe-
tition maximums in four barbell exercises: back squat, front squat, deadlift, and power clean were collected
as our weight room metrics.

Other related athletic metrics were also collected, such as the individual’s tested standing vertical, whether
they can dunk a basketball and age they first dunked if they did, along with general metrics; height, stand-
ing reach, weight, age, and whether they ever trained with THP coaching service (one of the most popular
vertical jump specific programs).

Due to the nature of strength training, ability to lift heavy in one exercise generally follows ability lift
heavy in other exercises as well. This poses a multicollinearity issue, which we will get into in a later section.
To battle this, instead of looking at the absolute numbers of the lifts, we created new variables that show the
exercise metric relative to an individual’s bodyweight. This method improved our final results along with
allowing us to drop the ”weight” variable as well. Since height and standing reach are also directly related,
we created an interaction variable of the two.

2 Data
2.1 Dataset Description and Pre-Processing
We collected 746 survey responses, focusing on individuals aged 16 or older with a minimum 28-inch vertical
jump and a 135-pound back squat to exclude beginners and reduce bias. Unrealistic responses were removed,
and only participants who answered at least 11 of 13 questions were included. For 21 entries with remaining
missing values, averages from similar strength profiles were used to fill the gaps.
Our final dataset has 315 observations (rows) and 11 variables (columns) listed below.
ˆ HxSR: Height multiplied by standing reach (inches squared)

ˆ Age: Individual’s age at time of response (years)

ˆ Dunk: Categorical variable

– 0: Can’t dunk a basketball


– 1: Can dunk a basketball
ˆ Age First Dunk: Ordinal categorical variable

– 0: Never dunked
– 1: First dunk before age 17
– 2: First dunk between ages of 17-19
– 3: First dunk between ages of 20-25
– 4: First dunk after age 25
ˆ Max Vert: Target variable maximum vertical jump height given full approach (inches)

ˆ Standing Vert: Maximum vertical jump height from standing position with no approach (inches)

2
ˆ THP: Categorical variable
– 0: Never trained on THP (Translating Human Performance) jump program
– 1: Has trained on THP jump program
ˆ bs bw: Maximum barbell back squat to bodyweight ratio (lbs/lbs)
ˆ fs bw: Maximum barbell front squat to bodyweight ratio (lbs/lbs)
ˆ pc bw: Maximum power clean to bodyweight ratio (lbs/lbs)
ˆ dl bw: Maximum barbell deadlift to bodyweight ratio (lbs/lbs)

Figure 1: Simple Data Statistics

3 Importance of Training with THP


Other than analyzing strength metrics, we wanted to see if training with THP is significantly related to
maximum vertical. In our dataset, 186 observations (59.05%) had never trained with THP, and 129 (40.95%)
had. From the plot below we see that the ones who did, generally have higher verticals. We fit a Simple

Figure 2: Max Vert vs THP=0,1

Linear Regression model to test our hypothesis. With t-statistic of 101.28 and p-value 0.0010, at α = 0.1
significance level, there is sufficient evidence to indicate that training with THP is significantly related to a
higher maximum vertical.

3
4 Correlation and Multicollinearity
As mentioned before, due to the nature of the study, many of our variables are highly correlated. However,
with our approach and modifications, none of the Variance Inflation Factors in the final model were greater
than 5. In the heatmap below, we can see that the strength exercises are very much correlated, as well
as standing vertical and max vertical. We will later discuss whether keeping standing vertical in our final
model aligns with our study’s objectives. Full correlation matrix and VIF values are included in the outputs
section of the appendix.

Figure 3: Correlation Heat Map

5 Analysis on Full Set of Data


We first started with fitting a Linear Regression model on full set of variables, where numeric variables were
standardized and scaled.

5.1 Fit Diagnostics


Assessing whether Linear Regression assumptions are met, we looked at the plots below. The residuals are

Figure 4: Diagnostic Plots

randomly scattered around zero with no obvious patterns, and the Q-Q plot shows a linear line with slight

4
deviations at the tails. These suggest that linearity, normality, and constancy of variance assumptions are
met. The main concern are outliers which we will get into in the next section.

5.1.1 Outliers
Our survey responders consisted of many high level athletes who specialize in vertical jumping, including
high jumpers, professional dunkers, and even world record holders. It was inevitable that there would be
some outliers and influential points. With total parameters p = 11 and sample size n = 315, studentized
deleted residuals (|ti | > B(Bonferoni-corrected-c.v.)) were used to detect target variable outliers, leverage
values hii > 2pn = 0.07 were used to detect outlying predictor observations, and |DF F IT S| > 0.37 and
Cook ′ sDistance > 2 np = 0.37 were used to detect influential points. Full details can be found in appendix.
p

Upon closer look, the responses from the influential points did not seem unrealistic, and most were due to
high level athletes, which for our study seemed more informative than harmful. Therefore no outliers were
dropped.

6 Variable Selection
Our model on the full set of variables resulted with a Root Mean Square Error = 2.23 and R2 = 0.78,
meaning the variables in this model can explain 78% of the variability in Maximum Vertical. The following
variables were significant at α = 0.1 when all other variables were included in the model. However, standing

Table 1: Significant Variables


Variable P-value
Standing Vert <0.0001
Dunk <0.0001
Age First Dunk <0.0001
HxSR 0.0001
pc bw 0.0160
bs bw 0.0767

vertical is closely related to maximum vertical, which makes the model too specific and not general enough.
Since the main goal of this study is to analyze strength metrics for vertical jumping, we decided that we will
not include standing vertical in our final model.
For variable selection, we tried three methods shown below, which all gave similar results.

Table 2: Variable Selection Methods


Method Selection Criteria Variables Selected
Best Subset adjusted R2 HxSR, Dunk, Age First Dunk, fs bw, pc bw, dl bw
Best Subset Mallows’ CP HxSR, Dunk, Age First Dunk, fs bw, pc bw, dl bw
Stepwise Regression Schwarz’s Bayesian Criterion HxSR, Dunk, Age First Dunk, pc bw, dl bw

7 Final Multiple Linear Regression Model


From the results before, 6 final variables HxSR, Dunk, Age First Dunk, fs bw, pc bw, dl bw were included
in our final Linear Regression Model. The data was split into 70% for training and 30% for testing to assess
generalization ability of the model, with both train and test set standardized and scaled separately.

M axˆV ert = 30.11 − 1.52HxSR + 8.88Dunk − 2.16Age F irst Dunk + 0.41f s bw + 1.19pc bw + 0.80dl bw

5
Figure 5: Final Model

7.1 Results
Although without standing vertical, the model cannot predict maximum vertical as well, we still get mean-
ingful insight with some accuracy. With an F-value of 42.53 and a p-value less than 0.0001, there is sufficient
evidence to indicate that this model is useful in predicting maximum vertical.

Table 3: Performance on Train Data


MSE RMSE SSE SST Rsquared
11.615 3.408 2485.599 5449.828 0.544

Table 4: Performance on Test Data


MSE RMSE SSE SST Rsquared
10.495 3.240 986.561 1424.436 0.307

7.2 Most Important Variables


All variables except front squat were significant in the final model. Note that although back squat was
significant when we had full set of variables, it was not even selected when we dropped standing vertical.
Among the strength metrics, the power clean to bodyweight ratio had the smallest p-value.

8 Conclusion
Although outside the scope of this class, 6 more advanced statistical models were also trained and tested.
All 6 had power clean to bodyweight ratio among their top most important features, with both Random
Forest and Gradient Boosting having it as their top feature. Full details can be found in the first section of
the appendix.

Our analysis shows that knowing an individual’s standing vertical and whether they can dunk, age they
did, along with some strength metrics, we can predict their maximum vertical with good accuracy.

Among the four barbell exercises: back squat, front squat, power clean, and deadlift, it appears that the
power clean to bodyweight ratio is most useful in predicting an individual’s maximum vertical jump height
and is likely the most important strength metric among the four.

6
References
[1] Daniel Baker. “Improving Vertical Jump Performance Through General, Special, and Specific Strength
Training: A Brief Review”. In: Journal of Strength and Conditioning Research 10.2 (May 1996), pp. 131–
136.
[2] Sasho James MacKenzie, Robert J. Lavers, and Brendan B. Wallace. “A biomechanical comparison of
the vertical jump, power clean, and jump squat”. In: Journal of Sports Sciences 32.16 (2014). PMID:
24738710, pp. 1576–1585. doi: 10.1080/02640414.2014.908320. eprint: https://doi.org/10.1080/
02640414.2014.908320. url: https://doi.org/10.1080/02640414.2014.908320.

7
9 Appendix
10 Advanced Methods
6 methods other than basic Linear Regression were trained and tested, with 5-fold cross validation, which
all resulted in similar metrics.

Figure 6: Other Methods Feature Importance

Figure 7: Performance Metrics

8
10.1 Additional Plots and Tables

Figure 8: Heatmap of pc bw vs Max Vert

Figure 9: Outliers and influential points

9
Figure 10: Deeper Look at Influential Points

10
10.2 Full Outputs

11
10.3 SAS Code
SAS Code:
1 / * Reading the data * /
2 PROC IMPORT DATAFILE =" / home / u63983223 / STA 6236 / final _ project / final _ data3 . csv "
3 OUT = df
4 DBMS = CSV
5 REPLACE ;
6 GETNAMES = YES ;
7 RUN ;
8

9 data df1 ( drop = Height Standing _ Reach Weight Back _ Squat Front _ Squat Power _ Clean
Deadlift ) ;
10 set df ;
11

12 HxSR = Height * Standing _ Reach ;


13 run ;
14

15 title " Importance of Training With THP ";


16 proc freq data = df1 ;
17 tables THP / nocum ;
18 run ;
19

20 proc sgplot data = df1 ;


21 vbox Max _ Vert / category = THP ;
22 title " Comparison of Max _ Vert for THP =1 vs THP =0";
23 run ;
24 quit ;
25

26 proc sgplot data = df1 ;


27 scatter x = THP y = Max _ Vert / jitter ;
28 title " Scatter Plot of Max _ Vert by THP ";
29 xaxis label =" THP (0 or 1) ";
30 yaxis label =" Max _ Vert ";
31 run ;
32 quit ;
33

34 proc reg data = df1 plots = NONE ;


35 model Max _ Vert = THP ;
36 run ;
37 quit ;
38

39 title " Correlation and Multicollinearity ";


40 / * Compute the Correlation Matrix * /
41 proc corr data = df1 outp = correlation _ matrix ;
42 var HxSR Age Dunk Age _ First _ Dunk Max _ Vert Standing _ Vert THP bs _ bw fs _ bw pc _
bw dl _ bw ;
43 run ;
44

45 / * Reshape the Correlation Matrix * /


46 data corr _ data ;
47 set correlation _ matrix ;
48 where _ TYPE _ = ’ CORR ’;
49 array vars [ * ] HxSR Age Dunk Age _ First _ Dunk Max _ Vert Standing _ Vert THP bs _
bw fs _ bw pc _ bw dl _ bw ;
50 Var1 = _ NAME _ ;
51 do i = 1 to dim ( vars ) ;
52 Var2 = vname ( vars [ i ]) ;
53 Correlation = vars [ i ];

35
54 output ;
55 end ;
56 keep Var1 Var2 Correlation ;
57 run ;
58

59 / * Heat Map * /
60 proc sgplot data = corr _ data noautolegend ;
61 title " Correlation Heat Map ";
62 xaxis label =" Variable " discreteorder = data ;
63 yaxis label =" Variable " discreteorder = data reverse ;
64 heatmapparm x = Var1 y = Var2 colorresponse = Correlation / colormodel =( blue
white red ) outline ;
65 run ;
66

67 title1 " Fitting Linear Regression on Full Data ( Standardized and Scaled ) ";
68 title2 " Keeping Standing Vertical in the model ";
69 title3 " Variance Inflation Factors ";
70 / * VIFs * /
71

72 / * Standardizing and Scaling values * /


73 proc standard data = df1 mean =0 std =1 out = df _ scaled ;
74 var HxSR Age Age _ First _ Dunk Standing _ Vert bs _ bw fs _ bw pc _ bw dl _ bw ; / *
Exclude Dunk and THP * /
75 run ;
76

77 / * VIFs Scaled * /
78 ods output outputstatistics = outstats ;
79 proc reg data = df _ scaled ;
80 model Max _ Vert = HxSR Age Dunk Age _ First _ Dunk Standing _ Vert THP bs _ bw fs _ bw
pc _ bw dl _ bw / vif influence r ;
81 run ;
82 quit ;
83 ods output close ;
84

85 title " Bonferoni Coefficient for outlier analysis ";


86 data bon ;
87 alpha = 0.1; * Significance level ;
88 n = 315; * Sample size ;
89 p = 11; * # of parameters ;
90 prob = 1 - ( alpha / (2 * n ) ) ;
91 df = n -p -1;
92 Bcoef = tinv ( prob , df ) ;
93 run ;
94 proc print data = bon ;
95 run ;
96

97 data y _ outliers ( keep = Observation RStudent ) ;


98 set outstats ( where =( abs ( RStudent ) > 3.64) ) ;
99 run ;
100

101 data x _ outliers ( keep = Observation HatDiagonal ) ;


102 set outstats ( where =( HatDiagonal > 0.07) ) ;
103 run ;
104

105 data influential ( keep = Observation RStudent HatDiagonal DFFITS CooksD ) ;


106 set outstats ( where =( abs ( DFFITS ) > 0.37 or CooksD > 0.07) ) ;
107 run ;
108

109 title " Y outliers ";

36
110 proc print data = y _ outliers noobs ; run ;
111 title " X outliers ";
112 proc print data = x _ outliers noobs ; run ;
113 title " Influential Points ";
114 proc print data = influential noobs ; run ;
115

116 title " Deeper Look at Influential Points ";


117 data selected _ obs ;
118 set df1 ;
119 Orig _ Obs = _ N _ ;
120 if _ n _ in (1 , 79 , 128 , 144 , 172 , 246 , 274 , 275 , 276 , 278 , 287 , 300 , 311) ;
121 run ;
122

123 proc print data = selected _ obs ;


124 run ;
125

126 title " Removing Standing Vert ";


127

128 / * Variable Selection * /


129 title " Variable Selection ";
130

131 title " Best Subset Selection ( Adjusted R - Square ) ";


132 proc reg data = df _ scaled plots = NONE ;
133 model Max _ Vert = HxSR Age Dunk Age _ First _ Dunk THP bs _ bw fs _ bw pc _ bw dl _ bw
/ selection = adjrsq best =5;
134 run ;
135

136 title " Best Subset Selection ( Mallows ’␣ CP ) ";


137 ␣ ␣ ␣ ␣ proc ␣ reg ␣ data = df _ scaled ␣ plots = NONE ;
138 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ model ␣ Max _ Vert ␣ = ␣ HxSR ␣ Age ␣ Dunk ␣ Age _ First _ Dunk ␣ THP ␣ bs _ bw ␣ fs _ bw ␣ pc _ bw ␣ dl _ bw ␣
/ ␣ selection ␣ = ␣ cp ␣ best =5;
139 ␣ ␣ ␣ ␣ run ;
140

141 ␣ ␣ ␣ ␣ title ␣ " Stepwise ␣ Selection ␣ ( Schwarz ’s Bayesian Criterion ) ";


142 proc glmselect data = df _ scaled ;
143 model Max _ Vert = HxSR Age Dunk Age _ First _ Dunk THP bs _ bw fs _ bw pc _ bw dl _ bw
/ selection = stepwise ( choose = sbc select = sbc ) ;
144 run ;
145

146 / * Train / Test Split * /


147 proc surveyselect data = df1 out = sampled _ data seed =12345 samprate =0.7 outall ;
148 run ;
149

150 data train _ raw test _ raw ;


151 set sampled _ data ;
152 if Selected =1 then output train _ raw ;
153 else output test _ raw ;
154 run ;
155

156 / * Scaling the data * /


157 proc standard data = train _ raw mean =0 std =1 out = train ;
158 var HxSR Age Age _ First _ Dunk Standing _ Vert bs _ bw fs _ bw pc _ bw dl _ bw ; / *
Exclude Dunk and THP * /
159 run ;
160 proc standard data = test _ raw mean =0 std =1 out = test ;
161 var HxSR Age Age _ First _ Dunk Standing _ Vert bs _ bw fs _ bw pc _ bw dl _ bw ; / *
Exclude Dunk and THP * /
162 run ;
163

37
164 title " Final Model with 6 Variables ";
165

166 / * Train the model * /


167 proc reg data = train ;
168 model Max _ Vert = HxSR Dunk Age _ First _ Dunk fs _ bw pc _ bw dl _ bw ;
169 store out = reg _ model ;
170 run ;
171 quit ;
172

173 / * Step 3: Predict on test data * /


174 proc plm restore = reg _ model ;
175 score data = test out = predicted _ test predicted residual ;
176 run ;
177

178 / * Step 4: Evaluate the model * /


179 / * Compute the mean of Max _ Vert and store it in a macro variable * /
180 proc sql noprint ;
181 select mean ( Max _ Vert ) into : mean _ Max _ Vert from predicted _ test ;
182 quit ;
183

184 title " Performance on Test Data ( Ability of Model to Generalize ) ";
185 / * Evaluate the model ’s ␣ performance ␣ on ␣ test ␣ data ␣ * /
186 ␣ ␣ ␣ ␣ proc ␣ sql ;
187 ␣ ␣ ␣ ␣ ␣ ␣ ␣ select
188 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ mean ( residual * * 2) ␣ as ␣ MSE ,
189 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sqrt ( mean ( residual * * 2) ) ␣ as ␣ RMSE ,
190 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ mean ( abs ( residual ) ) ␣ as ␣ MAE ,
191 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sum (( Max _ Vert ␣ -␣ Predicted ) * * 2) ␣ as ␣ SSE ,
192 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ sum (( Max _ Vert ␣ -␣ & mean _ Max _ Vert .) * * 2) ␣ as ␣ SST ,
193 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣ 1 ␣ -␣ ( calculated ␣ SSE ␣ / ␣ calculated ␣ SST ) ␣ as ␣ Rsquared
194 ␣ ␣ ␣ ␣ ␣ ␣ ␣ from ␣ predicted _ test ;
195 ␣ ␣ ␣ ␣ quit ;
196

197 ␣ ␣ ␣ ␣ title ␣ " pc _ bw ␣ vs ␣ Max _ Vert ";


198 ␣ ␣ ␣ ␣ proc ␣ sgplot ␣ data = df1 ;
199 ␣ ␣ ␣ ␣ ␣ ␣ ␣ heatmap ␣ x = pc _ bw ␣ y = Max _ Vert ␣ / ␣ colormodel =( blue ␣ green ␣ yellow ␣ red ) ;
200 ␣ ␣ ␣ ␣ ␣ ␣ ␣ title ␣ " Heatmap ␣ of ␣ pc _ bw ␣ vs ␣ Max _ Vert ";
201 ␣ ␣ ␣ ␣ ␣ ␣ ␣ xaxis ␣ label =" Power ␣ Clean ␣ to ␣ Bodyweight ␣ Ratio ";
202 ␣ ␣ ␣ ␣ ␣ ␣ ␣ yaxis ␣ label =" Max ␣ Vertical ";
203 ␣ ␣ ␣ ␣ run ;

38

You might also like