bailey2019
bailey2019
https://doi.org/10.1007/s42978-019-00042-4
REVIEW ARTICLE
Abstract
Athlete monitoring utilizing strength and conditioning as well as other sport performance data is increasing in practice and
in research. While the usage of this data for purposes of creating more informed training programs and producing potential
performance prediction models may be promising, there are some statistical considerations that should be addressed by those
who hope to use this data. The purpose of this review is to discuss many of the statistical issues faced by practitioners as
well as provide best practices recommendations. Single-subject designs (SSD) appear to be more appropriate for monitoring
and statistically evaluating athletic performance than traditional group statistical methods. This paper discusses several SSD
options available that produce measures of both statistical and practical significance. Additionally, this paper discusses issues
related to heteroscedasticity, reliability, validity and provides recommendations for each. Finally, if data are incorporated into
the decision-making process, it should be returned and utilized quickly. Data visualizations are often incorporated into this
process and this review discusses issues and recommendations related to their clarity, simplicity, and distortion. Awareness
of these issues and utilization of some best practice methods will likely result in an enhanced and more efficient decision-
making process with more informed athlete development programs.
Athlete Monitoring of strength and conditioning programs [28]. Data from mon-
itoring programs may substantiate or contradict a strength
A desirable result of any strength and conditioning program and conditioning program. This goes beyond simply evalu-
is an improved level of preparedness and improved ability ating a program based on a win/loss record for a season
to perform [28, 31, 54, 68]. Typically this can be evaluated and provides objective data on the success of a particular
through some form of testing, but continual maximal effort program for a team or an individual athlete. Direct objec-
performance testing may not be practical for athletes, espe- tive feedback on an athlete’s progression can be given to
cially those in season. As such, regular testing of physical coaches and other decision makers [48]. This data can also
performance at submaximal levels or during regular prac- be used to help coaches and practitioners make data driven
tice or competition may be a better approach allowing for decisions for program improvement at the team or individual
more frequent measurement such as in an athlete monitoring level [28, 50, 54].
program [32, 50]. Regardless of the type of testing or the Understanding the demands of a particular sport is an
variable of interest, measurement must be completed if one ongoing venture in sport performance. The relationship
is to evaluate an athlete’s level of preparedness [28, 38, 50]. between competition performance data and collected moni-
Along with aiding practitioners in the evaluation of ath- toring data will likely help answer some questions and
lete readiness, athlete monitoring also helps in the evaluation potentially bring about new questions [28, 38]. Data from
athlete monitoring may also aid in talent identification or the
identification of variables that contribute to optimal perfor-
* Chris Bailey
[email protected] mance of a particular sport or task [60].
Data from athlete monitoring programs may also help
1
Department of Kinesiology, Health Promotion, clarify some of the ambiguity of the training process from a
and Recreation, University of North Texas, 1900 Chestnut dose response perspective at the individual athlete level [28,
St. 209, Denton, TX 76201, USA
13
Vol.:(0123456789)
Journal of Science in Sport and Exercise
38, 50, 54]. Not all athletes will respond to training the same but it is not the actual performance itself [50]. Goodhart’s
way and there should be a focus on individual responses law states that “when a measure becomes a target, it ceases
to training. While standard pre/post testing may explain to be a good measure” [55]. This law should also be applied
the success of a program with a sufficient sample size, that to athlete monitoring. Once an individual monitoring vari-
model does not work with individual athletes. Fortunately, able becomes the objective, it can no longer be considered an
athlete monitoring utilizes more frequent data collection, adequate monitoring variable. The objective of monitoring
providing many data points for each athlete [50, 54]. should be to predict some future performance, not of itself.
Monitoring for the purpose of understanding training This adage is borrowed from economics, but it seems useful
can be broken down into two primary areas: (1) dosage or for athlete monitoring as well.
input and (2) response or output. All of the training ses- While athlete monitoring is increasing in practice and
sions, practice sessions, competitions, and anything else that research, it appears that much of the research has focused on
results in a reaction of the athlete can be considered dosage specific areas to monitor and not as much on the statistical
[54]. Much of this can be quantified, but there may be issues procedures involved [67]. This may lead to an accumula-
with differing units of intensity across the different types tion of data, but no plan as to what to do with it [50]. A
of dosage [17]. The response or output is often more dif- handful of papers have discussed some statistical techniques,
ficult to quantify, but a change in performance, whether an but they do not focus much attention on data preparation
improvement or a decrement, is a good signal of a response. and evaluation of the quality of data [12, 32, 39, 50]. The
It is important to note, that a performance decrease may assumptions of normality, homogeneity of variance, and
not always be visible by a single marker such as amount of reliability of data collection methods can prove problematic
weight lifted and measures in different areas may be required if they are not evaluated. If violated, they can often render
to show the response [18–20]. It is also important to note other statistical significance tests useless, so data screening
that some short-term performance decrement due to fatigue prior to other analysis is necessary [64]. Even fewer articles
should be expected during training, but that should be logi- discuss the return of the monitoring data to coaches and the
cally planned as part of a functional overreach [24]. visual display of the information [38, 39, 50]. The purpose
While testing and monitoring variables of potential of this review is to discuss many of the statistical issues and
enhanced performance is important, monitoring the recovery provide some best practices information. If literature is lack-
is just as important for athletes. Each session, whether it be a ing in our field, best practices information from other fields
training session, practice, or competition, can be considered will be borrowed and adapted.
a stimulus for adaptation. Recovery is necessary if adapta-
tion or some form of supercompensation is to occur [31, 54,
68]. Even if adaptation has not occurred, at least returning Statistical Concerns and Best Practices
to normal homeostatic levels of preparedness is desirable for
athletes in season [30]. Recovery occurs when the amount Regular Testing
of training stressors and stimuli are reduced. Unfortunately,
practitioners may often forget or not be able to quantify all Sands et al. [50] state that monitoring utilizes assessments
stressors and stimuli [31]. An athlete may have an issue in as “stand-ins” for competitions to simulate the competition
their social life that is resulting in a reduction in sleep or a as if it were today. As such, it is necessary to have regular
student athlete may be sacrificing sleep to study for a exam. testing if this question is to be answered [49]. Furthermore,
Each of these examples reduces the amount of sleep an ath- mistakes in athletic development can prove costly further
lete might get, and there are many other opportunities that justifying frequent monitoring if it has the possibility to help
might alter optimal recovery that sport performance coaches reduce some of these mistakes [25, 26, 30, 43]. Testing pre
may not be aware of [30]. There are consequences for ath- and post season is important, but only doing that may result
letes who have a less than adequate amount of recovery for a in missing data that could prove helpful and may even pre-
given amount of stress/stimuli that are likely to occur if this vent injury or overtraining. Figure 1 may help illustrate the
continues. Short term underrecovery may result in fatigue difference between pre/post testing and regular monitoring.
and decreased motivation. Long term underrecovery may Both sides of the figure are depicting the same athlete data
result in performance decreases, overtraining, and athlete of countermovement jump peak power (in W) over time, but
burnout [40]. the left side only shows what would be visible with only pre/
There are a lot of areas and variables that can potentially post season testing. From this data, we might interpret that
be monitored. Selection of what measures to monitor should our strength and conditioning program was effective. On
be done with caution and they should be relevant. Some cau- the right side, all of the test data are visible. This data tells a
tion should be given as practitioners should remember that different story. It seems there was one outlier data point and
monitoring data serves as a representation of performance, most of the other data are similar. From this data, we likely
13
Journal of Science in Sport and Exercise
Singe‑Subject Designs
13
Journal of Science in Sport and Exercise
There is no issue if a practitioner is designing a program for knowledge, the Theil–Sen slope has not been used in sport
an athlete near the middle of the data distribution. But, when performance research but has been used to in student aca-
considering developing or elite athletes, practitioners may demic progress monitoring in a similar manner [61]. It is
find themselves working with athletes on the tails of data worthwhile to note that these procedures may not be widely
distributions rather than near the mean. It is also important available in statistical software, as single case research does
to consider that athlete training response is individualistic not make up a large portion of the market share. That being
and idiosyncratic [50]. As such, the mean of athletes’ train- said, these procedures are easy to do by hand and free web-
ing responses may have little actual value. calculators using these methods have been created that will
generate P values, confidence intervals, and effect sizes [62].
Statistical and Practical Significance It should be noted that statistical and practical signifi-
cance are not the same. Statistical significance is generally
Single-subject designs seem to be more appropriate for ath- expressed as a P value and provides information about the
lete monitoring as they focus on each athlete individually. reliability of a finding, or the probability that the finding
A common statistical concern of single-subject designs is is by chance alone [41, 58]. Practical significance, often
the sample size. A sample size of one is likely disastrous for referred to as meaningfulness, is generally reported via some
group designs in terms of achieving statistical significance. type of effect size estimate [14]. Much of scientific writing
Single-subject designs overcome this by utilizing repeated and publications depends heavily on P values, but there is a
measures of the same subject [32, 50]. Statistical signifi- movement to rely less on them [1]. The justification for this
cance is similar between group statistics and single-subject is that findings are generally accepted as “significant” if a
designs in that they are both attempting to quantify the prob- P value of less than 0.05 is achieved, but P values do not
ability that some treatment will reliably produce the same indicate the size of the effect. They are also heavily influ-
result [41, 58]. enced by sample size. So much so, that a small effect with a
Even though single-subject designs have a small sam- large enough sample may produce a statistically significant
ple (n = 1), tests of statistical and practical significance P value. This may lead to practitioners making misinformed
still exist, but repeated measures are necessary to establish decisions. This has become such an issue in larger fields of
statistical significance. Instead of using a large sample of science that over 800 researchers are now promoting the
measures from different athletes, a single-subject design will complete abandonment of P values in a recent publication
likely use numerous data points over a period of time from in Nature [1].
the same subject. Then individual phases may then be iden- Specifically concerning athlete development, measures of
tified, and measures of those phases can be compared. The practical significance may be of primary concern. In order to
extended celeration line (ECL), improvement rate difference enhance performance, coaches, athletes, and scientists alike
(IRD), percentage of all nonoverlapping data (PAND), nono- are mainly concerned with meaningful change [14, 32]. Fur-
verlap of all pairs (NAP), and Tau-U are examples of these thermore, even if a group statistical method is chosen, sam-
procedures [36, 44, 62]. While all are different techniques, ple sizes in sport performance are generally dictated by the
each is essentially evaluating the amount of data in a phase size of the team one is working with. Small samples more
that overlaps with the data in the other phase or phases. The often than not result in a low likelihood of achieving statis-
selection of a particular technique will likely come down to tical significance even if meaningful change has occurred
the specific situation encountered. For example, during sport [14]. That being said, this does not necessarily mean that
performance assessments, practitioners may be concerned the reliability of a finding should be ignored and only the
with a potential learning effect. Increases in performance magnitude of difference or relatedness should be considered.
could be due to increases in physical preparedness or due to Whenever possible, both P values and effect size estimates
getting better at the test. The presence of a learning effect in should be reported [14].
data could lead to a misinterpretation. Furthermore, many Concerning practical significance, there may be many
of the phase comparison techniques assume that the base- occasions where a data visualization depicts the whole story
line phase data are not trended. Fortunately, the ECL can and measures of probability and effect size are simply icing
control for linear trends and the Tau-U can control for non- on the cake [32, 50]. Consider Fig. 3 that depicts changes in
linear trends [44]. If practitioners suspect a learning effect an athlete’s jumping peak power across a training macrocy-
might be present, one of the aforementioned options may be cle. When only considering the raw data, it may be observed
sought. Single-subject design phase comparison techniques that the athlete’s peak power is increasing. This notion is
are similar to group means comparison techniques with further justified by the smoothed trendline in blue along
larger samples. Practitioners looking for a single-subject with the shaded 95% confidence interval. Individual training
design alternative to a regression or prediction technique phases are indicated by the dashed vertical lines. The first
should consider the Theil–Sen slope. To the current author’s 8 weeks of training were part of a hypertrophy phase that
13
Journal of Science in Sport and Exercise
Reliability
is especially true with elite sports where ample data is avail- Previous authors have justified the usefulness of CV, SEM,
able publicly [29, 34, 52]. This data can be explored and and LOA for the exercise or sport scientist as they provide
manipulated to evaluate relationships and produce predictive implications for measurement precision and the improved
models. This information can then be used to make more ability to infer results to other samples [2, 65].
informed decisions about player development. The data col- CV, SEM and LOA are not equal measures of absolute
lected in strength and conditioning and sport science pro- reliability and distinction of the appropriate measurement
grams can often be used in the same way as an abundance of is critical. SEM and LOA both assume the data is homosce-
data is collected through monitoring programs [3, 9, 11, 56]. dastic, meaning that every data point has the same chance
Unfortunately, all data and all data collection instruments of variance regardless of magnitude. CV, on the other hand,
are not always useful. As such, there are some concerns that assumes the data are heteroscedastic, where there is an
13
Journal of Science in Sport and Exercise
unequal chance of variance based on measure magnitude. Evaluating validity is generally more difficult than evalu-
Thus, if heteroscedasticity is present, a CV may be more ating reliability. Evaluating reliability requires multiple tri-
useful. Heteroscedasticity is commonly present in measure- als, but evaluating validity also requires a criterion measure
ment of sport science data, but one should conduct a test of or actual competition data. Practitioners may not have access
hetero/homoscedasticity prior to application of a measure of to “gold standard” equipment, so this may not be as practi-
reliability to ensure they are using the appropriate measure cal as performing their own reliability analysis. Assuming
[2]. Therefore the use of ICC is still recommended when access to criterion measurement equipment is available, vali-
appropriate, and the proper usage of a measure of absolute dation of concurrent validity is often completed via Pear-
reliability should also be considered [2, 65]. son’s Product-Moment (PPM) correlations [38, 41, 58]. The
results are then interpreted via the r value. This can prove
problematic if this is the only method of measurement of
Validity validity. Consider the two data sets in Table 1 of theoreti-
cal jumping peak power values. Running a PPM correlation
Validity is the similarity between the measured value and the yields a perfect 1 r value, but these data are not the same.
actual value. A measure must first be reliable before it can A paired samples t test reveals a P value of < 0.000 and a
be considered valid. In fact, validity is dependent on reli- Cohen’s d effect size estimate reveals a value of 1.43 indicat-
ability along with relevance [41, 58]. As such, it is possible ing that these are both statistically and practically different.
for a measure to be reliable, but not valid if the measure is While these circumstances may be unlikely, it is possible
not relevant to its objectives. For example, a measure can be that two measurement devices can be highly correlated but
incorrect, but consistently incorrect leading to an acceptable statistically and practically different as has been seen previ-
level of reliability. ously in research [4, 5]. As a result, statistical validation
There are may different types of validity. Logical, eco- should include multiple methods such as a PPM correlation
logical, and criterion validity are likely the ones most rel- and some form of means comparison (ANOVA, t-test, limits
evant to athlete monitoring. Logical or face validity refers of agreement, etc.) [6, 41].
to the way a test looks on the surface and it should logically
measure what it is claiming to. This is quite important for Heteroscedasticity and Measurement Error
coaches and athletes alike, as they may not fully participate
in or support a test that does not show immediate perceived It is important to remember that all measurement will con-
value [38, 58]. Ecological validity is concerned with the tain some error. Even the most valid methods will have
application of the findings to actual competition scenarios. some amount of error. Theoretically we can consider the
Ecological validity is very important in athlete monitoring observed value as the sum of the true value and the error
as the application of the findings is desired in a very short value (observed value = true value + error value). The
period of time [38]. Criterion validity utilizes scores on true value is what one should strive to measure, but is
some criterion measure to establish either concurrent or pre- not actually possible to attain [41, 58]. There are many
dictive validity. Concerning the data collection and instru- sources of potential error and some such as methodology
mentation part of athlete monitoring, concurrent validity can and instrumentation based error have been mentioned
be established by examining the measures obtained via a already. One area of potential error that is often neglected
specific method along with those simultaneously measured is the magnitude of the measure itself. If athletes who pro-
by a previously validated “Gold Standard” device [58]. For duce extreme values (very high or very low) have a greater
example, the force plate might be considered the criterion
measure for analyzing jump performance as many variables
can be attained from force–time data collected at high sam- Table 1 Two data sets of PP1 PP2
countermovement jump peak
pling frequencies. But, depending on software, force–time power (PP) represented in watts 2917.97 4376.95
curve analysis may take a significant amount of time and (W)
3000.35 4500.52
there may be difficulties with portability. A switch mat may
3397.00 5095.50
be a more practical way to measure jump performance, but
3349.39 5024.09
it should be validated against a force plate, as has been done
3755.57 5633.36
in research [10]. Predictive validity is concerned with the
3530.48 5295.72
predictive value of the data obtained with a test. Concerning
3227.84 4916.76
athlete monitoring, this refers to the ability to predict future
4077.68 6116.52
sport performance. Sport scientist may be concerned with
3206.73 4810.10
the predictive validity of a single measure or a combination
3774.15 5661.23
of measures in a model [38].
13
Journal of Science in Sport and Exercise
13
Journal of Science in Sport and Exercise
Regardless of the method chosen, the data given back to common way to do this is with the z score or t score [38,
coaches or other decision makers likely comes in the form 41]. Most statistical software has formulas built in for each,
of a collection of charts, plots, or dashboards for each ath- but they are easy to calculate by hand if not. The decision
lete. There are many factors that should be considered when about which standardized score to use is generally based
returning data in the form of a visualization. In Edward upon sample size. Both formulas use the standard devia-
Tufte’s landmark work on the subject, he promotes several tion, but the z score is supposed to use the standard devia-
tenets of graphical excellence and best practices. The main tion of the population that it is representing, not necessarily
ones discussed in this paper will be that practitioners should the standard deviation of the sample being tested. Thus, the
represent as much data as possible in as little space as pos- general recommendation is to use z scores with sample sizes
sible, graphics should not distort what the data is saying, and greater than or equal to 30 and t scores with smaller samples
be fairly clear in its purpose [59]. [41]. That being said, one could argue that their team of 22
athletes is the population as well as the sample, so a z score
Efficient Data Display may still be appropriate. A second concern is the desired
magnitude direction. For example, Fig. 5 displays baseball
Concerning the presentation of large quantities of data in a monitoring data. For several of the variables it is desirable
small space, Tufte presents the concept of “Data-ink” which that the data points be further away from the center [jump
is a ratio of how much ink is used to represent actual data height (JH), rate of force development (RFD), peak power
or change in data to the ink required to produce the graphic (PP)]. There may be other variables where a smaller value
[59]. Essentially, it is the ratio of ink dedicated to the nec- is desired, such as the time it takes to reach first base. This
essary display of information over that which is redundant may lead to some confusion if not explained well. The final
or unnecessary. Certain types of charts have inherently bad concern is that once data are converted into standardized
data-ink ratios, such as the pie chart as it can generally be scores, the units are no longer necessary and magnitudes
represented by a small table. Examples of a high data-ink may be difficult to interpret. All of these concerns should
ratio are the time series (seen in Fig. 3) and radar plot [16, be considered and addressed, but if the graphic causes too
39, 50]. Radar plots (Fig. 5) allow practitioners to display much confusion, then it may be time to simplify [16, 59].
multiple variables in a single graphic along with changes in
time or performance comparisons between athletes [16, 38].
Fundamentally, a radar plot is just a line graph with mul- Misrepresenting the Data
tiple data series’ that have been formed into a round shape
[16]. Some potential concerns to the radar plot are that if Unfortunately, data visualizations can be misleading. For
one is using different measurement scales [e.g. peak force in the purposes of athlete monitoring, misleading the viewer
Newtons (4982 N) and jump height in meters (0.51 m)] data is likely an accident, but it can lead to making an incor-
will have to be normalized, otherwise the smaller numbers rect decision. It is important to follow some best practices
will not be visible on the shared axis when potted. The most or guidelines when producing visualizations so this can be
avoided.
One potential way for this to happen in athlete monitoring
is by not displaying all the data or by not collecting enough
data. If only preseason and postseason data are displayed,
one might be mislead about what happened along the way or
some effect might seem magnified as was the case in Fig. 1.
Assuming regular testing is occurring, enough data should
be available to produce time-series plots that are easily read-
able for viewers [50].
Misrepresenting the y axis may be the most common
issue in data visualizations. For example, in Fig. 6 the same
athlete’s countermovement jump data is used to create both
plots. The plot on the left looks highly variable and seems
to show a dramatic increase after the first two measurements
for those not paying attention to the y axis tick marks. The
magnitude of difference is misleading here. Standardizing
the y axis will help us avoid this mistake. Fixing the y axis at
Fig. 5 A radar plot comparing athlete performance data of three ath-
letes (JH jump height, Mass body mass, TimetoFirst home to first zero illustrates the difference in Fig. 6 and it is a good form
time, RFD rate of force development, PP peak power) for all plots to start y axes at zero as a result [16, 53, 59].
13
Journal of Science in Sport and Exercise
Misrepresenting data happens frequently and the mag- data based on chart area and our visual perception is lim-
nitude of misrepresentation can be quantified via Tufte’s ited in its ability to do this task. If a pie chart is rotated so
Lie Factor [59]. that no side of any slice is directly in line with either the x
or y axis, perception is weakened further. Looking at the
Size of effect shown in graphic
Tufte’s lie factor = pie chart in Fig. 7, it may be relatively easy to determine
Size of effect in data
that Catchers (C) represent 25% of the data because both
If the change between points 2 and 3 in Fig. 6 is meas- sides of its slice are on the x and y axis. Turning your head
ured at 50 mm in the plot on the left and only 5 mm in the or the image slightly increases the difficulty of determin-
plot on the right, then the Lie Factor of the first plot is 10. ing its value [51]. Determining the value of any of the
other positions is likely much more difficult. Bar charts
50 mm are easier to interpret and can illustrate the same informa-
Tufte’s lie factor = = 10
5 mm tion, leading to the recommendation to replace pie charts
Data visualizations can distort effects in both directions, with bar charts or a simple table whenever possible [15].
so a Lie Factor can be above or below 1.0. According to Speedometer or gauge plots are popular in performance
Tufte, anything outside of the range of 0.95–1.05 is sub- based dashboards, but they are fundamentally just a differ-
stantial distortion. Thus the example shown in Fig. 6 is ent version of a pie chart as they represent fractional com-
substantial distortion. ponents. These are often more complicated to produce and
Sometimes misrepresenting data may not entirely be the are extremely poor performers in Tufte’s Data-ink ratio as
fault of the data visualization creator. Consider pie charts, they only represent one value (e.g. 85% of total) [16, 59].
which have understandably gone out of favor with many While some plots may appear elegant or visually appeal-
data scientists. [15, 51]. While pie charts are familiar to ing, if they offer little information relative to the amount
many, they force viewers to make comparisons between of ink required to create the graphic, space and time are
not being used efficiently. Finally, some attention should
13
Journal of Science in Sport and Exercise
References
1. Amrhein V, Greenland S, McShane B. Retire statistical signifi-
cance. Nature. 2019;567(7748):305–7.
2. Atkinson G, Nevill AM. Statistical methods for assessing meas-
urement error (reliability) in variables relevant to sports medi-
cine. Sports Med. 1998;26(4):217–38.
3. Bailey C, McInnis T, Batcher J. Bat swing mechanical analysis
with an inertial measurement unit: reliability and implications
for athlete monitoring. J Trainol. 2016;5(2):43–5.
4. Bampouras T, Relph N, Orne D, Esformes J. Validity and reli-
ability of the myotest pro wireless accelerometer. Br J Sports
Med. 2010;44(14):i20.
5. Batcher J, Nilson K, North T, Brown D, Raszeja N, Bailey
C. Validity of jump performance measures assessed with
field-based devices and implications for athlete monitoring. J
Strength Cond Res. 2017;31(p):s82–162.
6. Bland J, Altman D. Statistical methods for assessing agree-
ment between two methods of clinical measurement. Lancet.
1986;1(8476):307–10.
7. Bosco C, Colli R, Bonomi R, Atko Viru SVD. Monitoring
strength training: neuromuscular and hormonal profile. Med
Sci Sports Exerc. 2000;32(1):202–8.
8. Breusch T, Pagan A. A simple test for heteroskedastic-
ity and random coefficient of variation. Econometrica.
1979;47(5):1287–94.
9. Bricker J, Bailey C, Driggers A, McInnis T, Alami A. A new
Fig. 7 A comparison of the same theoretical baseball positional data method for the evaluation and prediction of base stealing per-
represented as a pie chart, bar chart, and table (C catcher, P pitcher, formance. J Strength Cond Res. 2016;30(11):3044–50.
IF infielder, OF outfielder) 10. Buckthorpe M, Morris J, Folland J. Validity of vertical jump
measurement devices. J Sports Sciences. 2012;30(1):63–9.
11. Camp C, Tubbs T, Flesig G, Dines J, Dines D, Altcheck D,
Dowling B. The relationship of throwing arm mechanics and
elbow varus torque: within-subject variation for professional
baseball pitchers across 82,000 throws. Am J Sports Med.
be paid to the choice of color palette used. The ‘viridis’ 2017;45(13):3030–5.
color palette (available in R and Python, used in Fig. 7) is 12. Clubb J, McGuigan M. Developing cost-effective, evidence-
accessible to those with different types of colorblindness, based load monitoring systems in strength and conditioning
so it will be clear to most who view graphics with it [21]. practice. Strength Cond J. 2018;40(6):7–14.
13. Driggers A, Bingham G, Bailey C. The relationship of throwing
arm mechanics and elbow varus torque: letter to the editor. Am
J Sports Med. 2018;47(1):1–5.
14. Ellis P. The essential guide to effect sizes: Statistical power,
Conclusion meta-analysis, and the interpretation of research results. 1st ed.
Cambridge: University Press; 2010.
15. Few S. Save the pies for dessert. Visual business intelligence
Data collection during strength and conditioning and sport newsletter;2007.
performance is on the rise and its use in athlete monitor- 16. Few S. Information dashboard design: displaying data for at-a-
ing is also increasing. While the usage of this data for glance monitoring. 2nd ed. Burlingame: Analytics Press; 2013.
17. Foster C. Monitoring training in athletes with reference to over-
purposes of creating more informed training programs and training syndrome. Med Sci Sports Exerc. 1998;30(7):1164–8.
potential performance prediction is promising, there are 18. Fry A, Kraemer W, Borselen F, Lynch J, Marsit J, Roy E, Knutt-
some statistical concerns that should be addressed by those gen H. Performance decrements with high-intensity resistance
who use this data. At minimum, analyses of reliability and exercise overtraining. Med Sci Sports Exerc. 1994;26(9):1165–73.
19. Fry A, Kraemer W, Lynch J, Triplett NT, Koziris L. Does short-
the assumption of homoscedasticity should be evaluated. term near maximal intensity machine resistance exercise induce
This should be done by all practitioners with their data, not overtraining? J Strength Cond Res. 1994;8(3):75–81.
relying on published findings of other samples. If possible, 20. Fry A, Webber J, Weiss L, Fry M, Li Y. Impaired performance
concurrent validity of devices should also be evaluated. with excessive high-intensity free-weight training. J Strength
Cond Res. 2000;14(1):54–61.
Following any analysis, the data return process should not 21. Garnier S. Viridis: Default color maps from ‘matplotlib’. 2018.
be overlooked. Data should be visualized in a simple and https://CRAN.R-project.org/package=viridis. Accessed 9 Jul
clear manner that does not result in distortion. This will 2019.
likely result in an efficient decision making process and 22. Gonzalez-Badillo J, Gorostiaga E, Arellana R, Izquierdo M.
Moderate resistance training volume produces more favorable
more informed athlete development programs.
13
Journal of Science in Sport and Exercise
strength gains than high or low volumes during a short-term 45. Perez-Castilla A, Piepoli A, Delgado-Garcia G, Garrido-Blanca
training cycle. J Strength Cond Res. 2005;19(3):689–97. G, Garcia-Ramos A. Reliability and concurrent validity of seven
23. Haff G, Carlock J, Hartman M, Kilgore J, Kawamori N, Jackson commercially available devices for the assessment of movement
J, Stone M. Force-time curve characteristics of dynamic and iso- velocity at different intensities during the bench press. J Strength
metric muscle actions of elite women olympic weightlifters. J Cond Res. 2019;33(5):1258–65.
Strength Cond Res. 2005;19(4):741–8. 46. R Core Team. R: A language and environment for statistical com-
24. Halson S, Jeukendrup A. Does overtraining exist? An analy- puting. Vienna, Austria: R foundation for statistical computing.
sis of overreaching and overtraining research. Sports Med. 2017. https://www.R-project.org/. Accessed 9 Jul 2019.
2004;34(14):967–81. 47. Rose T. The end of average. 1st ed. New York: HarperOne; 2016.
25. Hickey J, Shield A, Williams M, Opar D. The financial cost of 48. Sands W. Monitoring the elite female gymnast. Natl Strength
hamstring strain injuries in the australian football league. Br J Cond Assoc J. 1991;13(4):66–72.
Sports Med. 2014;48(8):729–30. 49. Sands W, Stone M. Monitoring the elite athlete. Olymp Coach.
26. Hoffman J, Kaminsky M. Use of performance testing for moni- 2005;17(3):4–12.
toring overtraining in youth basketball players. Strength Cond J. 50. Sands W, Kavanaugh A, Murray S, McNeal J, Jemni M. Mod-
2000;22(6):54–62. ern techniques and technologies applied to training and perfor-
27. Hopkins W. Measures of reliability in sports medicine and sci- mance monitoring. Int J Sports Physiol Perform. 2017;12(Suppl
ence. Sports Med. 2000;30(1):1–15. 2):S263–72.
28. Joyce D, Lewindon D. High-performance training for sports. 1st 51. Schwabish J. An economist’s guide to visualizing data. J Econ
ed. Champaign: Human Kinetics; 2014. Perspect. 2014;28(1):209–34.
29. Kagan D. The anatomy of a pitch: doing physics with pitchf/x 52. Sikka R, Baer M, Raja A, Stuart M, Tompkins M. Analytics in
data. Phys Teacher. 2009;47(7):412. sports medicine: implications and responsibilities that accompany
30. Kellman M. Enhancing recovery: Preventing underperformance the era of big data. J Bone Jt Surg. 2019;101(3):276–83.
in athletes. 1st ed. Champaign: Human Kinetics; 2002. 53. Smith M. Conversations with data #31: bad charts. 2019. https://
31. Kellman M, Beckmann J. Sport, recovery, and performance: inter- datajournalism.com/read/newsletters/bad-charts. Accessed 21 Jul
disciplinary insights. 1st ed. New York: Routledge; 2018. 2019.
32. Kinugasa T, Cerin E, Hooper S. Single-subject research designs 54. Stone M, Stone M, Sands W. Science and practice of resistance
and data analyses for assessing elite athletes’ conditioning. Sports training. 1st ed. Champaign: Human Kinetics; 2007.
Med. 2004;34(15):1035–50. 55. Strathern M. Improving ratings: audit in the British University
33. Krustrup P, Mohr M, NYbo L, Jensen J, Nielsen N, Bangsbo J. system. Eur Rev. 2019;5(3):305–21.
The yo-yo ir2 test: physiological response, reliability, and applica- 56. Suchomel T, Bailey C. Monitoring and managing fatigue in base-
tion to elite soccer. Med Sci Sports Exerc. 2006;38(9):1666–73. ball players. Strength Cond J. 2014;36(6):39–45.
34. Lage M, Ono J, Cervone D, Chiang J, Dietrich C, Silva C. Stat- 57. Tabachnick B, Fidell L. Using multivariate statisitcs. 5th ed. Bos-
Cast dashboard: exploration of spatiotemporal baseball data. IEEE ton: Pearson; 2015.
Comput Graph Appl. 2016;36(5):28–37. 58. Thomas J, Nelson J, Silverman S. Research methods in physical
35. Lani J. Heteroscedasticity. 2019. https://www.statisticssolutions. activity. 7th ed. Champaign: Human Kinetics; 2015.
com/heteroscedasticity/. Accessed 09 Jul 2019. 59. Tufte ER. The visual display of quantitative information. Chesh-
36. Lee J, Cherney L. Tau-u: a quantitative approach for analysis ire: Graphic Press; 2001.
of single-case experimental data in aphasia. Am J Speech Lang 60. Vaeyens R, Lenoir M, Williams A, Philippaerts R. Talent iden-
Pathol. 2018;27(1S):495–503. tification and development programmes in sport: current models
37. Levene H. Robust tests for equality of variances. In: Olkin I, edi- and future directions. Sports Med. 2008;38(9):703–14.
tors. Contributions to probability and statistics: essays in honor of 61. Vannest K, Parker R, Davis J, Soares D, Smith S. The Theil-Sen
Harold Hotelling. Palo Alto, CA: Stanford University Press; 1960. slope for high-stakes decisions from progress monitoring. Behav
p. 278–292. Disord. 2012;37(4):271–80.
38. McGuigan M. Monitoring training and performance in athletes. 62. Vannest K, Parker R, Gonen O, Adiguzel T. Single case research:
1st ed. Champaign: Human Kinetics; 2017. web based calculators for scr analysis. 2016. http://www.singl
39. McGuigan M, Cormack S, Gill N. Strength and power profiling of ecaseresearch.org/. Accessed 05 Jul 2019.
athletes: selecting tests and how to use information for program 63. Van Rossum G. Python tutorial, technical report cs-r9526.
design. Strength Cond J. 2013;35(6):7–14. Amsterdam, Netherlands: Centrum voor Wiskunde en Informatica
40. Meeusen R, Duclos M, Foster C, Fry A, Glesson M, Neiman D, (CWI). 1995. http://www.python.org/. Accessed 9 Jul 2019.
Urhausen A. Prevention, diagnosis and treatment of the overtrain- 64. Vincent W, Weir J. Statistics in kinesiology. 5th ed. Champaign:
ing syndrome: joint consensus statement of the european college Human Kinetics; 2012.
of sport sciences (ecss) and the american college of sports medi- 65. Weir J. Quantifying test–retest reliability using the intra-
cine (acsm). Med Sci Sports Exerc. 2013;45(1):186–205. class correlation coefficient and the sem. J Strength Cond Res.
41. Morrow J, Mood D, Disch J, Kang M. Measurement and evalua- 2005;19(1):231–40.
tion in human performance. 5th ed. Champaign: Human Kinetics; 66. Wicknham H, Grolemund G. R for data science. 1st ed. Sebas-
2016. topol: O’Reilly Media; 2017.
42. Nuzzo J, Anning J, Scharfenberg J. The reliability of three devices 67. Wing C. Monitoring athlete load: data collection methods and
used for measuring vertical jump height. J Strength Cond Res. practical recommendations. Strength Cond J. 2018;40(4):26–39.
2011;25(9):2580–90. 68. Zatsiorsky V, Kraemer W. Science and practice of strength train-
43. Ozturk S, Kilic D. What is the economic burden of sports injuries. ing. 2nd ed. Champaign: Human Kinetics; 1995.
Jt Dis Relat Surg. 2013;24(2):108–11.
44. Parker R, Vannest K, Davis J. Effect size in single case
research: a review of nine nonoverlap techniques. Behav Modif.
2011;35(4):303–22.
13