Gg Stats Plt
Gg Stats Plt
{ggstatsplot}: A Biography
Indrajeet Patil
Or How I Learned to Stop Worrying about Data Visualization and Statistical Reporting
1
Genesis
Why a new software?
2
Life in the trenches (c. 2017, Harvard)
External Stimulus Internal Response
Reporting errors:
Interpretation errors:
4
A visualization with statistical summary
6
Action Plan
{ggstatsplot} was born!
(open-sourced on GitHub in 2017; still actively developed)
7
Example function
E.g., for hypothesis about differences between groups
1 ggbetweenstats(iris, Species, Sepal.Length)
Important
Information-rich defaults
parametric
non-parametric
robust
Bayesian
8
And there is more!
10
Show, don’t tell
Without {ggstatsplot} With {ggstatsplot}
Pearson’s correlation test revealed that, across 142
participants, variable x was negatively correlated
with variable y: t(140) = −0.76, p = 0.446. The
effect size (r = −0.06, 95%CI [−0.23, 0.10])
was small, as per Cohen’s (1988) conventions. The
Bayes Factor for the same analysis revealed that
the data were 5.81 times more probable under the
null hypothesis as compared to the alternative
hypothesis. This can be considered moderate
evidence (Jeffreys, 1961) in favor of the null
hypothesis (absence of any correlation between x
and y).
13
User Love
Total downloads > 500K (97 percentile) Total citations > 1000
14
Pleasant Side Effects
Maybe the real treasure was the skills we acquired along the
way!
15
Software Architecture
Breaking down the monolith: 20K(2017) → 1K(2024) lines of code
easyst
effectsize
insight
backend engine
statsExpressions parameters
ggstatsplot performance
Other dependencies
bayestestR
16
Collaborative Solutions
While re-architecting {ggstatsplot}, I started contributing upstream.
Making it a habit
18
Communication
Training material on best practices in software/package development to support
community contributions keeping in mind the diverse backgrounds of contributors.
19
Biography (2017-)
(Or how developing {ggstatsplot} continues to help me grow as a software developer)
Code Quality
Technical Debt
ggstatsplot
Collaboration
Communication
20
Conclusion
{ggstatsplot} offers an intuitive interface for creating
detailed statistical visualizations, enabling users to adopt
rigorous, reliable, and robust workflows for data exploration
and reporting across various academic and industrial
disciplines. It is a well-maintained tool with high-quality
infrastructure and widespread adoption.
21
Thank You 😊
Source code for these slides can be found on GitHub.
22
For more
If you are interested in good programming and software
development practices, check out my other slide decks.
23
Find me at…
Twitter
LikedIn
GitHub
Website
E-mail
24
Session information
1 sessioninfo::session_info(include_base = TRUE)
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os Ubuntu 22.04.5 LTS
system x86_64, linux-gnu
hostname fv-az564-242
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2024-12-08
pandoc 3.5 @ /opt/hostedtoolcache/pandoc/3.5/x64/ (via rmarkdown)
quarto 1.7.2 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.4.2 2024-10-31 [3] local
BayesFactor 0.9.12-4.7 2024-01-24 [1] RSPM
bayestestR 0.15.0 2024-10-17 [1] RSPM
bi 1 0 9 2024 10 03 [1]
25
Appendix
26
Examples of other
functions
27
ggwithinstats()
Hypothesis about group differences: repeated measures design
1 ggwithinstats(
2 data = WRS2::WineTasting, Important
3 x = Wine,
4
5 )
y = Taste
✏️ Defaults
raw data + distributions
descriptive statistics
inferential statistics
effect size + uncertainty
pairwise comparisons
Bayesian hypothesis-testing
Bayesian estimation
parametric
parametric
robust
Bayesian
28
gghistostats()
Distribution of a numeric variable
1 gghistostats(
2 data = movies_long, Important
3 x = budget,
4
5 )
test.value = 30
✏️ Defaults
counts + proportion for bins
descriptive statistics
inferential statistics
effect size + uncertainty
pairwise comparisons
Bayesian hypothesis-testing
Bayesian estimation
parametric
parametric
robust
Bayesian
29
ggdotplotstats()
Labeled numeric variable
1 ggdotplotstats(
2 data = movies_long, Important
3 x = budget,
4
5
y = genre,
test.value = 30 ✏️ Defaults
6 )
descriptive statistics
inferential statistics
effect size + uncertainty
pairwise comparisons
Bayesian hypothesis-testing
Bayesian estimation
parametric
parametric
robust
Bayesian
30
ggscatterstats()
Hypothesis about correlation: Two numeric variables
1 ggscatterstats(
2 data = movies_long, Important
3 x = budget,
4
5 )
y = rating
✏️ Defaults
joint distribution
marginal distribution
effect size + uncertainty
pairwise comparisons
Bayesian hypothesis-testing
Bayesian estimation
parametric
parametric
robust
Bayesian
31
ggcorrmat()
Hypothesis about correlation: Multiple numeric variables
1 ggcorrmat(dplyr::starwars)
Important
✏️ Defaults
inferential statistics
effect size + uncertainty
careful handling of NAs
partial correlations
parametric
parametric
robust
Bayesian
32
ggpiestats()
Hypothesis about composition of categorical variables
1 ggpiestats(
2 data = mtcars, Important
3 x = am,
4
5 )
y = cyl
✏️ Defaults
descriptive statistics
inferential statistics
effect size + uncertainty
goodness-of-fit tests
Bayesian hypothesis-testing
Bayesian estimation
33
ggbarstats()
Hypothesis about composition of categorical variables
1 ggbarstats(
2 data = mtcars, Important
3 x = am,
4
5 )
y = cyl
✏️ Defaults
descriptive statistics
inferential statistics
effect size + uncertainty
goodness-of-fit tests
Bayesian hypothesis-testing
Bayesian estimation
34
ggcoefstats()
Hypothesis about regression coefficients
1 mod <- lm(
2 formula = rating ~ mpaa, Important
3 data = movies_long
4 )
5 ✏️ Defaults
6 ggcoefstats(mod)
estimate + uncertainty
inferential statistics (t, z, F , χ2 )
model fit indices (AIC + BIC)
35
grouped_ variants
Iterating over a grouping variable
36
grouped_ functions
1 grouped_ggpiestats(
2 data = mtcars, Available grouped_ variants:
3 x = cyl,
4 grouping.var = am
5 )
grouped_ggbetweenstats()
grouped_ggwithinstats()
grouped_gghistostats()
grouped_ggdotplotstats()
grouped_ggscatterstats()
grouped_ggcorrmat()
grouped_ggpiestats()
grouped_ggbarstats()
37
Customizability
“What if I don’t like the default plots?” 🤔
38
Modify the look
By changing theme and palette
🎨
1 ggbetweenstats(
2 data = movies_long,
3 x = mpaa,
4 y = rating,
5 ggtheme = ggthemes::theme_economist(),
6 palette = "Darjeeling2",
7 package = "wesanderson"
8 )
39
Too much information
Get only plots:
🙈
1 ggbetweenstats(
2 data = iris,
3 x = Species,
4 y = Sepal.Length,
5 # turn off statistical analysis
6 centrality.plotting = FALSE,
7 results.subtitle = FALSE,
8 bf.message = FALSE,
9 # turn off pairwise comparisons
10 pairwise.display = "none"
11 )
40
{ggstatsplot}: Details
about statistical reporting
41
Supports different statistical approaches
Note
Functions Description Parametric Non- Robust Bayesian
parametric
ggbetweenstats() Between group comparisons ✅ ✅ ✅ ✅
ggwithinstats() Within group comparisons ✅ ✅ ✅ ✅
gghistostats(), Distribution of a numeric variable ✅ ✅ ✅ ✅
ggdotplotstats()
ggcorrmat() Correlation matrix ✅ ✅ ✅ ✅
ggscatterstats() Correlation between two variables ✅ ✅ ✅ ✅
ggpiestats(), Association between categorical ✅ NA NA ✅
ggbarstats() variables
ggpiestats(), Equal proportions for categorical ✅ NA NA ✅
ggbarstats() variable levels
ggcoefstats() Regression modeling ✅ ✅ ✅ ✅
ggcoefstats() Random-effects meta-analysis ✅ NA ✅ ✅
42
Toggling statistical approaches
Parametric Non-parametric
🔀
1 # anova 1 # anova
2 ggbetweenstats( 2 ggbetweenstats(
3 data = mtcars, 3 data = mtcars,
4 x = cyl, 4 x = cyl,
5 y = wt, 5 y = wt,
6 type = "p" 6 type = "np"
7 ) 7 )
8 8
9 # correlation analysis 9 # correlation analysis
10 ggscatterstats( 10 ggscatterstats(
11 data = mtcars, 11 data = mtcars,
12 x = wt, 12 x = wt,
13 y = mpg, 13 y = mpg,
14 type = "p" 14 type = "np"
15 ) 15 )
16 16
17 # t-test 17 # t-test
18 gghistostats( 18 gghistostats(
19 data = mtcars, 19 data = mtcars,
20 x = wt, 20 x = wt,
21 test.value = 2, 21 test.value = 2,
22 type = "p" 22 type = "np"
23 ) 23 )
43
Alternative: Pure Pain
Hunting for packages Inconsistent APIs
44
Benefits in details
{ggstatsplot} combines data visualization and statistical analysis in a single step.
It…
extremely beginner-friendly
45
Simplified data analysis workflow
47
A grain of salt
The “Golem of Prague” problem
❌ Promotes mindless
application of statistical tests.
48
Footnotes
1. (Nuijten et al., Behavior Research Methods, 2016)
49