statss-2
statss-2
Non-Parametric Tests Mann-Whitney U Test Alternative to Independent t-Test for Ordinal, non-normal
non-normal distributions or ordinal data continuous
Mean The arithmetic average Continuous To find a central value Single numeric value.
of a set of values. (Interval/Ratio) for normally distributed
(or roughly symmetric)
data.
Median The middle value in a Continuous or When the data is Single numeric value.
sorted list of values. Ordinal skewed or has outliers; a
robust measure of
central tendency.
Mode The most frequently Any (Continuous, When identifying the Single numeric value
occurring value in a Ordinal, Nominal) most common category (or category).
data set. or value is important;
also relevant for
categorical data.
Variance The average of the Continuous Foundation for Standard Single numeric value
squared differences (Interval/Ratio) Deviation; used in (square of SD).
from the mean. inferential tests (ANOVA,
t-test, etc.).
Interquartile Range The difference Continuous or To measure the middle Single numeric value
(IQR) between the 75th (Q3) Ordinal 50% spread; less (Q3 - Q1).
and 25th (Q1) affected by outliers than
percentile of data. range or SD.
Kurtosis A measure of the Continuous To understand the shape Single numeric value
"tailedness" (peak and (Interval/Ratio) of a distribution, (positive, negative, or
tail thickness) of a particularly how heavy or zero).
distribution. light the tails are.
Skewness A measure of the Continuous When checking whether Single numeric value
asymmetry of a (Interval/Ratio) the distribution is (positive, negative, or
distribution symmetrical or skewed. zero).
(positive/right or
negative/left skew).
2. Distributions
Distribution Definition / Type of Data When to Use Typical Result
Purpose
Parameter A value that describes Population-level When describing the Single numeric value (e.g., μ\mu).
a characteristic of a entire population's
population (e.g., true measure.
μ\mu, σ\sigma).
Statistic A value that describes Sample-level When describing a Single numeric value (e.g., xˉ\bar{x}).
a characteristic of a sample’s measure,
sample (e.g., used to estimate the
xˉ\bar{x}, ss). population
parameter.
Standard Error The standard Continuous (for When estimating how Single numeric value (spread of sample
(SE) deviation of a means) much a sample mean means).
sampling distribution varies from one
(often the distribution sample to another.
of sample means).
Confidence A range of values Continuous (for When wanting an Interval: [Lower bound,Upper bound][
Interval likely to contain the means) or interval estimate \text{Lower bound}, \text{Upper bound} ].
true population proportions (uncertainty range) of
parameter at a a parameter (mean,
chosen confidence proportion, etc.).
level (e.g., 95%).
Null Hypothesis Statement that there Any (depends on When performing Usually “no difference” or “no relationship”
(H0H_0) is no effect or test) hypothesis testing; statement.
difference; the default the assumption to be
assumption in tested (and possibly
hypothesis testing. rejected).
Alternative Statement that there Any (depends on The claim you usually Usually “there is a difference” or “there is a
Hypothesis is an effect or test) want to find evidence relationship”.
(H1H_1 or difference, for; tested indirectly
HaH_a) contradicting the null by attempting to
hypothesis. reject H0H_0.
Hypothesis Statistical analysis to Any (depends on Whenever making A p-value, decision to reject or fail to reject
Testing decide whether to test) inferences about H0H_0.
reject the null population
hypothesis. parameters or
relationships from
sample data.
Type I Error Rejecting H0H_0 Any (depends on Controlling this error Probability of a false alarm.
(α\alpha) when H0H_0 is test) rate is crucial in
actually true (False hypothesis testing
Positive). (often set
α=0.05\alpha = 0.05).
Type II Error Failing to reject Any (depends on Want to keep β\beta Probability of missing a true effect.
(β\beta) H0H_0 when H0H_0 test) low to avoid missing
is actually false (False a true effect; related
Negative). to power = 1−β1 -
\beta.
Statistical Power Probability of correctly Any (depends on Important when Numeric value between 0 and 1 (commonly
rejecting H0H_0 when test) designing studies to 0.8 or higher is desired).
H0H_0 is false. ensure enough
sample size and
effect detectability.
Test Parametric / When to Use Variables & Data Type Key Assumptions Result / Output
Non-Parametric
t-test (general) Parametric Compare means of DV: Continuous 1) Normal distribution in each t-statistic, p-value
two groups (Interval/Ratio), IV: 2 group2) Similar variances (for
groups (nominal) Student’s t-test)3) Independent
observations
Independent Samples Parametric Compare means of DV: Continuous, IV: Same as above + groups are t-statistic, p-value
t-test two independent Categorical with 2 independent
groups levels
Paired Samples t-test Parametric Compare means of DV: Continuous, IV: Differences are normally t-statistic, p-value
the same group Time/Condition distributed, measurements
measured twice (within-subject) paired
Welch's t-test Parametric Compare means of DV: Continuous, IV: Normal distribution in each t-statistic, p-value
two independent Categorical with 2 group, but does NOT assume
groups with unequal levels equal variances
variances
Mann-Whitney U test Non-Parametric Compare two DV: Ordinal or Independent observations, U-statistic, p-value
independent groups Continuous (skewed), distributions have similar
(median comparison) IV: Categorical with 2 shape
when data isn’t levels
normal
Paired Sample Non-Parametric Compare two related DV: Ordinal or skewed Data are paired, differences in W-statistic, p-value
Wilcoxon Rank Test groups (matched or Continuous, IV: ranks tested
repeated measures) repeated measure or
when data isn’t matched pairs
normal
ANOVA (one-way) Parametric Compare means of DV: Continuous, IV: Normal distribution, F-statistic, p-value
2+ groups on one Categorical with ≥2 homogeneity of variance,
factor levels independent groups
Repeated Measures Parametric Compare means of DV: Continuous, IV: Sphericity (or corrected with F-statistic, p-value
ANOVA 2+ repeated repeated measure Greenhouse-Geisser), normal
conditions on the factor with ≥2 levels distribution of differences
same participants
Factorial ANOVA Parametric Compare means of DV: Continuous, Normal distribution, F-statistics (main
2+ groups across multiple IVs (each homogeneity of variance, effects & interactions),
multiple categorical Categorical) independent groups p-values
factors
Friedman Repeated Non-Parametric Compare 3+ DV: Ordinal or skewed Dependent/paired data, Chi-square statistic,
Measures ANOVA repeated measures Continuous, repeated rank-based test p-value
when data isn’t measures on same
normal subjects
Chi-Square Test Non-Parametric Test relationship DV & IV both Expected frequencies >5 in Chi-square statistic,
between 2+ Categorical each cell (approx), p-value
categorical variables independence of observations
Chi-Square Goodness Non-Parametric Compare categorical DV: Categorical Expected frequencies >5 in Chi-square statistic,
of Fit data distribution to a each category, independence p-value
theoretical expected
distribution
Pearson’s Correlation Parametric Assess linear Both variables: Bivariate normality (roughly), r (from -1 to +1),
(r) relationship between Continuous linear relationship assumption p-value, significance
2 continuous (Interval/Ratio)
variables
Multiple Regression Parametric Predict a continuous DV: Continuous, IVs: Linearity, normality of Regression coefficients
DV from 2+ Continuous or residuals, homoscedasticity, (Betas), p-value, R²
predictors Dummy-coded independence
(continuous or Categorical
categorical)
Compare 2 related groups Continuous Same subjects measured Paired Samples t-test
(pre-post) on a continuous (Interval/Ratio) twice (within-subject)
DV (normal)