Statistical Tools - Summary
Statistical Tools - Summary
Mean
Definition: The average of a set of numbers, calculated by
dividing the sum of all values by the number of values.
Uses: Used in statistical analysis to determine the central
tendency of a dataset.
Result Analysis: A high or low mean can indicate where data is
centered. Mean is sensitive to extreme values (outliers), which
can skew the result.
Median
Definition: The middle value of a data set when the values are
arranged in order. If there is an even number of values, the
median is the average of the two middle values.
Uses: Useful in measuring central tendency, especially for
skewed data or data with outliers.
Result Analysis: A comparison between the median and mean
can indicate skewness in the data. The median remains unaffected
by outliers.
Mode
Definition: The value that appears most frequently in a data set.
Uses: Used to understand the most common or repeated value in
a dataset.
Result Analysis: If the data has one mode, it's unimodal; multiple
modes indicate bimodal or multimodal distributions. It helps in
categorical data analysis.
Standard Deviation
Definition: A measure of the dispersion or spread of a set of
values from the mean. It shows how much the values deviate
from the average.
Uses: Used to understand the variability or consistency of a
dataset.
Result Analysis: A high standard deviation indicates wide
dispersion, while a low standard deviation shows that data points
are closer to the mean.
Correlation Analysis
Definition: A statistical method used to measure the strength and
direction of the relationship between two variables.
Uses: Helps to understand if and how two variables are related.
Result Analysis: Correlation coefficients range from -1 to +1. A
value close to +1 indicates a strong positive correlation, -1 a
strong negative correlation, and 0 no correlation.
Regression Analysis
Definition: A technique for modeling the relationship between a
dependent variable and one or more independent variables.
Uses: Used for prediction and to understand the relationship
between variables.
Result Analysis: The coefficient of determination (R²) shows how
well the independent variables explain the dependent variable.
Regression coefficients indicate the strength and direction of
relationships.
F-test
Definition: A statistical test used to compare the variances of two
or more groups to see if they are significantly different.
Uses: Often used in ANOVA and regression analysis.
Result Analysis: A significant F-value suggests that at least one
of the group means is different from the others. Used to test
hypotheses about multiple group variances.
T-test
Definition: A statistical test used to compare the means of two
groups and determine if they are significantly different from each
other.
Uses: Commonly used in hypothesis testing.
Result Analysis: A p-value from the t-test indicates whether the
difference between the means is statistically significant. A low p-
value (< 0.05) suggests a significant difference.
Chi-Square Test (X²)
Definition: A test used to determine if there is a significant
association between two categorical variables.
Uses: Often applied in tests of independence and goodness-of-fit
tests.
Result Analysis: The chi-square statistic is compared to a critical
value from the chi-square distribution. A significant result
suggests a relationship between the variables.
Parametric Statistics
Definition: Statistical methods that assume data follows a normal
distribution and has a specific form (e.g., t-tests, ANOVA).
Uses: Used when data meets certain assumptions, such as
normality and homogeneity of variances.
Result Analysis: Results are valid when assumptions are met.
Otherwise, results may be misleading.
Non-Parametric Statistics
Definition: Statistical methods that do not assume a normal
distribution of data (e.g., Mann-Whitney U test, Kruskal-Wallis
test).
Uses: Used for data that doesn't meet parametric assumptions or
is ordinal.
Result Analysis: Useful for analyzing data that is not normally
distributed or has outliers. Interpretation focuses on ranks rather
than actual data values.
Hypothesis Testing
Definition: A method of making decisions using data, whether
from a controlled experiment or observational study.
Uses: To test an assumption about a population parameter.
Result Analysis: Based on p-values, you can either reject or fail
to reject the null hypothesis. Common significance levels are 0.05
or 0.01.
Factor Analysis
Definition: A technique used to reduce data dimensionality by
identifying underlying factors that explain the correlations among
variables.
Uses: Often used in survey research and psychometrics.
Result Analysis: Helps in understanding the structure of a dataset
by identifying latent variables. Factor loadings indicate the
strength of the relationships between variables and factors.