Statistcal Techiques - Notes
Statistcal Techiques - Notes
Two main statistical methods are used in data analysis: descriptive statistics, which
summarizes data using indexes such as mean and median and another is inferential statistics,
which draw conclusions from data using statistical tests such as student's t-test.
Selection of appropriate statistical method depends on the following three things:
Aim and objective of the study,
Type and distribution of the data used, and
Nature of the observations (paired/unpaired)
Definition
The term descriptive statistics covers statistical methods for describing data using statistical
characteristics, charts, graphics or tables.
It is important here that only the properties of the respective sample are described and
evaluated. However, no conclusions are drawn about other points in time or the population.
This is the task of inferential statistics or concluding statistics. The various sub-areas of
descriptive statistics can be summarised as follows:
Depending on which question and which measurement scale is available, different key
figures, tables and graphics are used for evaluation. The best known of these are:
Location parameter: Mean value, median, mode, sum
Dispersion parameter: Standard deviation, variance, range
Tables: Absolute, relative and cumulative frequencies
Charts: Histograms, bar charts, box plots, scatter charts, matrix plots
The first group of Descriptive Statistics are location parameter like the mean and mode.
They are used to express a central tendency of the data set. They therefore describe where the
center of a sample is located or where a large part of the sample is located.
The second group are measures of dispersion. They provide information about how much the
values of a variable in a sample differ from each other. Measures of dispersion can therefore
describe how strongly the values of a variable deviate from the mean value: Are the values
rather close together, i.e. are they similar, or are they far apart and thus differ greatly? A
classic example is the standard deviation.
Which measures of location or dispersion are suitable for describing the data depends on the
respective scales of measurement of the variable. Here, a distinction can be made
between metric, ordinal and nominal scales of measurement.
Finally, a large area of descriptive statistics is diagrams such as the bar chart, the pie chart, or
the histogram.
Inferential statistics
definition
Inferential statistics is a branch of statistics that uses various analytical tools to draw
conclusions about the population from sample data. For a given hypothesis about the
population, inferential statistics uses a sample and gives an indication of the validity of the
hypothesis based on the sample collected.
Definition:
Parametric tests are statistical tests that make assumptions about the parameters of the
population distribution from which the sample is drawn. These tests often assume normality
t-Test:
Conditions: Assumes normally distributed data, The variability of the data in each group
variances).
comparing the exam scores of students from three different schools. If the spread of
scores in one school is much wider (higher variance) than the others, it could affect the
validity of our comparison. Homogeneity of variances ensures that the groups are
Regression Analysis:
When precision in parameter estimation is crucial, it means that obtaining accurate and
narrow estimates of population parameters is of utmost importance.
Nonparametric Tests
Definition:
Nonparametric tests are statistical tests that make fewer assumptions about the population
distribution. They are more flexible and applicable to a broader range of data types.
Mann-Whitney U Test:
Use: Compares medians of two independent groups.
Kruskal-Wallis Test: