Lecture Methods 3
Lecture Methods 3
BY
SANGWANI CHAVULA
BRANCHES OF STATISTICAL ANALYSES
• The two main branches of statistics are :-
1. descriptive statistics
2. inferential statistics
1. Descriptive statistics covers;-
– Descriptive statistics generally precedes inferential
statistics and at times gives insights on the
inferential techniques to be used.
– It is also referred to as exploratory data
analysis( EDA)
Examples:
• Organization of data
• Graphical presentation of data
• Evaluation of appropriate summary statistics e.g.
means, variance, standard deviation
2. Inferential statistics
– generalises sample findings to the broader
population.
– It enables the statistical validity of the question being
investigated to be assessed with respect to the
experimental objectives thereby enabling inferences
to be drawn.
It covers:
– Hypothesis testing
– Significance tests & Confidence intervals
DESCRIPTIVE STATISTICS
• Large data sets need to be summarised before
any useful information can be obtained from
it.
• Techniques used to summarise information
include a variety of graphs and tables.
• They summarise data in a concise, unbiased
and logical manner and enhance
understanding of the information at hand.
• Graphs are intended to show an overview
while tables present more detail.
• Graphs and tables should be self explanatory.
Label the graphs and tables clearly, and
identify the units of measurement.
• Numerical descriptive measures are numbers
which enable us to fully describe the profile of
a random variable.
The main descriptors are
1. Measures of central location
2. Measures of dispersion (spread)
3. Measure of skewness
1. Measures of central tendency
• Measures of central tendency tell us where most
of the data are concentrated.
• These include the arithmetic mean, mode, median.
a. MEAN
– Most commonly used statistic in statistical calculations.
– Other means include the geometric mean (when data
is not normally distributed or the data measures the
rate of change), the harmonic mean (disproportionate
data) and the weighted mean (stratified random
sampling in surveys).
b. MODE
-the most frequently occurring value in a
sample of n data values, occurring with the
highest frequency.
• If the data has only one mode, it is
described as unimodal, if there are two
modes, then it is bimodal etc. depending on
the number of modes there are in the data
C. MEDIAN
• It is the middle value when measurements are
arranged from lowest to highest, the middle
number of an ordered set of data.
• For an odd numbered sample size the median
= (n+1)/2
• For even numbered sample size the median is
the average of the two middle-most position
• The median position starts at the position n/2
2. MEASURES OF SPREAD
• Dispersion (or spread) refers to the extent to
which data values scatter about their central
location value, it is an indication of variability
of the data.
• This influences the confidence that an analyst
can have on the representativeness and
reliability (stability) of central location
measures.
• Widely dispersed data values indicate low
reliability and less confidence in the central
measure as a representative measure of a
sample of data.
• A high concentration of data values about
their central location indicates high reliability
and hence greater confidence in the
representativeness of the central location
value.
Types of measures of dispersion
– Range
– Variance
– Standard deviation
I. Range- the difference between the maximum
and minimum data values in the data set.
– R= x max –xmin
II. VARIANCE
Symbols:
– population variance σ2 - parameter
– Sample variance s2 – statistic
• WRITE THE FORMULA
FORMULA
• The coefficient of variation is given as a
percentage.
– The smaller the CV, the more concentrated the data
values are about the mean, conversely a large CV
implies that the data values are more widely spread/
dispersed about the mean value.
3. MEASURES OF SKEWNESS
• The skewness of a statistic describes the shape
of uni-modal distribution of numeric data
• Assuming a unimodal distribution, three
common shapes are :
1. Symmetrical shapes
2.Positively skewed shapes (skewed to the right)
3. Negatively skewed shapes (skewed to the left)
BE IN GROUPS AND WRITE DOWN THE
1. Symmetrical shapes
2.Positively skewed shapes (skewed to the right)
3. Negatively skewed shapes (skewed to the left)
• AND THERE DIAGRAMS
Probability distributions
• Probability
– We may think of a probability as a number
between 0 and 1 that is assigned to an event to
indicate the likelihood of occurrence of the event.
– It is assigned to an inferential statement about a
population under investigation to indicate the
degree of reliability of the inference.
• Events which cannot occur are assigned a
probability of 1 and those that will certainly
occur, are assigned a probability 1 and any
other events are assigned probability between
0 and 1 exclusive.
How do we assign probabilities?
• If you flip a coin, you either get a “head” or
“tail”. This is called a trial.
• The result is called an outcome of the trial.
• A collection of some outcomes of a trial is an
event.
• An event is the occurrence of one or more
outcomes of interest to us.