0% found this document useful (0 votes)
42 views

Mean, Median, Mode, Standard Deviation (Descriptive Statistics)

This document provides an overview of descriptive statistics. It defines descriptive statistics as involving tabulating, depicting, and describing data. It discusses measures of central tendency like the mean, median, and mode for ungrouped data. It also covers measures of variability such as range, interquartile range, and standard deviation. Key measures like percentiles, quartiles, and how to calculate and interpret them are explained. The document provides examples and steps for computing various descriptive statistics.

Uploaded by

jh342703
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Mean, Median, Mode, Standard Deviation (Descriptive Statistics)

This document provides an overview of descriptive statistics. It defines descriptive statistics as involving tabulating, depicting, and describing data. It discusses measures of central tendency like the mean, median, and mode for ungrouped data. It also covers measures of variability such as range, interquartile range, and standard deviation. Key measures like percentiles, quartiles, and how to calculate and interpret them are explained. The document provides examples and steps for computing various descriptive statistics.

Uploaded by

jh342703
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

Descriptive Statistics

• PRESENTATION BY: DR T CHINGWARU


Defining Statistics

• Statistics: Theory and method of analyzing quantitative data from samples


of observations … to help make decisions about hypothesized relations
• There are two types of statistics

– Descriptive Statistics: involve tabulating, depicting, and describing data

– Inferential Statistics: predicts or estimates characteristics of a population


from a knowledge of the characteristics of only a sample of the population
• Non-parametric
• Non-parametric tests do not require any assumptions about normal
distribution, but are generally less sensitive than parametric tests.
• The test for nominal data is the Chi-Square test
– Parametric
• The tests for interval and ratio data include the t-test and ANOVA
Measures of Central Tendency
Ungrouped Data
• Ungrouped data is any array of numbers which have not been
summarized by statistical techniques

• Measures of central tendency reveal information about the


values at the center, or middle part, of a group of numbers (or
ordered array)

• Common Measures of Central Tendency are the :


– Mean
– Median
– Mode
– Percentiles
– Quartiles
The Arithmetic Mean

• The arithmetic mean is commonly called ‘the mean’


• It is the average of a group of numbers
• It is a concept applicable for interval and ratio data
• It is not applicable for nominal or ordinal data
• The mean is computed by summing all values in the data set
and dividing the sum by the number of values in the data set
• Thus, its value is affected by each value in the data set,
including extreme values
Application of Arithmetic
Mean in Statistics

• As a summary statistic of central tendency in data produced


by business and economic processes
• When used in these settings it is important to make the
distinction between
– The population mean: µ and the
– Sample mean X
• The population mean is based on all of the values within the
population
• The sample mean only uses some of the values within a
population
Computing Population Mean

Suppose a company has five departments with 24, 13, 19, 26,
and 11 workers in each department. The population mean
number of workers in each department is 18.6 workers. The
computations follow:
Computing Sample Mean
The calculation of a sample mean uses the same algorithm as for
a population mean and will produce the same answer if
computed on the same data. However, a separate symbol is
necessary for the population mean and for the sample mean.

Given the following set of numbers: 57, 86, 42, 38, 90, and 66.
The sample mean is 63.167. The computations follow:

X
X  X  X  X
1 2 3
 ...  X n
n n
57  86  42  38  90  66

6
379

6
 63.167
Impact of Extreme Values on the Mean

• The mean is the most commonly used measure of central


tendency because of its mathematical properties and because
it uses all the data point in the data set
• However, the mean is affected by extremely large or extremely
small numbers
• Note that for the sample mean example, if the largest number
90 is replaced by the number 1,000 the mean becomes
214.833 as opposed to 63.167
• If the smallest number 38 is replaced by the number 5 the
mean becomes 57.667 as opposed to 63.167
• Extreme values can significantly distort the mean.
The Median

• The median is the middle value in an ordered array of


numbers

• The median applies for ordinal, interval, and ratio data

• Advantage of the median – it is unaffected by extremely large


and extremely small values in the data set

• A disadvantage of the median is that not all the information


from the numbers is used
Computing the Median

• First Step
– Arrange the observations in an ordered array
• Second Step
– For an array with an odd number of terms, the median is the middle
number.
• Third Step
– For an array with an even number of terms, the median is the
average of the two middle numbers.

• Locating the Median


– The median’s location in an ordered array is found by (n+1)/2
Median Example
with an Odd Number of Data

• Let X be an ordered array such that X has the following


values:
3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21, 22
– There are 17 values in the ordered array
– Position of median = (n+1)/2 = (17+1)/2 = 9th position
– Counting from left to right to the 9th position, the median is 15
• Advantage - extreme values do not distort the median
– Note that if 22 (maximum value) is replaced by 100, the median is still
15
– If 3 (minimum value) is replaced by -103, the median is still 15
Median Example
with an Even Number of Data
• Let X be an ordered array such that X assumes the following
values:
3, 4, 5, 7, 8, 9, 11, 14, 15, 16, 16, 17, 19, 19, 20, 21
– There are 16 values in the ordered array
– Position of median = (n+1)/2 = (16+1)/2 = 8.5th position
– The median is a value between the 8th and 9th observations in the
ordered array. The median is 14 + 0.5(15-14) = 14.5 or simply,
(14+15)/2 =14.5
• Advantage - extreme values do not distort the median
– If 21 (maximum value) is replaced by 100, the median is still 14.5
– If 3 (minimum value) is replaced by -88, the median is still 14.5
The Mode

• The mode is the value that occurs most frequently in an array


of data
• The mode applies to all levels of data measurement: nominal,
ordinal, interval, and ratio
• Unimodal: describes data sets with a single mode
• Bimodal: describes data sets that have two modes
• Multimodal: describes data sets that contain more than two
modes
Example of the Mode

• Organizing the data into an


ordered array helps to locate
the mode
• The arrangement of the
numbers represents an
ordered array
• 44 is the value that occurs
most frequently (occurs 5
times).
• The mode is 44
Percentiles
• Percentiles are measures of central tendency that divide a group
of data into 100 parts

• The nth percentile is the value such that at least n percent of the
data are below that value and at most (100 - n) percent are
above that value

• For example: If a plant operator takes a safety examination and


87.6% of the safety exam scores are below that person’s score,
he or she still scores at only the 87th percentile, even though
more than 87% of the scores are lower.

• The median is the 50th percentile and has the same value as the
50th percentile
Percentiles

•Percentiles are stair step values: for example, the 87th and 88th
percentile have no values between them

•Percentile methods are applicable for ordinal, interval, and ratio


data and are not applicable for nominal data

•In general percentiles are not influenced by extreme values in the


data set
Steps in Determining the
Location of the Percentile
1. Organize the data into ascending order

2. Calculate the percentile location (i) using:


Where
P P = percentile
i (n) i = percentile location
100 n = number in the data set

3. Determine the location

- If i is a whole number, the Pth percentile is the average of the value at


the ith location and the value at the (i + 1)th location.

1. If i is not a whole number, the Pth percentile value is located at the


whole-number part of i + 1.
Calculating Percentiles: An Example

• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17


• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Problem: Find 30th percentile
• Number of observations n = 8
• Location of 30th Percentile:

• The location index, i, is not a whole number.


• Therefore put location at whole number portion of ( i + 1) =
2.4 + 1 = 3.4.
• The whole number portion is 3. The 30th percentile is at
the 3rd location of the array: 30th percentile = 13
Quartiles

• Quartiles are measures of central tendency that divide a


group of data into four subgroups or parts
Quartile values are not necessarily members of the data set

• Q1: 25% of the data set is below the first quartile


• Q2: 50% of the data set is below the second quartile
• Q3: 75% of the data set is below the third quartile
• Relationship between Quartiles and percentiles
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the median
• Q3 is equal to the 75th percentile
Measures of Variability:
Ungrouped Data
• Measures of variability are used to describe the spread or
dispersion of data
• By using variability with measures of central tendency, the
result is a more complete description of data
• Measures of variability for ungrouped data include:
– range,
– interquartile range,
– mean absolute deviation,
– variance,
– standard deviation,
– z scores and
– coefficient of variation
Measures of Variability:
Ungrouped Data

• Measures of variability describe the dispersion (spread) of a


set of data or the convergence (unity) of a set of data
• Dispersion explains how far data is spread apart or
disassociates from the mean
• Convergence explains how data moves towards union or
conformity of the mean
• Variability is most frequently expressed in terms of deviation
from the norm or mean. The images in the next slides express
this visually
Variability

No Variability in Cash Flow (same amounts)


Mean

Variability in Cash Flow (different amounts)


Mean
Variability

Variability

No Variability
Range

• The range is the difference between the


largest and smallest values in the data
set
• Usefulness:
– Advantage - simple to compute
– Disadvantages:
• Ignores all data points except the two extremes
• Influenced by extreme values
• Has no reference point
• Has limited use by itself
• Example of range using data provided:
Range  48  35  13
Interquartile Range

Interquartile Range = Q3 – Q1

• The interquartile range contains all values in the interval


between the first and third quartiles

• The interquartile range accounts for the middle 50% of values


in the ordered data set

• The interquartile range is especially useful in situations where


data users are more interested in values toward the middle
and less interested in extremes

• The interquartile range is less influenced by extremes


Population Variance
• Variance is the extent to which individual scores in a distribution of
scores differ from one another

• Population variance is the sum of the square deviations


divided by the number of observations

• Statistics are measured in terms of square units of


measurement

• Square units of measurement are hard to interpret so


variance is typically used as a process of obtaining the
standard deviation of a data set
Example of Population Variance
• Given the following x values, the solution would be expressed
as 26.0 units squared
Population Standard Deviation

• Square root of the population variance

• Easier to interpret in practice than the variance

• Measures the dispersion of the population data from the


mean
Example of Sample Variance

• Sample variances are also expressed as units squared.


Formulae the same except that for a sample its n-1
Example of Sample Standard Deviation

• The sample standard deviation is the square root of the


sample variance

• Easier to interpret in practice than square units

• Sample standard deviation is used as a good estimator of the


population standard deviation
Standard Deviation

• Standard deviation is the square root of the varian


• Most widely used measure to describe the dispersion among a set of
observations in a distribution

• Standard deviation of a population is denoted by:

• The standard deviation of a sample is denoted by:


Standard Deviation

• The smaller the standard deviation, the higher


the quality of the measuring instrument and
your technique
• Also indicates that the data points are also
fairly close together with a small value for the
range.
• Indicates that you did a good job of precision
w/your measurements.
A high or large standard deviation
• Indicates that the values or measurements are
not similar
• There is a high value for the range
• Indicates a low level of precision (you didn’t
make measurements that were close to the
same)

• The standard deviation will be “0” if all the


values or measurements are the same.
Uses of Standard Deviation

• Indicator of financial risk

• Quality Control
– construction of quality control charts
– process capability studies

• Comparing two or more populations


– household incomes in two cities
– employee absenteeism at two plants
– used as a percentage of the mean, the coefficient of variation (CV)
Standard Deviation as an
Indicator of Financial Risk
Symmetric and Asymmetric Distributions

• Data are either symmetric or non-symmetric with respect to


some measure of central tendency
• Statisticians have observed that distributions describing many
types of business and economic data tend to be symmetric or
have a normal shape
• They found that in practical terms the processes that generate
symmetric data have special and exact properties (the
empirical rule) with respect to data concentration
• Non-symmetric distributions, in practice and theory, obey as a
minimum specified rules with respect to the concentration of
data values in a population
Z Scores

• The z score represents the number of standard


deviations a value (x) is above or below the mean

• Data for a z score is normally distributed


• Translates into standard deviations
• Z score formula
Descriptions and Measures of Shape

• Skewness
– Absence of symmetry
– Presence of extreme values in one or other side of a distribution
• Box and Whisker Plots
– Graphic display of a distribution using 5-summary statistics
– Reveals skewness and data location or clustering
Probability Distributions
Showing Symmetry and Skewness
0.4
0.4

0.30
0.30
0.3
0.3
0.25
0.25

0.2 0.20
0.2 0.20

0.15
0.15
0.1
0.1
0.10
0.10

0.0 0 0.05
0.0 0 0.05
-4 -3 -2 -1 0 1 2 3
-4 -3 -2 -1 0 1 2 3
0.00 0
0.00 0
0 2 4 6 8 10 12
0 2 4 6 8 10 12
Symmetrical
Right or Positively
12
12 Skewed
10
10

8
8

6
6

4
4

2
2 Left or Negatively
0
0
0
0 Skewed
0.70 0.75 0.80 0.85 0.90 0.95 1.00
0.70 0.75 0.80 0.85 0.90 0.95 1.00
Symmetrical Shape Frequency Histogram Showing
Relationship of Mean, Median and Mode
Exercises

• Work out the mean, median, mode, variance ,


sample standard deviation of the following
data
• 2,5,7,9,12,15,19,0,18,16,32,8,2
• 25,37,78,98,13,18,49,201
• 11,2,0,19,16,33,51,69,79,16
• 18,17,0,1,9,34,29,19,18,20
• 28,201,209,278,150,260,12,370,245,401,333
Exercises

• Find the 30th,40th,60th and 80th percentile of


the above data
• Find the 25, 50th, 75th quartiles of the above
data
• Find the interquartile range of the above data

• END
• THANK YOU

You might also like