Descriptive Statistics
Descriptive Statistics
Prepared by:
REBECCA K. CAJUCOM
rkcajucom 1
STATISTICS
rkcajucom 2
Uses of Statistics
Business Research
Market Research
Economics Research
Product Control – quality control, price control,
and volume of production
Life Insurance
Employee and Employer relationship
Etc….
rkcajucom 3
Branches of Statistics
STATISTICS
PROBLEM OF ESTIMATION
POINT ESTIMATION
INTERVAL ESTIMATION
TESTS OF HYPOTHESES
PARAMETRIC TESTS
NON-PARAMETRIC TESTS
rkcajucom 4
2 Divisions of Statistics
Descriptive Statistics – division wherein one
describes a given set of data by a single
measure called statistical description.
Inferential Statistics – division wherein one
estimates a population parameter based on
samples and one either accepts or rejects
specific assertions about populations using
samples.
a. Problems of Estimation
b. Tests of Hypotheses
rkcajucom 5
Population vs. Sample
Population - a set of data consisting of all
conceivably possible (or hypothetically
possible) observations of a given
phenomenon.
Sample - a set of data consisting of only a
part of the observations of a population.
rkcajucom 6
Terms needed in the discussion
Sample Population
rkcajucom 7
Symbols Used
Sample Statistics Population Parameter
Mean m (mu)
x
Standard Deviation s s
(small Greek letter sigma)
Variance s2 s2
Proportion
Probability p p (pi)
Percentage
Size n N
rkcajucom 8
Variables and its classifications
A variable is a characteristic that can vary
in value among subjects in a sample or
population.
It can be classified into:
a. Qualitative vs. Quantitative
b. Discrete vs. Continuous
c. Dependent vs. Independent
d. Experimental vs. Observational
rkcajucom 9
Qualitative vs. Quantitative
Qualitative variable – when scale for
measurement is a set of unordered
categories.
Quantitative variable- when the possible
values do differ in magnitude. Each possible
value is greater than or less than any other
possible value.
rkcajucom 10
Discrete vs. Continuous
Discrete variable –variable that can take on an
infinite number of values.
Example: Number of children, number of murders
Continuous variable- variable that can take an
infinite continuum of possible real number values.
Example: Measurements such as height, weight,
age, amount of time it takes to read a passage of a
book
rkcajucom 11
Dependent vs. Independent
Dependent (response) variable – outcome
variable about which comparisons are made. It
refers to the goal of investigating the degree to
which the response on that variable depends on the
group to which the subject belongs.
Independent (explanatory) variable- variable that
defines the groups.
rkcajucom 12
Experimental vs. Observational
Experimental data –data resulting from planned
experiment. The major purpose of many experiments
is to compare responses of subjects on some
outcome measure, under different conditions, called
treatments. To obtain these data, the researcher
needs a plan (called experimental design) for
assigning subjects to the different conditions being
compared
Observational data- data obtained from surveys.
The researcher measures subjects’ responses on the
variables of interest, but has no experimental control
over the subjects.
rkcajucom 13
Four(4) Basic Levels of Measurement:
1. Nominal data
2. Ordinal data
3. Interval data
4. Ratio data
rkcajucom 14
Levels of Measurement
Nominal data – when the measure assigned to
an item is a label used to identify the item.
Example:
a. Numbers of the baseball uniforms – labels
used to identify players .
b. Democrat, Republican or Independent –
labels used to identify political category.
rkcajucom 15
Levels of Measurement
Ordinal data – when the measures assigned
permit the items to be ordered with respect to
some criterion.
Example:
a. Size of automobile - compact, intermediate
or full size.
b. Class rank to each student based upon
grade point average.
rkcajucom 16
Levels of Measurement
Interval data – when there is a fixed numerical
unit of measurement and each measure
assigned is expressed as a quantity of those
units.
Example:
Measurement of temperature – unit of
measurement is degree.
rkcajucom 17
Levels of Measurement
Ratio data – when there is a fixed unit of
measure and the zero point is inherently defined
as the scale of measurement.
Example:
Physical distance such as length, width since a
value of zero denotes the absence of any
distance.
rkcajucom 18
Kinds of Distributions
Qualitative Distribution
Quantitative Distribution
a. Frequency Distribution
b. Probability Distribution
c. Sampling Distribution
rkcajucom 19
Three Ways of Collecting Data
One may ask people questions
One may observe the behavior of
persons, groups or outcomes
One may utilize existing records of data
other than one’s own research
rkcajucom 20
Three Ways of Presenting Data
Tabular Form
Textual (or Paragraph) Form
Graphical Form
line diagram or line curve
pictograph
pie chart
bar chart
statistical map
• dot map
• flow map
• cross hatched map
rkcajucom 21
Tabular Form
Table II
Percentage Distribution of the students included in the Sample
( According to Batch Year )
1993 – 94 20 30.303
1994 – 95 14 21.212
1995 – 96 10 15.152
1996 – 97 22 33.333
Total 66 100.000%
rkcajucom 22
Textual (or Paragraph) Form
Table II shows that of the 61.682% of the
sample size,
– 30.303% came from Batch year 1993 – 94
– 21.212% from Batch year 1994 – 95
– 15.152% from Batch year 1995 – 96
– 33.333% were from Batch year 1996 – 97.
rkcajucom 23
Graphical Form
FIGURE 2. LINE DIAGRAM
#
of
10
S 8
t
1993-94
u 6
d 1994-95
e
4 1995-96
n
t 1996-97
s 2
0
80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
Combined grades in ECO & ECO 5
rkcajucom 24
Graphical Form
FIGURE 3. BAR CHART
#
of 180 - 199
S 160 - 179
t
140 - 159 1996-97
u
d 120 - 139 1995-96
e
100 - 119 1994-95
n
t 80 - 99 1993-94
0 2 4 6 8 10
Combined Grades in ECO 4 &
rkcajucom 25
Graphical Form
FIGURE 3. BAR CHART
#
10
of
S 8 1993-94
t 6
1994-95
u
d 4 1995-96
e 2 1996-97
n 0
t 80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
s
Combined Grades in ECO 4 & ECO 5
rkcajucom 26
Graphical Form
FIGURE 3. BAR CHART
#
of 25
S
t 20
u 1996-97
d 15
1995-96
e
n 10 1994-95
t 1993-94
5
0
80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
Combined Grades in ECO 4 &
rkcajucom 27
Graphical Form
FIGURE 3. BAR CHART
#
of 10
S 8
t 6 1993-94
u 1994-95
4
d
e 2 1995-96
n 0 1996-97
t 80- 100- 120- 140- 160- 180-
s 99 119 139 159 179 199
CombinedGradesinECO4&ECO5
rkcajucom 28
Graphical Form
FIGURE 1. PIE CHART
33.333% 30.303%
1993-1994
1994-1995
1995-1996
1996-1997
15.152% 21.212%
rkcajucom 29
Graphical Form
FIGURE 1. PIE CHART
33.333% 30.303%
1993-1994
1994-1995
1995-1996
1996-1997
15.152% 21.212%
rkcajucom 30
Sample Design
A sample design is a definite plan,
determined completely before any data are
actually collected, for obtaining a sample
from a given population.
rkcajucom 31
Kinds of Sampling
Probability Sampling
-sampling wherein each element is given
an equal chance of being chosen.
Non- probability Sampling
- sampling wherein each element is not
given an equal chance of being chosen.
rkcajucom 32
Kinds of Probability Sampling
Simple random sampling
Systematic sampling with a random start
Stratified sampling
Multi- stage sampling
Cluster sampling
rkcajucom 33
Sampling Frame
A sampling frame is a list of all N items in the
population, so that we can assign each item one of
the numbers from 1 to N.
It makes it easy to draw random samples with the
aid of computers or random number table (a table
containing a sequence of numbers that is computer
generated according to a scheme whereby each
digit is equally likely to be any of the integers 0, 1,
2, .. ., 9).
rkcajucom 34
Sources of Variability among Samples
Sampling error of a statistic- the error that occurs when a statistic based
on sample estimates or predicts the value of a population parameter. For
samples of size 1000, the sampling error for estimating percentages is
usually no greater than 3% or 4%.
Other factors: (a) Undercoverage of the sampling frame. It may lack
representation from some groups in the population of interest.
(b) Problem of non-response of the subjects. Subjects supposed to be
included may refuse to participate, or it may not be possible to reach
them. © Response bias created by the interviewer or other factors
affecting the response. Respondents might lie if they think their response
to a question is socially unacceptable.
(d) Measurement of the variables have a large impact on the types of
results observed. (e) Order of the questions can influence the results
dramatically.
Missing data- Some subjects do not provide responses for some of
variables measured.
rkcajucom 35
Simple Random Sampling
The population is listed and the plan and sample
size is fixed. Each possible sample has the same
probability of being selected. A table of random
numbers is used and sampling is done without
replacement. Apart from this, the sampling units
are drawn independently.
Randomization, a mechanism for ensuring that the
sample representative is adequate for inferential
methods, is used.
rkcajucom 36
Systematic Sampling with a Random Start
rkcajucom 37
Stratified Sampling
The Stratified sampling is a procedure that
consists of stratifying ( or dividing ) the
population into a number of non-overlapping
sub-populations, or strata, and then taking a
sample from each stratum.
Sampling is called proportional if the
proportions of the sample chosen in the various
strata are the same as those existing in the
entire population
rkcajucom 38
Sample Sizes for Proportional Allocation
Formula:
Ni
ni n
N
for i = 1, 2, . . . , and k
where n = n1 + n2 + . . . + nk = the total
size of the sample.
When necessary, use integers closest to the values
given by this formula.
rkcajucom 39
Proportional Allocation Example
rkcajucom 40
Cluster Sampling
Cluster sampling technique is useful when a
complete listing of the population is not available.
The total population is divided into a number of
relatively small subdivisions, and some of these
subdivisions, or clusters, are randomly selected for
inclusion in the overall sample.
A cluster sample is one for which the sampling units
are the subjects in a random sample of the clusters.
If the clusters are geographic subdivisions, this kind
of sampling is called area sampling.
rkcajucom 41
Multistage Sampling
Multistage sampling methods use
combinations of various sampling
techniques. They are common in social
science research for they are simpler to
implement than simple random sampling
but provide a broader sampling of the
population than a single method, such as
cluster sampling, provides.
rkcajucom 42
**Cross Stratification
In a system-wide survey designed to determine the
attitude of its students, say, toward a new tuition
plan, a state college system with 17 colleges
might stratify its sample not only with respect to
colleges, but also with respect to students’ class
standing, sex and major. This stratification will
increase the precision ( reliability ) of
estimates and other generalizations, and is widely
used, particularly in opinion sampling and
market research.
rkcajucom 43
**Quota Sampling
Quota Sampling is a convenient, relatively expensive, and
sometimes necessary procedure, but as it is often executed,
the resulting samples do not have the essential features of
random samples.
For instance, in determining voters’ attitudes toward
increased medical coverage for elderly persons, an
interviewer working on a certain area might be told to
interview 6 male self-employed homeowners under 30 years
of age, 10 female wage-earners in the 45 -60 bracket who
live in apartments, 3 retired males over 60 who live in
trailers, and so on, with the actual selection of the
individuals being left to the interviewer’s discretion.
rkcajucom 44
FREQUENCY
DISTRIBUTION
Prepared by:
REBECCA K. CAJUCOM
rkcajucom 45
Frequency Distribution
rkcajucom 46
Frequency Distribution
Example 1:
Population Distribution of
College students in a certain
School
Year Level Number of Students
I 1000
II 500
III 300
IV 200
Total 2000
rkcajucom 47
Frequency Distribution
Example 2:
Classes Frequency
2 - 3 2
4 - 5 3
6 - 7 9
8 - 9 11
10 - 11 4
12 - 13 1
n = 30
rkcajucom 48
Terms defined
Classes or class intervals (CI) Class size ( c ) – the width of a
–the symbol defined by the end CI which is the absolute
numbers 2 -3, 4 -5, 6- 7, etc… difference of 2 successive lower
Class limits ( CL )-- the end class limits or 2 successive
numbers of a CI. upper class limits OR the
a. Lower class limits difference of the CB of a CI.
b. Upper class limits Class mark ( Xi ) -- the
Class boundaries ( CB )—the midpoint or the mid-value of a
true limits of a CI. CI which is obtained by getting
a. Lower class boundaries ½ the sum of the class limits of
b. Upper class boundaries a CI.
Class frequencies (fi where
the i subscript refers to the Range ( R )—the absolute
number of the class interval) - difference between the lowest
the number of observations value ( LV ) and the highest
falling under a CI. value ( HV ).
rkcajucom 49
Steps in the Frequency Distribution
1. Construction of the Frequency d. Enumerate the class limits by
distribution simply starting with lowest
Steps: value as the first lower class
a. Determine the range (R ). e. limit. Then add the class size
c from thereon. Do the same
b. Decide on the number of CI
thing with the upper class
desired. Choose any
limits. Be sure that the
number from 6 to 16.
highest value is included in
c. Compute the class size to the last CI.
be used. e. Then make a tally sheet of the
R number of observations under
c
No. of class intervals desired each CI to determine the
frequency falling under each
class
rkcajucom 50
Steps in the Frequency Distribution
rkcajucom 51
Stem and Leaf Plots
A stem and leaf plot is a graphical representation
which represents each observation by its leading
digit(s) (the stem) and its final digit (the leaf).Each
stem is a number to the left of the vertical bar and
a leaf is a number to the right of it.
It conveys much of the same information as a
histogram.
It is useful for quick portrayals of small sets of
data.
It can provide simple visual comparisons of two
relatively small samples on a quantitative variable.
rkcajucom 52
Stem and Leaf Plot (example)
Stem Leaf
1 6 7
2 0 3 9
3 0 1 4 4 4 6 8 9 9 9
4 4 6
5 0 2 3 8
6 0 3 4 6 8 9
7 5
8 0 3 4 6 9
9 0 8
10 2 2 3 4
11 3 3 4 4 6 9
12 7
13 1 3 5
rkcajucom 53
STATISTICAL DESCRIPTIONS
By:
REBECCA K. CAJUCOM
rkcajucom 54
Statistical Descriptions
rkcajucom 55
MEASURES
of
CENTRAL TENDENCY
rkcajucom 56
Measures of Central Tendency/
Central Location
Arithmetic Mean ( x
Median ( Md )
Mode ( Mo )
Harmonic Mean ( HM )
Geometric Mean ( GM )
rkcajucom 57
Arithmetic Mean( x )
The Arithmetic mean is the most popularly
known ‘‘ average .’’
It is unique and it always exists.
Its serious weakness lies in that it is strongly
influenced by extreme values called ‘‘outliers.”
( An observation is an outlier if it falls more than
1.5 IQR above the upper quartile or more than
1.5 IQR below the lower quartile.)
rkcajucom 58
Other Characteristics of the Mean
rkcajucom 59
Kinds of Arithmetic Mean
Simple Arithmetic Mean ( S.A.M. )
x
x
n
xw
wx
w
rkcajucom 60
S.A.M Example
Find the average grade Solution
of a student in
86 89 94 78 80
Statistics having the x
following grades in 5 5
quizzes: 86, 89, 94, 78 427
& 80 5
85.4
rkcajucom 61
W.A.M. Example
Find the average price per kilo of rice if a
mixture is formed out of the following
varieties of rice:
10k @ P25.70
18k @ P18.75
24k @ P14.65
Solution:
10(25.70 18(18.75 24(14.65 946.1
xw
10 18 24 52
P 18.194/ k
rkcajucom 62
Computation of the Mean
For Ungrouped Data: For Grouped Data:
x
x
x xo c
f d i i
n n
where where
xo = assumed mean which
x = individual item
is one of the class marks
n = the number of items
c = class size
fi = class frequency
di= class deviation =
x x
n = total frequencies i o
c
rkcajucom 63
How to determine the assumed Mean Xo
rkcajucom 64
Median ( Md)
The Median is a positional measure which
divides the distribution into two equal parts.
50% 50%
Interpretation:
Md
Lower 50% of the distribution lies below
the Md or Upper 50% of the distribution
lies above the Md .
rkcajucom 65
Characteristics of the Median
Like the mean, it always exists and is unique for
any set of data.
Unlike the mean, it is not affected by extreme
values.
It is utilized as an average when open-ended
intervals are contained in the distribution.
Since it is a positional measure, in its computation it
does not make use of the values of the individual
items.
Like the mean, it is meaningful only when the
distribution is fairly normal.
rkcajucom 66
Computation of the Median
rkcajucom 67
Computation of the Median
For grouped data:
Formula: n
Fm 1
M d Lm c 2
fm
where
Lm = lower boundary of the Md class
c = class size
n = total frequencies
Fm-1 = cumulative frequency preceding the
cumulative frequency of Md class
fm = class frequency of the Md class
rkcajucom 68
Steps in the computation of the Median
rkcajucom 71
Computation of the Mode
For grouped data:
Formula:
f mo f1
M o Lmo c
2 f mo f1 f 2
where
Lmo = lower boundary of the modal class
c = class size
fmo = class frequency of the modal class
f1 = class frequency preceding the class frequency of the
modal class
f2 = class frequency following the class frequency of the
modal class
rkcajucom 72
Harmonic Mean (HM)
The Harmonic Mean is the reciprocal of
the arithmetic mean of the reciprocals of a
given set of values.
Characteristics:
a. Like the mean, median, and the geometric mean, it is
rigidly defined.
b. It makes use of all the individual values in the set.
c. It is difficult to understand and computations involved
laborious and cumbersome operations.
rkcajucom 73
Harmonic Mean (HM)
rkcajucom 74
Geometric Mean (GM)
The Geometric Mean is the nth root of the
product of the n values.
Characteristics:
a. It is not affected by extremely large or small values,
therefore, it’s often used in the place of the arithmetic
mean.
b. It is used in averaging ratios , in estimating the average
rate of change and in computing the average for a series
of values in geometric progression.
c. It is always < Arithmetic mean, but if the numbers are all
the same it is = to the Arithmetic Mean.
rkcajucom 75
GM Example
rkcajucom 76
MEASURES of DISPERSION
rkcajucom 77
Measures of Variation/Dispersion
a. Absolute Measures
1. Range (R)
2. Interquartile Range (IR)
3. Average Deviation (AD)
4. Standard Deviation (s) and Variance (s2)
b. Relative Measures
1. Coefficient of Variation (CV)
2. Coefficient of Quartile Deviation (CQD)
rkcajucom 78
Absolute Measures of Dispersion
Range ( R )
Interquartile Range ( IR )
Average Deviation ( AD )
Standard Deviation ( s ) and Variance ( s2 )
rkcajucom 79
Absolute Measures of Dispersion
Interquartile Range
IR = Q3 - Q1
Semi-interquartile Range or
Quartile deviation:
QD = 1/2 IR
rkcajucom 81
Quantiles/ Fractiles
rkcajucom 82
The Quartiles
Figure: Interpretation of Q3:
Lower 75% of the
data lies below Q3 or
Upper 25% of the
data lies above Q3.
Interpretation of Q1:
25% 25% 25% 25%
Lower 25% of the
Q1 Q2 Q3 data lies below Q1 or
Upper 75% of the
data lies above Q 1.
rkcajucom 83
The Deciles
Figure:
10% or 1/10
D1 D2 D3 D4 D5 D6 D7 D8 D9
Interpretation of D4:
Lower 40% of the data lies below D4 or
Upper 60% of the data lies above D4
rkcajucom 84
The Percentiles
Figure:
1%or 1/100
Interpretation of P43:
Lower 43% of the data lies below P43 or
Upper 57% of the data lies above P 43
rkcajucom 85
Computation of the Quantiles
For ungrouped data:
1. Determine the position first using the
formula:
D3’s position = 3n/ 10 + 1/2
P45’s position = 45n/ 100 + 1/2
2. Then find the value of the quantile.
rkcajucom 86
Computation of the Quartiles:
Grouped data Formula:
n 3n
FQ1 1 FQ3 1
Q3 LQ3 c 4
Q1 LQ1 c 4
fQ1 f Q3
where
LQ1 = lower boundary of the Q1
class LQ3 = lower boundary of the Q3
c = class size
n = total frequencies class
fQ1 = frequency of the Q1 class fQ3 = frequency of the Q3 class
FQ1-1 = cumulative frequency FQ3-1 = cumulative frequency
of the class preceding the
cumulative frequency of of the class preceding the
the Q1 class. cumulative frequency of
the Q3 class.
rkcajucom 87
Computation of the Deciles :
rkcajucom 88
Computation of the Percentiles:
where
LP43 = lower boundary of the P43 class
FP43 - 1 = cumulative frequency of the class preceding
the cumulative frequency of the P43 class
f P43 = class frequency of the P43 class
n = total frequencies
c = class size
rkcajucom 89
Note the ff. :
Md = Q2 = D5 = P50 The quantiles are
Q1 = P25 , Q3 = P75 computed similarly as
D1 = P10 the Md is computed.
D2 = P20 # of quantiles = # of
equal parts - 1.
: :
: :
D9 = P90
rkcajucom 90
Four (4) Cases of Problems
Below what value lies the lower __% of
the distribution?
Above what value lies the upper ___% of
the distribution?
Between what values lies the middle ___%
of the distribution?
What is the middle ___% range?
rkcajucom 91
Chebyshev’s Theorem
For any set of data (population or sample)
and any constant k greater than 1, at least
1 - 1/k2 of the data must lie within k
standard deviations on either side of the
mean.
Figure:
P 11/k2
x ks x x ks
rkcajucom 92
CT Example 1
The mean amount of time for chemical
workers to vacate a chemical factory during
a fire drill is 7 minutes with a standard
deviation of 0.5 minutes. Using CT,
determine at least what percentage of the
time the chemical factory can be vacated
during a fire drill between 6 and 8 minutes.
rkcajucom 93
CT Example 2
For a certain library, the mean daily number
of books which are returned overdue is 45
and the standard deviation is 6 books. Use
CT to determine between what 2 numbers
must lie at least 15/16 of the daily number
of books which are returned overdue.
rkcajucom 94
Average Deviation (AD)
The Average Formula:
Deviation is
arithmetic mean of
the
AD
x M
i d
n
the absolute Note the ff.:
deviations of the a. AD is always less than
individual
observations from the the standard deviation.
median or from the b. AD is 4/5 as large as
arithmetic mean of all the standard deviation
the observations. when the distribution is
approximately bell
shaped-curve or normal
rkcajucom 95
Standard Deviation & Variance
Standard deviation (s) How to compute:
is the root mean
square. 1. For ungrouped data:
Variance ( s2 ) is the
x ( x
2 2
n
square of the standard s
n(n 1)
deviation. 2. For grouped data:
Interpretation of s:
On the average, the
s c
n fi di2 ( f d
i i
2
rkcajucom 96
Box Plots
The Box Plot is a graphical summary of both the
central tendency and variation of a set of data. It
portrays the range and the quartiles of the data, and
possibly some outliers.
The box contains the central 50% of the distribution,
from the lower quartile to the upper quartile. The Md is
marked by a line drawn within the box. The lines
extending from the box are called whiskers. These
extend to the maximum and minimum values unless
there are outliers.
The box plot is particularly useful for comparing
distributions side by side (or back-to-back
comparisons of two groups) and they identify outliers
separately.
rkcajucom 97
Box Plot Example
Box Plots for U.S. & Canadian Murder rates
M For US: The upper whisker
0
U and upper half of the central
! box are longer than lower
R 20 ! ones. This indicates that the
D
! right tail of the distribution,
E
15 which corresponds to the
R relatively large values, is
+ longer than the left tail. The
R 10 ! plot reflects the skewness to
a the right of the distribution.
T 5 +
These side-by-side plots
e reveals that the murder rates in
! !
! the US tend to be much higher
and have much greater
US Canada variability.
rkcajucom 98
Relative Measures of Dispersion
Coefficient of Variation (C.V. )
s
C.V . 100%
x
Coefficient of Quartile Deviation ( CQD )
Q3 Q1
C.Q.D.
Q3 Q1
rkcajucom 99
Other Measures
Measures of Kurtosis ( m̂ 4)
rkcajucom 100
Measure of Skewness
The Measure of Skewness is the asymmetry of the
distribution.
Types of skewness
1. Positively skewed (Sk > 0)
2. Negatively skewed (Sk < 0)
3(x M d
Formula:
Sk
s
Note : a. If Sk = 0 the distribution is normal.
b. Median is always in between mean and mode.
rkcajucom 101
Positively Skewed
Characteristics:
1. The distribution tapers more to the right
than to the left.
2. Mean is always greater than the Median.
3. The value of skewness is always greater
than the zero or always positive.
Figure:
^ ^
m3 > 0 (m3 = + )
Mo Md x
rkcajucom 102
Negatively skewed
Characteristics:
1. The distribution tapers more to the left than
to the right.
2. Mean is always less than the Median.
3. The value of skewness is always less than
zero or always negative.
Figure: ^m < 0 (m^ =-)
3 3
x
Md Mo
rkcajucom 103
Computation of Skewness (general case)
m3
xi x
( 3
m3
fi (xi x 3
n n
where
where n = total no. of fi = class frequency
observations n = total no. of
xi = individual observations
observation xi = individual
x = arithmetic observation
mean x = arithmetic mean
rkcajucom 104
Measure of Kurtosis
rkcajucom 105
Leptokurtic
Characteristics:
1. It is a pointed curve more peaked than the
normal curve.
2. The value of kurtosis is greater than 3.
Figure:
^m > 0
4
rkcajucom 106
Platykurtic
Characteristics:
1. It is a flat topped curve.
2. The value of kurtosis is less than 3.
Figure:
^m < 3
4
rkcajucom 107
Mesokurtic
Characteristics:
1. It is the normal curve.
2. The value of kurtosis is equal to 3.
Figure:
^
m4 = 3
rkcajucom 108
Computation of Kurtosis
For ungrouped data:
m4
i
( x x 4
where
n
xi = individual observation
For grouped data:
i i
fi = class frequency
f ( x x 4
m4 x = arithmetic mean
n
n = total frequencies
Prepared by:
REBECCA K. CAJUCOM
rkcajucom 110
Probabilities
In order to make intelligent decisions, two questions must
be answered:
1. What is possible? 2. What is probable?
a. Problems of listing a. To assign probabilities
down everything
possible 1. Classical or “priori”
1. Tabular method probability
2. Tree diagram 2. Relative frequency
approach
b. Problem of determining 3. Subjective
the number of possible probability
ways without listing
down everything- use b. To specify the odds at
counting techniques which it is fair to bet
1. Permutation that events will likely
2. Combination
to occur.
rkcajucom 111
Problem of listing down
everything possible (example 1)
Toss 2 coins. Give the sample space of the
given experiment using:
a. tabular form
b. tree diagram
rkcajucom 112
Problem of listing down
everything possible (example 2)
rkcajucom 113
Fundamental Principle
(Multiplication of Choices)
If an event can happen in any one of the
n1 ways, and, if when this has occurred,
another event can happen in any one of the
n2 ways, then the number of the ways, in
which both events can happen at the same
time in the specified order is:
N = n 1 x n2
Similarly, if there are more than 2 events:
N = n 1 x n2 x n 3 . . .
rkcajucom 114
The Counting Techniques
Permutations
Combinations
rkcajucom 115
Permutation
The Permutation of n distinct objects is the
arrangement of the objects with attention given to
the order of arrangement. The number of
permutation of n objects taken r at a time is
denoted by nPr and is defined by:
n!
n Pr
(n r )!
where n r.
Examples:
1. 6P4 , 6P6
2. In how many ways can the 3 letters (a, b, c) be
arranged if taken two at a time?
rkcajucom 116
Permutation
The number of permutation of n objects taken
all (or n = r) at a time is
nPr = nPn = n!
Example:
In how many ways can the 3 letters (a, b,
c) be arranged if taken all at a time?
rkcajucom 117
Permutation
The number of permutation of n objects consisting of
groups of which n1 are alike, n2 are alike, . . . is :
n!
P
n1!n2!n3!..... where n = n1 + n2 + n3 + ...
Examples:
1. In how many ways can the letters a, a, b, b, b, c, c, c be
arranged?
2. How many words can be formed out of the letters of the
word “STATISTICS”?
rkcajucom 118
Permutation
The number of permutation of n objects
arranged in a circle is :
P=(n-1)!
Example:
In how many ways can 5 people be seated
on a round table?
rkcajucom 119
Combination
The Combination of n objects is the
selection of the objects with no attention
given to the order of arrangement. The
number of combinations of n objects taken r
at a time is denoted by nCr or n and is
r
defined by:
n
n Cr
n!
where n r.
r (n r )!r!
Example:
In how many ways can the 3 letters (a, b, c)
be grouped if taken 2 at a time?
rkcajucom 120
Combination
nCr = nCn-r
Example
5C2 = 5C5 -2 = 5C3
nCn = 1
Example
5C5 = 1
rkcajucom 121
Kinds of Probabilities
rkcajucom 122
Classical Probability
The probability of an event A is the ratio of the
number of sample points corresponding to event A
over the total number of sample points in the
sample space. In symbols,
a
P( A)
ab
where
a = number of successes (the number of sample
points that correspond to event A)
b = number of failures (the number of sample
points that do not correspond to event A)
rkcajucom 123
Classical Probability(example)
Toss 2 coins. What’s the probability of
getting:
a. exactly two heads?
b. at least one head?
c. at most one head?
d. at least two heads?
e. exactly three heads?
rkcajucom 124
Relative Frequency Approach
The probability of an Example:
event ( happening or
Toss a coin 100 times.
outcome ) is the
proportion of the time What’s the probability
that events of the same of getting heads in the
kind will occur in the given experiment?
long run.
rkcajucom 125
The Law of Large Numbers ( in relation to
relative frequency approach )
rkcajucom 126
Subjective Probability
Subjective probability is sometimes called
personal probability. It reflects one’s belief with
regard to uncertainties that are involved, and they
apply especially when there is little or no direct
evidence, so that there really is no choice but to
consider collateral or indirect information ,
educated guesses, and perhaps intuition and other
subjective factors.
Example:
95% of the time it will rain today.
rkcajucom 127
Rules on Probabilities
Addition Rules (or = ) Multiplication Rules (and =
1. For mutually exclusive or implied and )
events (or disjoint sets) - events 1. For independent events (with
which cannot occur together ( or replacement )
the occurrence of one event
Formula:
automatically precludes the
occurrence of the other event/s. P( A B ) = P ( A )· P ( B )
Formula: 2. For dependent events (without
P (A U B) = P (A) + P (B) replacement)
2. For not mutually exclusive events Formula:
(or joint sets) P ( A B ) = P( A ) P ( B A)
Formula:
conditional probability
P(A U B) = P(A) + P(B) - P(A B)
of B given A
joint probability of A and B
rkcajucom 128
Addition Rules (example)
What’ s the probability of drawing an ace or
a king in a deck of cards?
What’s the probability of drawing an ace or
a heart in a deck of cards?
What’s the probability of drawing an ace,
king , queen, or a heart in a deck of cards?
rkcajucom 129
Multiplication Rules (example)
rkcajucom 130
Further Rules on Probability
1. P (A) 1 for any event A
2. P () = 0
3. P (A) + P (A’) = 1
or
P (A’) = 1 - P(A)
event A not happening
rkcajucom 131
Venn Diagram
rkcajucom 132
V.D. (in relation to operations on sets )
1. Union ( )
A. Joints sets B. Disjoint sets
A B A B
A B A B
if AB, AB = B
C. Subsets if BA, AB = A
A B
B A
rkcajucom 133
V.D. (in relation to operations on sets )
2. Intersection ( )
A. Joints sets B. Disjoint sets
A B A B
A B A B=
rkcajucom 134
V. D. (in relation to operations on sets)
3. Complementation ( A ’)
A’
rkcajucom 135
V. D. Example
One of the 240 members of a tennis club is
to be named Player of the year . If 145 of
the members are women, 85 use a two-
handed backhand, and 50 are women who
use a two-handed backhand, how many of
the outcomes correspond to the choice of a
man who does not use a two-handed
backhand?
rkcajucom 136
V. D. Example
Among 60 houses advertised for sale there are 8 with
swimming pools, three or more bedrooms, and wall-to-
wall carpeting; 5 with swimming pools, three or more
bedrooms, but no wall-to-wall carpeting; 3 with swimming
pools, wall-to-wall carpeting, but fewer than 3 bedrooms; 8
with swimming pools but neither wall-to-wall carpeting
nor 3 or more bedrooms; 24 with 3 or more bedrooms, but
neither a swimming pool nor wall-to-wall carpeting; 2 with
3 or more bedrooms, wall-to-wall carpeting, but no
swimming pool; 3 with wall-to-wall carpeting but neither a
swimming pool nor 3 or more bedrooms; and 7 without
any of these features. If one of these houses is to be chosen
for a television commercial, how many outcomes
correspond to the choice of: (a) house with a swimming
pool; (b) a house with wall-to-wall carpeting?
rkcajucom 137
2-Circle Venn Diagram Example
The probability that a person stopping at a service
station, will ask to have his oil checked is 0.28, the
probability that he will ask to have his tire
pressures checked is 0.11, and the probability that
he will ask to have both checked is 0.04. What are
the probabilities that a person stopping at this
service station, will ask to have:
a. his oil, his tire pressures, or both checked?
b. neither his oil nor his tire pressures checked?
rkcajucom 138
3-Circle Venn Diagram example
In a marketing study made by an SMC researcher, the following data
were obtained out of a sample of 100 male beer-drinkers:
42 drinks Gold Eagle, 68 drinks Pale Pilsen, 54 drinks Lagerlite, 22
drinks both Lagerlite and Gold Eagle, 25 drinks both Pale Pilsen and
Gold Eagle, 7 drinks Lagerlite and neither Gold Eagle nor Pale Pilsen, 10
drinks all the three kinds of beers, and 8 does not take any of the three
beers.
A. Construct a three circle Venn Diagram showing the number of
drinkers in each of the 3 sets.
B. Suppose, a drinker is selected at random, find the probability that:
1. he drinks Pale Pilsen only;
2. He takes Pale Pilsen and Lagerlite but not
Gold Eagle;
3. If he drinks Pale Pilsen he takes all 3 beers.
rkcajucom 139
Conditional Probabilities
Formula:
P( A B)
P( A B)
P( B)
if P (B) 0
or
P( B A)
P( B A)
P( A)
if P (A) 0
rkcajucom 140
Conditional Probabilities (example1)
A consumer research A. Construct a joint probability of
organization has studied the the given table .
service under warranty B. Find the probability of
choosing:
provided by the 200 tire
1. A N/B who provided good
dealers in a large city, and service under warranty;
that its findings are 2. A dealer who provides good
summarized in the ff. Table: service under warranty given
that he is a N/B;
Good Service Bad Service 3. An O/B dealer who gives bad
Dealer
under Warranty under Warranty
service; and
Name
Brand
84 36 4.a dealer giving bad service
given that he is O/B.
Off
Brand
38 42
rkcajucom 141
Conditional Probabilities (example 2)
Among a company’s replacement parts for a given
assembly, 20% are defective and the rest are good,
60% were bought from external sources and the rest
were made by the co. itself, and of those bought
from external sources 80% are good and the rest are
defective. What ‘s the probability that a replacement
part, randomly selected from this stock, is:
A. company made and good?
B. either defective or bought?
C. neither company made nor good?
D. bought, given that it is defective?
rkcajucom 142
Conditional Probabilities (example 3)
Of the many dwellings in a large district of a major
city 70 per cent are single- and the rest multiple-
family dwellings, 60% were built prior to 1939 and
the rest since then, and of the pre- 1939 dwellings 3/4
are single and the rest multiple units. If one dwelling
is selected at random from all the dwellings in this
district, what are the probabilities that this dwelling is:
A. a pre - 1939 single one;
B. either a multiple or a pre- 1939 one;
C. neither a multiple nor a pre- 1939 one; and
D. a multiple one given that it is not pre- 1939.
rkcajucom 143
Mathematical Expectation
A labor union wage negotiator feels that the odds
are 3 to 1 that the union members will get a raise of
80 cents in their hourly wage, the odds are 17 to 1
against their getting a raise of 40 cents in their
hourly wage, and the odds are 9 to 1 against their
getting no raise at all.
A. Find the corresponding probabilities that they
will get an P.80, .40., or no raise at all in their
hourly wage.
B. What is the expected raise in their hourly wage?
rkcajucom 144
PROBABILITY DISTRIBUTIONS
By:
REBECCA K. CAJUCOM
rkcajucom 145
Random Variables
--quantities which can take on different values
depending on chance.
Ex. 1. Number of tickets issued each day in a
movie house
2. Annual Production of rice in the
Philippines
3. Number of students passing a course
4. Number of mistakes a student makes in a
test
Note: In our study of random variables, we are usually
interested mainly in the probabilities with which they
take on the various values within their range.
rkcajucom 146
Kinds of random variables
Discrete random variables – values expressed as
integers or whole numbers only or observed values at
isolated points along a scale of values.
Example: Number of persons per household, units of
an item in an inventory.
Continuous random variables – variables which can
assume a value at any fractional point along a specified
interval of values.
Example: Weight of each shipment, average number of
persons per household in a large community.
rkcajucom 147
Probability Distributions
(or probability functions)
rkcajucom 148
Probability Distributions
(or probability functions)
Mean of a Probability Distribution
m = E(x) = x ·p(x)
rkcajucom 149
Kinds of Probability Distributions
Discrete probability distributions:
a. Binomial Distribution
b. Hypergeometric Distribution
c. Poisson Distribution
d. Negative Binomial Distribution
e. Geometric Distribution
f. Multinomial Distribution
rkcajucom 150
Probability Distribution (example 1)
rkcajucom 151
Probability Distribution (example 2)
Toss a die. (a.) Construct the probability distribution of
the different outcomes of the given experiment. (b.)
Compute the mean and standard deviation of the
probability distribution obtained in (a).
Solution: Number of points Probabilities
Rolled with a die of x
x P (x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
rkcajucom 152
Probability Distribution(example 3)
Toss two coins. a. Construct the
probability distribution of the number
of tails.
b. Also, compute the mean and
variance of the probability distribution
obtained in (a).
rkcajucom 153
Probability Distribution (example 4)
The following table gives the probabilities that a computer
will malfunction 0, 1, 2, 3, 4, 5, or 6 time on any given
day: No. of malfunctions Probability
0 0.15
1 0.22
2 0.31
3 0.18
4 0.09
5 0.04
6 0.01
calculate the mean and the standard deviation of this
probability distribution.
rkcajucom 154
DISCRETE PROBABILITY
DISTRIBUTIONS
rkcajucom 155
Binomial Distribution
The Binomial Characteristics:
Distribution is a a. There are 2 mutually
rkcajucom 156
Binomial Distribution
Formulas:
P(x) = nCx px qn-x
where: n n!
C
n x
x (n x)! x! = no. of combinations (2 < n < 15)
(Table of Binomial Coefficient
with p = .05, .1, .2, … .9, .95)
p = probability of success in each trial
n = no. of trials or observations
x = designated no. of successes
q = 1 – p = probability of failure
Mean of a BD: Variance of a BD:
m np s2= npq
rkcajucom 157
B D Example
The probability that a randomly chosen sales
prospect will make a purchase is 0.2. If a
salesman calls on six prospects, what’s the
probability that he will make:
a. exactly 4 sales
b. 4 or more sales
c. at most 2 sales
rkcajucom 158
BD as approximation to HD
If, among 16 delivery trucks, five(5) have
worn brakes and ten(10) are chosen at
random for inspection, what is the
probability that at least three(3) trucks with
worn brakes are chosen?
rkcajucom 159
B D Note
In actual practice the BD is often used to
approximate the HD. It is agreed that this
approximation is “safe” as long as (sample
size) is less than 5% of the N(population
size) that is, n < .05 N or n < .05 (a + b).
rkcajucom 160
Hypergeometric Distribution
When sampling is done without
replacement of each sampled item taken
from a finite population of items, the
Bernoulli process does not apply because
there is a systematic change in the
probability of success as items are removed
from the population. Therefore, the
Hypergeometric distribution is the
appropriate discrete probability distribution.
rkcajucom 161
Hypergeometric Distribution
Formula:
aCx bCn-x for x = 0, 1, 2,… n
P (x) = ------------
a+bC n
where
n = size of sample/or sum or total of
designated successes and failure
a = number of successes
b = number of failures
x = designated number of successes
rkcajucom 162
H D Example
Of nine employees, three have been with
the company five or more years. If 4
employees are chosen randomly from the
group of 9, what’s the probability that:
a. exactly 2
b. less than 3 employees will have 5 or
more years seniority?
rkcajucom 163
Poisson Distribution
The Poisson distribution can be used to determine the
probability of a designated number of successes when
the events occur in a continuum of time or space, such
a process is called a poisson process. It is similar to a
Bernoulli process except that the events occur over a
continuum rather than occurring on fixed trials or
observations.
Examples:
Number of complaints received by a telephone
operator, Number of accidents happening in an
intersection, Number of patients entering an ER, etc.
rkcajucom 164
Poisson Distribution
Formulas:
P( x)
xe
(npx e np
x! x!
where
n = number of trials
x = designated number of successes in a
poisson process
e = constant = 2.71828…
p = probability of success
= np= m = number of expected successes
(or average number of successes)
Mean of a PD:
( x) = = m = np
Variance of a PD:
Var ( x) = s =
rkcajucom 165
P D Example 1
An average of 5 calls for service/ hour are
received by a machine repair department.
What’s the probability that:
a. exactly three calls
b. fewer than 3 calls
c. for service will be received in a
randomly selected hour?
rkcajucom 166
P D Example 2
On the average, 12 people /hr. ask questions
of a decorating consultant in a fabric store.
The probability that three(3) or more people
approach the consultants with questions :
a. during a 10 minute interval?
b. during a 15 minute period?
c. within 2 hours?
rkcajucom 167
P D Note
When the number of observations or trials n in
a Bernoulli process is large, computations are
quite tedious. Further, tabled probabilities for
very small values of p are not generally
available. Fortunately, the PD is suitable as an
approximation of Binomial Probabilities when n is
large (n 30) and p or (1 - p) is very small.
(np < 5 or n (1 - p) < 5)
rkcajucom 168
P D Approximation to B D
rkcajucom 169
Negative Binomial Distribution
If repeated independent trials can result in
success with probability p and a failure with
probability q = 1 - p, then the probability
distribution of the random variable X is a Negative
Binomial distribution. The number of the trial on
which the kth success occurs is given by:
x - 1 k x-k
b* (x;k,p) = p q , for x = k, k + 1, k +2,. .
k-1
rkcajucom 170
NBD Example
rkcajucom 171
Geometric Distribution
If repeated independent trials can result in a
success with probability p and a failure with
probability q = 1 - p, then the probability
distribution of the random variable X is a
Geometric distribution. The number of the trial on
which the first success occurs, is given by :
g ( x; p ) = p q x - 1 , for x = 1, 2, 3, . . .
rkcajucom 172
GD Example
Find the probability that a person flipping
a balanced coin requires 4 tosses to get a
head.
rkcajucom 173
Multinomial Distribution
If a given trial results in k outcomes E1, E2,
. . , Ek, with probabilities p1, p2, …, pk,
then the probability distribution of the
random variables x1, x2, …, xk, is a
Multinomial distribution.
Formula:
n
P(x1, x2, …, xk, p1, p2, …, pk) = p1x1 p2x2 …, pkxk
x1, x2, …, xk
k k
with xi = n and pi = 1
i =1 i =1
rkcajucom 174
MD Example
rkcajucom 175
CONTINUOUS PROBABILITY
DISTRIBUTIONS
rkcajucom 176
Normal Distribution
The Normal distribution is a continuous probability
distribution which is both symmetric and mesokurtic
and is given by the function:
2
xm
1 / 2
1 s
y f ( x) e
2p s
where
m = population mean
s = population standard deviation
s2= population variance
e = natural logarithmic constant = 2.71828…
x = random variable
rkcajucom 177
Characteristics of the
Normal Distribution
1.It looks like a Mexican 5. Although its tails are prolonged
indefinitely on both sides, they
sombrero or it has a bell- will never touch the horizontal
shaped curve. axis.
2.Its mean, median, and mode 6.The probability (or area) under
the curve bounded by the
are all equal.
horizontal axis is always equal
3.It is symmetrical about its to one. In symbols,
center. P (- < z < + ) =1
4.Figure: 7.To convert the random variable
into its standard units, use
X - µ
Z = -----------------
s
- µ +
rkcajucom 178
ND Note
The mean m of a standard normal curve
is zero and its variance s2 is = to one.
Figure of the standard normal curve:
m= 0
and
s 2= 1
rkcajucom 179
Importance of the Normal Distribution in
Statistical Inference
rkcajucom 181
ND Example 2
Two students were informed that they
received standard scores of 0.80 and -
0.40, respectively. On a multiple choice
examination in English. If their grades
were 88 and 64, respectively, find the
mean and standard deviation of the
examination grades.
rkcajucom 182
AREAS under the Normal Curve
rkcajucom 183
Steps in finding areas under the
Normal Curve
1. Draw the correct figure.
2. Write the area or probability notation.
3. Solve / or find the area using the table of
the areas under the normal curve.
rkcajucom 184
ND Example 3
Find the areas under the normal curve in each of
the cases below:
a. between 0 and 1.23 h. to the right of - 0.78
b. between -0.68 and 0 i. to the right of 2.18
c. between -0.46 and 2.21 j. between –2.05 and
d. between 0.81 and 1.94 -1.44
e. to the left of -0.68
f. to the right of -2.05 and to the left of –1.44
g. to the left of 1.28
rkcajucom 185
ND Example 4
Determinethe value/s of z in each of the
cases where area refers to that under the
normal curve:
a. area between 0 and z is 0.3770
b. area to the left of z is 0.8621
c. area between –1.50 and z is 0.0217.
rkcajucom 186
ND Example 5
The grape fruits grown in a large orchard have a mean
weight of 19.3 ounces with a standard deviation of 2.2
ounces. Assuming that the distribution of the weight of these
grapefruits has roughly the shape of a normal distribution,
find:
a. what percentage of the grapefruits weigh:
1. less than 18.0 oz.
2. at least 20.0 oz.
3. Between 18.5 and 20.5 oz.
b. the weight below which lies the lightest 15 percent of
the grapefruits,
c. the weight above which lies the heaviest 25 percent of
the grapefruits.
rkcajucom 187
Normal Approximations
Rules on correction for Continuity:
In general, when use of the correction for continuity is appropriate 0.50
is either added or subtracted according to the form of the probability
value required. To convert discrete data into a continuous data:
a. Subtract 0.50 from xI when P ( x xi ) is required. => at least xi or
xi or more
b. Subtract 0.50 from xI when P ( x < xi ) is required. => less than xi
c. Add 0.50 to xI when P ( x > xi ) is required. => more than xi
d. Add 0.50 to xI when P ( x xi) is required. => at most xi or
xi or less
rkcajucom 188
Normal Approximation to BD
When the no. of observations or trials n is relatively
large, the normal probability distribution can be
used to approximate BD. This is acceptable
whenever n 30 and both np 5 and n (1 – p) 5.
Example:
For a large group of sales prospects, it is known
that 20% of those contacted personally by a
sales representative will make a purchase. If a
sales representative contacts 30 prospects,
determine the probability that 10 or more will
make a purchase.
rkcajucom 189
Normal Approximation to BD
rkcajucom 190
Normal Approximation to PD
When the mean of a PD is relatively large, the
normal probability distribution can be used to
approximate PD. A convenient rule of thumb is
that such approximation is acceptable when 10.
Example:
where
T= variable designated as time.
rkcajucom 193
ED Example 1
An average of 5 calls per hour is received
by a machine repair department beginning
at a random point in time. What’s the
probability that the first call for service
will arrive within a half hour?
rkcajucom 194
ED Example 2
Find the probability that a random
variable having an exponential distribution
with m = 10 will take on a value
a. between 0 and 4;
b. greater than 6;
c. between 8 and 12.
rkcajucom 195