Applied Statistics
Applied Statistics
1. Introduction
For a layman, ‘Statistics’ means numerical information expressed in quantitative terms. This
information may relate to objects, subjects, activities, phenomena, or regions of space. As a matter of
fact, data have no limits as to their reference, coverage, and scope.
1
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
standards of accuracy, collected in a systematic manner for a pre-determined purpose, and placed in
relation to each other.
3. Types of Data
Statistical data are the basic raw material of statistics. Data may relate to an activity of our
interest, a phenomenon, or a problem situation under study. They derive as a result of the process of
measuring, counting and/or observing. Statistical data, therefore, refer to those aspects of a problem
situation that can be measured, quantified, counted, or classified. In statistics, data are classified into
two broad categories: quantitative data and qualitative data. This classification is based on the kind
of characteristics that are measured.
Quantitative data are those that can be quantified in definite units of measurement. These refer to
characteristics whose successive measurements yield quantifiable observations. Depending on the
nature of the variable observed for measurement, quantitative data can be further categorized as
continuous and discrete data.
Obviously, a variable may be a continuous variable or a discrete variable.
Continuous data represent the numerical values of a continuous variable. A continuous variable
is the one that can assume any value between any two points on a line segment, thus representing
an interval of values. The values are quite precise and close to each other, yet distinguishably
different. All characteristics such as weight, length, height, thickness, velocity, temperature,
tensile strength, etc.,
Discrete data are the values assumed by a discrete variable. A discrete variable is the one whose
outcomes are measured in fixed numbers. Such data are essentially count data. These are derived
from a process of counting, such as the number of items possessing or not possessing a certain
characteristic. The number of customers visiting a departmental store every day, the incoming
flights at an airport, and the defective items in a consignment received for sale, are all examples
of discrete data.
2
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
The descriptive statistics deals with collecting, summarizing, and simplifying data, which are
otherwise quite unwieldy and voluminous. It seeks to achieve this in a manner that meaningful
conclusions can be readily drawn from the data. Descriptive statistics may thus be seen as
comprising methods of bringing out and highlighting the latent characteristics present in a set of
numerical data. It not only facilitates an understanding of the data and systematic reporting thereof in
a manner; and also makes them amenable to further discussion, analysis, and interpretations.
A well thought-out and sharp data classification facilitates easy description of the data by means of a
variety of summary measures. These include measures of central tendency, dispersion, skewness,
and kurtosis, which constitute the essential scope of descriptive statistics.
3
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
Inferential statistics, goes beyond describing a given problem situation by means of collecting,
summarizing, and meaningfully presenting the related data. Instead, it consists of methods that are
used for drawing inferences, or making broad generalizations, about a totality of observations on the
basis of knowledge about a part of that totality. Thus, obtaining a particular value from the sample
information and using it for drawing an inference about the entire population underlies the subject
matter of inferential statistics.
Notes:
(1) The totality of observations about which an inference may be drawn, or a generalization made,
is called a population.
(2) The part of totality, which is observed for data collection and analysis to gain knowledge about
the population, is called a sample.
Inferential statistics helps to evaluate the risks involved in reaching inferences or generalizations
about an unknown population on the basis of sample information. for example, an inspection of a
sample of five battery cells drawn from a given lot may reveal that all the five cells are in perfectly
good condition. This information may be used to conclude that the entire lot is good enough to buy
or not.
(i) The planning of operations: This may relate to either special projects or to the recurring
(ii) The setting up of standards: This may relate to the size of employment, volume of sales,
fixation of quality norms for the manufactured product, norms for the daily output, and so
forth.
4
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
(iii) The function of control: This involves comparison of actual production achieved against the
norm or target set earlier. In case the production has fallen short of the target, it gives remedial
Statistical Measures
The description of statistical data may be quite elaborate or quite brief depending on two factors: the
nature of data and the purpose for which the same data have been collected.
1) Measures Of Central Tendency:
The measures of central tendency enable us to compare two or more distributions pertaining to
the same time period or within the same distribution over time. For example, the average
consumption of tea in two different territories for the same period or in a territory for two years, say,
2003 and 2004, can be attempted by means of an average.
MEAN
Adding all the observations and dividing the sum by the number of observations results the
mean. Symbolically, the mean is
X=
∑ X = X 1 + X 2 +…+ X n
n n
It may be noted that the Greek letter μ is used to denote the mean of the population and n to denote
the total number of observations in a population.
Example 1: Calculate the following average workers' wages:
15, 18, 28, 39, 56, 66
Solution:
X=
∑ X = 15+ 18+28+39+56+ 6 = 222 =37
n 6 6
Example 2: Calculate the mean of the following items:
5
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
X=
∑ X = 29+21+18+27+25+30+ 16 = 166 =23.7142
n 7 7
MEDIAN
Median is defined as the value of the middle item (or the mean of the values of the two
middle items) when the data are arranged in an ascending or descending order of magnitude. if the
n values are arranged in ascending or descending order of magnitude,
() ( )
th th
n n
if n is even. the median is the mean of the two middle values and +1 .
2 2
Suppose we have the following series:
15, 19, 21,7, 10, 33, 25, 18 ,5
We have to first arrange it in either ascending or descending order. These figures are arranged in an
ascending order as follows:
5, 7, 10, 15, 18, 19, 21, 25, 33
Now as (n) is odd number, to find out the value of the middle item, we use the formula
n+1 th
2 ( )
then
15 18 28 39 56 66
6
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
39+ 28
median= =33.5
2
Example 4: Calculate the median of the following items:
16 ,30 ,25 ,27 ,18 ,21 ,29
Solution:
2 2
16 18 25 27 30
1 9
median=25
MODE
The mode is another measure of central tendency. It is the value at the point around which the
items are most heavily concentrated. (The most frequent values)
Example 5: Calculate the mode of the following items:
29, 25, 18, 27, 25, 30, 16
mode=25
15, 18, 18, 39, 56, 15
mode={15,18 }
15, 18, 18, 39, 39, 15
No mode.
The mode is the only measure that takes more than one value and it is possible that there is no mode
2) Measures of Dispersion:
It may be noted that these measures do not indicate the extent of dispersion or variability in a
distribution. The dispersion or variability provides us one more step in increasing our understanding
of the pattern of the data. Further, a high degree of uniformity (i.e. low degree of dispersion) is a
desirable quality.
Averages are not sufficient to give a complete description of the data, as they are not suitable for
measuring how different or homogeneous the data are with each other. For example, if we look at
the following two sets of data:
A 30 40 55 60 65 80 90
7
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
B 55 57 59 60 61 63 65
We found that the mean and the median for each are 60. However, the differences between them
are large. Values in group B are close to each other and are not far from the mean or median, unlike
in case A where we find their components more dispersed. Accordingly, when accurately describing
the dataset, we are not satisfied with the average scale, but in addition, a dispersion scale should be
calculated. There are commonly used measures: range - variance - standard deviation.
RANGE
The simplest measure of dispersion is the range, which is the difference between the maximum
value and the minimum value of data.
A B
Range = max -min ¿ 90−30=60 ¿ 65−55=10
it is clear that group B is less dispersed than group A. In other words, the elements of group B
are more homogeneous with each other than the elements of group A.
VARIANCE
variance is the mean squared difference between all elements of a group and the mean of this
group.
2
S=
∑ ( X −X )2
n−1
A B
Xi X −X ( X −X )
2
Xi X −X (X −X )
2
30 30- 900 55 5- 25
40 20- 400 57 3- 9
55 5- 25 59 1- 1
60 0 0 60 0 0
65 5 25 61 1 1
80 20 400 63 3 9
90 30 900 65 5 25
∑ 420 0 2650 42 0 70
8
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
2
S=
∑ ( X− X)
2
2
SA=
2650
=441.6666
2
S B=
70
=11.6666
n−1 7−1 7−1
STANDARD DEVIATION
standard deviation is the mean of difference between all elements of a group and the mean of this
group.
S= √ S
2
9
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
Solved Problems
1) Calculate the mean, median and mode of the following data:
i.18, 10, 15, 13, 17, 15, 12, 15, 18, 16, 11
Solution: Order data: 10, 11, 12, 13, 15, 15, 15, 16, 17, 18, 18 ¿ 15 , Median=15 and
10+ 11+12+13+ 15× 3+16+17+18 × 2
Mean= =14.55
11
ii. Find the Average, Median, Mode, Range, Variance, and Standard Deviation for the following
data: 4, 7, 9, 12, 15, 20.
Solution: Order data: 4, 7, 9, 12, 15, 20 x x−X ( x−X )2
Mode=No Mode, 4 −43/ 6 1849/36
7 −25 /6 625/36
9+12 9 −13/6 169/36
Median= =10.5 Range=20−4=16 ,
2 12 5/6 25/36
4+7 +9+12+15+20 67 15 23/6 529/36
X= = =11.16 ,
6 6 20 53/6 2809/36
Σ 0 166.83
166.83
v( x)= =33.266and SD=√ 33.266=5.77
5
1182 94
X A= =168.86 X B= =13.43
7 7
23978.86 115.71
v A ( x )= =3996.476 vB ( x) = =19.285
6 6
S D A =63.22 S D B =4.39
10
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
Therefore, Group B is more Homogenous than Group A because, it has smaller SD.
Self-Assessment Questions
1. Calculate the Mean, Median and Mode of the following data:
i. 2, 0, 5, 4, 6, 4, 2, 0, 4, 8, 0, 6.
ii. 51, 52, 47, 50, 48, 41, 59, 56, 89.
iii. 1, 2, 4, 5, 1, 2, 5, 7, 0, -1.
iv. 4, 8, 6, 2, 1, 0, -1, 7.
v. 740, 712, 742, 7, 712, 751, 714, 742
vi. 1, 2, 5, 4, 2, 4, 1, 1, 5, 4, 2, 5.
vii. 2, 3, 5, 6, 3, 4, 8, 2, 9, 3, 5, 5, 5, 2, 7.
viii. -1, -2, -3, -9, 0, 4, 9, 7, 5, 6, 4, -1, 0, 2.
ix. -1, 0, 2, -1, 0, 0, 3, 8, 0, 5, -1.
x. 4, 5, 8, 8, 7, 4, 5, 7, 2.
2. Find the Average, Median, Mode, Range, Variance, and Standard Deviation for the
following data:
i. 1, 1, 2, 3, 4, 1, 6, 3, 2, 4, 1, 2
ii. 3, 1, 10, 10, 42, 1, 3, 2, 2, 1, 3, 5, 2, 1.
iii. 3, 1, 2, 3, 4, 1, 2, 3, 5, 7, 6, 2
iv. -1, -4, -3, 1, -4, -4
3. In the following data, which group is more homogenous? Why?
Group A 22 25 29 28 27 22 20
Group B 7 2 8 9 11 15 19
Group A 24 31 35 39 41 24 36
Group B 19 14 13 16 15 15 19
11
Applied Dr. Mahmoud Abd El-
DESCRIPTIVE STATISTICS
Statistics Raouf
12