ME 1 ES Descriptive Analysis Updated
ME 1 ES Descriptive Analysis Updated
What is Statistics?
(1) Collection and Organization of Data:
(i) Graphically: through the use of charts and graphs
(ii) Numerically: through the use of tables of data
(2) Analysis of Data:
Once the data is organized, we can compute various quantities (called statistic or parameters) associated with the data.
(3) Interpretation of Data:
Once we have performed the analysis, we can use the information to make decision about the aggregate / population
Statistical techniques are used in production, construction, marketing etc.
Branches of Statistics
(1) Descriptive Statistics: Descriptive statistics includes statistical procedures that we use to describe the population.
The data could be collected from either a sample or a population, but the results help us organize and describe data.
Descriptive statistics can only be used to describe the group that is being studying. Frequency distributions,
measures of central tendency (mean, median, and mode), and graphs like pie charts and bar charts that describe the
data are all examples of descriptive statistics.
(2) Inferential Statistics: Inferential statistics is concerned with making predictions or inferences about a population
from observations and analysis of a sample. Regression analysis, test of hypothesis, significance, analysis of variance
are the examples of inferential statistics.
Example (1)
Data have been obtained on the lives of batteries of a particular type in an industrial application. Following Table shows
the lives of 36 batteries recorded to the nearest tenth of a year.
4.1 5.1 4.4 2.3 4.3 4.8 1.8 3.3 3.0
5.2 4.0 5.6 4.5 3.9 3.7 5.1 5.8 4.3
2.8 4.1 6.1 4.9 3.2 4.6 4.2 4.4 4.7
4.9 3.5 3.7 5.6 5.0 5.5 6.3 4.8 5.1
(i) Construct a frequency distribution with class width 0,5
(ii) Construct a bar chart, frequency Histogram, frequency polygon and frequency curve
(iii) Construct cumulative Histogram, cumulative frequency polygon and cumulative frequency curve (Ogive)
(DeCoursey-EXCEL-Statistics-and-Probability-for-Engineering-Applications, Example 4.1)
Exercise 1.
The data shown here represent the number of miles per gallon (mpg) that 30 selected four-wheel-drive sports utility
vehicles obtained in city driving. Construct a frequency distribution, and analyze the distribution.
12 17 12 14 16 18 16 18 12 16 17 15 15 16 12
15 16 16 12 14 15 12 15 15 19 13 16 18 16 14
Source: Model Year Fuel Economy Guide. United States Environmental Protection Agency.
The complete ungrouped frequency distribution is
Exercise 2.
The bureau of labor statistics has sampled 30 communities nationwide and compiled prices in each community at the
beginning and end of August in order to find out approximately how the Consumer Price Index has changed during August. The
percentage changes in prices for the 30 communities are as follows:
0.7 0.4 0.3 0.2 0.1 0.1 0.3 0.7 0.0 0.4
0.1 0.5 0.2 0.3 1.0 0.3 0.0 0.2 0.5 0.1
0.5 0.3 0.1 0.5 0.4 0.0 0.2 0.3 0.5 0.4
Using the following four equal sized classes, starting from the minimum value as lower class limit..
th
(Ref. Ex. 2.19 “Statistics for Management” 7 by Levin Rubin)
Exercise 3.
In a group of 500 wage-earners, the weekly wages of 4% were under Rs.60 and those of 15% were under Rs.62.50. 15% of
the workers earned Rs.95 and over, and 5% of them got Rs.100 and over. The median and quartile wages were Rs.82.25, Rs. 72.75
th th
and Rs. 90.50; the 4 and 6 decile wages were Rs. 78.75 and Rs.85.25 respectively. Put the above information in the form of a
frequency distribution and estimate the mean wages of the 500 wage-earners there from.
Averages
The following average measures are also called the central measures
(1) Arithmetic Mean
(2) Geometric Mean
(3) Harmonic Mean
(4) Weighted Mean
Arithmetic Mean
The Arithmetic mean or simply the mean is the most familiar average. It is defined as
Exercise
Find the arithmetic mean, geometric mean and harmonic mean of the series
n
(i) 1,2,4,8,16,…, 2
n
(ii) 1,3,9,27,81,…, 3 . (Sher)
The Weighted Mean
The weighted mean enables us to calculate an average that takes into account the importance of each value to the overall total.
The weighted mean of a variable X is obtained by multiplying each value by its corresponding weight and dividing the sum of the
products by the sum of the weights.
The formula for calculating the weighted average is
(wxi) x1w1 + x2w2 + … + xnwn
xw = = w1 + w2 + … + wn
w
Exercises
(1) The following are the monthly salaries in rupees of 30 employees of a firm:
139 126 114 100 88 62 77 99 103 108
144 129 148 63 69 148 132 118 142 116
123 104 95 80 85 106 123 140 134 133
The firm gave bonuses of Rs. 10, 15, 20, 25, 30 and 35 for individuals in the respective salary groups; exceeding 60 but not
exceeding 75, exceeding 75 but not exceeding 90 and so on up to exceeding 135 but not exceeding 150. Find the average
bonus paid per employee.
Example (2)
Dave’s Giveaway Store advertises, “If our average prices are not equal or lower than everyone else’s, you get it free”. One
of Dave’s customers came into the store one day and threw on the counter bills of sale for six items she bought from a competitor
th
for an average price less than Dave’s. (“Statistics for Management”, 7 Ed, by Richard Levin and David Rubin Chap 3)
Exercise ( Bluman )
1. Find the weighted mean price of three models of automobiles sold. The number and price of each of each
model sold are shown in this list.
2. Using the weighted mean, find the average number of games of fat per ounce of meat or fish that a person
would consume over a 5 day period if he ate these:
Meat or Fish Fat (g/oz)
3 oz fried shrimp 3.33
3 oz veal cutlet (broiled) 3.00
2 oz roast beef (lean) 2.50
2.5 oz fried chicken drumstick 4.40
4 oz tuna (canned in oil) 1.75
Source:- The World Almanac and Book of Facts
3. A recent survey of a new diet cola reported the following percentages of people who liked the taste. Find the
weighted mean of the percentages.
4. The costs of three models of helicopters are shown below. Find the weighted mean of the costs of the models
5. An instructor grades exams, 20%; term paper, 30%; final exam, 50%. A student had grades of 83, 72, and 90,
respectively, for exams, term paper, and final exam. Find the student’s final average. Use the weighted mean.
6. Another instructor gives four 1-hour exams and one final exam, which counts as two 1-hour exams. Find
student’s grade if she received 62, 83, 97, and 90 on the 1-hour exams and 82 on the final exam.
Harmonic Mean
Harmonic mean is used to calculate the average value when the values are expressed as value/unit. Since the speed is
expressed as km/hour, harmonic mean is used for the calculation of average speed.
Harmonic Mean is defined as the reciprocal of the arithmetic mean of reciprocals of the observations.
Harmonic Mean for Ungrouped data;
Let x1, x2, ..., xn be the n observations then the harmonic mean is defined as
(1/xi) n
H.M = Reciprocal of ( )=
n (1/xi)
Example
A man travels from Lahore to Islamabad by a car and takes 4 hours to cover the whole distance. In the first hour he
travels at a speed of 50 km/hr, in the second hour his speed is 64 km/hr, in third hour his speed is 80 km/hr and in the fourth hour
he travels at the speed of 55 km/hr. Find the average speed of the motorist.
Harmonic Mean for grouped data;
Let x1, x2, ..., xn be the n observations with corresponding frequencies f1, f2, …, fn , then the harmonic mean is defined as
fi(1/xi) fi
H.M = Reciprocal of ( )=
fi fi(1/xi)
.
1. Descriptive Statistics (5)
Example
The following data is obtained from the survey. Compute H.M
Speed of the Car 130 135 140 145 150
No. of Cars 2 4 8 9 2
(i) motion in case of a person who rides the first mile at the rate of 10 miles an hour, the next mile at the rate
of 8 miles per hour and the third mile at the rate of 6 miles per hour.
(ii) Increase in the population, which in the first decade has increased 20%, in the next 25% and in the third
44%. (Problem 4-108 “Elementary Statistics” by Bluman, chapter 3, page 122 )
(3) A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning home.
Find the average miles per hour.
(4) A bus driver drives 50 miles to West Chester at 40 miles per hour and returned driving 25 miles per hour. Find the average
miles per hour.
(5) A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average cost of
1 pound of nails.
Measures of Location (median. Mode, quartiles, deciles and percentiles)
Median
If n is odd, then the median is the middle value.
If n is even, the median is the average of the two middle values.
Mode
The mode is the value that is repeated most often in the data set.
e.g. The ages in years of the cars worked on by the Village Auto Haus last week
5 6 3 6 11 7 9 10 2 4 10 6 2 `1 5. Mode in this case is 6
Example (3)
A computing student received the following grades in subjects of his first semester 2007:
Y = [6; 7; 6; 8; 5; 7; 6; 9; 10; 6] Mode = 6 called unimodel
1,2,3,4,5,6,6,7,7 mode value is 6 and 7 called Bimodal
2,3,4,2,3,4,7,8 2,3,4, are the modes called Multimodal
2,3,4,5,6,7,8 no mode
2,2,3,3,4,4,5,5 no mode
Exercise 5.
A semi-commercial test plant produced the following daily outputs in tonnes/day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 2.8 1.1 1.7
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
Find out the mode?
(ref . McCoursey Chap 4 )
Other Measures of Location
we will discuss here are quartiles, deciles and percentiles
1. Descriptive Statistics (6)
Quartiles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is the same as the 25th percentile;
Q2 is the same as the 50th percentile, or the median; Q3 corresponds to the 75th percentile, as shown:
n n n n
For Q1 we see that is an integer or a non-integer 1 , 2 , 3
4 4 4 4
n n th
If is not an integer, then Q1 = [ ] + 1 item in the data let say [7.25] = 7
4 4
n n n
If 4 is an integer, then Q1 = average of {4 th and(4 +1)th items}
2n 3n
Similarly for Q2 and Q3 we will check whether and is an integer or non-integer respectively, then we find the value of Q2 and
4 4
Q3 same as we did in the case of Q1.
Deciles
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.
7n
For D7 we see that is an integer or a non-integer
10
7n 7n th
If is not an integer, then D7 = [ ] + 1 item in the data
10 10
7n 7n 7n
If is an integer, then D7 = average of { th and( +1)th items}
10 10 10
2n 3n
Similarly for D2 and D3 we will check whether and is an integer or non-integer respectively, then we find the value of D2 and
10 10
D3 same as we did in the case of D7.
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
Percentiles are symbolized by
P1, P2, P3, . . . , P99
and divide the distribution into 100 groups.
27n
For instance, For P27 we see that is an integer or a non-integer
100
27n 27n th
If is not an integer, then P27 = [
100 100 ] + 1 item in the data
27n 27n 27n
If is an integer, then P27 = average of { th and( +1)th items}
100 100 100
25n 30n
Similarly for P25 and P30 we will check whether and is an integer or non-integer respectively, then we find the
100 100
value of P25 and P30 same as we did in the case of P27.
Note that
1. Descriptive Statistics (7)
Dispersion Measures
Properties of Variance
(1)
Var .(a) = 0
2
( 2 ) Var (X + a) = Var (X) =
2
( 3 ) Var (aX) = a Var (X)
( 4 ) Var (X Y)= Var (X) + Var (Y)
2 2
( 5 ) Let x¯1 and s1 be mean and variance of n1 observations and x¯2 and s2 be mean and variance of n2 observations
(n1 and n2 are sufficiently large) then if the variance of n1 + n2 observations prove that
2 2 2 2
2 n 1 s1 + n2 s2 n 1 ( x
¯ 1 - x̄ ) n 2 ( x
¯ 2 - x̄ )
S = + +
n1 +n2 n1 +n2 n1 +n2
Example (4)
The breaking strength of test pieces of a certain alloy is given as under
X: 95 103 97 130 96 73 78 95 89 68
82 79 69 67 83 108 94 87 93 117
Calculate the average breaking strength of the alloy and the standard deviation.
2 2
X X 2 X X 2
|X - X| (X - X) |X - X| (X - X)
67 4489 23.15 535.92 94 8836 3.85 14.823
68 4624 22.15 490.62 95 9025 4.85 23.522
69 4761 21.15 447.32 95 9025 4.85 23.522
73 5329 17.15 294.12 96 9216 5.85 34.222
78 6084 12.15 147.62 97 9409 6.85 46.922
79 6241 11.15 124.32 103 10609 12.85 165.12
82 6724 8.15 66.423 108 11664 17.85 318.62
83 6889 7.15 51.123 117 13689 26.85 720.92
87 7569 3.15 9.9225 130 16900 39.85 1588
89 7921 1.15 1.3225 1803 167653 253 5112.6
93 8649 2.85 8.1225
X 1803 2 2
Mean = = = 90.15 (remember ( X) X )
n 20
2
X X 2 167653 1803 2
= -( ) = -( ) = 8382.65 - 8127.0225 = 255.6275 = 15.99
n n 20 20
xi x 253 (xi x )
Mean Deviation = M.D = = =? remember =0
n 20 n
Because the sum of the deviations from mean value is always zero
2
2 (xi x) 5112
Variance = = = =?
n 20
Example (5)
The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the
commissions is $5225, and the standard deviation is $773. Compare the variations of the two.
Solution
The coefficients of variation are
5
C.V = 100 = 87 100 = 5.7 % (sales)
x
773
C.V = 100 = 5225 100 = 14.8 % (commissions)
x
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales
1. Descriptive Statistics (9)
Example (6)
Heights of 18-year-old males have a bell-shaped distribution with mean 69.6 inches and standard deviation 1.4 inches.
(a) About what proportion of all such men are between 68.2 and 71 inches tall?
(b) What interval centered on the mean should contain about 95% of all such men?
Solution (a)
x − ks = 68.2 69.6 – k (1.4) = 68.2 k = 1
x + ks = 71 69.6 + k (1.4) = 71 k = 1
by empirical rule, hence 1-S.D interval about the mean x , it contains approx. 68% of the data
Solution (b)
By the Empirical Rule the shortest such interval containing 95% of the data is x ± 2s. So the interval from
x − 2s = 69.6 − 2(1.4) = 66.8
x + 2s = 69.6 + 2(1.4) = 72.4
So this interval, (66.8, 72.4) contains approximately 95% of the data values.
1. Descriptive Statistics (10)
Example (7)
A sample of size n = 50 has mean x = 28 and standard deviation s = 3. Without knowing anything else about the sample,
what can be said about the number of observations that lie in the interval (22, 34)? What can be said about the number of
observations that lie outside that interval?
This means we are given .
x - ks = 22 k = 2
x + ks =34 k = 2
2
Then at least (1−1/k )% = 1 – ¼) % = ¾% = 75% of 50 values = 37.5 = 38 values are contained in the given interval.
Therefore 12 observations fall outside the interval.
Exercise
The mean of a distribution is 20 and the standard deviation is 2. Use Chebyshev’s theorem.
a. At least what percentage of the values will fall between 10 and 30?
b. At least what percentage of the values will fall between 12 and 28? (Bluman ch. 3)
Exercise
The Energy Information Administration reported that the mean retail price per gallon of regular grade gasoline was $2.30
(Energy Information Administration, February 27, 2006). Suppose that the standard deviation was $.10 and that the retail price per
gallon has a bell shaped distribution.
a. What percentage of regular grade gasoline sold between $2.20 and $2.40 per gallon?
b. What percentage of regular grade gasoline sold between $2.20 and $2.50 per gallon?
c. What percentage of regular grade gasoline sold for more than $2.50 per gallon?
(prob. 3.30, Sweeny Chap 3 )
If the frequency curve of a distribution has a longer tail to the right of the central maximum
than to the left, the distribution is said to be skewed to the right or to have positive
skewness. In positive skewed distribution, the mean exceeds the mode.
If the frequency curve of a distribution has a longer tail to the left of the central maximum
than to the right, the distribution is said to be skewed to the left or to have negative
skewness. In negative skewed distribution, the mean is smaller than the mode.
Kurtosis
Kurtosis is the degree of peakness of a distribution. A distribution having relatively high peak is called Lepto-Kurtic
whereas a distribution having flat topped is called Platy Kurtic. A frequency curve which is neither very high peaked nor vary flat
topped is called Meso-kurtic or a Normal curve having a Normal distribution.
Exercise 14.
2 2
Let x¯1 and s1 be mean and variance of n1 observations and x¯2 and s2 be mean and variance of n2 observations
(n1 and n2 are sufficiently large) then if the variance of n1 + n2 observations prove that
2 2 2
2 n1 s1 + n2 s2 n1n2( x¯1 - x¯2 )
S = + 2
n1 +n2 (n1 +n2)
solution
Group I : x11 , x12 , x13 , x14 , … , x1n1 ; with x¯1
Group II : x21 , x22 , x23 , x24 , … , x2n2 , with x¯2
and x̄ be the combined mean of both data sets
Let xij = jth observation of the ith subgroup. i = 1, 2 and j = 1,2, ….. , ni
2 1 2 ni 2 n1 x¯1 + n2 x¯2
S = (x - x̄ ) where x̄ =
n1+n2 ij n1+n2
i =1 j =1
1 2 ni 2
= [(x - ¯x )+( ¯xi - x̄ )]
n1+n2 ij i
i =1 j =1
1 2 ni 2 2
= [(x ¯x ) +( ¯xi - x̄ ) + 2(xij - ¯xi ) ( ¯xi -x̄ ) ]
n1+n2 ij i
i =1 j =1
1 2 ni 2 1 2 ni 2 1 2 ni
=
n1+n2 [(xij - ¯xi ) ] +
n1+n2 [( ¯xi -x̄ ) ] +2
n1+n2
[(xij - ¯xi ) ( ¯xi -x̄ ) ]
i=1 j=1 i=1 j=1 i=1 j=1
ni
since [(xij - ¯xi )] = 0 therefore
j=1
1 2 ni 2 1 2 ni 2
=n +n [(xij - ¯xi ) ] +n +n [( ¯xi -x̄ ) ]
1 2 1 2
i=1 j=1 i=1 j=1
2 2 2 2
n1 s1 + n2 s2 n1( x¯1 -x̄ ) + n2( x¯2 -x̄ )
= + 2 ------------------------(A)
n1 +n2 (n1 +n2)
n1 x¯1+ n2 x¯2
Where x¯1 -x̄ = x¯1 -
n1+n2
n1 x¯1+ n2 x¯1- n1 x¯1- n2 x¯2 n2( x¯1- x¯2)
= x¯1 - =
n1+n2 n1+n2
Similarly
n1 x¯1+ n2 x¯2
x¯2 -x̄ = x¯2 -
n1+n2
n1 x¯2+ n2 x¯1- n1 x¯1- n2 x¯2 -n1( x¯1- x¯2)
= x¯2 - =
n1+n2 n1+n2
From (A)
2 2 2 2
2 n1 s1 + n2 s2 1 n2( x¯1- x¯2) -n1( x¯1- x¯2)
(n1 +n2) n1 n +n
S = +
n1 +n2 +n2
1 2 n1+n2
2 2 2 2
n1 s1 + n2 s2 1 n1 n2 2 n1 n2 2
= + 2(x
¯ - x¯ ) + 2(x
¯ - x¯ )
n1 +n2 (n1 +n2) (n1 +n2) 1 2 (n1 +n2) 1 2
2 2
n1 s1 + n2 s2 1 n1n2(n1 + n2) 2
= + 2 ( x¯1- x¯2)
n1 +n2 (n1 +n2) (n1 +n2)
2 2 2
n1 s1 + n2 s2 n1n2( x¯1 - x¯2 )
= + 2
n1 +n2 (n1 +n2)