UNIT I & II QA
UNIT I & II QA
a) Calculate the sample mean using the coding method with 0 assigned to fourth class.
b) Repeat part (a) with 0 assigned to the sixth class
c) Explain why your answers in parts (a) and (b) are the same ?
PRACTICE SUMS
2) The frequency distribution below represents the time in seconds needed to serve a sample
of customers by cashiers at BullsEye Discount Store in December 1996.
TIME (in seconds) FREQUENCY
20-29 6
30-39 16
40-49 21
50-59 29
60-69 25
70-79 22
80-89 11
90-99 7
100-109 4
110-119 0
120-129 2
PRACTICE SUMS
3) Following information pertains to the daily income of 150 families,
Calculate the arithmetic mean.
Income (in Rs) Number of families
More than 75 150
More than 85 140
More than 95 115
More than 105 95
More than 115 70
More than 125 60
More than 135 40
More than 145 25
PRACTICE SUMS
4) The size of land holdings of 380 families in a village is given below.
Find the median size of land holdings.
Size of Land Holdings (in acres) Number of families
Less than 100 40
100–200 89
200 – 300 148
300–400 64
400 and above. 39
PRACTICE SUMS
5) The following table gives production yield in kg. per hectare of rice
of 200 farms in a village. Calculate the mean, median and mode values.
Production yield (kg. per hectare) Number of farms
40-45 13
45-50 18
50-55 14
55-60 30
60-65 36
65-70 38
70-75 26
75-80 20
80-85 5
MEASURES OF VARIABILITY
• Variability in statistics means deviation of scores in a group or series, from their mean
scores.
• It actually refers to the spread of scores in the group in relation to the mean.
• It is also known as dispersion.
• For instance, in a group of 10 participants who have scored differently on a
mathematics test, each individual varies from the other in terms of the marks that
he/she has scored.
• These variations can be measured with the help of measure of variability, that measure
the dispersion of different values for the average value or average score.
• Variability or dispersion also means the scatter of the values in a group.
• High variability in the distribution means that scores are widely spread and are not
homogeneous.
• Low variability means that the scores are similar and homogeneous and are
CENTRAL TENDENCY MEASURE OF VARIABILITY OR DISPERSION
Central Tendency is the numbers that are used to Measure of Distribution is used to quantify the
quantify the properties of the data set. variability of the data of dispersion.
Measure of Central tendency include, Various parameters included for the measure of
•Mean dispersion are,
•Median •Range
•Mode •Variance
•Standard Deviation
•Mean Deviation
•Quartile Deviation
FUNCTIONS OF VARIABILITY
The major functions of dispersion or variability are as follows:
• It is used for calculating other statistics such as analysis of variance, degree
of correlation, regression etc.
• It is also used for comparing the variability in the data obtained as in the
case of Socio-Economic Status, income, education etc.
• To find out if the average or the mean/median/mode worked out is reliable.
If the variation is small then we could state that the average calculated is
reliable, but if variation is too large, then the average may be erroneous.
• Dispersion gives us an idea if the variability is adversely affecting the data
and thus helps in controlling the variability.
ABSOLUTE DISPERSION AND RELATIVE DISPERSION
• Absolute dispersion usually refers to the standard deviation, a measure of variation from
the mean. The units of standard deviation are the same as for the data. In other words,
absolute measure is expressed in terms of the original units of a distribution. Therefore,
absolute dispersion is not suitable for comparing the variability of two distributions since
the two variables are expressed and measured in two different units. For instance, the
variability in body height (cm) and body weight (kg) cannot be compared because the
absolute measure (standard deviation) is expressed in cm and kg. The absolute measure is
also not appropriate for two sets of scores expressed in the same units with wide
divergence in means (central value). Nevertheless, absolute measures are widely used,
except in the exceptional cases like above. The absolute measures include range, mean
deviation, standard deviation, and variance.
• Relative dispersion, sometimes called the coefficient of variation, is the result of
dividing the standard deviation by the mean and it may be presented as a quotient or as a
percentage. Thus, relative measures are computed from the absolute measures of
dispersion and its corresponding central values. A low value of relative dispersion usually
implies that the standard deviation is small in comparison to the magnitude of the mean.
TYPES OF MEASURES OF DISPERSION OR
VARIABILITY
1) Range
2) Quartile Deviation
3) Average Deviation or Mean Deviation
4) Standard Deviation
5) Variance
Range and quartile deviation measure dispersion by computing the
spread within which the values fall, while as average deviation and
standard deviation compute the extent to which the values differ from
the average.
RANGE AND QUARTILE DEVIATION
• The Range (R) Range can be defined as the difference between the highest and
lowest score in the distribution.
Range = Highest Score – Lowest Score(R=H-L)
• Since a large number of values in the data lie in the middle of the frequencies
distribution and range depends on the extreme (outliers) of a distribution, we need
another measure of variability. The Quartile deviation, is a measure that depends
on the relatively stable central portion of a distribution.
Inter Quartile Range (IQR): The range computed for the middle 50% of the
distribution is the interquartile range. The upper quartile (Q3) and lower quartile
(Q1) is used to compute IQR. This is Q3 – Q1. IQR is not affected by extreme
values. Semi-Interquartile Range (SIQR) or Quartile Deviation (QD): Half of
the IQR is called as semi inter quartile range. SIQR is also called as quartile
deviation or QD. Thus, QD is computed as; QD = Q3 – Q1/2
MEAN DEVIATION & STANDARD DEVIATION
• The two measures of variation, range and quartile deviation do not show how
values of the data are scattered about a central value. R and QD attempt to
compute spread of values and not compute how far the values are from their
average. To measure the variation, as a degree to which values within a data
deviate from their mean, we use average deviation.
• The term standard deviation was first used in writing by Karl Pearson in 1894.
The standard deviation of population is denoted by ‘σ’ (Greek letter sigma) and
that for a sample is ‘s’. A useful property of SD is that unlike variance it is
expressed in the same unit as the data. This is most widely used method of
variability. The standard deviation indicates the average of distance of all the
scores around the mean. It is the positive square root of the mean of squared
deviations of all the scores from the mean. SD is an absolute measure of
dispersion and it is the most stable and reliable measure of variability. Standard
deviation shows how much variation there is, from the mean. SD is calculated
VARIANCE
• The term variance was used to describe the square of the standard deviation by R.A.
Fisher in 1913.
• Calculating the variance is an important part of many statistical applications and
analysis. It is a good absolute measure of variability and is useful in computation of
Analysis of Variance (ANOVA) to find out the significance of differences between
sample means.
• The relative measure corresponding to SD is the coefficient of variation. It is a
relative measure of dispersion developed by Karl Pearson. When we want to compare
the variations (dispersion) of two different series, relative measures of standard
deviation must be calculated. This is known as co-efficient of variation or the co-
efficient of SD. It is defined as the SD expressed as a percentage of the mean.
• The formula for computing coefficient of variation is as follows:
V = 100 × σ/ M
Where, V = Variance σ = Standard deviation M = Mean
PRACTICE SUMS
1) The amount of rainfall in a particular season for 6 days are given as 17.8
cm, 19.2 cm, 16.3 cm, 12.5 cm, 12.8 cm and 11.4 cm. Find:
Range and standard deviation.
2) 48 students were asked to write the total number of hours per week they spent
on watching television. With this information find the standard deviation of hours
spent for watching television.
x 6 7 8 9 10 11 12
f 3 6 9 13 8 5 4
3) Marks of the students in a particular subject of a class are given below. Find its
variance and standard deviation.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of students 8 12 17 14 9 7 4
PRACTICE SUMS
2.
3.
SPEARMAN’S RANK COEFFICIENT OF CORRELATION
• When quantification of variables becomes difficult such beauty of female,
leadership ability, knowledge of person etc, then this method of rank correlation is
useful which was developed by British psychologist Charles Edward Spearman in
1904. In this method ranks are allotted to each element either in ascending or
descending order. The correlation coefficient between these allotted two series of
ranks is popularly called as “Spearman’s Rank Correlation” and denoted by “R”.
• To find out correlation under this method, the following formula is used
2.
Use the rank correlation coefficient to determine which pairs of judges has the nearest approach to common
tastes in beauty.
3.
PROBABILITY
• Probability is a mathematical term used to talk about the likelihood of something happening. It’s the ability to
understand and predict an outcome. We generally use probability to understand the world around us to judge what
is likely to happen and what isn’t likely to happen.
• Real-life can be chaotic, and lots of things happen that don’t seem to make any sense. You can’t always know
what’s going to happen. But you can do your best, with the help of mathematics, to predict what is going to happen
so you can make sound decisions every day.
• Everything from a coin flip to the weather can be predicted by probability. We express mathematical probability in
terms of fractions and percentages. Once you know the probability of something, you usually classify it as follows.
• It’s certain. (This is a probability of 100 percent, which is the highest possible likelihood of something happening)
• It’s likely. (Probability is between 50 percent and 100 percent)
• There’s an even chance. (There’s a 50 percent probability of it going either way)
• It’s unlikely. (The probability is between zero and 50 percent)
• It’s impossible. (The probability is zero)
BAYE’S THEOREM
• Bayes' Theorem, named after 18th-century British mathematician
Thomas Bayes, is a mathematical formula for determining conditional
probability. Conditional probability is the likelihood of an outcome
occurring, based on a previous outcome having occurred in similar
circumstances.
• P(A ∣ B) is the conditional probability of event A occurring, given that
B is true. P(B ∣ A) is the conditional probability of event B occurring,
given that A is true. P(A) and P(B) are the probabilities of A and B
occurring independently of one another.
NORMAL PROBABILITY DISTRIBUTION SUMS
1) Most graduate schools of business require applicants for admission to take the
Graduate Management. Admission Council’s GMAT examination. Scores on the
GMAT are roughly normally distributed with a mean of 527 and a standard
deviation of 112. What is the probability of an individual scoring above 500 on the
GMAT?
2) The length of human pregnancies from conception to birth approximates a
normal distribution with a mean of 266 days and a standard deviation of 16 days.
What proportion of all pregnancies will last between 240 and 270 days (roughly
between 8 and 9 months)?
3) The average number of acres burned by forest and range fires in a large New
Mexico county is 4,300 acres per year, with a standard deviation of 750 acres. The
distribution of the number of acres burned is normal. What is the probability that
between 2,500 and 4,200 acres will be burned in any given year?
POISSON DISTRIBUTION SUMS
1) Suppose it has been observed that, on average, 180 cars per hour pass a
specified point on a particular road in the morning rush hour. Due to
impending roadworks it is estimated that congestion will occur closer to
the city centre if more than 5 cars pass the point in any one minute.
What is the probability of congestion occurring?
2) The number of failures occurring in a machine of a certain type in a
year has a Poisson distribution with mean 0.4. In a factory there are ten
of these machines. What is
(a) the expected total number of failures in the factory in a year?
(b) the probability that there are fewer than two failures in the factory in a
year?