0% found this document useful (0 votes)
16 views

UNIT I & II QA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

UNIT I & II QA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

MEASURES OF CENTRAL TENDENCY

• Central tendency is the middle point of a distribution.


• Measures of central tendency are also called measures of location.
• The measures of central tendency include :
A) Mean – Arithmetic , Weighted, Geometric mean
B) Median
C) Mode
THE ARITHMETIC MEAN FORMULAS
• Most of the time when we refer to
“average” of something, we are
talking about arithmetic mean.’
• For example : Average winter
temperature in New York city, the
average life of a flashlight battery,
and the average corn yield from an
acre of land.
PRACTICE SUMS
1) The frequency distribution below represents the weights in pounds of a sample of packages carried last
month by a small airfreight company.
CLASS FREQUENCY
10.0 – 10.9 1
11.0 – 11.9 4
12.0 – 12.9 6
13.0 – 13.9 8
14.0 – 14.9 12
15.0 – 15.9 11
16.0 – 16.9 8
17.0 – 17.9 7
18.0 – 18.9 6
19.0 – 19.9 2

a) Calculate the sample mean using the coding method with 0 assigned to fourth class.
b) Repeat part (a) with 0 assigned to the sixth class
c) Explain why your answers in parts (a) and (b) are the same ?
PRACTICE SUMS
2) The frequency distribution below represents the time in seconds needed to serve a sample
of customers by cashiers at BullsEye Discount Store in December 1996.
TIME (in seconds) FREQUENCY
20-29 6
30-39 16
40-49 21
50-59 29
60-69 25
70-79 22
80-89 11
90-99 7
100-109 4
110-119 0
120-129 2
PRACTICE SUMS
3) Following information pertains to the daily income of 150 families,
Calculate the arithmetic mean.
Income (in Rs) Number of families
More than 75 150
More than 85 140
More than 95 115
More than 105 95
More than 115 70
More than 125 60
More than 135 40
More than 145 25
PRACTICE SUMS
4) The size of land holdings of 380 families in a village is given below.
Find the median size of land holdings.
Size of Land Holdings (in acres) Number of families
Less than 100 40
100–200 89
200 – 300 148
300–400 64
400 and above. 39
PRACTICE SUMS
5) The following table gives production yield in kg. per hectare of rice
of 200 farms in a village. Calculate the mean, median and mode values.
Production yield (kg. per hectare) Number of farms
40-45 13
45-50 18
50-55 14
55-60 30
60-65 36
65-70 38
70-75 26
75-80 20
80-85 5
MEASURES OF VARIABILITY
• Variability in statistics means deviation of scores in a group or series, from their mean
scores.
• It actually refers to the spread of scores in the group in relation to the mean.
• It is also known as dispersion.
• For instance, in a group of 10 participants who have scored differently on a
mathematics test, each individual varies from the other in terms of the marks that
he/she has scored.
• These variations can be measured with the help of measure of variability, that measure
the dispersion of different values for the average value or average score.
• Variability or dispersion also means the scatter of the values in a group.
• High variability in the distribution means that scores are widely spread and are not
homogeneous.
• Low variability means that the scores are similar and homogeneous and are
CENTRAL TENDENCY MEASURE OF VARIABILITY OR DISPERSION

Central Tendency is the numbers that are used to Measure of Distribution is used to quantify the
quantify the properties of the data set. variability of the data of dispersion.

Measure of Central tendency include, Various parameters included for the measure of
•Mean dispersion are,
•Median •Range
•Mode •Variance
•Standard Deviation
•Mean Deviation
•Quartile Deviation
FUNCTIONS OF VARIABILITY
The major functions of dispersion or variability are as follows:
• It is used for calculating other statistics such as analysis of variance, degree
of correlation, regression etc.
• It is also used for comparing the variability in the data obtained as in the
case of Socio-Economic Status, income, education etc.
• To find out if the average or the mean/median/mode worked out is reliable.
If the variation is small then we could state that the average calculated is
reliable, but if variation is too large, then the average may be erroneous.
• Dispersion gives us an idea if the variability is adversely affecting the data
and thus helps in controlling the variability.
ABSOLUTE DISPERSION AND RELATIVE DISPERSION
• Absolute dispersion usually refers to the standard deviation, a measure of variation from
the mean. The units of standard deviation are the same as for the data. In other words,
absolute measure is expressed in terms of the original units of a distribution. Therefore,
absolute dispersion is not suitable for comparing the variability of two distributions since
the two variables are expressed and measured in two different units. For instance, the
variability in body height (cm) and body weight (kg) cannot be compared because the
absolute measure (standard deviation) is expressed in cm and kg. The absolute measure is
also not appropriate for two sets of scores expressed in the same units with wide
divergence in means (central value). Nevertheless, absolute measures are widely used,
except in the exceptional cases like above. The absolute measures include range, mean
deviation, standard deviation, and variance.
• Relative dispersion, sometimes called the coefficient of variation, is the result of
dividing the standard deviation by the mean and it may be presented as a quotient or as a
percentage. Thus, relative measures are computed from the absolute measures of
dispersion and its corresponding central values. A low value of relative dispersion usually
implies that the standard deviation is small in comparison to the magnitude of the mean.
TYPES OF MEASURES OF DISPERSION OR
VARIABILITY

1) Range
2) Quartile Deviation
3) Average Deviation or Mean Deviation
4) Standard Deviation
5) Variance
Range and quartile deviation measure dispersion by computing the
spread within which the values fall, while as average deviation and
standard deviation compute the extent to which the values differ from
the average.
RANGE AND QUARTILE DEVIATION
• The Range (R) Range can be defined as the difference between the highest and
lowest score in the distribution.
Range = Highest Score – Lowest Score(R=H-L)
• Since a large number of values in the data lie in the middle of the frequencies
distribution and range depends on the extreme (outliers) of a distribution, we need
another measure of variability. The Quartile deviation, is a measure that depends
on the relatively stable central portion of a distribution.
Inter Quartile Range (IQR): The range computed for the middle 50% of the
distribution is the interquartile range. The upper quartile (Q3) and lower quartile
(Q1) is used to compute IQR. This is Q3 – Q1. IQR is not affected by extreme
values. Semi-Interquartile Range (SIQR) or Quartile Deviation (QD): Half of
the IQR is called as semi inter quartile range. SIQR is also called as quartile
deviation or QD. Thus, QD is computed as; QD = Q3 – Q1/2
MEAN DEVIATION & STANDARD DEVIATION
• The two measures of variation, range and quartile deviation do not show how
values of the data are scattered about a central value. R and QD attempt to
compute spread of values and not compute how far the values are from their
average. To measure the variation, as a degree to which values within a data
deviate from their mean, we use average deviation.
• The term standard deviation was first used in writing by Karl Pearson in 1894.
The standard deviation of population is denoted by ‘σ’ (Greek letter sigma) and
that for a sample is ‘s’. A useful property of SD is that unlike variance it is
expressed in the same unit as the data. This is most widely used method of
variability. The standard deviation indicates the average of distance of all the
scores around the mean. It is the positive square root of the mean of squared
deviations of all the scores from the mean. SD is an absolute measure of
dispersion and it is the most stable and reliable measure of variability. Standard
deviation shows how much variation there is, from the mean. SD is calculated
VARIANCE
• The term variance was used to describe the square of the standard deviation by R.A.
Fisher in 1913.
• Calculating the variance is an important part of many statistical applications and
analysis. It is a good absolute measure of variability and is useful in computation of
Analysis of Variance (ANOVA) to find out the significance of differences between
sample means.
• The relative measure corresponding to SD is the coefficient of variation. It is a
relative measure of dispersion developed by Karl Pearson. When we want to compare
the variations (dispersion) of two different series, relative measures of standard
deviation must be calculated. This is known as co-efficient of variation or the co-
efficient of SD. It is defined as the SD expressed as a percentage of the mean.
• The formula for computing coefficient of variation is as follows:
V = 100 × σ/ M
Where, V = Variance σ = Standard deviation M = Mean
PRACTICE SUMS
1) The amount of rainfall in a particular season for 6 days are given as 17.8
cm, 19.2 cm, 16.3 cm, 12.5 cm, 12.8 cm and 11.4 cm. Find:
Range and standard deviation.
2) 48 students were asked to write the total number of hours per week they spent
on watching television. With this information find the standard deviation of hours
spent for watching television.
x 6 7 8 9 10 11 12
f 3 6 9 13 8 5 4

3) Marks of the students in a particular subject of a class are given below. Find its
variance and standard deviation.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of students 8 12 17 14 9 7 4
PRACTICE SUMS

4) From the following data calculate:


a) Range and coefficient of range
b) Inter Quartile range and Coefficient of quartile deviation
Marks 0-10 10-20 20-30 30-40 40-50 50-60
No. of students 10 20 30 50 40 30

5) Suppose batsman A has mean 50 with SD 10. Batsman B has mean


30 with SD 3. What do you infer about their performance?
MEASURES OF SHAPE
• The measure of central tendency and measure of dispersion can describe the
distribution but they are not sufficient to describe the nature of the distribution.
For this purpose, we use other two statistical measures that compare the shape to
the normal curve called Skewness and Kurtosis.
• Skewness and Kurtosis are the two important characteristics of distribution that
are studied in descriptive statistics
1-Skewness
Skewness is a statistical number that tells us if a distribution is symmetric or not. A
distribution is symmetric if the right side of the distribution is similar to the left side
of the distribution. If a distribution is symmetric, then the Skewness value is 0. i.e.
If a distribution is Symmetric (normal distribution): median= mean= mode,
(Skewness value is 0) If Skewness is greater than 0, then it is called right-skewed or
that the right tail is longer than the left tail. If Skewness is less than 0, then it is
called left-skewed or that the left tail is longer than the right tail.
SKEWNESS
KURTOSIS
2-Kurtosis
• Kurtosis is a statistical number that tells us if a distribution is taller or
shorter than a normal distribution. If a distribution is similar to the
normal distribution, the Kurtosis value is 0. If Kurtosis is greater than 0,
then it has a higher peak compared to the normal distribution. If
Kurtosis is less than 0, then it is flatter than a normal distribution.
• There are three types of distributions:
Leptokurtic: Sharply peaked with fat tails, and less variable.
Mesokurtic: Medium peaked
Platykurtic: Flattest peak and highly dispersed.
KURTOSIS
MEASURES OF SKEWNESS
• In order to make valid comparison between the skewness of two or more distributions
we have to eliminate the distributing influence of variation.
• The following are the important methods of measuring skewness:
1. Karl Pearson’s Method
2. Bowley’s Method
3. Kelly’s Method
PRACTICE SUMS
1) Calculate sample skewness and sample kurtosis from the following from the
following grouped data:
CLASS 2-4 4-6 6-8 8-10
FREQUENCY 3 4 2 1

Ans – Kurtosis =0.48 Skewness=2.12


2) Compute Karl Pearson’s, Bowley’s and Kelly’s coefficient of skewness from the
following data.
HEIGHT 58 59 60 61 62 63 64 65
No. of persons 10 18 30 42 35 28 16 8

Ans = 0.153 , 0.048, 0.026


KEY DIFFERENCES BETWEEN SKEWNESS AND KURTOSIS

1- The characteristic of a frequency distribution that ascertains its symmetry about


the mean is called skewness. On the other hand, Kurtosis means the relative
pointedness of the standard bell curve, defined by the frequency distribution.
2- Skewness is a measure of the degree of lopsidedness in the frequency
distribution. Conversely, kurtosis is a measure of degree of tailedness in the
frequency distribution.
3- Skewness is an indicator of lack of symmetry, i.e. both left and right sides of the
curve are unequal, with respect to the central point. As against this, kurtosis is a
measure of data, that is either peaked or flat, with respect to the probability
distribution.
4- Skewness shows how much and in which direction, the values deviate from the
mean? In contrast, kurtosis explain how tall and sharp the central peak is.
REMEMBER THE THREE FOLLOWING
QUESTIONS:
1) Is the relationship between the variables significant?
• Conduct a significance test
2) How strong is the relationship?
• Use a measure of association
3) What is the nature of the relationship between the variables?
• Interpret outputs of your analyses: charts, tables, mathematical
formulas
MEASURES OF ASSOCIATION

• One goal of research may be to establish a relationship between or


among variables.
• The first step toward this goal is to demonstrate that a relationship
exists.
• The second step is to quantify the strength and direction of the
relationship using one or more appropriate measures of association.
• Several statistical techniques for determining the strength of the
association among dependent and independent variables are there.
• The choice of which technique to use is largely driven by the level of
data being analyzed (i.e. – Ratio, Interval, Ordinal, Nominal, Binary).
MEASURES OF ASSOCIATION
Choosing a Measure of Association
• Need to select appropriate test based on level of measurement of IV(s)
and DV
• Need to consider the measure’s sensitivity (more on this later) Researcher
should be familiar with the chosen statistic.
Asymmetric or symmetric?
• Asymmetric Measures Preferred when you know which variable is the IV
and which is the DV
• Symmetric Measures Choose when you do not know which is IV and
which is DV.
MEASURES OF ASSOCIATION
How to Interpret Levels of Association
There is no universal scale to determine if a relationship is strong or weak
but following points to be remembered:
• Perfect positive relationship between variables: +1.0
• Perfect negative relationship between variables: -1.0
• No relationship between variables = 0
In general:
The closer to 0, the weaker the relationship and the closer to ±1, the stronger
the relationship
CORRELATION
Correlation is a statistical technique to ascertain the association or relationship between two or
more variables. Correlation analysis is a statistical technique to study the degree and direction
of relationship between two or more variables. A correlation coefficient is a statistical measure
of the degree to which changes to the value of one variable predict change to the value of
another. When the fluctuation of one variable reliably predicts a similar fluctuation in another
variable, there’s often a tendency to think that means that the change in one causes the change
in the other.
Uses of correlations:
1. Correlation analysis helps inn deriving precisely the degree and the direction of such
relationship.
2. The effect of correlation is to reduce the range of uncertainity of our prediction. The
prediction based on correlation analysis will be more reliable and near to reality.
3. Correlation analysis contributes to the understanding of economic behaviour.
4. Economic theory and business studies show relationships between variables like price and
quantity demanded advertising expenditure and sales promotion measures etc. 5. The measure
TYPES OF CORRELATION
Correlation is described or classified in several different ways. Three of the most
important are:
I. Positive and Negative
• If both the variables vary in the same direction, correlation is said to be positive.
II. Simple, Partial and Multiple
• When only two variables are studied, it is a case of simple correlation.
• In case of partial correlation one studies three or more variables but considers only
two variables to be influencing each other and the effect of other influencing
variables being held constant.
• When three or more variables are studied, it is a case of multiple correlation.
III. Linear and non-linear
• If the amount of change in one variable bears a constant ratio to the amount of
KARL PEARSON’S COEFFICIENT OF CORRELATION
Karl Pearson’s method of calculating
coefficient of correlation is based on
the covariance of the two variables in a
series. This method is widely used in
practice and the coefficient of
correlation is denoted by the symbol
“r”. If the two variables under study
are X and Y, the following formula
suggested by Karl Pearson can be used
for measuring the degree of
relationship of correlation.
PRACTICE SUMS
1.

2.

3.
SPEARMAN’S RANK COEFFICIENT OF CORRELATION
• When quantification of variables becomes difficult such beauty of female,
leadership ability, knowledge of person etc, then this method of rank correlation is
useful which was developed by British psychologist Charles Edward Spearman in
1904. In this method ranks are allotted to each element either in ascending or
descending order. The correlation coefficient between these allotted two series of
ranks is popularly called as “Spearman’s Rank Correlation” and denoted by “R”.
• To find out correlation under this method, the following formula is used

• In case of tie in ranks or equal ranks:


PRACTICE SUMS
1.

2.

Use the rank correlation coefficient to determine which pairs of judges has the nearest approach to common
tastes in beauty.

3.
PROBABILITY
• Probability is a mathematical term used to talk about the likelihood of something happening. It’s the ability to
understand and predict an outcome. We generally use probability to understand the world around us to judge what
is likely to happen and what isn’t likely to happen.
• Real-life can be chaotic, and lots of things happen that don’t seem to make any sense. You can’t always know
what’s going to happen. But you can do your best, with the help of mathematics, to predict what is going to happen
so you can make sound decisions every day.
• Everything from a coin flip to the weather can be predicted by probability. We express mathematical probability in
terms of fractions and percentages. Once you know the probability of something, you usually classify it as follows.
• It’s certain. (This is a probability of 100 percent, which is the highest possible likelihood of something happening)
• It’s likely. (Probability is between 50 percent and 100 percent)
• There’s an even chance. (There’s a 50 percent probability of it going either way)
• It’s unlikely. (The probability is between zero and 50 percent)
• It’s impossible. (The probability is zero)
BAYE’S THEOREM
• Bayes' Theorem, named after 18th-century British mathematician
Thomas Bayes, is a mathematical formula for determining conditional
probability. Conditional probability is the likelihood of an outcome
occurring, based on a previous outcome having occurred in similar
circumstances.
• P(A ∣ B) is the conditional probability of event A occurring, given that
B is true. P(B ∣ A) is the conditional probability of event B occurring,
given that A is true. P(A) and P(B) are the probabilities of A and B
occurring independently of one another.
NORMAL PROBABILITY DISTRIBUTION SUMS
1) Most graduate schools of business require applicants for admission to take the
Graduate Management. Admission Council’s GMAT examination. Scores on the
GMAT are roughly normally distributed with a mean of 527 and a standard
deviation of 112. What is the probability of an individual scoring above 500 on the
GMAT?
2) The length of human pregnancies from conception to birth approximates a
normal distribution with a mean of 266 days and a standard deviation of 16 days.
What proportion of all pregnancies will last between 240 and 270 days (roughly
between 8 and 9 months)?
3) The average number of acres burned by forest and range fires in a large New
Mexico county is 4,300 acres per year, with a standard deviation of 750 acres. The
distribution of the number of acres burned is normal. What is the probability that
between 2,500 and 4,200 acres will be burned in any given year?
POISSON DISTRIBUTION SUMS
1) Suppose it has been observed that, on average, 180 cars per hour pass a
specified point on a particular road in the morning rush hour. Due to
impending roadworks it is estimated that congestion will occur closer to
the city centre if more than 5 cars pass the point in any one minute.
What is the probability of congestion occurring?
2) The number of failures occurring in a machine of a certain type in a
year has a Poisson distribution with mean 0.4. In a factory there are ten
of these machines. What is
(a) the expected total number of failures in the factory in a year?
(b) the probability that there are fewer than two failures in the factory in a
year?

You might also like