0% found this document useful (0 votes)
3 views

Module I - Statistical Measures questions

The document outlines statistical measures, focusing on measures of central tendency (mean, median, mode) and variation (range, standard deviation). It includes definitions, formulas, and problems related to these concepts, as well as correlation and regression lines for discrete data. Additionally, it discusses weighted arithmetic mean and composite series means, providing examples and exercises for practical understanding.

Uploaded by

dhanyasuki05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module I - Statistical Measures questions

The document outlines statistical measures, focusing on measures of central tendency (mean, median, mode) and variation (range, standard deviation). It includes definitions, formulas, and problems related to these concepts, as well as correlation and regression lines for discrete data. Additionally, it discusses weighted arithmetic mean and composite series means, providing examples and exercises for practical understanding.

Uploaded by

dhanyasuki05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

MODULE I STATISTICAL MEASURES

SYLLABUS:
Measures of central tendency: Arithmetic Mean, Median and Mode –
Measures of variation: Range, Mean deviation, Standard deviation
and Coefficient of variation – Correlation (Discrete Data): Karl
Pearson’s Correlation coefficient, Spearman’s Rank Correlation –
Regression lines (Discrete Data).

Measures of central tendency


In the study of a population we may get a large number of
observations. It is not possible to grasp any idea about the
characteristic when we look at all the observations. So it is better to
get a representative for all the observations to give a clear picture
of that characteristic.

An average or a measure of central tendency or a measure of


location of a set of observations is a single value which is
representative of all the items and around which the items cluster.

The common measures of central tendency are


(1) Arithmetic Mean (2) Median 3) Mode (4) Geometric Mean
(5) Harmonic Mean

Arithmetic Mean

A.M. of a set of observations is their sum divided by the number of


observations
For ungrouped data:
𝑛
1 (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑥̅ = ∑ 𝑥𝑖 =
𝑛 𝑛
𝑖=1
For ungrouped frequency distribution with values 𝑥1 , 𝑥2 , … , 𝑥𝑛 and
corresponding frequencies 𝑓1, 𝑓2 , … , 𝑓𝑛 :
𝑛
1 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑥̅ = ∑ 𝑓𝑖 𝑥𝑖 = 𝑛
𝑁 ∑𝑖=1 𝑓𝑖
𝑖=1

For grouped frequency distribution,


1 ∑𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
𝑥̅ = ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 =
∑𝑛
where 𝑥𝑖 is the midpoint of the
𝑁 𝑖=1 𝑓𝑖
corresponding class.

Problems:
1. Find the arithmetic mean of the values 7, 5, 3, 4, 6, 4, 5
2. Andy has grades of 84, 65, and 76 on three tests. What grade
must he obtain on the next test to have an average of exactly
80 for the four tests?
3. If the mean of five observations x, x + 4, x + 6, x + 8 and x +
12 is 16, find the value of x.

4. Find the A.M. of the frequency distribution:


x: 1 2 3 4 5 6 7
f: 5 9 12 17 14 10 6

5. Six types of workers are employed in each of two workshops, but


at different rates of wages as follows:
S.No. Types of Workshop I Workshop II
workers Wages No.of Wages No.of
workers workers
1. Mechanic 250 2 300 18
2. Fitter 350 14 300 50
3. Electrician 400 20 425 8
4. Carpenter 300 7 350 12
5. Smith 300 6 350 10
6. Clerk 200 1 500 2
In which workshop is the average rate of wages per worker higher
and by how much?
6. Calculate the arithmetic mean of the marks from the given table
Marks: 0-10 10-20 20-30 30-40 40-50 50-60
No.of students 12 18 27 20 17 6

7. Calculate Mean for the following data


Less than: 10 20 30 40 50 60 70
Frequency: 7 14 28 45 60 68 70
8. Find the A.M. of the wages of 72 labourers, details of which are
given in the following table:
Wages(Rs.) : 13-17 18-22 23-27 28-32 33-37
No. of labourers: 2 22 19 14 3
Wages(Rs.) : 38-42 43-47 48-52 53-57
No. of labourers: 4 6 1 1

9. Find the missing frequency from the following data given


that the average mark is 16.82
Marks Frequency(𝑓)
0-5 10
5-10 12
10-15 16
15-20 𝑓
20-25 14
25-30 10
30-35 8

10. The pass result of students who passed in a class test is given
below. If the average marks for all the fifty students were 5.16, find
the average marks of the students who failed.

Marks(𝑥 ) No. of
students( 𝑓)
4 8
5 10
6 9
7 6
8 4
9 3

11. The average marks secured by 50 students was 44. Later on,
it was discovered that a score 36 was misread as 56. Find the correct
average marks secured by the students.
Weighted Arithmetic Mean:
If 𝑥1 , 𝑥2 , … … , 𝑥𝑛 are the observations and 𝑤1, 𝑤2 , … … , 𝑤𝑛 are the
assigned weights, the weighted arithmetic mean 𝑥̅ is given by
∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑥̅ =
∑𝑛𝑖=1 𝑤𝑖

12. An examination was held to decide the award of a


scholarship. The weights of various subjects were different. The
marks obtained by 3 candidates (out of 100 in each subject) are
given below:
Subject Weight Marks Marks Marks
A B C
Statistics 4 63 60 65
Maths 3 65 64 70
Economics 2 58 56 63
Hindi 1 70 80 52
If weighted A.M. is calculated , who should get the scholarship?
If simple A.M. is calculated , who should get the scholarship?
13. A professor has decided to use a weighted average in
figuring final grades for his students. The homework average
will count for 20% of a student’s grade; the midterm, 25%;
the final, 35%; the term paper, 10%; and quizzes, 10%. From
the following data, compute the final average for the five
students.

Student Homework Quizzes Paper Midterm Final


1 85 89 94 87 90
2 78 84 88 91 92
3 94 88 93 86 89
4 82 79 88 84 93
5 95 90 92 82 88

Mean of a composite series

If ̅̅̅
𝑥1 is the A.M. of 𝑛1 observations and ̅̅̅
𝑥2 is the A.M. of another 𝑛2
observations, then the A.M. of the combined set is
𝑛1 ̅𝑥̅̅1̅+𝑛2 ̅̅̅̅
𝑥2
𝑥̅ =
𝑛1 +𝑛2

14. There are two branches of a company employing 280 and 320
persons respectively. If the A.M. of the salaries of the two branches
are Rs. 750 and Rs.937.5 respectively, find the A.M. of the salaries
of the employees of the company as a whole.
15. The average salary of male employees in a firm was Rs.5200
and that of female employees was Rs.4200. If the mean salary of
all the employees was Rs.5000, find the percentage of male and
female employees.
Median

Median is the value of the middle item when the items are arranged
in ascending or descending order of magnitude. It is a positional
average.

For a raw data with n observations, median is


𝑛+1 𝑡ℎ
(i) The value of the ( ) item, if 𝑛 is odd
2
𝑛 𝑡ℎ 𝑛 𝑡ℎ
(ii) The A.M. of the ( ) and ( + 1) items, if 𝑛 is even, ie., the
2 2
A.M. of the two values in the middle.

For a discrete frequency distribution,


𝑁
(i) Let 𝑁 = ∑𝑛𝑖=1 𝑓𝑖 . Find .
2
(ii) Find the (less than) cumulative frequencies.
𝑁 𝑁+1
(iii) Find the c.f. just greater than or
2 2
(iv) The corresponding value of x is median

For a continuous frequency distribution


𝑁
The class corresponding to the c.f. just greater than is called the
2
median class.
𝑁
( −𝑚)𝑐
Median = 𝑙 + 2
𝑓
where
l = lower limit of the median class
N = Total frequency
m = cumulative frequency of the class just before the median class
c = class width
f = frequency of the median class.

1. Find out the median of the following items


5, 7, 9, 12, 10, 8, 7, 15, 21

2. Find the value of the median from the following data:


10, 18, 9, 17, 15, 24, 30, 11
3. Find the value of median from the following data:
Wages (Rs.) 100 50 70 110 80
No. of workers 15 20 15 18 12
4. Find the value of the median for the following data
Marks: 10 23 18 38 65 92 40 58
No. of students 8 12 16 12 10 18 4 1

5. Calculate the median for the following data:


Wages: 100-109 110-119 120-129 130-139 140-149
No. of workers: 15 23 38 24 10

6. Compute median from the following data:


Mid values: 115 125 135 145 155 165 175 185 195
Frequency: 6 25 48 72 116 60 38 22 3

7. In the frequency distribution of 100 families given below, the


number of families corresponding to expenditure groups 20 – 40 and
60 – 80 are missing from the table. However, the median is known
to be 50. Find the missing frequencies.
Expenditure: 0-20 20-40 40-60 60-80 80-100
No. of families: 14 ? 27 ? 15

Mode
Mode is the value which occurs most frequently in a set of
observations.
For a discrete frequency distribution, mode is value of x
corresponding to the maximum frequency.
For a continuous frequency distribution, modal class is the class
having maximum frequency
(𝑓1 −𝑓0 )×𝑐
Mode = 𝑙 +
2𝑓1 −𝑓0 −𝑓2

where
𝑙 = lower limit of the modal class
𝑓1= frequency of the modal class
𝑓0= frequency of the class just above the modal class
𝑓2= frequency of the class just below the modal class
𝑐 = class width
1. For the six values 140, 220, 90, 180, 140, 200, find: (a)
the mean (b) the median (c) the mode

2. Find the mode:


(i) 45, 36, 28, 42, 45, 40, 44
(ii) 36, 30, 26, 20, 32, 31
(iii) 10, 18, 15, 17, 10, 16, 18, 12
3. Find the mode of the frequency distribution;
X: 1 2 3 4 5 6 7 8
F: 4 9 16 25 22 15 7 3

4. Calculate mode for the following data:


Class: 0-10 10-20 20-30 30-40 40-50
Frequency: 10 14 19 17 13
5. Calculate mean, median and mode for the following data:
Less than : 10 20 30 40 50 60 70
Frequency: 7 14 28 45 60 68 70

6. The following table gives the length of life of 150 electric lamps.
Calculate the mode.
Life (hours) Frequency
0-400 4
400-800 12
800-1200 40
1200-1600 41
1600-2000 27
2000-2400 13
2400-2800 9
2800-3200 4

7. Find the mode:


Marks: 0-10 10-20 20-40 40-50 50-60 60-80
No.of students:10 15 50 20 10 20
8. Test scores for a class of 20 students are as follows:
93, 84, 97, 98, 100, 78, 86, 100, 85, 92, 72, 55, 91, 90, 75, 94, 83,
60, 81, 95

Test a) Copy and complete the table shown


Frequency
Scores at the left.
51-60 b) Find the modal interval..
61-70 c) Find the interval that contains the
71-80 median.
81-90
91-100

Empirical relation between mean, median and mode

1
Mean – Median = (Mean – Mode)
3
i.e., Mode = 3 Median – 2 Mean (For asymmetrical distribution)

For a symmetric distribution, Mean = Median = Mode

9. For a moderately asymmetrical distribution, mode and mean are


32.1 and 35.4 respectively. Find the median

Requisites of an ideal measure of central tendency

(i) It should be rigidly defined


(ii) It should be readily understandable and easy to calculate.
(iii) It should be based on all observations
(iv) It should be suitable for further mathematical treatment
(v) It should not be much affected by fluctuations in sampling
(vi) It should not be affected by extreme values

Measures of Dispersion
The degree to which numerical data tend to spread about the
average value, is called variation or dispersion of data. The
measures of dispersion are:
(i) Range
(ii) Mean deviation
(iii) Standard deviation
(iv) Quartile deviation

Range
Range is the difference between the greatest and the least of the
given values.
Range = L – S
For continuous frequency distribution, take L = upper limit of the
highest class and S = lower limit of the lowest class.

Coefficient of range
(𝐿−𝑆)
Coefficient of range =
(𝐿+𝑆)

1. If the marks of 5 students are 45, 92, 26, 81 and 72 , find the
range.
2. The profits (in ‘000 Rs.)of a company for the last 8 years are
given below. Calculate the range and coefficient of range.
Year: 1975 1976 1977 1978 1979 1980 1981 1982
Profit 40 30 80 100 120 90 200 230
3. Calculate the range of the prices of gold from Monday to Saturday
of a week.
Mon Tue Wed Thu Fri Sat
1160 1158 1170 1142 1175 1187
4. Calculate range and coefficient of range:
Daily wages (Rs.): 50-60 60-70 70-80 80-90
No. of labourers: 60 45 45 40
90-100 100-110 110-120
35 30 30
Mean Deviation (from the mean)

It is the A.M. of the absolute values of deviations of


observations from the mean
1
For raw data, M.D. = ∑|𝑥𝑖 − 𝑥̅ |
𝑛
1
For frequency distribution, M.D. = ∑𝑓𝑖 |𝑥𝑖 − 𝑥̅ | where 𝑁 = ∑𝑓𝑖
𝑁

Note: in a similar manner, we can define mean deviation from


median and mode also

5. Find the M.D. from the mean of the numbers: 4800, 4600, 4400,
4200, 4000
6. Calculate the mean deviation from the mean: 100.500, 100.250,
100.375, 100.625, 100.750, 100.125, 100.375, 100.625, 100.500,
100.125
7. Calculate the mean deviation from the mean, for the following
data

Marks No. of
(x) students(f)
5 5
15 8
25 15
35 16
45 6

8. Calculate the mean deviation from the mean, for the following
data:
Marks: 0-10 10-20 20-30 30-40 40-50
No. of students: 6 5 8 15 7
50-60 60-70
6 3

9. Calculate the mean deviation from median, from the following


table.
Class interval: 1-3 3-5 5-7 7-9 9-11 11-13 13-15 15-17
Frequency: 6 53 85 56 21 26 4 4
Standard Deviation
Standard deviation is the positive square root of the arithmetic mean
of the squares of the deviations of deviations from their arithmetic
mean.
1 1 ∑𝑥𝑖 2
For raw data, 𝜎 = √ ∑(𝑥𝑖 − 𝑥̅ ) 2 = √ ∑𝑥𝑖2 − ( )
𝑛 𝑛 𝑛

For frequency distribution,


1 1 ∑𝑓𝑖 𝑥𝑖 2
𝜎 = √ ∑𝑓𝑖 (𝑥𝑖 − 𝑥̅ ) 2 = √ ∑𝑓𝑖 𝑥𝑖2 − ( ) where 𝑁 = ∑𝑓𝑖
𝑁 𝑁 𝑁

Variance: The square of S.D. is called variance. 𝜎 2

Standard deviation of combined series:


If are 𝑛1, 𝑛2 the sizes and 𝑥̅1 , 𝑥̅2 are the means and 𝜎1, 𝜎2 are the
standard deviations of two series, then the standard deviation of the
combined series of size 𝑛1 + 𝑛2 is given by
1
𝜎2 = [𝑛 (𝜎 2 + 𝑑12 ) + 𝑛2 (𝜎22 + 𝑑22)]
𝑛1 + 𝑛2 1 1
where
𝑛1 ̅̅̅̅
𝑥1 +𝑛2 ̅̅̅̅
𝑥2
𝑑1 = 𝑥̅1 − 𝑥̅ , 𝑑2 = 𝑥̅2 − 𝑥̅ , 𝑥̅ =
𝑛1 +𝑛2

1. Calculate the mean and standard deviation of the heights (in cms)
of 10 students given below:
160, 160, 161, 162, 163, 163, 163, 164, 164, 170

2. Calculate the standard deviation for the following data:


14, 22, 9, 15, 20, 17, 12,

3. The mean of 30 items is 18 and their standard deviation is 3.


Find the sum of all the items and also the sum of the squares of all
items.
4. The following table gives the number of finished articles turned
out per day by different number of workers in a factory. Find the
mean value and the standard deviation of the daily output of finished
articles.
No. of No. of
articles workers
(x) (f)
18 3
19 7
20 11
21 14
22 18
23 17
24 13
25 8
26 5
27 4

5. Calculate the S.D. for the following distribution of net profits


earned by a group of companies.
Profits No. of
(‘000Rs.) companies(𝑓)
20-30 30
30-40 58
40-50 62
50-60 85
60-70 112
70-80 70
80-90 57
90-100 26

Coefficient of variation

When we want to compare variability of two series which differ


widely in their averages or which are measured in different
units, we calculate the coefficient of variation
coefficient of variation

𝑆𝐷
Coefficient of variation = × 100
𝑀𝑒𝑎𝑛
It is a pure number independent of the units of measurement.
To compare the variability of two series: The series having
greater C.V. is more variable than the other. The series having
less C.V. is said to be more consistent or more homogeneous
than the other.
𝑆𝐷
Note: Coefficient of S.D. =
𝑀𝑒𝑎𝑛
1. The prices of two commodities over 10 weeks are given below.
Find out which price shows less variation.
A: 54 55 53 56 52 52 58 49 50 51
B: 108 107 105 106 105 103 102 104 104 101

2. An analysis of monthly wages paid to the workers of two firms


A and B belonging to the same industry gives the following results:
Firm Firm
A B
No. of workers 500 600
Average daily 186 175
wages (Rs.)
Variance of 81 100
distribution of
wages

(i) Which firm , A or B has a larger wage bill?


(ii) In which firm, A or B, is there greater variability in
individual wages?
(iii) Calculate (a) the average daily wage and (b) the variance
of the distribution of wages of all the workers in the firms
A and B taken together.

Correlation

Correlation refers to the study of relationship between two or more


variables.
Let X and Y measure some characteristics of a particular system. If
X and Y vary in such a way that change in one variable corresponds
to change in the other variable then the variables X and Y are
correlated.
Types of correlation: Positive and negative

If increase in one variable causes a proportionate increase in the


other variable, then the variables are said to be positively correlated.
If increase in one variable causes a proportionate decrease in the
other variable, then the variables are said to be negatively
correlated.
Methods of studying correlation:
(i) Scatter diagram method
(ii) Karl Pearson’s correlation coefficient
(iii) Spearman’s rank correlation coefficient
Scatter diagram method

The given data are plotted on a graph in the form of dots. ie, for
each pair of X and Y, we put dots and looking at the scatter of the
various points, we form an idea as to whether the two variables are
related or not. The more the plotted points scatter over a chart, the
lesser is the degree of relationship between the two variables. The
nearer the points come to a line, the higher the relationship. If the
points lie in a haphazard manner, it shows the absence of any
relationship between the variables

Positive correlation Negative correlation No correlation


Karl Pearson’s correlation coefficient
This gives us a measure of correlation which indicates the degree of
correlation in quantitative terms. It is defined as
1
𝐶𝑜𝑣(𝑋, 𝑌) 𝑛 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟(𝑥, 𝑦) = 𝑟𝑥𝑦 = = =
𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦 𝑛𝜎𝑥 𝜎𝑦

Note:
1
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) is called the covariance between X and Y (Cov
𝑛
(X,Y).
𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑟𝑥𝑦 =
√𝑛∑𝑥 2 − (∑𝑥 )2 √𝑛∑𝑦 2 − (∑𝑦)2

Properties of correlation coefficient


(i) The correlation coefficient lies between -1 and +1, i.e.|𝑟| ≤ 1
Note: If r = 1, there is perfect positive correlation
If r = -1, there is perfect negative correlation
If r = 0, the variables are uncorrelated.
(ii) The correlation coefficient is independent of change of scale
𝑋−𝑎 𝑌−𝑏
and origin of the variables X and Y. i.e., if 𝑈 = , 𝑉=
ℎ 𝑘

where 𝑎, 𝑏, ℎ, 𝑘, are constants, ℎ > 0, 𝑘 > 0, then


𝑟(𝑋, 𝑌) = 𝑟(𝑈, 𝑉)

1. Find Karl Pearson’ correlation coefficient for the following


heights in inches of fathers (x) and their sons (y):
X: 65 66 67 67 68 69 70 72
Y: 67 68 65 68 72 72 69 71

2. Calculate Karl Pearson’s Coefficient of correlation between


price and supply of a commodity from the following data:
Price (Rs.):17 18 19 20 21 22 23 24 25 26
Supply(Kg):38 37 38 33 32 33 34 29 26 23
3. A computer while calculating 𝑟𝑥𝑦 from 25 pairs of observations
, obtained the following constants: 𝑛 = 25, ∑𝑥 = 125, ∑𝑥 2 =
650, ∑𝑦 = 100, ∑𝑦 2 = 460, ∑𝑥𝑦 = 508. A recheck showed that two
pairs of values (6, 14), (8, 6) were wrong, while the correct values
were (8, 12), (6,8). Obtain the correct value of correlation
coefficient.
Rank Correlation

Sometimes we have to deal with problems in which data cannot be


quantitatively measured but qualitative measurement is possible.
Here, we give ranks to the values in each series separately and
calculate Spearman’s rank correlation coefficient as
6∑𝑑 2
𝜌=1−
𝑛(𝑛2 − 1)
where d is the difference between the ranks of paired items in the
two series.
Rank correlation coefficient varies between -1 and +1.
Note: 1. ∑𝑑 will always be equal to 0
2. Spearman’s rank correlation coefficient and Karl Pearson’s
correlation coefficient for a given data, are usually different.
3. Spearman’s rank correlation coefficient has the same
value as Karl Pearson’s correlation coefficient between the ranks.
.
Repeated ranks
If two or more individuals have the same value in a series, then each
individual is given the average of ranks. Then rank correlation
coefficient is
1 1
6 [∑𝑑 2 + (𝑚13 − 𝑚1 ) + (𝑚23 − 𝑚2 )+. . . ]
𝜌 = 1− 12 12
𝑛(𝑛2 − 1)

where 𝑚𝑖 is the number of items whose ranks are equal.

1. Calculate the rank correlation coefficient between marks in the


selection test (X) and the proficiency test (Y) of 9 recruits.
Sl.No. 1 2 3 4 5 6 7 8 9
X: 10 15 12 17 13 16 24 14 22
Y: 30 42 45 46 33 34 40 35 39
2. Ten competitors in a music competition are ranked by three
judges in the following order:
Competitor: 1 2 3 4 5 6 7 8 9 10
Judge A: 1 6 5 10 3 2 4 9 7 8
Judge B: 3 5 8 4 7 10 2 1 6 9
Judge C: 6 4 9 8 1 2 3 10 5 7
Using rank correlation coefficient, determine which pair of judges
have common taste in music.

3. Calculate the rank coefficient of correlation for the following


data:
X: 68 64 75 50 64 80 75 40 55 64
Y: 62 58 68 45 81 60 68 48 50 70

Regression
Regression is the measure of the average relationship between
two or more variables in terms of the original units of the data.
It provides a mechanism for predicting or forecasting.

If two variables X and Y are correlated, we see that the scatter


diagram will be more or less concentrated around a curve,
called the curve of regression. If this curve is a straight line,
then it is called line of regression.

We have two regression lines. The regression line of Y on X


gives the most probable value of Y for given values of X. The
regression line of X on Y gives the most probable value of X for
given values of Y.
The equation of the line of regression of Y on X is
𝑟𝜎𝑦
𝑦 − 𝑦̅ = (𝑥 − 𝑥̅ )
𝜎𝑥
𝑟𝜎𝑦
where 𝑏𝑦𝑥 = is the regression coefficient of y on x.
𝜎𝑥

The equation of the line of regression of X on Y is


𝑟𝜎𝑥
𝑥 − 𝑥̅ = (𝑦 − 𝑦̅)
𝜎𝑦
𝑟𝜎𝑥
where 𝑏𝑥𝑦 = is the regression coefficient of x on y.
𝜎𝑦

Note:
𝑛∑𝑥𝑦−∑𝑥∑𝑦 𝑛∑𝑥𝑦−∑𝑥∑𝑦
1. 𝑏𝑦𝑥 = 𝑏𝑥𝑦 =
𝑛∑𝑥 2 −(∑𝑥)2 𝑛∑𝑦 2 −(∑𝑦)2

2. Both the regression lines pass through the point (𝑥̅ , 𝑦̅) .
Hence, by solving the two regression equations, we can find
the means of X and Y.
3. Both the regression coefficients will have the same sign;
either both will be positive or both will be negative.
4. Correlation coefficient is the geometric mean between the
regression coefficients.
i.e., 𝑟𝑥𝑦 = 𝑏𝑥𝑦 . 𝑏𝑦𝑥
If both the regression coefficients are positive, r will be
positive; if both the regression coefficients are negative, r will
be negative.
5. Regression coefficients are independent of the change of
origin, but not of scale.
Angle between regression lines
If 𝜃 is the angle between the two regression lines, then
1 − 𝑟 2 𝜎𝑥 𝜎𝑦
tan 𝜃 = ( ) 2
𝑟 𝜎𝑥 + 𝜎𝑦2

Note:
𝜋
1. When r = 0, 𝑡𝑎𝑛𝜃 = ∞, so 𝜃 =
2
i.e., the two regression lines are perpendicular to each
other. Their equations are 𝑦 = 𝑦̅ and 𝑥 = 𝑥̅

2. If 𝑟 = ±1, then tan 𝜃 = 0 , 𝜃 = 0 𝑜𝑟 𝜋, i.e., the two


regression lines coincide. They cannot be parallel since

they have a common point (𝑥̅ , 𝑦


̅ ).
1. Find the correlation coefficient and the equations of the
regression lines for the following data:
X: 1 2 3 4 5
Y: 2 5 3 8 7

2. Marks obtained by 10 students in Mathematics (𝑥) and


Statistics (𝑦) are given below:
𝑋: 60 34 40 50 45 40 22 43 42 64
𝑌: 75 32 33 40 45 33 12 30 34 51
Find the two regression lines. Also find y when x = 55

3. For the following data, find the most likely price at Madras
corresponding to the price 70 at Bombay and that at Bombay
corresponding to the price 68 at Madras
Madras Bombay
Average price 65 67
S.D. of price 0.5 3.5
S.D. of the difference between the prices at Madras and
Bombay is 3.1

4. In a partially destroyed laboratory record of an analysis of a


correlation data, the following results only are legible.
Variance of 𝑋 = 9
Regression equations are 8𝑥 – 10𝑦 + 66 = 0
40𝑥 – 18𝑦 = 214
Find
(a) The mean values of 𝑋 and 𝑌
(b) The standard deviation of 𝑌
(c) The coefficient of correlation between 𝑋 and 𝑌

Exercises:
1. Calculate Karl Pearson’s Coefficient of correlation between price
and supply of a commodity from the following data:
Price (Rs.): 17 18 19 20 21 22 23 24 25 26
Supply (Kg): 38 37 38 33 32 33 34 29 26 23
2. Compute the coefficient of correlation between the corresponding
values of x and y in the following table:
X: 2 4 5 6 8 11
Y: 18 12 10 8 7 5
3. Calculate Karl Pearson’s correlation coefficient from the data:
Roll No. 1 2 3 4 5 6 7 8 9 10
Marks
in Economics:78 36 98 25 75 82 90 62 65 39
Marks
in Maths: 84 51 91 60 68 62 86 58 53 47
4. Calculate the coefficient of correlation from the following data:
X: 9 8 7 6 5 4 3 2 1
Y: 15 16 14 13 11 12 10 8 9

5. Calculate the coefficient of rank correlation from the following


data:
X: 48 33 40 9 16 16 65 24 16 57
Y: 13 13 24 6 15 4 20 9 6 19
6. The ranking of 10 students in two subjects, maths and physics,
are as follows:
Maths: 3 5 8 4 7 10 2 1 6 9
Physics: 6 4 9 8 1 2 3 10 5 7
Find the correlation coefficient.

8. The coefficient of rank correlation between the debenture prices


and share prices of a company was + 0.8. If the sum of the
squares of the difference in ranks was 33, find the value of n.
9. If covariance between X and Y is 10 and the variance of X and Y
are respectively 16 and 9, find the coefficient of correlation
10. Calculate Karl Pearson’s correlation between X and Y from the
following data:
N  13,  X  117,  X 2  1313, Y  260, Y 2  6580,  XY  2827
11. In two sets of variables X and Y with 50 observations each, the
following data were observed:
Mean of X = 10, S.D. of X = 3
Mean of Y =6, S..D. of Y = 2
Coefficient of correlation between X and Y is 0.3. However, on
subsequent verification, it was found that one value of X (=10)
and Y (=6) were inaccurate and hence weeded out. With the
remaining 49 pairs of values, how is the original value of
correlation coefficient affected?

12. From the following data, obtain the two regression


equations
Sales: 91 97 108 121 67 124 51 73 111 57
Purchase: 71 75 69 97 70 91 39 61 80 47
13. Two variables gave the following data:
x  20, y  15,  x  4,  y  3, r  0.7 . Obtain the two regression

equations and find the most likely value of Y when X = 24.

14. You are given the following information about advertising and
sales:

Adv. Expenses(x) Sales(y)


(Rs. lakhs) (Rs. lakhs)
Average price 10 90
S.D. of price 3 12

Correlation coefficient is 0.8


(a) Find the two regression lines.
(b) Find the likely sales when advertisement expenditure is
Rs. 15 lakhs.
(c) What should be the advertisement expenditure if the
company wants to attain sales target of Rs. 120 lakhs?

15. The equations of the two lines of regression for a bivariate data
are Y = 10(X – 5) and X = 2.5(Y – 14). Find the arithmetic
means of X and Y as well as the coefficient of correlation between
X and Y.

16. Two variables gave the following data: 𝑥̅ = 20, 𝑦̅ = 15, 𝜎𝑥 = 4,


𝜎𝑦 = 3, 𝑟 = +0.7 . Obtain the two regression equations and find the
most likely value of Y when X = 24.

17. Calculate Coefficient of correlation from the following data


and comment on the result:
Experience(x): 16 12 18 4 3 10 5 12
Performance(y): 23 22 24 17 19 20 18 21

18. Compute the coefficient of correlation between the


corresponding values of x and y in the following table:
X: 2 4 5 6 8 11
Y: 18 12 10 8 7 5

19. Calculate the coefficient of rank correlation from the


following data:
X: 48 33 40 9 16 16 65 24 16 57
Y: 13 13 24 6 15 4 20 9 6 19
20. Calculate Karl Pearson’s correlation coefficient from the
data:
Roll No. 1 2 3 4 5 6 7 8 9 10
Marks in
Economics: 78 36 98 25 75 82 90 62 65 39
Marks
in Maths: 84 51 91 60 68 62 86 58 53 47

21. Calculate the coefficient of correlation from the following data:


X: 9 8 7 6 5 4 3 2 1
Y: 15 16 14 13 11 12 10 8 9

22. The ranking of 10 students in two subjects, maths and physics,


are as follows:
Maths: 3 5 8 4 7 10 2 1 6 9
Physics: 6 4 9 8 1 2 3 10 5 7
Find the correlation coefficient.

23. The coefficient of rank correlation between the debenture prices


and share prices of a company was + 0.8. If the sum of the squares
of the difference in ranks was 33, find the value of n.

24. If covariance between X and Y is 10 and the variance of X and


Y are respectively 16 and 9, find the coefficient of correlation

25. Calculate Karl Pearson’s correlation between X and Y from the


following data:
𝑁 = 13, ∑𝑋 = 117, ∑𝑋2 = 1313, ∑𝑌 = 260,
∑𝑌 2 = 6580 , ∑𝑋𝑌 = 2827

You might also like