0% found this document useful (0 votes)
45 views

3 Unit (1) - Merged

Uploaded by

KRISHNA RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

3 Unit (1) - Merged

Uploaded by

KRISHNA RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

~r s

correlation, Regression, Time Series


Analysis and Index Numbers

s.1 Correlation and Regression


Correlation analysis is the statistical tool used to measure the degree to which two
variables are linearly related to each other. Correlation measures the degree of
association between two variables.
If the quantities (X, Y) vary in such a way that change in one variable corre-
sponds to change in the other variable, then the variables X and Y are correlated.
Example: Price of commodity and amount of demand.
Correlation can be studied using various methods like

1. Scatter diagram

2. Karl Pearson's coefficient of correlation

3. Spearman's rank correlation coefficient.


J

5.1.1 Scatter Diagram


The simplest device for studying correlation between two variables is a special
type of dot chart called scatter diagram. When this method is used, the given data
are plotted on a graph in the form of dots .
. i.e, for each pair of X and Y values we put dots and thus we obtain as many
points as the number of observations. By looking to the scatter of the various
points we can form an idea as to whether the 2 vaiiables are related or not. The
more the plotted points scatter over a chart, the lesser is the degree of relationship
between the two variables. The nearer the points come to the line, the higher the
degree of relationship. If the plotted points lie in a haphazard manner it shows
the absence of any relationship between the variables. Consider the followin
diagrams.
, Correlation, Rcgrcs.'tion Time S rl
' Analy I Number.
_1.3 Properties of correlation coefficient
6
(D The coefficient of correlation lies hctwccn I and + 1 or rl
Proof:
The con-elation coefficient' r' between X and y
1s g1ven hy
r = Cov(x, y)
ax ay where Cov (x, y) = _n1 L(x -X) !J
'I
K
say Cov (x,y) = K

r e= :yr
axay

Here K
2
= G 2)x - x)(y - y) =
2 2 - I:(x -x)2 I:(y-y)2 I:X2. I:Y2
ax a y -
n
. ----
n
= ----'=--
n2
where X = x - x and Y = y- y

By Schwarz's inequality, we have

Dividing both sides by n 2 , we get

K2 <
-
0-2. 0-2
X y

lrl <
') I

,,. .. < 1 ⇒ 1 (or) - 1 <r<1

Note:

1. If r = l then there is a perfect positive correlation.


2. If r = - l then there is a perfect negative correlation.
3. If r = 0 then the variables are uncorrelated.

(ii) The co-efficient of correlation is independent of change of scale and origin


of the variables X and Y.
Proof

X-a Y-b
.Let U = h , V == k so that
X ===a+ hU and Y = b + kV where a, b, h, k are constants; h > 0, k > 0.
/
We shall prove that r(X, Y) = r(U, V)
Note: Metl1od for finding correlatio n co-efficie nt (discrete caseJ

(X-X) (Y-Y)
r == ~..:.-- -=----
na xo-y
n~XY - (~X) (~Y)
- J J
n ~X 2 - (~X) 2 n ~Y 2 - · (~Y) 2

5.1.4 Calculation of co-effic ient of correla tion for a bi-variate


distribution
If the bi-variate data in x and y is given by two way table and f is the frequenc
then,
_ . N~~f:ry - C~f x)(LJfy)
Txy - JNLJjx 2 - cEJx)2J N Y:,fy2 - ci:,Jy)2
5 .1 .5 R e g re s s io n
Definition , . 11sh1p b et w en t 10 ,r
su re o f th e a v c ri g c 1e1,tt10
n1ea
Re2"ression is th e
i

a.
ngu1al uni ts o f d at
• •
... •
te n n s o f o . . ,~-~.,."'
find o u t th ~,,,_ .,A
ab le s 111
co rr el at ed w e ca n
th e sa le s a n d ad v er ti si n g are o r the a m o u n t n
ed
E .t·aJ npl e: If xp e n d it u re
g iv e n ad v er ti si n g
e
o f sa l~ s fo r a
amount s.
n an1ount o f sale
at ta in in g t11e g iv e

Lines o f r e g r e s s io n n b e tw e e n
5.1.6 , th e re ex is ts a n as so ci at io
i. e.
ar ia b le s X a n d Y a re c o rr e la te d o re o r le ss co n ce n tr at ed a ro u n d
,r m
If (\·re
e th a t th e sc a tt e r d ia g ra m will b e
them, ,ve c a n se rve o f regression.
T h is c u rv e is c a ll e d C u
f re g re ss io n a n d th e re g re s-
e. o
a curv
st ra ig h t li n e , it is c a ll e d the line
If the curve is a
is a li n e a r re g re ssion. re ss io n li n e o f X and Y a n d
sicn n lines as the re g
sh a ll h a v e tw o re g re ss io
e o f Y a n d X g ives th e m o st
We
o f Y a n d X . T h e re g re ss io n li n li n e o f) (_ a n d } '"
io n li n e g re ss io n
the regress n v a lu es o f X a n d the
re
Y fo r g iv e
probable value o f f X fo r g iv e n v a lues o f Y .
g iv es the m o st p
ro b a b le values o

is a n d R e g re s s ion Analysis
o n b e tw e e n C o rrelation Analys
Table 5.1 Relati Regression Analy
sis
o C o rr el a tion Analysis re g re ss io n co efficients are
S.N he
at io n co ef fi ci en t r b et w ee n T al m ea su re s ex p re ss in g
1. C o rr el m at h em at ic
d Y is a m ea su re o f li n ea r re - e re la ti o n sh ip b et w ee
n
X an th e av er ag
et w ee n X and Y
la ti o n sh ip b th e f·w o variables. -ci-
ss io n co ef li t~ nts reflect o n
R eg re
ef fi ci en t d o es not b le i.e, w h ic h is
2. T h e co rr el at io n co th e n a tu re o f v ar ia
t u p o n th e n at u re o f variable v ar ia b le . In o th er words,
reflec d ep en d en t
d ep en d en t o r d ep en d en t vari- cs th e v al u e o f d ep en d en t
(in it csti1n at
able) ia b le fo r an y g iv en va lu e o f in -
v ar
le .
d e p e n d e n t v ar ia b
5.6 Statistics for Management

It indicates the cause and f


3. It does not imply cause and effect
feet relationship between the va:i:
relationship between the variables
ables. The variable correspondin g
und er study . k
to caus e 1s ta en as independent
variable, whereas corresponding to
. k
effect 1s ta en as dependent vari-
able.
Reg ress ion coefficients are absolute-
4. It is a relative measure and is in-
1neasures of finding out the relation-
dependent of the units of measure-
ship betw een two or more variables
men t
It is used to forecast the nature 0 (
5. It indicates the degree of associa-
depe nden t variable when the value
tion.
of inde pend ent variable is known.

Uses of Regression Analysis


1. The cause and effect relations are indicate
d from the study of regression
analysis.
in terms of the changes in
2. It establishes the rate of change in one variable
another variable.
equation can determine an
3. It is useful in economic analysis as regression
ease in general price
increase in the cost of living i...11.dex for a particul ar incr
level.
value of unknown quanti-
4. It helps in prediction and thus it can esti mate the
ties.
een the variables.
5. It enables us to study the nature of relationship betw
l sciences, where the data
6. It can be useful to all natural, social and phy sica
are in functional relationship.

5.1. 7 Regression Equations


(i) Equation of line of regression of Yo n Xi s
0-
y- y =r _J/_ (x- x)
0-x

Cly . .
where r a- x 1s the regression coefficient of y on x.
(ii) Equation of line of regression of X on y is

X -
-
X == r -0-x ( y - y)
0-y

h Clx .
w ere r CJY is the regr essi on co-e ffic ient of X on y.
. An 1 .
Correlation, Regression ' Time Senes
a ys1s and Index N um bers 5.7
~l
I

Note:
1. The regression coefficients can be denoted by
b CJy
yx =r -
CJx
and bxy == CJ
r ~
. ay
2. The regress10n co-efficients are obt . th
discrete values of X and y amed by e following expressions for

byx =r O"y =n ~ xy - (~x)(~y)


CJx n ~x2 - (~x)2
bxy =r CJx =n ~xy - (~x)(~y)
O"y n ~y2 _ (~y)2

. (-x _y) where x d -


3. Both the regression lines pass thro ugh the point
means of X and y respectively. ' an y are
4. Co.rrelation coefficient is the Geometric mean between the regression coef-
ficients.
.
1.e, bxy · byx = r 2 ⇒ r = ± ✓bxy. byx

5. If one of the regression coefficients is greater than unity the other must be
less than unit-y.
6. Regression coefficients are independent of the change of origin but not of
scale.
7. Both the regression coefficients will have the same sign, i.e, they will be
either both positive or both negative. The coefficient correlation will have
the same sign as that of regression coefficients, i.e. if regression coefficients
have a negative sign, r will also have negative sign and if the regression
coefficients have a positive sign, r will also be positive.

5.1.8 Angle between regression lines


th
If 0 is the angle between the two regression lines, en
tan 0 ::::: (1 r r2) :x:y
Clx
2
Cly

usual meaning.
where r O'X
h
O'Y have t e
Proof : ' ' . of y on X and X on Y are
. ion lines
Equation of the regress _
y _ y"' byx(x - ~)
x _ X"' bxy(Y - y)
5.8 Statistics for Management

Slope of the two lines are


ay
'm1 == byx -
-
r · -·
ax
1 1
.ay
-.
m2 == -b
xy
- r ax

If 0 is the angle, then


jm 1 - m2I
tan 0 == l + m1m2

Ir·~ - r":x I
= 1 + (r · ~) (~ · ~)
11 !!JL . vx
Ir - -r CY x
n-2

Since r2 < 1 and ax and a Y are positive, the angle between the lines is

l - r2 CJ"xCJ"y
tan0 == - - -
r CJ"X2 + CJ"2y
Note:
(i) Supposer == 0. Then tan 0 == oo ⇒ 0 == ; == 90°
The two regression lines are perpendicular to each other and the equations
will be
y == y and x == x.
(ii) If r == ±l, then tan0 == o ⇒ 0 == 0 or 7r
Here the lines of regression coincide. They cannot be parallel since theY
have a common point (x, y).
Solved Problem 5.1

Calculate the correlation coefficient for thefollowing heights (in inches) of fathers
( ) and their sons (y).

X 65 66 67 67 68 69 70 72
y: 67 68 65 68 72 72 69 71
5.9
Correlation, Regression, Time Series Analysis and Index Numbers
SSION,

Solution
xy
65 67 4489 4355
4225
66 68 4624 4488
4356
67 65 4355
4489 4225
67 68 4556
4489 4624
68 72 4896
4624 5184
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184
Total: 544
5041 5112
552 37028 38132 37560

2X =544, Y 552, x2 37028,


=

Y =
38132, XY =
37560, n =8

TXY=F nXY (2X) (2Y)


x - (x)/»Ty- (Er)
8(37560)- (544) (552)
V8(37028) (544)2 8(38132) (552)2
-

= 0.603

There is high positive correlation between x and y.


(or)
By taking u = * -

67, v =
y -

68, we have
X u = X - 67 V= y - 68
2 uv
65 67 2 -1 4 1 2
66 68 0 0
67 65 0 9 0
67 68 0 0 0 0
68 72 4 16 4
69 72 4 4 16 8
70 69 9 1 3
72 71 25 9 15
44 52 32
5.10
Statisties for Management

TY w" ()
8(32)-(8)(8)
11) (8)'8(62)-(8)

0.603

Solved Problem 5.2


werghts given
the heights and
between
h e r e is any
significant corelation
below 65 55 58 37
63 64
59 62
Height in inches: 57
126 130 129 111 116 112
Weight in lbs: 113 117 126

uv
V=y
-
120
u =-60
9 49 21
57 113 3 7
9 3
59 117 -1 3
36 12
0TP 62 126 6
36 18
63 126 6
16 100 40
64 130 10
25 81 45
65 129 9
-9 25 81 45
55 111
-4 4 16 8
58 116 2
57 112 -3 -8 9 64 24

0 0 102 472 216


nuv-(Eu)C)
TxyTuv
Vnu-(Du"VnE-( )
9(216) - (0)(0)

V9(102) -029(472) -

02
= 0.9844
0.667

Solved Problem 5.10

The
following table gives, according
age, the frequency
to age,
of marks
obtained

100 students in dccording treqe


an
intelligence test.
Age in year
20 21 Total
18 19
Marks
10-20 2 2

6 4 19
20-30 5 4

30-40 6 8 10 11 35

40-50 4 4 6 22

2 4 4 10
50-60

2 3 1 6
60-70

31 28 100
Total 19 22

the correlation coefficient.


Calculate

Solution
U=
=
- 3 35
5
v
u = t
-

19, 10
Let
Statistics tor ManageentA A
5.18 T 2 Total
-1

19 20 21 fv
Age X 18
Mid
value/Mark
10-20
4 ® 2 O , 8-16 -16 32 fuy A
-2 15 4O|6O|4 19 -19
20-30 O| 19
-1 25 35 9
8O|1oO||0 0 0
30-40 6O|
o 35 22
4O|6 O8 22 22
4 0 18
40-504
145 2044 10 20
2 55
50-60
20 31 6 18 24
54 15
60-70
65
3 19 22 31 28 N-100 25|167| 52
f
Total 0 31 56 68
-19
fu
19
0 31 112 162
fu
13 30 52
t9to4- fuv 9
NE/ry(2f«)(Efv)
r=
N / ? - ( E f * V N L / P - (E/ y)?

Nfuv-(2/u2/)
v?
NEfu- (fu"VNEfv? (2S
-

(100 x 52) - (68 x 25)

V(100 162) (68)2 (100


1 6 7 ) - ( 2 5 ) 2 0 . 2 5 6 6 .

x -

x
Solved Problem 5.12
for tne
Cuate the coefficient of correlation and obtain the lines of regression
following.
X: 1 2 3 4 5 6 7 8 9
Y: 9 8 10 12 11 13 14 16 15
6.2.
estimate of Y which should correspond to the value X
=
Obtain an
5.20 Statistics for
NManagement
Solution
Y NY XY
81
61 16
3 10 100 30
412 16 144 48
5 11 25 121 65
13 36 169
7 14 49
78
196 98
16 64 256 128
9 15 225 135
45 108 285 1356 597

X=45, Y
597, n= 9
=
108. X =
285, EY 1356.5 XY
-

T-X45
5
108
= 12
9
Correlation coefficient
nXY-(X)(Y)
VnX- (2X) VnY-(2Y
(9 x 597)-(45 x 108)

V9x 285)- (45)"(9x 1356) (108)2


r=0.95

Regression coefficient of X on Y

nXY-(XICY
bey nY-(EY¥
(9x 597)- (45 x 108)
= 0.95
(9x 1356) - (108)2
Regression coefficient of Y on X
b XY-(x)(Y
nX- (EX
(9 x 597)- (45 x108) - 0.95
(9x 285) - (45)2
5.21
orTelation, Regression, Time Series Analvsis and Inder Niumbers

Regression line of X on Y is

-T b,u(y-)
-5-0.95(y 12)
0.95y-114
0.95 y-6.4
Regression line of Y on X is
y-7 byr (r- 7)
r-12 0.95(r- 5)
= 0.95 r-4.75
y=0.95r + 7.25
Value of y
coresponding to r =
6.2 is

y= (0.95 x 6.2) + 7.25 13.14

Solved Problem 5.13

The
following data relate
marketing expenditure in lakhs of rupees and the
to
Corresponding sales of a product in crores of rupees. Estimate the marketing ex-
penditure to attain a sales target of Rs.40 crores.
Marketing expenditure 10 12 15 20 23
Product sales 14 17 23 21 25
Also find the coefficient of correlation between
sales.
marketing expenditure and

Solution
Let c be marketing expenditure and g be-product sales.

xy
10 14 100 196 140
12 17 144 289 204
15 23 225 529 345
20 21 400 441 420
23 25 529 625 575
Total 80 100 1398 2080 1684
5.22 Statistics for Management

r80, y= 100, =1398, =2080,5


Regression coefficient of A on Y = b

ny-( E)
n -(
(5)(1684)-(80) 100=105
5(2080)- (100)2
Now

T= = 16
7-- n 5
=20
Regression line of X on Y is

z-T =
bzyy-7)
T- 16 =
1.05(y 20) -

T = 1.05 y - 5

.'. Marketing expenditure to attain a sales target of Rs.40 crores


=
Value ofX whenY = 40
=
(1.05 x 40) 5 =
Rs.37 lakhs
Correlation coefficient TxY =

nay-( )
Vnr-(2-* Vn2-(2y
(5)(1684)(80)(100)
V5(1398) (80)2 (5)(2080) (100)2
-

= 0.8646

Solved Problem 5.14

A research investigator collected dala on savings and investment from 16 house


Solved Problem 5.15

Out of the two lines of regression given by r +2y 5 0 and 2r 8


-
= +
3y =
0,
which one is the regression line of X and Y?
Use the equations to find the means of Xand Y. If the variance of X is 12,
find the variance of Y.
oS
Solution
The two regression lines are l
r+2y-5 =0 (1) and
2ar+3y-8 = 0
(2)
5.24 Statistics for Management

Solving the two equations, we get

F= 1, 7 = 2
Using note (3)
Let us assume that () is the
of r
regression line of y on r and (2)
(2) isis
on th
y. the
gres ico,n lite
regressi
Then (1)2y= -r + 5
y=-+
by

(2)2x = -3y + 8
3
T=-y+4
bry 2

rl< 1 and use choose the


Our negative sign since byz and bzy are
assumption
line of X on Y is
about the
regression on lines is negative.
2x+3y 8 =0 correct, i.e., the regression
Given ox = 12

Now 1
by 2
1.e.,

1.e.,

oy- V3 =2
Variance of y =
o = 4
5.25
mbers
Number
Index
Correlation, Regression, Time Series Analysis and

Solved Problem 5.16 data, only


correlation

In a
partially destroyed laboratorv records on the analysisof
the following results are legible: =

=0and
40x - 18y
Variance of x 9. 10y + 66
=
Regression equations 8x-
214. Find (a) mean values
of xand y(b) Correction coefhcre
(c) S.D. ofy.
Solution
1. The equations of the regression lines are (1)
8x- 10y+66 0
(2)
214
40r 18 =

Solving these equations, we get + = 13, y= i

2. Let
(1) be the regression line of y on x and
(2) be the regression line of x ony.
8 66
.yTo"T10
18 214
T 4040
8 18
y 10' bry 40
V 0 =18
0.6

3. Now
8
ay
10
0.6 x =0.8
ay =
0.8 x 3 4
0.6

Solved Problem 5.17


For the following data, find the most likely price at Chennai corresponding to the

price 70 at Mumbai and that at Mumbai corresponding to the price 68 at Chennai.


Chennai Mumbai
Average price 65 67
S.D. of price 0.5 3.5
S.D. of the difference between the prices at Chennai and Mumbai is 3.1.
Management
Statistics for
5.26

Solution
at Chennai and
Y be the price at M
Let X be the price bai
Given T= 65, = 67, O =
0.5,
oy3.5, -y3.1
Correlation coefficient =r=
20,0
(0.5)2 +(3.5) (3.1)2
2(0.5)(3.5)
Y is
The regression line of X on

t - T= ( y - 7)
Oy
0.8257 x 0.5,
t 65 =
3.5 -(-67)
t = 0.1179y + 57.1007

The price at Chennai corresponding to the price 70 at


(0.1179 mbai
= x 70) + 57.1007
= Rs. 65.37.
The regression line of y on x iS

TO(-T)
0.8257x 3.5,
y-67= 0.5
-(r - 65)

y =5.7799z - 308.6935
The price at Mumbai corresponding to the price 68 at
Chennai
(5.7799 68) 308.6935
=
x -

= Rs.85.63.

Solved Problem 5.18


From the (AU, May/June 2006)
following data, find the equations of the
Marks in regression lines.
Mean
Mathematics Marks in
62.5 English
S.D 9.5 39
Coefficient of correlation between 10
marks in
1. Estimate the marks in Mathematics and English = 0.60
English when marks in
2. Estimate the marks in Mathematics is 70
Mathematics
corresponding to 54 marks in English
Correlation, Regression, Time Series Analysis and fndex Numbers 5.27

Solution
Let marks
Given
in
Mathematics be z and marks in Ernglish De

T
=62.5, = 39, a, =9.5, ay =
10, r=0.0
Regression line of y on r is

y - - TOy (r- 7)
0.60 10-62.5)
9.5
=
0.6316r -

62.5)
=
0.6316: -

39.475
y =
0.6316r -0.475
()
i) Marks in English when marks in Marks in Mathematics is 70
=
(0.6316x 70) 0.475 43.74
Regression line of r on y is

Oy
I - 62.5 =
0.609.5
10
u-39)
= 0.57(y39)

0.57y 22.33
. I =
0.57y + 40.27
(i) Marks in Mathematics corresponding to 54 marks in
English
=
(0.57x 54) + 40.27
=
30.78 + 40.27 71.05

You might also like