P&S unit 2
P&S unit 2
UNIT - 2
1 CURVE FITTING
Introduction
𝑦 = 𝛼 + 𝛽𝑥
𝑦 = 𝛼 + 𝛽𝑥 + 𝛾𝑥 2 (𝑝𝑎𝑟𝑎𝑏𝑜𝑙𝑎)
𝑦 = 𝛼𝑒 𝛽𝑥 (𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙)
𝑦 = 𝛼𝑥 𝛽 (𝑝𝑜𝑤𝑒𝑟 𝑐𝑢𝑟𝑣𝑒𝑙)
𝑦 = 𝛼𝛽 𝑥 (𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙)
2 CORRELATION ANALYSIS
3 REGRESSION ANALYSIS
Lines of regression
Regression co-efficients
Case study:
Adv Cost 15 20 25 30 35 40 45 50 55 60 65 70
(000's)
Sales (000's) 198 262 346 395 432 444 482 498 527 541 579 595
When the company spends 1 lac as advertising cost, what about the sales?
Objective:
Determine the unknown parameters which are involved in the functional relation,
Tools
2 Calculus (derivatives)
PROCEDURE:
Curve fitting is the process of constructing a curve, or mathematical
function, that has the best fit to a series of data points, possibly subject to
constraints.
bivariate(𝑋 , 𝑌).
700
600
500
400
300
200
100
0
The functional relation between adv. Cost and sales is assumed to be linear
𝑦 = 𝛼 + 𝛽𝑥 + 𝑒
700
600
500
sales (y)
400
300
200
100
0
0 10 20 30 40 50 60 70 80
Adv. cost (x)
𝑦 = 𝛼 + 𝛽𝑥 + 𝑒 − − − − − − − −(1)
𝑦 − (𝛼 + 𝛽𝑥) = 𝑒, 𝑒𝑟𝑟𝑜𝑟
𝐸𝑟𝑟𝑜𝑟, 𝑒 = 𝑦 − 𝛼 − 𝛽𝑥
“To minimise the sum of squares of errors with respect the parameters” the
𝜕𝐸 𝜕𝐸
necessary conditions =0 & =0
𝜕𝛼 𝜕𝛽
𝜕𝐸 𝜕
=0 ⇒ {∑(𝑦 − 𝛼 − 𝛽𝑥)2 } = 0
𝜕𝛼 𝜕𝛼
⇒ 2 ∑(𝑦 − 𝛼 − 𝛽𝑥)2−1 . (0 − 1 − 0) = 0
⇒ ∑ (𝑦 − 𝛼 − 𝛽𝑥)1 = 0
⇒ ∑(𝑦 − 𝛼 − 𝛽𝑥) = 0
⇒ ∑𝑦 − ∑𝛼 − ∑𝛽𝑥 = 0
⇒ ∑𝑦 = ∑𝛼 + ∑𝛽𝑥
∑ 𝑦 = 𝑛 𝛼 + 𝛽 ∑ 𝑥 − − − − − − − − − − − (2)
𝜕𝐸
Now consider =0
𝜕𝛽
𝜕
⇒ {∑(𝑦 − 𝛼 − 𝛽𝑥)2 } = 0
𝜕𝛽
⇒ 2 ∑(𝑦 − 𝛼 − 𝛽𝑥) (0 − 𝑥) = 0
⇒ ∑ (𝑦𝑥 − 𝛼𝑥 − 𝛽𝑥 2 ) = 0
⇒ ∑ 𝑦 𝑥 − ∑ 𝛼 𝑥 − ∑ 𝛽 𝑥2 = 0
∑ 𝑦𝑥 = 𝛼 ∑ 𝑥 + 𝛽 ∑ 𝑥 2 − − − − − − − − − (3)
∑ 𝑦 = 𝑛 𝛼 + 𝛽 ∑ 𝑥 − − − − − − − − − − − (2)
∑ 𝑦𝑥 = 𝛼 ∑ 𝑥 + 𝛽 ∑ 𝑥 2 − − − − − − − − − (3)
By solving the above normal equations, we get, the parametric values of 𝛼 𝑎𝑛𝑑 𝛽.
Prob 1. A chemical company wishing to study the effect of extraction time on the efficiency
of an extraction operation obtained the data shown in the following
Extaction 27 45 41 19 3 39 19 49 15 31
time in
minuites (x)
Efficiency(y) 57 64 80 46 62 72 52 77 57 68
Fit a straight line to the given data by the method of least squares .
Sol: Let 𝑦 = α + β𝑥 be the straight line to be fitted for the given data.
The normal equations are
𝑚𝛼 +(∑ 𝑥𝑖 )β=∑ 𝑦𝑖
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖 𝑦𝑖
27 57 729 1539
45 64 2025 2880
41 80 1681 3280
19 46 361 874
3 62 9 186
39 72 1521 2808
19 52 361 988
49 77 2401 3773
15 57 225 855
31 68 961 2108
𝒙 10 12 13 16 17 20 25
𝒚 10 22 24 27 29 33 37
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖 𝑦𝑖
10 10 100 100
12 22 144 264
13 24 169 312
16 27 256 432
17 29 289 493
20 33 400 660
25 37 625 925
2
∑ 𝑥𝑖 =113 , ∑ 𝑦𝑖 =182, ∑ 𝑥𝑖 =1983, ∑ 𝑥𝑖 𝑦𝑖 =3186
Over time 1 1 2 2 3 3 4 5 6 7
hrs(x)
Additional 2 7 7 10 8 12 10 14 11 14
units(y)
Fit a non linear curve of the form 𝒚 = 𝒂 + 𝒃𝒙 + 𝒄𝒙𝟐
y=1.8022+3.4822x-0.2689𝑥 2
Prob 4. Fit a second degree polynomial to the following data by the method of least squares.
x 0 1 2 3 4
y 1 1.8 1.3 2.5 6.3
Prob 5: Find the least square fir of the form 𝒗 = 𝒂𝟎 + 𝒂𝟏 𝒖𝟐 to the following data.
u -1 0 1 2
v 2 5 3 3
(∑ 𝑈) 𝑎 + (∑ 𝑈 2 ) 𝑏=∑ 𝑈𝑣
𝑢 v 𝑈 = 𝑢2 𝑈2 𝑈𝑣
-1 2 1 1 2
0 5 0 0 0
1 3 1 1 3
2 0 4 18 0
Prob 2: By method of least squares obtain the polynomial of second degree which fits the
following data
T 19 25 30 36 40 45 50
R 76 77 79 80 82 83 85
Prob 4: The velocity of a liquid vary with temperature according to quadratic law
T 1 2 3 4 5 6 7
V 2.31 2.01 3.80 1.66 1.55 1.47 1.41
𝑉 = 2.593 − 0.326𝑇 + 0.23𝑻𝟐
Prob 5: Apply the method of least squares, construct the normal equations of second degree
𝒙 0 1 2 3 4
𝒚 1 1.8 1.3 2.5 6.3
Model - 1
Curve fitting is the way we model or represent a data spread by assigning a best fit
function (curve) along the entire range. Ideally, it will capture the trend in the data
and allow us to make predictions of how the data series will behave in the future.
is 𝑦 = 𝛼𝑒 𝛽𝑥 ------- (1)
𝑦 − 𝛼𝑒 𝛽𝑥 = 𝑒, error
Which is not linear, can be converted into linear by taking logarithms of both sides.
𝑦 = 𝛼𝑒 𝛽𝑥 ------- (1)
∑ 𝑌𝑥 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 − − − − − − − − − (3)
𝒚 10 15 12 15 21
𝑙𝑛 𝑦 = 𝑙𝑛 𝛼 + 𝛽𝑥 . 𝑙𝑛 𝑒
𝑙𝑛 𝑦 = 𝑙𝑛 𝛼 + 𝛽𝑥
Put 𝑙𝑛 𝑦 = 𝑌 𝑎𝑛𝑑 𝑙𝑛 𝛼 = 𝐴 ,𝛽 = 𝐵, 𝑥 = 𝑋
∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝐵 ∑ 𝑋 2 − − − − − − − − − (3)
By solving the above normal equations, we get, the parametric values of 𝐴 𝑎𝑛𝑑 𝛽.
The method of procedure is the same as in fitting a straight line . Here number of
pairs 𝑛
𝑥 𝑦 𝑌 = 𝑙𝑜𝑔𝑒 𝑦 𝑋 𝑋2 𝑌𝑋
1 10 2.3026 1 1 2.3026
5 15 2.7081 5 25 13.5405
7 12 2.4849 7 49 17.3943
9 15 2.7081 9 81 24.3729
Normal equations
5𝐴 + 34𝐵 = 13.2482 ,
X 0 1 2 3 4 5 6 7 8
Sol : Let 𝑦 = 𝑎𝑒 𝑏𝑥
𝑥𝑖 𝑦𝑖 𝑋𝑖 = 𝑥𝑖 𝑌𝑖 𝑋𝑖2 𝑋𝑖 𝑌𝑖
0. 20 0 2.996 0 0
1. 30 1 3.401 1 3.401
2 52 2 3.951 4 7.902
3 77 3 4.344 9 13.032
4 135 4 4.905 16 19.62
5 211 5 5.352 25 26.76
6 326 6 5.787 36 34.722
7 550 7 6.310 49 44.17
8 1052 8 6.958 64 55.664
𝒙 2 4 6 8 10
𝒚 4.077 11.084 30.128 81.879 222.62
Sol :- Let 𝑦 = 𝑎𝑒 𝑏𝑥
𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏𝑥
Let 𝑌 = 𝑙𝑛 𝑦 , 𝐴 = 𝑙𝑛 𝑎 , 𝑋 = 𝑥 Then, the relation is 𝑌 = 𝐴 + 𝑏𝑋
The normal equation to fit 𝑦 = 𝐴 + 𝑏𝑋 are
5A + (∑𝑋𝑖 )b = ∑𝑌𝑖
(∑𝑋𝑖 )𝐴 + (∑𝑋𝑖2 )b = ∑𝑋𝑖 𝑌𝑖
𝑥𝑖 𝑦𝑖 𝑋𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
2 4.077 2 1.405 4 2.81
4 11.084 4 2.406 16 9.624
6 30.128 6 3.405 36 20.43
8 81.897 8 4.405 64 35.24
10 222.62 10 5.405 100 54.05
𝛼 𝑎𝑛𝑑 𝛽: 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠(𝑢𝑛𝑘𝑛𝑜𝑤𝑛)
e~ 𝑁(0, 𝜎 2 )
The relation is not linear, can be converted into linear by taking logarithms of
both sides of 𝑦 = 𝑎𝑏 𝑥
we obtain 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥 𝑙𝑛 𝑏
Consider 𝑙𝑛 𝑦 = 𝑌, 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝐵 = 𝑙𝑛 𝑏,
𝑌 = 𝐴 + 𝐵𝑥 − − − − − − − −(2)
“To minimise the sum of squares of errors 𝐸 = ∑(𝑌 − 𝐴 − 𝐵𝑥)2 with respect the
𝜕𝐸 𝜕𝐸
parameters” the necessary conditions are =0 & =0
𝜕𝐴 𝜕𝐵
∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑥 − − − − − − − − − − − (2)
∑ 𝑌𝑥 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 − − − − − − − − − (3)
Prob 4 :The following data were collected to determine the relationship between
pressure (x) and the corresponding scale reading(y) for the purpose of
calibration. Fit curve of the form, 𝒚 = 𝒂𝒃𝒙 for the following data by the method of
least squares.
𝒙 2 3 4 5 6
𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥 𝑙𝑛 𝑏
Consider 𝑙𝑛 𝑦 = 𝑌 , 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝐵 = 𝑙𝑛 𝑏 , 𝑋 = 𝑥
𝑌 = 𝐴 + 𝐵𝑋 − − − − − − − −(2)
∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑋 − − − − − − − − − − − (2)
∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝐵 ∑ 𝑋 2 − − − − − − − − − (3)
By solving the above normal equations, we get, the parametric values of 𝐴 𝑎𝑛𝑑 𝐵.
𝑥 𝑦 𝑌 = 𝑙𝑜𝑔𝑒 𝑦 𝑋=𝑥 𝑋2 𝑌𝑋
∑ 𝑌 = 17.3438 ∑ 𝑋 = 20 ∑ 𝑋 2 = 90 ∑ 𝑌𝑋 = 76.2493
5𝐴 + 20𝐵 = 17.3438 ,
20𝐴 + 90𝐵 = 76.2493
∴ 𝑎 = 𝑒 𝐴 = 2.0526 ,
𝑏 = 𝑒 𝐵 = 1.9886
Prob 5. Fit an exponential curve of the form y=a𝒃𝒙 to the following data
X 1 2 3 4 5 6 7 8
Y 1 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol: The curve to be fitted is 𝑌 = 𝑎𝑏 𝑥
⟹ 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥(𝑙𝑛 𝑏)
Let 𝑌 = 𝑙𝑛 𝑦, 𝐴 = ln 𝑎 , 𝐵 = 𝑙𝑛 𝑏 , 𝑋 = 𝑥
Then the relation is 𝑌 = 𝐴 + 𝐵𝑋
The normal equation to fit 𝑌 = 𝐴 + 𝐵𝑋 are
8𝐴 +(∑ 𝑋𝑖 )𝐵=∑ 𝑌𝑖
(∑ 𝑋𝑖 )𝐴 +(∑ 𝑋𝑖 2)𝐵 = ∑ 𝑋𝑖 𝑌𝑖
𝑥𝑖 𝑦𝑖 𝑋𝑖 =𝑥𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
1 1 1 0 1 0
2 1.2 2 0.182 4 0.364
3 1.8 3 0.588 9 1.764
4 2.5 4 0.916 16 3.664
5 3.6 5 1.281 25 6.405
6 4.7 6 1.548 36 9.288
7 6.6 7 1.887 49 13.209
8 9.1 8 2.208 64 17.664
∑ 𝑋𝑖 = 36, ∑ 𝑌𝑖 = 8.61, ∑ 𝑋𝑖 2 = 204, ∑ 𝑋𝑖 𝑌𝑖 = 52.358
The normal equation are
8𝐴 + 36𝐵 = 8.61
36𝐴 + 204𝐵 = 52.358
By solving these equations we get 𝐴 = −0.382, 𝐵 = 0.324
𝑦 = 𝑎𝑥 𝑏 -------(1)
Which is not linear, converted into linear by taking logarithms of both sides,
we obtain 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥
Consider 𝑙𝑛 𝑦 = 𝑌, 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝑋 = 𝑙𝑛 𝑥,
𝑌 = 𝐴 + 𝑏𝑋 − − − − − − − −(2)
“To minimise the sum of squares of errors with respect the parameters”
𝜕𝐸 𝜕𝐸
=0 & =0
𝜕𝐴 𝜕𝑏
∑ 𝑌 = 𝑛 𝐴 + 𝑏 ∑ 𝑋 − − − − − − − − − − − (2)
∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 − − − − − − − − − (3)
1. The following data on the high performance of radial tires made by a manufacturer are still
usable after having been driven for the given number of miles
X 1 2 5 10 20 30 40 50
Y 98.2 91.7 81.3 64 36.4 32.6 17.1 11.3
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 the following after fitting a power curve to the data
(i) Percentage usable when 𝑥 = 15
(ii) Percentage usable when 𝑥 = 35
Let 𝑦 = 𝑎𝑥 𝑏
By taking logarithms l𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥
𝑌 = ln 𝑦 , 𝐴 = ln 𝑎 , 𝐵 = 𝑏, 𝑋 = ln 𝑥,
𝑌 = 𝐴 + 𝐵𝑋
Now the Normal equations to fit 𝑌 = 𝐴 + 𝐵𝑋 are
5𝐴 +(∑ 𝑋𝑖 )B = ∑ 𝑌𝑖
(∑ 𝑋𝑖 )A+(∑ 𝑋𝑖 2 )B=∑ 𝑋𝑖 𝑌𝑖
𝑥𝑖 𝑦𝑖 𝑋𝑖 =ln 𝑥𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
1 98.2 0 4.587 0 0
2 91.7 0.6931 4.5185 0.4805 3.132
5 81.3 1.6994 4.3981 2.5903 7.0785
10 64 2.3026 4.1589 5.3019 9.5762
20 36.4 2.9957 3.5946 8.9744 10.7684
30 32.6 3.4012 3.4843 11.5681 11.8508
40 17.1 3.6889 2.8391 13.6078 10.473
50 11.3 3.912 2.4248 15.3039 9.4859
Prob 6 .Fit a power curve of the form 𝒚 = 𝒂𝒙𝒃 for the following data by using the method of
least squares.
x 2 3 4 5 6
y 8.3 15.4 33.1 65.2 127.4
𝑏
Sol:Let 𝑦 = 𝑎𝑥
By taking logarithms l𝑜𝑔 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥
𝑌 = ln 𝑦 , 𝐴 = ln 𝑎 , 𝐵 = 𝑏, 𝑋 = ln 𝑥,
𝑌 = 𝐴 + 𝐵𝑋
Now the Normal equations to fit 𝑌 = 𝐴 + 𝐵𝑋 are
Tutorial problems
1 . Obtain the relation of the form y=𝑎𝑏 𝑥 for the following data by the method of least squares.
X 2 3 4 5 6
Y 8.3 15.4 33.1 65.2 127.4
2 . Fit the curve of the form y=𝑎𝑥 𝑏 for the following data
3. Fit the curve of the form y=𝑎𝑏 𝑥 for the following data
X 0 1 2 3 4 5 6 7
Y 10 21 35 59 92 100 400 610
3. A study was made on the amount of sugar converted in a certain process at various
temperatures. The data were coded and recorded as follows. Apply the method of least
Temperature(x) 50 65 85
Converted 20 40 60
sugar(y)
4. Fit the exponential curve of the form 𝑦 = 𝑎𝑒 𝑏𝑥 for the following data
𝑥 1 5 7 9 12
𝑦 10 15 12 15 21
Correlation:
Introduction:
Correlation:
If a change in one variable causes change in the other then the two variables
are said to be correlated. Correlation is a statistical analysis which measures and
analyses the degree to which the two variables fluctuate with reference to each
other. The Correlation refers the closeness of relationship between the variables.
The measure of closeness of the relation between the variables is called as
correlation coefficient. It is denoted by ‘𝑟 ′ .
A high correlation means that two or more variables have a strong
relationship with each other, while a weak correlation means that the
variables are hardly related. In other words, it is the process of studying
the strength of that relationship with available statistical data.
Correlations are useful because if you can find out what relationship
variables have, you can make predictions about future behaviour.
Knowing what the future holds is very important in the social sciences like
government and healthcare. Businesses also use these statistics for
budgets and business plans.
Types of Correlation
Y 18 33 48 63
Covariance of X, Y:
The covariance of X, Y is denoted by COV(X,Y) and defined as
COV(X, Y)= E[(X − μX )(Y − μY )]
= E[XY − μX Y − μY X + μX μY ]
= E[XY] − μX E[Y] − μY E[X] + μX μY
= E[XY] − E[X]E[Y] − E[Y]E[X] + E[Y]E[X]
= E[XY] − E[X]E[Y]
Note: If X, Y are independent, then E[XY] = E[X]E[Y] and so COV(X, Y) = 0
Karl Pearson Correlation Coefficient
It measures the magnitude of linear relationship between the variables . It is
COV(X,Y)
denoted by ‘𝑟’ and defined as r =
σX σY
E[(X−μX )(Y−μY )]
=
σX σY
∑ ai 2 ∑ bi 2 ∑ 2ai bi
= + +
σX 2 σY 2 σX σY
∑(xi −μX )2 ∑ ai 2
We know that σX 2 = E[(X − μX )2 ] = =
n n
∑ ai 2
∴ =n
σX 2
2
∑(yi − μY )2 ∑ bi 2
σy = E[(Y − μY )2 ] = =
n n
∑ bi 2
∴ =n
σY 2
∑ ai bi
∴ = rn
σX σY
ai bi 2 ∑ ai 2 ∑ bi 2 ∑ 2ai bi
∑( + ) = 2
+ 2
+ = n + n + 2rn = 2n(1 + r)
σX σY σX σY σX σY
ai bi 2
‘ Since ∑ ( + ) ≥ 0, 2n(1 + r) ≥ 0
σX σY
⟹ (1 + r) ≥ 0
⟹ r ≥ −1 … (1)
ai bi 2 ∑ ai 2 ∑ bi 2 ∑ 2ai bi
∑( − ) = 2
+ 2
- = n + n − 2rn = 2n(1 − r)
σX σY σX σY σX σY
ai bi 2
‘ Since ∑ ( − ) ≥ 0, 2n(1 − r) ≥ 0
σX σY
⟹ (1 − r) ≥ 0
⟹ 1 ≥ r….(2)
From (1)&(2) −1 ≤ r ≤ 1.
𝐶𝑂𝑉(𝑋,𝑌)
Sol: The correlation coefficient is 𝑟 =
𝜎𝑋 𝜎𝑌
∑ ai bi
=
√∑ ai 2 √∑ bi 2
∑ yi 98+99+⋯+91 950
μY = = = = 95
n 10 10
xi yi a i = x i − μ𝑋 bi = yi −μ𝑌 ai 2 bi 2 a i bi
100 98 1 3 1 9 3
101 99 2 4 4 16 8
102 99 3 4 9 16 12
102 97 3 2 9 4 6
100 95 1 0 1 0 0
99 92 0 -3 0 9 0
97 95 -2 0 4 0 0
98 94 -1 -1 1 1 1
96 90 -3 -5 9 25 15
95 91 -4 -4 16 16 16
Therefore r = 0.8472
The given two variables are positively correlated.
Q2. Find the Karl Pearson’s coefficient of correlation to the following data.
Fertiliser 15 18 20 24 30 35 40 50
Used
Productivity 85 93 95 105 120 130 150 160
∑ai bi
r=
√∑a2i √∑b2i
2317
=
√1022√5343.48
2317
= (31.968)(73.099)
∴ r = 0.9915
∴ the given two variables are positively correlated.
Q3.From the following data , show that the coefficient of correlation between 𝑿
and Y is 0.89.
X series Y series
No. of items 15 15
Arithmatic mean 25 18
Sum of squares of deviations from mean 136 138
122
=
√136√138
122
= (11.6619)(11.7473)
∴ r = 0.89
Q4. Apply the method of Karl Pearson’s correlation to calculate the correlation
coefficient from the following results:
𝑛 = 10 , ∑𝑥 = 100, ∑𝑦 = 150 , ∑(𝑥 − 10)2 = 180, ∑(𝑦 − 15)2 = 215,
∑(𝑥 − 10)(𝑦 − 15) = 60.
Sol :
∑xi 100
µX = = = 10
n 10
∑yi 150
µy = = = 15
n 10
60
= = 0.3049
√180√215
𝐶𝑂𝑉(𝑋,𝑌)
Sol: The correlation coefficient is 𝑟 =
𝜎𝑋 𝜎𝑌
∑ ai bi
=
√∑ ai 2 √∑ bi 2
∑ yi 28+34+⋯+36 314
μY = = = = 31.4 . Take assumed mean B=31
n 10 10
xi yi ai = xi − 𝐴 bi = y i − 𝐵 ai 2 bi 2 a i bi
38 28 -1 -3 1 9 3
45 34 6 3 36 9 18
46 38 7 7 49 49 49
38 34 -1 3 1 9 -3
35 36 -4 5 16 25 -20
38 26 -1 -5 1 25 5
46 28 7 -3 49 9 -21
32 29 -7 -2 49 4 14
36 25 -3 -6 9 36 18
38 36 -1 5 1 25 -5
Clearly, ∑ ai 2 = 212, ∑ bi 2 = 200, ∑ ai bi =58
n ∑ ai bi −(∑ ai )(∑ bi ) 10x58−(2)(4)
Then r = = = 0.2791
√10x212−4√10x200−16
√𝑛 ∑ ai 2 −(∑ ai )2 √∑ bi 2 −(∑ bi )2
Therefore r = 0.2791
Practice problems:
1. Psychological tests of intelligence and engineering ability were applied to 10
students. Here is a record of ungrouped data showing intelligence ratio (I.R) and
engineeering ratio (E.R) . calculate the coefficient of correlation.
I.R 105 104 102 101 100 99 98 96 93 92
E.R 101 103 100 98 95 96 104 92 97 94
ANS : 0.59
2. Find the coefficient of correlation between X and Y for the following data.
X 10 12 18 24 23 27
Y 13 18 12 25 30 10
Ans : 0.25553
Height in inches 57 59 62 63 64 65 55 58 57
Weight in Ibs 113 117 126 126 130 129 111 116 112
Ans:0.98
Regression
In statistical modelling, regression analysis is a set of statistical processes
for estimating the relationships between a dependent variable (often called the
'outcome variable') and one or more independent variables (often called
'predictors', 'covariates',
It can be utilized to assess the strength of the relationship between variables and
for modelling the future relationship between them.
Scatter diagram:
700
600
500
converted sugar (y)
400
300
200
100
0
0 10 20 30 40 50 60 70 80
Temp (x)
The study of correlation measures the direction and the strength of the
relationship between the variables. Regression is a statistical method which helps
us to estimate the unknown value of one variable with the known value of related
variable is called Regression. The line described in the average relationship is
called the line of Regression.
Comparison between Correlation and Regression:
The correlation measures the degree of variability between the variables, while
regression establishes a functional relationship between the dependent and
independent variables.
Regression equation:
The functional form of Regression equation is 𝑌 = 𝑎 + 𝑏𝑋, where 𝑎 is the
𝑌 intercept. 𝑏 is the slope of the regression line and called as Regression
coefficient of 𝑌 on 𝑋.it is also denoted by 𝑏𝑌𝑋
The values of 𝑎 and 𝑏 can be obtained from the normal equations
𝑚𝑎 + (∑ 𝑥𝑖 )𝑏 = ∑ 𝑦𝑖
(∑ 𝑥𝑖 )𝑎 + (∑ 𝑥𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
The regression linen 𝑋 𝑜𝑛 𝑌 is given by 𝑋 = 𝑎 + 𝑏 𝑌, where 𝑎 is the 𝑋 intercept. 𝑏
is the slope of the regression line and called as Regression coefficient of 𝑋 on 𝑌. it
is also denoted by 𝑏𝑋𝑌
The values of 𝑎 and 𝑏 can be obtained from the normal equations
𝑚𝑎 + (∑ 𝑦𝑖 )𝑏 = ∑ 𝑥𝑖
(∑ 𝑦𝑖 )𝑎 + (∑ 𝑦𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
Regression lines by mean deviations
𝜎𝑌
The Regression line of 𝑌 on 𝑋 is 𝑌 − 𝜇𝑌 = 𝑟 (𝑋 − 𝜇𝑋 )
𝜎𝑋
{ 𝑚𝑎 + (∑ 𝑥𝑖 )𝑏 = ∑ 𝑦𝑖 ⟹ 𝜇𝑌 = 𝑎 + 𝜇𝑋 𝑏
(∑ 𝑥𝑖 )𝑎 + (∑ 𝑥𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖 ⟹ (∑ 𝑥𝑖 −𝜇𝑥 )𝑎 + (∑(𝑥𝑖 − 𝜇𝑋 )2 )𝑏 =
∑(𝑥𝑖− 𝜇𝑋 )( 𝑦𝑖 − 𝜇𝑌 )
⟹ (∑(𝑥𝑖 − 𝜇𝑥 )2 ) 𝑏 = ∑(𝑥𝑖− 𝜇𝑋 )( 𝑦𝑖 − 𝜇𝑌 )
𝑟𝜎𝑌
Where 𝜇𝑋 , 𝜇𝑌 are means of 𝑋, 𝑌 respectively and is called the regression
𝜎𝑋
𝑟𝜎𝑌
coefficient of 𝑌 on 𝑋 and it is denoted by 𝑏𝑌𝑋 i.e 𝑏𝑌𝑋 =
𝜎𝑋
𝑟𝜎𝑋
The Regression line of 𝑋 on 𝑌 is 𝑋 − 𝜇𝑥 = (𝑌 − 𝜇𝑌 )
𝜎𝑌
𝑟𝜎𝑋
Where is called the regression coefficient of 𝑋 on 𝑌 and it is denoted by
𝜎𝑌
𝑟𝜎𝑋
𝑏𝑋𝑌 i.e 𝑏𝑋𝑌 =
𝜎𝑌
𝑟𝜎𝑌 𝑟𝜎𝑋
Consider 𝑏𝑌𝑋 . 𝑏𝑋𝑌 = . = 𝑟2
𝜎𝑋 𝜎𝑌
Thus b YX =
a b i i
a i
2
Similarly bXY =
a i bi
b i
2
∑ 𝑎𝑖 𝑏 𝑖
The regression line of Y on X is 𝒀 − 𝝁𝒀 = 𝒃𝒀𝑿 (𝑿 − 𝝁𝑿 ) where 𝑏𝑌𝑋 = ∑ 𝑎𝑖 2
∑ 𝑎𝑖 𝑏 𝑖
The regression line of Y on X is 𝑿 − 𝝁𝑿 = 𝒃𝑿𝒀 (𝒀 − 𝝁𝒀 ) where 𝑏𝑋𝑌 = ∑ 𝑏𝑖 2
Note: Clearly, we observe that the both regression lines satisfied by the point
(𝜇𝑋 , 𝜇𝑌 )
So, the point (𝜇𝑋 , 𝜇𝑌 ) is the intersection point of the regression lines.
Problems:
1. For a sample of 200 points of observations , the following quantities were
calculated
∑ 𝒙𝒊 = 𝟏𝟏. 𝟑𝟒, ∑ 𝒚𝒊 = 𝟐𝟎. 𝟕𝟖,∑ 𝒙𝒊 𝟐 = 𝟏𝟐. 𝟏𝟔, ∑ 𝒚𝒊 𝟐 = 𝟖𝟒. 𝟗𝟔,
∑ 𝒙𝒊 𝒚𝒊 = 𝟐𝟐. 𝟏𝟑
From the above data prepare the two regression lines.
𝒀 67 68 65 68 72 72 69 71
Sol: The regression line of Y on X is 𝑌 − 𝜇𝑌 = 𝑏𝑌𝑋 (𝑋 − 𝜇𝑋 ) where bYX =
a i bi
a i
2
b i
2
∑ 𝑥𝑖 544 ∑ 𝑦𝑖 552
𝜇𝑋 = = = 68, 𝜇𝑌 = = = 69
8 8 8 8
𝑥𝑖 𝑦𝑖 𝑎𝑖 𝑏𝑖 𝑎𝑖 2 𝑏𝑖 2 𝑎𝑖 𝑏𝑖
= 𝑥𝑖 − 𝜇𝑋 = 𝑦𝑖 − 𝜇𝑌
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
a =36, b = 44, a b = 24
i
2
i
2
i i
b YX =
a bi i
=
24 2
= = 0.6667 , b XY =
a b i i
=
24 6
= = 0.5455
a i
2
36 3 b i
2
44 11
∑ 𝑋𝑌 95 19
𝑏𝑥𝑦 = = =
∑ 𝑌 2 140 28
(i) Line of Regression of 𝑦 on 𝑥 is 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
1
i.e 𝑦 − 56 = (𝑥 − 50)
2
⟹ 𝑦 = 0.5𝑥 + 31
(ii) Line of Regression of 𝑥 on 𝑦 is 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
19
i.e 𝑥 − 50 = (𝑦 − 56)
28
⟹ 𝑥 = 0.6786𝑦 + 12
Practice problems:
1. Find the regression line of y on x from the data and so find y (20)
X 10 12 13 12 16 15
𝒚 40 38 43 45 37 43
5. In a partially destroyed laboratory data, only the equations giving the two
lines of regression are available are 7x-16y+9=0,5y-4x-3=0.Calculate the
coefficient of correlation and means 𝑥̅ , 𝑦̅.
Standard error: standard error is the square root mean square deviations
of the points in the data from the regression line of Y on X.
𝜎𝑌
Regression line 𝑌 = 𝜇𝑌 + 𝑟 (𝑋 − 𝜇𝑋 )
𝜎𝑋
2
𝜎𝑌
Sum of square of the deviations 𝑆 = ∑ ((𝑌 − 𝜇𝑌 ) − 𝑟 (𝑋 − 𝜇𝑋 ))
𝜎𝑋
𝜎𝑌 2 𝜎
= 𝑛𝜎𝑌 2 + (𝑟 ) 𝑛𝜎𝑋 2 − 2𝑟 𝜎𝑌 . 𝑛. 𝜎𝑋 𝜎𝑌
𝜎𝑋 𝑋
= 𝑛𝜎𝑌 2 (1 − 𝑟 2 )
Standard error 𝑆𝑦 = 𝜎𝑌 √1 − 𝑟 2