0% found this document useful (0 votes)
14 views

P&S unit 2

The document discusses curve fitting, correlation analysis, and regression analysis, focusing on linear and non-linear models. It includes a case study involving a drug organization's advertising expenditures and sales data, demonstrating the use of the least squares method to determine the relationship between these variables. Additionally, it provides examples of fitting straight lines to data using normal equations to derive parameters.

Uploaded by

bojjatrilok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

P&S unit 2

The document discusses curve fitting, correlation analysis, and regression analysis, focusing on linear and non-linear models. It includes a case study involving a drug organization's advertising expenditures and sales data, demonstrating the use of the least squares method to determine the relationship between these variables. Additionally, it provides examples of fitting straight lines to data using normal equations to derive parameters.

Uploaded by

bojjatrilok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CLUSTER 2

UNIT - 2

1 CURVE FITTING

Introduction

Fitting of linear curve

𝑦 = 𝛼 + 𝛽𝑥

Fitting of non- linear curves

𝑦 = 𝛼 + 𝛽𝑥 + 𝛾𝑥 2 (𝑝𝑎𝑟𝑎𝑏𝑜𝑙𝑎)

𝑦 = 𝛼𝑒 𝛽𝑥 (𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙)

𝑦 = 𝛼𝑥 𝛽 (𝑝𝑜𝑤𝑒𝑟 𝑐𝑢𝑟𝑣𝑒𝑙)

𝑦 = 𝛼𝛽 𝑥 (𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙)

2 CORRELATION ANALYSIS

3 REGRESSION ANALYSIS

Lines of regression

Regression co-efficients
Case study:

A study was made by a drug organization between advertising expenditures


and sales of certain drug effects on human organs. They wish to determine the
relation between weekly advertising expenditures and sales of the product in a year.
The company recorded the data between advertising cost and sales and shown below

Adv Cost 15 20 25 30 35 40 45 50 55 60 65 70
(000's)
Sales (000's) 198 262 346 395 432 444 482 498 527 541 579 595

When the company spends 1 lac as advertising cost, what about the sales?

Sol: This is, Problem of prediction (forecasting or estimation)

Let 𝑥: 𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑖𝑛𝑔 𝑐𝑜𝑠𝑡, 𝑦: 𝑠𝑎𝑙𝑒𝑠

The functional relation:𝑠𝑎𝑙𝑒𝑠 = 𝑓(𝑎𝑑𝑣. 𝑐𝑜𝑠𝑡)


𝑦 = 𝑓(𝑥)

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑖𝑛𝑔 𝑐𝑜𝑠𝑡) , 𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝑠𝑎𝑙𝑒𝑠)

Objective:

Determine the unknown parameters which are involved in the functional relation,

Tools

1 Least squares method

2 Calculus (derivatives)

3 Solution of linear equations

PROCEDURE:
Curve fitting is the process of constructing a curve, or mathematical
function, that has the best fit to a series of data points, possibly subject to
constraints.

Curve fitting is the way we model or represent a data spread by assigning a


‘best fit‘ function (curve) along the entire range. Ideally, it will capture the trend in
the data and allow us to make predictions of how the data series will behave in the
future.

Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3. … … , 𝑛 be the set of 𝑛 pairs of observations of a

bivariate(𝑋 , 𝑌).

Let us assume the functional relationship is 𝑠𝑎𝑙𝑒𝑠 = 𝑓(𝑎𝑑𝑣. 𝑐𝑜𝑠𝑡)


𝑦 = 𝑓(𝑥)

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑖𝑛𝑔 𝑐𝑜𝑠𝑡), 𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝑠𝑎𝑙𝑒𝑠)

Plot a scatter diagram

700
600
500
400
300
200
100
0

The functional relation between adv. Cost and sales is assumed to be linear

𝑦 = 𝛼 + 𝛽𝑥 + 𝑒

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑖𝑛𝑔 𝑐𝑜𝑠𝑡), 𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 (𝑠𝑎𝑙𝑒𝑠)


𝛼 𝑎𝑛𝑑 𝛽: 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠(𝑢𝑛𝑘𝑛𝑜𝑤𝑛)

𝑒: 𝑒𝑟𝑟𝑜𝑟/𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑑𝑎𝑡𝑎),

e follows normal dist.

700

600

500
sales (y)

400

300

200

100

0
0 10 20 30 40 50 60 70 80
Adv. cost (x)

𝑦 = 𝛼 + 𝛽𝑥 + 𝑒 − − − − − − − −(1)

𝑦 − (𝛼 + 𝛽𝑥) = 𝑒, 𝑒𝑟𝑟𝑜𝑟

𝐸𝑟𝑟𝑜𝑟, 𝑒 = 𝑦 − 𝛼 − 𝛽𝑥

𝑆𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟, 𝑒 2 = (𝑦 − 𝛼 − 𝛽𝑥)2


𝑛 𝑛

𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟𝑠 ∑ 𝑒 2 = ∑ (𝑦 − 𝛼 − 𝛽𝑥)2


𝑖=1 𝑖=1

Let 𝐸 = ∑ 𝑒 2 = ∑(𝑦 − 𝛼 − 𝛽𝑥)2

According to least squares method(LSM),

“To minimise the sum of squares of errors with respect the parameters” the
𝜕𝐸 𝜕𝐸
necessary conditions =0 & =0
𝜕𝛼 𝜕𝛽
𝜕𝐸 𝜕
=0 ⇒ {∑(𝑦 − 𝛼 − 𝛽𝑥)2 } = 0
𝜕𝛼 𝜕𝛼

⇒ 2 ∑(𝑦 − 𝛼 − 𝛽𝑥)2−1 . (0 − 1 − 0) = 0

⇒ ∑ (𝑦 − 𝛼 − 𝛽𝑥)1 = 0

⇒ ∑(𝑦 − 𝛼 − 𝛽𝑥) = 0

⇒ ∑𝑦 − ∑𝛼 − ∑𝛽𝑥 = 0

⇒ ∑𝑦 = ∑𝛼 + ∑𝛽𝑥

∑ 𝑦 = 𝑛 𝛼 + 𝛽 ∑ 𝑥 − − − − − − − − − − − (2)
𝜕𝐸
Now consider =0
𝜕𝛽

𝜕
⇒ {∑(𝑦 − 𝛼 − 𝛽𝑥)2 } = 0
𝜕𝛽

⇒ 2 ∑(𝑦 − 𝛼 − 𝛽𝑥) (0 − 𝑥) = 0

⇒ ∑ (𝑦𝑥 − 𝛼𝑥 − 𝛽𝑥 2 ) = 0
⇒ ∑ 𝑦 𝑥 − ∑ 𝛼 𝑥 − ∑ 𝛽 𝑥2 = 0

∑ 𝑦𝑥 = 𝛼 ∑ 𝑥 + 𝛽 ∑ 𝑥 2 − − − − − − − − − (3)

The equations (2) and (3) are known as normal equations

∑ 𝑦 = 𝑛 𝛼 + 𝛽 ∑ 𝑥 − − − − − − − − − − − (2)

∑ 𝑦𝑥 = 𝛼 ∑ 𝑥 + 𝛽 ∑ 𝑥 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parametric values of 𝛼 𝑎𝑛𝑑 𝛽.

Prob 1. A chemical company wishing to study the effect of extraction time on the efficiency
of an extraction operation obtained the data shown in the following

Extaction 27 45 41 19 3 39 19 49 15 31
time in
minuites (x)
Efficiency(y) 57 64 80 46 62 72 52 77 57 68

Fit a straight line to the given data by the method of least squares .
Sol: Let 𝑦 = α + β𝑥 be the straight line to be fitted for the given data.
The normal equations are
𝑚𝛼 +(∑ 𝑥𝑖 )β=∑ 𝑦𝑖

(∑ 𝑥𝑖 )α++(∑ 𝑥𝑖2 )β=∑ 𝑥𝑖 𝑦𝑖


Given that the number of data points 𝑚 = 10
Now consider the table

𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖 𝑦𝑖
27 57 729 1539
45 64 2025 2880
41 80 1681 3280
19 46 361 874
3 62 9 186
39 72 1521 2808
19 52 361 988
49 77 2401 3773
15 57 225 855
31 68 961 2108

∑ 𝑥𝑖 = 288, ∑ 𝑦𝑖 =635 , ∑ 𝑥𝑖2 =10274, ∑ 𝑥𝑖 𝑦𝑖 =19291

The normal equations are


10.α + 288.β=635
288.α+10274.β=19291
By solving, we get α = 48.9, β=0.506
The Best fit Straight line is 𝒚 = 𝟒𝟖. 𝟗 + (𝟎. 𝟓𝟎𝟔)𝒙
Prob 2 The amount of money spent on research and development (x) by a large corporation
is believed to have an effect on their gross sales(y).For a the past 7 years the following data
have been recorded .Fit a linear equation of the form 𝒚 = 𝒂 + 𝒃𝒙 to the given data.

𝒙 10 12 13 16 17 20 25
𝒚 10 22 24 27 29 33 37

Let 𝑦 = 𝑎 + 𝑏𝑥 be the straight line which is to be fitted for given data


The normal equations are
𝑚𝑎 +(∑ 𝑥𝑖 ) 𝑏=∑ 𝑦𝑖

(∑ 𝑥𝑖 ) 𝑎 +(∑ 𝑥𝑖2 ) 𝑏=∑ 𝑥𝑖 𝑦𝑖


Given that 𝑚 = 7 (No. of data points)
Consider the Table

𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖 𝑦𝑖
10 10 100 100
12 22 144 264
13 24 169 312
16 27 256 432
17 29 289 493
20 33 400 660
25 37 625 925
2
∑ 𝑥𝑖 =113 , ∑ 𝑦𝑖 =182, ∑ 𝑥𝑖 =1983, ∑ 𝑥𝑖 𝑦𝑖 =3186

So the normal equations are


7𝑎 + 113𝑏 = 182
113𝑎 + 1983𝑏 = 3186
By solving these normal equations
𝑎 = 0.7985 , 𝑏 = 1.5611
The best fitted straight line is 𝑦 = 0.7985 + (1.5611)𝑥
(b) Fitting of second degree polynomial (parabola)
Let 𝑦 = 𝛼 + 𝛽𝑥 + 𝛾𝑥 2 is a parabola which is to be fitted for given data (xi,yi) for i=1,2,3,…,m
The normal equations are

𝑚𝛼 +(∑ 𝑥𝑖 )𝛽+( ∑ 𝑥𝑖2 )𝛾 =∑ 𝑦𝑖

(∑ 𝑥𝑖 )𝛼 +( ∑ 𝑥𝑖2 ) 𝛽+(∑ 𝑥𝑖3 )𝛾=∑ 𝑥𝑖 𝑦𝑖

( ∑ 𝑥𝑖2 ) 𝛼 +(∑ 𝑥𝑖3 )𝛽 +(∑ 𝑥𝑖4 )𝛾=∑ 𝑥𝑖2 𝑦𝑖


Prob 3 For 10 selected observations the following data were recorded.

Over time 1 1 2 2 3 3 4 5 6 7
hrs(x)
Additional 2 7 7 10 8 12 10 14 11 14
units(y)
Fit a non linear curve of the form 𝒚 = 𝒂 + 𝒃𝒙 + 𝒄𝒙𝟐

Let 𝒚 = 𝒂 + 𝒃𝒙 + 𝒄𝒙𝟐 be the parabola which is to be fitted for given data


The normal equations are

10𝑎 +(∑ 𝑥𝑖 )𝑏 +( ∑ 𝑥𝑖2 ) 𝑐 = ∑ 𝑦𝑖

(∑ 𝑥𝑖 )𝑎 +(∑ 𝑥𝑖2 )𝑏 +( ∑ 𝑥𝑖3 ) 𝑐 = ∑ 𝑥𝑖 𝑦𝑖

(∑ 𝑥𝑖2 )𝑎 +(∑ 𝑥𝑖3 )𝑏 +( ∑ 𝑥𝑖4 ) 𝑐 = ∑ 𝑥𝑖2 𝑦𝑖


Now consider the table

𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 𝑥𝑖 𝑦𝑖 𝑥𝑖2 .𝑦𝑖


1 2 1 1 1 2 2
1 7 1 1 2 7 7
2 7 4 8 16 14 28
2 10 4 8 16 20 40
3 8 9 27 81 24 72
3 12 9 27 81 36 108
4 10 16 64 256 40 160
5 14 25 125 625 70 350
6 11 36 216 1296 66 396
7 14 49 343 2401 98 686
2 3 4 2
∑ 𝑥𝑖 =34 , ∑ 𝑦𝑖 =95, ∑ 𝑥𝑖 =154,∑ 𝑥𝑖 = 820, ∑ 𝑥𝑖 = 4774, ∑ 𝑥𝑖 𝑦𝑖 =377, ∑ 𝑥𝑖 𝑦𝑖 =1849

Now the normal equations are


10𝑎 + 34 𝑏 + 154𝑐 = 95
34𝑎 + 154𝑏 + 820𝑐 = 377
154𝑎 + 820𝑏 + 4774𝑐 = 1849
By solving we get 𝑎 = 1.8022, 𝑏 = 3.4822, 𝑐 = −0.2689
The best fit parabola is

y=1.8022+3.4822x-0.2689𝑥 2
Prob 4. Fit a second degree polynomial to the following data by the method of least squares.

x 0 1 2 3 4
y 1 1.8 1.3 2.5 6.3

Sol : Let 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎1 𝑥 2 be the polynomial which is to be fitted for given data.


The normal equation are

5𝑎0 + (∑ 𝑥𝑖 )𝑎1 + (∑𝑥𝑖2 )𝑎2 = ∑𝑦𝑖

(∑𝑥𝑖 )𝑎0 + (∑𝑥𝑖2 )𝑎1 + (∑𝑥𝑖3 )𝑎2 = ∑𝑦𝑖 𝑥𝑖

(∑𝑥𝑖2 )𝑎0 + (∑𝑥𝑖3 )𝑎1 + (∑𝑥𝑖4 )𝑎2 = ∑𝑦𝑖 𝑥𝑖2


Now consider the table

𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑦𝑖


0. 1 0 0 0 0 0
1. 1.8 1 1 1 1.8 1.8
2 1.3 4 8 16 2.6 5.2
3 2.5 9 27 81 7.5 22.5
4 6.3 16 64 256 25.2 100.8
clearly
𝛴𝑥𝑖 = 10, 𝛴𝑦𝑖 = 12.9 , 𝛴𝑥𝑖2 = 30 , 𝛴𝑥𝑖3 = 100

∑𝑥𝑖 𝑦𝑖 = 37.1 , ∑𝑥𝑖2 𝑦𝑖 = 130.3


The normal equations are
5𝑎0 + 10𝑎1 + 30𝑎2 = 12.9
10𝑎0 + 30𝑎1 + 100𝑎2 = 37.1
30𝑎0 + 100𝑎1 + 354𝑎2 = 130.3
By solving these normal equations for 𝑎0 , 𝑎1 , 𝑎2 we get
𝑎0 = 1.42, 𝑎1 = −1.70 , 𝑎2 = 0.55
The beat fit second degree polynomial is
Y= 1.42 – 1.07 x + 0.55 𝑥 2

Prob 5: Find the least square fir of the form 𝒗 = 𝒂𝟎 + 𝒂𝟏 𝒖𝟐 to the following data.
u -1 0 1 2
v 2 5 3 3

Sol: Given data U -1 0 1 2


V 2 5 3 3

The curve to the data is 𝑣 = 𝑎0 + 𝑎1 𝑢2.


Put 𝑢2 = 𝑈 , then the equation is 𝑣 = 𝑎0 + 𝑎1 𝑈
The corresponding normal equations are
4𝑎 + (∑ 𝑈) 𝑏=∑ 𝑣

(∑ 𝑈) 𝑎 + (∑ 𝑈 2 ) 𝑏=∑ 𝑈𝑣

𝑢 v 𝑈 = 𝑢2 𝑈2 𝑈𝑣
-1 2 1 1 2
0 5 0 0 0
1 3 1 1 3
2 0 4 18 0

∑ 𝑣=10 , ∑ 𝑈=6, ∑ 𝑈 2=18, ∑ 𝑈𝑣= 5

So the normal equations are


4𝑎 + 6𝑏 = 10
6𝑎 + 18𝑏 = 5

By solving these normal equations


𝑎 = 4.167, 𝑏 = −1.111
Hence the curve of best fit is 𝑣 = 4.167 − 1.111𝑈 i.e 𝑣 = 4.167 − 1.111𝑢2

6. The production in thousand tons of a sugar factory is given as follows


Year(x) 2002 2004 2006 2008 2010
17 21 36 42 55
Production(y)

Fit a straight line of the form 𝒚 = 𝒂𝟎 + 𝒂𝟏 𝒙


Sol: Let 𝒖 = 𝒙 − 𝟐𝟎𝟎𝟔, then the 𝒚 = 𝒂𝟎 + 𝒂𝟏 𝒙 becomes 𝒚 = 𝒂 + 𝒃𝒖 for some 𝒂, 𝒃
𝑥𝑖 𝑢𝑖 = 𝑥𝑖 − 2006 𝑦𝑖 𝑢𝑖2 𝑢𝑖 𝑦𝑖
2002 -4 17 16 -68
2004 -2 21 4 -42
2006 0 36 0 0
2008 2 42 4 84
2010 4 55 16 220
The normal equations are
𝑛𝒂 + (∑ 𝑢𝑖 )𝑏 = ∑ 𝑦𝑖
(∑ 𝑢𝑖 )𝑎 + (∑ 𝑢𝑖2 )𝑏 = ∑ 𝑢𝑖 𝑦𝑖
i.e 5𝑎 + (0 )𝑏 = 171
(0)a +(40 )𝑏 = 194
By solving these equations a= 34.2, b = 4.85
Thus 𝒚 = 𝟑𝟒. 𝟐 + 𝟒. 𝟖𝟓𝒖 i.e 𝒚 = 𝟑𝟒. 𝟐 + 𝟒. 𝟖𝟓(𝒙 − 𝟐𝟎𝟎𝟔)
Thus best least square fit is 𝒚 = 𝟒. 𝟗𝟓𝒙 − 𝟗𝟔𝟗𝟓. 𝟓
Tutorial:
Prob 1: The weight S of the potassium bromide which will dissolve in 100 gms of water at
temperature T is given in the following table. Show tat the least square line 𝑺 = 𝒎𝑻 + 𝒃 is
𝑆 = (0.52)𝑇 + 54.2.
T 0 1 2 3 4
Y 1 1.8 1.3 2.5 6.3

Prob 2: By method of least squares obtain the polynomial of second degree which fits the
following data

X 1 1.5 2 2.5 3 3.5 4


Y 1.1 1.3 1.6 2.010 2.7 3.4 4.1
Ans:1.04 − 0.918𝑥 + 0,244𝑥 2
Prob 3: The results of measurement of electric resistance of electric resistance R of a copper
bar at various temperature t0C are listed below. Find best possible curve of the form 𝑹 = 𝒂 +
𝒃𝒕 to fit the data,

T 19 25 30 36 40 45 50
R 76 77 79 80 82 83 85

Prob 4: The velocity of a liquid vary with temperature according to quadratic law

𝑽 = 𝒂 + 𝒃𝑻 + 𝒄𝑻𝟐 .Find the best values of 𝒂, 𝒃, 𝒄 for the following table

T 1 2 3 4 5 6 7
V 2.31 2.01 3.80 1.66 1.55 1.47 1.41
𝑉 = 2.593 − 0.326𝑇 + 0.23𝑻𝟐
Prob 5: Apply the method of least squares, construct the normal equations of second degree

curve of the form 𝒚 = 𝒂 + 𝒃𝒙 + 𝒄𝒙𝟐 for the following data

𝒙 0 1 2 3 4
𝒚 1 1.8 1.3 2.5 6.3

FITTING OF EXPONENTIAL CURVE

Model - 1

Curve fitting is the way we model or represent a data spread by assigning a best fit
function (curve) along the entire range. Ideally, it will capture the trend in the data
and allow us to make predictions of how the data series will behave in the future.

Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3. … … , 𝑛 be the set of 𝑛 pairs of observations of a


bivariate(𝑋 , 𝑌).

Let us assume the functional relationship between is an exponential curve of the


form 𝑦 = 𝛼𝑒 𝛽𝑥

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 , 𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 , 𝛼 𝑎𝑛𝑑 𝛽: 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠


The functional relationship between is an exponential curve of the form

is 𝑦 = 𝛼𝑒 𝛽𝑥 ------- (1)

𝑒: 𝑒𝑟𝑟𝑜𝑟/𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑑𝑎𝑡𝑎), e~ 𝑁(0, 𝜎 2 )

𝑦 − 𝛼𝑒 𝛽𝑥 = 𝑒, error

Which is not linear, can be converted into linear by taking logarithms of both sides.

𝑦 = 𝛼𝑒 𝛽𝑥 ------- (1)

we obtain, 𝑙𝑜𝑔 𝑦 = 𝑙𝑜𝑔 𝛼 + 𝛽𝑥 . 𝑙𝑜𝑔 𝑒 i.e 𝑙𝑜𝑔 𝑦 = 𝑙𝑜𝑔 𝛼 + 𝛽𝑥

Put 𝑙𝑜𝑔 𝑦 = 𝑌 𝑎𝑛𝑑 𝑙𝑜𝑔 𝛼 = 𝐴 ,𝛽 = 𝐵

Then the relation is 𝑌 = 𝐴 + 𝐵𝑥 − − − − − − − −(2)

which is a straight line in terms of 𝑌 𝑎𝑛𝑑 𝑥

According to least squares method(LSM),


“To minimise the sum of squares of errors 𝐸 = ∑( 𝑌 − 𝐴 − 𝐵𝑥)2 with respect the
𝜕𝐸 𝜕𝐸
parameters” =0 & =0
𝜕𝐴 𝜕𝐵

𝑇ℎ𝑒 𝑛𝑜𝑟𝑚𝑎𝑙 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑌 = 𝐴 + 𝛽𝑥 𝑏𝑦 𝐿𝑒𝑎𝑠𝑡 𝑠𝑞𝑢𝑎𝑟𝑒 𝑚𝑒𝑡ℎ𝑜𝑑 𝑎𝑟𝑒


∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑥 − − − − − − − − − − − (2)

∑ 𝑌𝑥 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parameters of 𝐴 𝑎𝑛𝑑 𝛽


Prob 1: In a certain type of metal test specimen, the normal stress(x) on a
specimen is known to be functionally related to the shear resistance (y). The
following is a set of coded experimental data on the two variables. Fit the curve
of the form, y = 𝜶 𝒆𝜷𝒙 𝒙 1 5 7 9 12

𝒚 10 15 12 15 21

Solution : Required curve to be fitted is 𝑦 = 𝛼𝑒 𝛽𝑥 -------(1)

Taking logarithms of both sides, we obtain

𝑙𝑛 𝑦 = 𝑙𝑛 𝛼 + 𝛽𝑥 . 𝑙𝑛 𝑒

𝑙𝑛 𝑦 = 𝑙𝑛 𝛼 + 𝛽𝑥
Put 𝑙𝑛 𝑦 = 𝑌 𝑎𝑛𝑑 𝑙𝑛 𝛼 = 𝐴 ,𝛽 = 𝐵, 𝑥 = 𝑋

The relation is 𝑌 = 𝐴 + 𝐵𝑋 − − − − − − − −(2)

which is a straight line in terms of 𝑌 𝑎𝑛𝑑 𝑋

According to least squares method (LSM),

𝑇ℎ𝑒 𝑛𝑜𝑟𝑚𝑎𝑙 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑌 = 𝐴 + 𝐵𝑋 𝑎𝑟𝑒


∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑋 − − − − − − − − − − − (2)

∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝐵 ∑ 𝑋 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parametric values of 𝐴 𝑎𝑛𝑑 𝛽.

The method of procedure is the same as in fitting a straight line . Here number of
pairs 𝑛
𝑥 𝑦 𝑌 = 𝑙𝑜𝑔𝑒 𝑦 𝑋 𝑋2 𝑌𝑋

1 10 2.3026 1 1 2.3026

5 15 2.7081 5 25 13.5405

7 12 2.4849 7 49 17.3943

9 15 2.7081 9 81 24.3729

12 21 3.0445 12 144 36.534

∑ 𝑋 = 34 ∑ 𝑌 = 13.2482 ∑ 𝑋 2 = 300 ∑ 𝑌𝑋 = 94.1443

Normal equations

5𝐴 + 34𝐵 = 13.2482 ,

34𝐴 + 300𝐵 = 94.1443

On solving, 𝐴 = 2.2487 and 𝐵 = 0.059

∴ 𝛼 = 𝑒 𝐴 = 9.4754 and 𝛽 = 𝐵 = 0.059

The best fit curve of the form 𝑦 = 𝛼𝑒 𝛽𝑥 𝑖𝑠 𝒚 = 𝟗. 𝟒𝟕𝟓𝟒. 𝒆𝟎.𝟎𝟓𝟗 𝒙

2. Fit the curve y = a𝒆𝒃𝒙 to the following data

X 0 1 2 3 4 5 6 7 8

Y 20 30 52 77 135 211 326 550 1052

Sol : Let 𝑦 = 𝑎𝑒 𝑏𝑥

log 𝑦 = log 𝑎 + log 𝑒 𝑏𝑥


𝑙𝑜𝑔 𝑦 = 𝑙𝑜𝑔 𝑎 + 𝑏𝑥
Let 𝑌 = 𝑙𝑜𝑔 𝑦 , 𝐴 = 𝑙𝑜𝑔 𝑎 , 𝑋 = 𝑥, 𝑏 = 𝐵. Then the relation is 𝑌 = 𝐴 + 𝐵𝑋

The normal equation are


𝑚𝐴 + (∑𝑋𝑖 ) 𝐵 = (∑𝑌𝑖 )
(∑𝑋𝑖 ) 𝐴 + (∑𝑋𝑖2 ) 𝐵 = ∑𝑋𝑖 𝑌𝑖

consider the table

𝑥𝑖 𝑦𝑖 𝑋𝑖 = 𝑥𝑖 𝑌𝑖 𝑋𝑖2 𝑋𝑖 𝑌𝑖
0. 20 0 2.996 0 0
1. 30 1 3.401 1 3.401
2 52 2 3.951 4 7.902
3 77 3 4.344 9 13.032
4 135 4 4.905 16 19.62
5 211 5 5.352 25 26.76
6 326 6 5.787 36 34.722
7 550 7 6.310 49 44.17
8 1052 8 6.958 64 55.664

Clearly ∑𝑋𝑖 = 36 , ∑𝑋𝑖2 = 204 , ∑𝑌𝑖 = 44.004 , ∑𝑋𝑖 𝑌𝑖 = 205.271


The normal equation are
9𝐴 + 36 𝐵 = 44.004
36 𝐴 + 204 𝐵 = 205.271

By solving for 𝐴 , 𝐵 we get 𝐴 = 2.939 , 𝐵 = 0.488

Since 𝐴 = 𝑙𝑜𝑔 𝑎 ⇒ 𝑎 = 𝑒 𝐴 = 𝑒 2.939 = 18.897


𝐵 = 𝑏 ⇒ 𝑏 = 𝐵 = 0.488

.′ . The best fit exponential curve is 𝑦 = ( 18.897)𝑒 (0.488)𝑥


Prob 3: By the method of least squares determine the constant “a” and “b” such that

𝒚 = 𝒂𝒆𝒃𝒙 fits for the following data.

𝒙 2 4 6 8 10
𝒚 4.077 11.084 30.128 81.879 222.62

Sol :- Let 𝑦 = 𝑎𝑒 𝑏𝑥
𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏𝑥
Let 𝑌 = 𝑙𝑛 𝑦 , 𝐴 = 𝑙𝑛 𝑎 , 𝑋 = 𝑥 Then, the relation is 𝑌 = 𝐴 + 𝑏𝑋
The normal equation to fit 𝑦 = 𝐴 + 𝑏𝑋 are
5A + (∑𝑋𝑖 )b = ∑𝑌𝑖
(∑𝑋𝑖 )𝐴 + (∑𝑋𝑖2 )b = ∑𝑋𝑖 𝑌𝑖

Consider the following table

𝑥𝑖 𝑦𝑖 𝑋𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
2 4.077 2 1.405 4 2.81
4 11.084 4 2.406 16 9.624
6 30.128 6 3.405 36 20.43
8 81.897 8 4.405 64 35.24
10 222.62 10 5.405 100 54.05

∑ 𝑋𝑖 = 30, ∑ 𝑋𝑖 2 = 220, ∑ 𝑌𝑖 = 17.026, ∑ 𝑋𝑖 𝑌𝑖 = 122.154

The normal equations are


5𝐴 + 30𝑏 = 17.026
30𝐴 + 220𝑏 = 122.154
By Solving for 𝐴, 𝐵 we get 𝐴 = 0.406, 𝑏 = 0.5

𝐴 = 𝑙𝑛 𝑎 => 𝑎 = 𝑒 𝐴 = 𝑒 0.406 = 1.501

The least square best fit is 𝑦 = (1.501)𝑒 (0.5)𝑥


Model-2

Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3. … … , 𝑛 be the set of 𝑛 pairs of observations of a


bivariate(𝑋 , 𝑌).

Let us assume the functional relationship between is an exponential curve of the


form 𝑦 = 𝛼𝛽 𝑥

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝛼 𝑎𝑛𝑑 𝛽: 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠(𝑢𝑛𝑘𝑛𝑜𝑤𝑛)

The functional relationship between is an exponential curve of the form is


𝑦 = 𝛼𝛽 𝑥 + 𝑒 -------(1)

𝑒: 𝑒𝑟𝑟𝑜𝑟/𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑑𝑎𝑡𝑎),

e~ 𝑁(0, 𝜎 2 )

The relation is not linear, can be converted into linear by taking logarithms of
both sides of 𝑦 = 𝑎𝑏 𝑥

we obtain 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥 𝑙𝑛 𝑏

Consider 𝑙𝑛 𝑦 = 𝑌, 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝐵 = 𝑙𝑛 𝑏,

We get the curve to be fitted as 𝑌 = 𝐴 + 𝐵𝑥.

𝑌 = 𝐴 + 𝐵𝑥 − − − − − − − −(2)

which is a straight line in terms of 𝑌 𝑎𝑛𝑑 𝑥

According to least squares method(LSM),

“To minimise the sum of squares of errors 𝐸 = ∑(𝑌 − 𝐴 − 𝐵𝑥)2 with respect the
𝜕𝐸 𝜕𝐸
parameters” the necessary conditions are =0 & =0
𝜕𝐴 𝜕𝐵

Now, the linear form 𝑌 = 𝐴 + 𝐵𝑥 − − − − − − − −(2)

The normal equations by Least squares method are

∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑥 − − − − − − − − − − − (2)

∑ 𝑌𝑥 = 𝐴 ∑ 𝑥 + 𝐵 ∑ 𝑥 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parametric values of


𝐴 𝑎𝑛𝑑 𝐵 𝑎𝑛𝑑 𝑠𝑜 𝑎 = 𝑒 𝐴 𝑎𝑛𝑑 𝑏 = 𝑒 𝐵

Prob 4 :The following data were collected to determine the relationship between
pressure (x) and the corresponding scale reading(y) for the purpose of
calibration. Fit curve of the form, 𝒚 = 𝒂𝒃𝒙 for the following data by the method of
least squares.

𝒙 2 3 4 5 6

𝒚 8.3 15.4 33.1 65.2 127.4


Solution :The curve to be fitted is 𝑦 = 𝑎𝑏 𝑥

Taking logarithms of both sides,

𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥 𝑙𝑛 𝑏

Consider 𝑙𝑛 𝑦 = 𝑌 , 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝐵 = 𝑙𝑛 𝑏 , 𝑋 = 𝑥

We get the curve to be fitted as 𝑌 = 𝐴 + 𝐵𝑋.

𝑌 = 𝐴 + 𝐵𝑋 − − − − − − − −(2)

which is a straight line in terms of 𝑌 𝑎𝑛𝑑 𝑋

According to least squares method(LSM),

∑ 𝑌 = 𝑛 𝐴 + 𝐵 ∑ 𝑋 − − − − − − − − − − − (2)

∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝐵 ∑ 𝑋 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parametric values of 𝐴 𝑎𝑛𝑑 𝐵.

To fit the above curve we form the table as follows:

The number of pairs is n = 5

𝑥 𝑦 𝑌 = 𝑙𝑜𝑔𝑒 𝑦 𝑋=𝑥 𝑋2 𝑌𝑋

2 8.3 2.1163 2 4 4.2325

3 15.4 2.7344 3 9 8.2031

4 33.1 3.4995 4 16 13.9981

5 63.2 4.1463 5 25 20.7315

6 127.4 4.8473 6 36 29.08

∑ 𝑌 = 17.3438 ∑ 𝑋 = 20 ∑ 𝑋 2 = 90 ∑ 𝑌𝑋 = 76.2493

Substituting these values in normal equations, we get

5𝐴 + 20𝐵 = 17.3438 ,
20𝐴 + 90𝐵 = 76.2493

On solving, 𝐴 = 0.7191 and 𝐵 = 0.6874

∴ 𝑎 = 𝑒 𝐴 = 2.0526 ,

𝑏 = 𝑒 𝐵 = 1.9886

Hence the required curve is 𝑦 = 2.0526 (1.9886)𝑥

Prob 5. Fit an exponential curve of the form y=a𝒃𝒙 to the following data

X 1 2 3 4 5 6 7 8
Y 1 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol: The curve to be fitted is 𝑌 = 𝑎𝑏 𝑥
⟹ 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑥(𝑙𝑛 𝑏)
Let 𝑌 = 𝑙𝑛 𝑦, 𝐴 = ln 𝑎 , 𝐵 = 𝑙𝑛 𝑏 , 𝑋 = 𝑥
Then the relation is 𝑌 = 𝐴 + 𝐵𝑋
The normal equation to fit 𝑌 = 𝐴 + 𝐵𝑋 are
8𝐴 +(∑ 𝑋𝑖 )𝐵=∑ 𝑌𝑖

(∑ 𝑋𝑖 )𝐴 +(∑ 𝑋𝑖 2)𝐵 = ∑ 𝑋𝑖 𝑌𝑖

Consider the table

𝑥𝑖 𝑦𝑖 𝑋𝑖 =𝑥𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
1 1 1 0 1 0
2 1.2 2 0.182 4 0.364
3 1.8 3 0.588 9 1.764
4 2.5 4 0.916 16 3.664
5 3.6 5 1.281 25 6.405
6 4.7 6 1.548 36 9.288
7 6.6 7 1.887 49 13.209
8 9.1 8 2.208 64 17.664
∑ 𝑋𝑖 = 36, ∑ 𝑌𝑖 = 8.61, ∑ 𝑋𝑖 2 = 204, ∑ 𝑋𝑖 𝑌𝑖 = 52.358
The normal equation are
8𝐴 + 36𝐵 = 8.61
36𝐴 + 204𝐵 = 52.358
By solving these equations we get 𝐴 = −0.382, 𝐵 = 0.324

Since 𝐴 = 𝑙𝑜𝑔 𝑎 => 𝑎 = 𝑒 𝐴 = 𝑒 −0.382 = 0.682


𝐵 = 𝑙𝑜𝑔 𝑏 => 𝑏 = 𝑒 𝐵 = 𝑒 0.324 = 1.384
The best fit curve is 𝑌 = (0.682)1.383𝑥

C) FITTING OF POWER CURVES:

Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3. … … , 𝑛 be the set of 𝑛 pairs of observations of a


bivariate(𝑋 , 𝑌).

Let us assume the functional relationship between is an exponential curve of the


form 𝑦 = 𝑎𝑥 𝑏

𝑥: 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 ,𝑦: 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 , 𝑎 𝑎𝑛𝑑 𝑏: 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠(𝑢𝑛𝑘𝑛𝑜𝑤𝑛)

The functional relationship between is an exponential curve of the form is

𝑦 = 𝑎𝑥 𝑏 -------(1)

𝑒: 𝑒𝑟𝑟𝑜𝑟/𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑑𝑎𝑡𝑎), e~ 𝑁(0, 𝜎 2 )

Which is not linear, converted into linear by taking logarithms of both sides,

we obtain 𝑙𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥

Consider 𝑙𝑛 𝑦 = 𝑌, 𝑙𝑛 𝑎 = 𝐴 𝑎𝑛𝑑 𝑋 = 𝑙𝑛 𝑥,

We get the curve to be fitted as 𝑌 = 𝐴 + 𝑏𝑋.

𝑌 = 𝐴 + 𝑏𝑋 − − − − − − − −(2)

which is a straight line in terms of 𝑌𝑎𝑛𝑑 𝑋

According to least squares method(LSM),

“To minimise the sum of squares of errors with respect the parameters”
𝜕𝐸 𝜕𝐸
=0 & =0
𝜕𝐴 𝜕𝑏

(𝑏𝑦 𝑚𝑖𝑛. −𝑚𝑎𝑥. 𝑎𝑛𝑑 𝑚𝑎𝑥. −𝑚𝑖𝑛. 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑙𝑒 𝑖𝑛 𝑐𝑎𝑙𝑐𝑢𝑙𝑢𝑠 𝑎𝑛𝑑

𝑤ℎ𝑒𝑟𝑒, 𝐸 = ∑(𝑌 − 𝐴 − 𝑏𝑋)2

The normal equations of 𝑌 = 𝐴 + 𝑏𝑋 by LSM are

∑ 𝑌 = 𝑛 𝐴 + 𝑏 ∑ 𝑋 − − − − − − − − − − − (2)

∑ 𝑌𝑋 = 𝐴 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 − − − − − − − − − (3)

By solving the above normal equations, we get, the parametric values of


𝐴 𝑎𝑛𝑑 𝑏 , 𝑤ℎ𝑒𝑟𝑒 𝑎 = 𝑒 𝐴 .

1. The following data on the high performance of radial tires made by a manufacturer are still
usable after having been driven for the given number of miles

X 1 2 5 10 20 30 40 50
Y 98.2 91.7 81.3 64 36.4 32.6 17.1 11.3
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 the following after fitting a power curve to the data
(i) Percentage usable when 𝑥 = 15
(ii) Percentage usable when 𝑥 = 35

Let 𝑦 = 𝑎𝑥 𝑏
By taking logarithms l𝑛 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥
𝑌 = ln 𝑦 , 𝐴 = ln 𝑎 , 𝐵 = 𝑏, 𝑋 = ln 𝑥,
𝑌 = 𝐴 + 𝐵𝑋
Now the Normal equations to fit 𝑌 = 𝐴 + 𝐵𝑋 are
5𝐴 +(∑ 𝑋𝑖 )B = ∑ 𝑌𝑖

(∑ 𝑋𝑖 )A+(∑ 𝑋𝑖 2 )B=∑ 𝑋𝑖 𝑌𝑖

𝑥𝑖 𝑦𝑖 𝑋𝑖 =ln 𝑥𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
1 98.2 0 4.587 0 0
2 91.7 0.6931 4.5185 0.4805 3.132
5 81.3 1.6994 4.3981 2.5903 7.0785
10 64 2.3026 4.1589 5.3019 9.5762
20 36.4 2.9957 3.5946 8.9744 10.7684
30 32.6 3.4012 3.4843 11.5681 11.8508
40 17.1 3.6889 2.8391 13.6078 10.473
50 11.3 3.912 2.4248 15.3039 9.4859

∑ 𝑋𝑖 = 18.603, ∑ 𝑌𝑖 = 30.0053, ∑ 𝑋𝑖 2 = 57.827, ∑ 𝑋𝑖 𝑌𝑖 = 62.3648

The normal Equations are


8𝐴 + 18.603𝐵 = 30.0053
18.603𝐴 + 57.827𝐵 = 62.3648
By Solving we get 𝐴 = 4.9333, 𝐵 = −0.5086

𝐴 = 𝑙𝑜𝑔 𝑎 => 𝑎 = 𝑒 𝐴 = 𝑒 4.9333 = 138.8369


𝐵 = 𝑏 => 𝑏 = 𝐵 = −50.86
The best fit power curve is 𝑦 = (138.8369)𝑥 −0.5086
𝑦(15) =
𝑦(35) =

Prob 6 .Fit a power curve of the form 𝒚 = 𝒂𝒙𝒃 for the following data by using the method of
least squares.

x 2 3 4 5 6
y 8.3 15.4 33.1 65.2 127.4
𝑏
Sol:Let 𝑦 = 𝑎𝑥
By taking logarithms l𝑜𝑔 𝑦 = 𝑙𝑛 𝑎 + 𝑏 𝑙𝑛 𝑥
𝑌 = ln 𝑦 , 𝐴 = ln 𝑎 , 𝐵 = 𝑏, 𝑋 = ln 𝑥,
𝑌 = 𝐴 + 𝐵𝑋
Now the Normal equations to fit 𝑌 = 𝐴 + 𝐵𝑋 are

5𝐴 +(∑ 𝑋𝑖 )B = ∑ 𝑌𝑖 , (∑ 𝑋𝑖 )A+(∑ 𝑋𝑖 2 )B=∑ 𝑋𝑖 𝑌𝑖


Consider the table
𝑥𝑖 𝑦𝑖 𝑋𝑖 =ln 𝑥𝑖 𝑌𝑖 =ln 𝑦𝑖 𝑋𝑖 2 𝑋𝑖 𝑌𝑖
2 8.3 0.693 2.116 0.480 1.466
3 15.4 1.099 2.734 1.205 3.001
4 33.1 1.386 3.499 1.920 4.849
5 65.2 1.609 4.177 2.588 6.720
6 127.4 1.791 4.847 3.207 8.680
∑ 𝑋𝑖 = 6.578, ∑ 𝑌𝑖 = 17.373, ∑ 𝑋𝑖 2 = 9.4, ∑ 𝑋𝑖 𝑌𝑖 = 24.716
Number of data points n=5
The normal Equations are 5𝐴 + 6.578𝐵 = 17.373
6.578𝐴 + 9.4𝐵 = 24.716
By Solving we get 𝐴 = 0.194, 𝐵 = 2.493

𝐴 = 𝑙𝑜𝑔 𝑎 => 𝑎 = 𝑒 𝐴 = 𝑒 0.194 = 1.214


𝐵 = 𝑏 => 𝑏 = 𝐵 = 2.493
The best fit power curve is 𝑦 = (1.214)𝑥 2.493

Tutorial problems
1 . Obtain the relation of the form y=𝑎𝑏 𝑥 for the following data by the method of least squares.

X 2 3 4 5 6
Y 8.3 15.4 33.1 65.2 127.4

2 . Fit the curve of the form y=𝑎𝑥 𝑏 for the following data

X 77 100 185 239 285


Y 2.4 3.4 7 11.1 19.6

3. Fit the curve of the form y=𝑎𝑏 𝑥 for the following data

X 0 1 2 3 4 5 6 7
Y 10 21 35 59 92 100 400 610
3. A study was made on the amount of sugar converted in a certain process at various
temperatures. The data were coded and recorded as follows. Apply the method of least

squares to fit the curve of the form 𝑦 = 𝑎𝑥 𝑏 .

Temperature(x) 50 65 85
Converted 20 40 60
sugar(y)

4. Fit the exponential curve of the form 𝑦 = 𝑎𝑒 𝑏𝑥 for the following data

𝑥 1 5 7 9 12
𝑦 10 15 12 15 21

Correlation:
Introduction:

If there exists a relationship between two variables then the statistical


analysis of such data is called bivariate analysis. We may be interested to find the
relationship between two or more variables under study. Correlation refers the
relationship between two or more variables.
In other words, it’s a measure of how things are related. The study of
how variables are correlated is called correlation analysis.
Correlation analysis is a statistical method used to evaluate the strength
of relationship between two quantitative variables.

Correlation:
If a change in one variable causes change in the other then the two variables
are said to be correlated. Correlation is a statistical analysis which measures and
analyses the degree to which the two variables fluctuate with reference to each
other. The Correlation refers the closeness of relationship between the variables.
The measure of closeness of the relation between the variables is called as
correlation coefficient. It is denoted by ‘𝑟 ′ .
A high correlation means that two or more variables have a strong
relationship with each other, while a weak correlation means that the
variables are hardly related. In other words, it is the process of studying
the strength of that relationship with available statistical data.

Correlations are useful because if you can find out what relationship
variables have, you can make predictions about future behaviour.

Knowing what the future holds is very important in the social sciences like
government and healthcare. Businesses also use these statistics for
budgets and business plans.

Types of Correlation

1. Positive correlation: If the two variables tend to move together in same


direction, then they are said to be positively correlated. i.e an increase in the
value of one variable is accompanied by increase of the value of other variable
(or) a decrease in the value of one variable is accompanied by a decrease of
value of another variable.
Ex 1: 𝑥-height, 𝑦 − weight
2: 𝑥- rain, 𝑦-yield
2. Negative Correlation: If the two variables tend to move in opposite direction,
then two variables are said to be negatively correlated. i.e an increase in the
value of one variable is accompanied by the decrease of value of another
variable (or) a decrease of value of one variable mis accompanied by the
increase of value of another.
Ex : 𝑥 −price, 𝑦 −sales
3. Simple and multiple correlation:
The study between two variables is described as simple correlation.
If we study more than two variables simultaneously, it is called as multiple
correlation.
EX: 𝑥- price, 𝑦-demand, 𝑧-supply
4. Linear and non linear correlation: If the relation of change between two
variables is uniform, then there will be a linear correlation between them.
X 5 10 15 20

Y 18 33 48 63

Otherwise, it is called non-linear correlation.

Covariance of X, Y:
The covariance of X, Y is denoted by COV(X,Y) and defined as
COV(X, Y)= E[(X − μX )(Y − μY )]
= E[XY − μX Y − μY X + μX μY ]
= E[XY] − μX E[Y] − μY E[X] + μX μY
= E[XY] − E[X]E[Y] − E[Y]E[X] + E[Y]E[X]
= E[XY] − E[X]E[Y]
Note: If X, Y are independent, then E[XY] = E[X]E[Y] and so COV(X, Y) = 0
Karl Pearson Correlation Coefficient
It measures the magnitude of linear relationship between the variables . It is
COV(X,Y)
denoted by ‘𝑟’ and defined as r =
σX σY

E[(X−μX )(Y−μY )]
=
σX σY

∑(xi −μX )(yi −μY )


Where E[(X − μX )(Y − μY )] =
n
∑(xi −μX )2
σX 2 = E[(X − μX )2 ] =
n
∑(yi −μY )2
σY 2 = E[(Y − μY )2 ] =
n
∑(xi −μX )(yi −μY )
n ∑(xi −μX )(yi −μY )
r= =
√∑(x −μX ) √∑(y −μY )2
2 √∑(xi −μX )2 √∑(yi −μY )2
i i
n
∑ ai bi
Let ai = xi − μX ,bi = yi − μY , then r =
√∑ ai 2 √∑ bi 2

Properties of Correlation coefficient


1. Result: The correlation coefficient r lies between -1 and 1
i.e −1 ≤ r ≤ 1

Proof: Let ai = xi − μX ,bi = yi − μY


ai bi 2 a2 b2 2ai bi
Consider ∑(
σX
+
σY
) = ∑ (σ i 2 + σ i 2 + σ )
X Y X σY

∑ ai 2 ∑ bi 2 ∑ 2ai bi
= + +
σX 2 σY 2 σX σY

∑(xi −μX )2 ∑ ai 2
We know that σX 2 = E[(X − μX )2 ] = =
n n
∑ ai 2
∴ =n
σX 2

2
∑(yi − μY )2 ∑ bi 2
σy = E[(Y − μY )2 ] = =
n n
∑ bi 2
∴ =n
σY 2

COV(X,Y) E[(X−μX )(Y−μY )] ∑(xi −μX )(yi −μY ) ∑ ai bi


r= = = =
σX σY σX σY nσX σY nσX σY

∑ ai bi
∴ = rn
σX σY

ai bi 2 ∑ ai 2 ∑ bi 2 ∑ 2ai bi
∑( + ) = 2
+ 2
+ = n + n + 2rn = 2n(1 + r)
σX σY σX σY σX σY

ai bi 2
‘ Since ∑ ( + ) ≥ 0, 2n(1 + r) ≥ 0
σX σY

⟹ (1 + r) ≥ 0
⟹ r ≥ −1 … (1)
ai bi 2 ∑ ai 2 ∑ bi 2 ∑ 2ai bi
∑( − ) = 2
+ 2
- = n + n − 2rn = 2n(1 − r)
σX σY σX σY σX σY
ai bi 2
‘ Since ∑ ( − ) ≥ 0, 2n(1 − r) ≥ 0
σX σY

⟹ (1 − r) ≥ 0
⟹ 1 ≥ r….(2)
From (1)&(2) −1 ≤ r ≤ 1.

Note: 1. If r = 1, the correlation is said to be perfect positive.


2. If r = −1, the correlation is said to be perfect negative.
3. If 0 < r < 1, the correlation is said to be positive.
4. If −1 < r < 0, the correlation is said to be negative.
5. If r = 0, then no correlation , the variates are uncorrelated.

Q1. Find Karl Pearson’s coefficient of correlation to the following data.


Wages 100 101 102 102 100 99 97 98 96 95
Cost 98 99 99 97 95 92 95 94 90 91
of
living

𝐶𝑂𝑉(𝑋,𝑌)
Sol: The correlation coefficient is 𝑟 =
𝜎𝑋 𝜎𝑌

∑ ai bi
=
√∑ ai 2 √∑ bi 2

Where ai = xi − μx ,bi = yi −μY .


∑ xi 100+101+⋯+95 990
Now, μX = = = = 99
n 10 10

∑ yi 98+99+⋯+91 950
μY = = = = 95
n 10 10

Consider the following table,

xi yi a i = x i − μ𝑋 bi = yi −μ𝑌 ai 2 bi 2 a i bi
100 98 1 3 1 9 3
101 99 2 4 4 16 8
102 99 3 4 9 16 12
102 97 3 2 9 4 6
100 95 1 0 1 0 0
99 92 0 -3 0 9 0
97 95 -2 0 4 0 0
98 94 -1 -1 1 1 1
96 90 -3 -5 9 25 15
95 91 -4 -4 16 16 16

Clearly, ∑ ai 2 = 54, ∑ bi 2 = 96, ∑ ai bi = 61


∑ ai bi 61 61
Then r = = = = 0.8472
√54√96 (7.3484)(9.7979)
√∑ ai 2 √∑ bi 2

Therefore r = 0.8472
The given two variables are positively correlated.

Q2. Find the Karl Pearson’s coefficient of correlation to the following data.
Fertiliser 15 18 20 24 30 35 40 50
Used
Productivity 85 93 95 105 120 130 150 160

Sol : Let X denotes fertiliser used, Y denotes production


cov(X,Y)
The correlation coefficient is r =
𝜎𝑋 𝜎𝑌

∑ai bi
r=
√∑a2i √∑b2i

Where ai = xi-µX , bi = yi-µy


∑xi 15+18+⋯…….+50 232
Now µX = = = = 29
n 8 8
∑yi 85+93+⋯……….+160 938
µY = = = = 117.25
n 8 8

consider the table:


xi yi ai = xi - µX bi = yi - µy a2i b2i a i bi
15 85 -14 -32.25 196 1040.06 451.5
18 93 -11 -24.25 121 588.06 266.75
20 95 -9 -22.25 81 495.06 200.25
24 105 -5 -12.25 25 150.06 61.25
30 120 1 2.75 1 7.5625 2.75
35 130 6 12.75 36 162.56 76.5
40 150 11 32.75 121 1072.56 360.25
50 160 21 42.75 441 1827.56 897.75

∑a2i = 1022 , ∑b2i = 5343.48, ∑ai bi = 2317


∑ai bi
∴r =
√∑a2i √∑b2i

2317
=
√1022√5343.48
2317
= (31.968)(73.099)

∴ r = 0.9915
∴ the given two variables are positively correlated.

Q3.From the following data , show that the coefficient of correlation between 𝑿
and Y is 0.89.
X series Y series
No. of items 15 15
Arithmatic mean 25 18
Sum of squares of deviations from mean 136 138

Summation of product of deviations of X and Y series from respective arithmetic


means is 122.
Sol :
Given that
µ𝑋 = 25, µ𝑌 = 18, ∑(xi − µX )2 = 136 , ∑(yi − µY )2 = 138
∑(xi − µX )(yi − µY ) = 122
cov(X,Y)
We know that r =
√X√Y
∑(xi − µX )(yi − µY )
r =
√ ∑(xi − µX )2 √∑(yi − µ𝑌 )2

122
=
√136√138
122
= (11.6619)(11.7473)

∴ r = 0.89

Q4. Apply the method of Karl Pearson’s correlation to calculate the correlation
coefficient from the following results:
𝑛 = 10 , ∑𝑥 = 100, ∑𝑦 = 150 , ∑(𝑥 − 10)2 = 180, ∑(𝑦 − 15)2 = 215,
∑(𝑥 − 10)(𝑦 − 15) = 60.
Sol :
∑xi 100
µX = = = 10
n 10
∑yi 150
µy = = = 15
n 10

∑(x − 10)2 = ∑(xi − µx )2 = ∑a2i


2
∑(y − 15)2 = ∑(yi − µy ) = ∑b2i
∑ai bi
∴ 𝑟 =
√∑a2i √∑b2i

60
= = 0.3049
√180√215

5. Find Karl Pearson’s coefficient of correlation to the following data.


x 38 45 46 38 35 38 46 32 36 38
y 28 34 38 34 36 26 28 29 25 36

𝐶𝑂𝑉(𝑋,𝑌)
Sol: The correlation coefficient is 𝑟 =
𝜎𝑋 𝜎𝑌

∑ ai bi
=
√∑ ai 2 √∑ bi 2

Where ai = xi − μx ,bi = yi −μY .


∑ xi 38+45+⋯+38 392
Now, μX = = = = 39.2. Take Assumed mean A=39
n 10 10

∑ yi 28+34+⋯+36 314
μY = = = = 31.4 . Take assumed mean B=31
n 10 10

Consider the following table,

xi yi ai = xi − 𝐴 bi = y i − 𝐵 ai 2 bi 2 a i bi
38 28 -1 -3 1 9 3
45 34 6 3 36 9 18
46 38 7 7 49 49 49
38 34 -1 3 1 9 -3
35 36 -4 5 16 25 -20
38 26 -1 -5 1 25 5
46 28 7 -3 49 9 -21
32 29 -7 -2 49 4 14
36 25 -3 -6 9 36 18
38 36 -1 5 1 25 -5
Clearly, ∑ ai 2 = 212, ∑ bi 2 = 200, ∑ ai bi =58
n ∑ ai bi −(∑ ai )(∑ bi ) 10x58−(2)(4)
Then r = = = 0.2791
√10x212−4√10x200−16
√𝑛 ∑ ai 2 −(∑ ai )2 √∑ bi 2 −(∑ bi )2

Therefore r = 0.2791

Practice problems:
1. Psychological tests of intelligence and engineering ability were applied to 10
students. Here is a record of ungrouped data showing intelligence ratio (I.R) and
engineeering ratio (E.R) . calculate the coefficient of correlation.
I.R 105 104 102 101 100 99 98 96 93 92
E.R 101 103 100 98 95 96 104 92 97 94

ANS : 0.59

2. Find the coefficient of correlation between X and Y for the following data.
X 10 12 18 24 23 27
Y 13 18 12 25 30 10
Ans : 0.25553

3. The following data, Show that the coefficient of correlation between X


and Y is 0.89.
X-series Y-Series
No of Items : 15 15
Arithmetic mean : 25 18
Sum of squares of deviations from mean : 136 138
Summation of product of deviations of x and y series from respective
arithmetic means is 122.
4. Apply the method of Pearson’s correlation to calculate the correlation
coefficient from the following results:
n = 10,  x = 100,  y = 150, ( x − 10) 2 = 180, ( y − 15) 2 = 215, ( x − 10)( y − 15) = 60 .

5. Apply the method of Pearson’s correlation to find the coefficient of correlation


from the following results:
n = 10,  x = 490,  y = 450, ( x − 49) 2 = 5150, ( y − 45) 2 = 920,
( x − 49)( y − 45) = 1449 .
6. Find if there is any significant correlation between the heights and weights
given below.

Height in inches 57 59 62 63 64 65 55 58 57

Weight in Ibs 113 117 126 126 130 129 111 116 112

Ans:0.98

Regression
In statistical modelling, regression analysis is a set of statistical processes
for estimating the relationships between a dependent variable (often called the
'outcome variable') and one or more independent variables (often called
'predictors', 'covariates',
It can be utilized to assess the strength of the relationship between variables and
for modelling the future relationship between them.

Scatter diagram:
700

600

500
converted sugar (y)

400

300

200

100

0
0 10 20 30 40 50 60 70 80
Temp (x)

The study of correlation measures the direction and the strength of the
relationship between the variables. Regression is a statistical method which helps
us to estimate the unknown value of one variable with the known value of related
variable is called Regression. The line described in the average relationship is
called the line of Regression.
Comparison between Correlation and Regression:
The correlation measures the degree of variability between the variables, while
regression establishes a functional relationship between the dependent and
independent variables.
Regression equation:
The functional form of Regression equation is 𝑌 = 𝑎 + 𝑏𝑋, where 𝑎 is the
𝑌 intercept. 𝑏 is the slope of the regression line and called as Regression
coefficient of 𝑌 on 𝑋.it is also denoted by 𝑏𝑌𝑋
The values of 𝑎 and 𝑏 can be obtained from the normal equations
𝑚𝑎 + (∑ 𝑥𝑖 )𝑏 = ∑ 𝑦𝑖
(∑ 𝑥𝑖 )𝑎 + (∑ 𝑥𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
The regression linen 𝑋 𝑜𝑛 𝑌 is given by 𝑋 = 𝑎 + 𝑏 𝑌, where 𝑎 is the 𝑋 intercept. 𝑏
is the slope of the regression line and called as Regression coefficient of 𝑋 on 𝑌. it
is also denoted by 𝑏𝑋𝑌
The values of 𝑎 and 𝑏 can be obtained from the normal equations
𝑚𝑎 + (∑ 𝑦𝑖 )𝑏 = ∑ 𝑥𝑖
(∑ 𝑦𝑖 )𝑎 + (∑ 𝑦𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
Regression lines by mean deviations
𝜎𝑌
The Regression line of 𝑌 on 𝑋 is 𝑌 − 𝜇𝑌 = 𝑟 (𝑋 − 𝜇𝑋 )
𝜎𝑋

{ 𝑚𝑎 + (∑ 𝑥𝑖 )𝑏 = ∑ 𝑦𝑖 ⟹ 𝜇𝑌 = 𝑎 + 𝜇𝑋 𝑏
(∑ 𝑥𝑖 )𝑎 + (∑ 𝑥𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖 ⟹ (∑ 𝑥𝑖 −𝜇𝑥 )𝑎 + (∑(𝑥𝑖 − 𝜇𝑋 )2 )𝑏 =
∑(𝑥𝑖− 𝜇𝑋 )( 𝑦𝑖 − 𝜇𝑌 )

⟹ (∑(𝑥𝑖 − 𝜇𝑥 )2 ) 𝑏 = ∑(𝑥𝑖− 𝜇𝑋 )( 𝑦𝑖 − 𝜇𝑌 )

∑(𝑥𝑖− 𝜇𝑋 )( 𝑦𝑖 − 𝜇𝑌 ) 𝑟𝜎𝑋 𝜎𝑌 𝑟𝜎𝑌


⟹𝑏= = =
(∑(𝑥𝑖 − 𝜇𝑋 )2 ) 𝜎𝑋 2 𝜎𝑋
𝑌 = 𝑎 + 𝑏𝑋, 𝜇𝑌 = 𝑎 + 𝜇𝑋 𝑏 ⟹ 𝑌 − 𝜇𝑌 = 𝑏(𝑋 − 𝜇𝑋 ) ⟹ 𝑌 − 𝜇𝑌 =
𝑟𝜎𝑌
(𝑋 − 𝜇𝑋 )}
𝜎𝑋

𝑟𝜎𝑌
Where 𝜇𝑋 , 𝜇𝑌 are means of 𝑋, 𝑌 respectively and is called the regression
𝜎𝑋
𝑟𝜎𝑌
coefficient of 𝑌 on 𝑋 and it is denoted by 𝑏𝑌𝑋 i.e 𝑏𝑌𝑋 =
𝜎𝑋
𝑟𝜎𝑋
The Regression line of 𝑋 on 𝑌 is 𝑋 − 𝜇𝑥 = (𝑌 − 𝜇𝑌 )
𝜎𝑌
𝑟𝜎𝑋
Where is called the regression coefficient of 𝑋 on 𝑌 and it is denoted by
𝜎𝑌
𝑟𝜎𝑋
𝑏𝑋𝑌 i.e 𝑏𝑋𝑌 =
𝜎𝑌
𝑟𝜎𝑌 𝑟𝜎𝑋
Consider 𝑏𝑌𝑋 . 𝑏𝑋𝑌 = . = 𝑟2
𝜎𝑋 𝜎𝑌

∴ 𝑟 = √𝑏𝑦𝑥 . 𝑏𝑥𝑦 i.e The correlation coefficient is GM of the two


regression coefficients.
𝑟𝜎𝑌 𝐶𝑜𝑣(𝑋,𝑌) 𝜎𝑌 𝐶𝑜𝑣(𝑋,𝑌)
Consider 𝑏𝑌𝑋 = = = =
𝜎𝑋 𝜎𝑋 .𝜎𝑌 𝜎𝑋 𝜎𝑋 2 .
1
∑(𝑥𝑖 −𝜇𝑋 )(𝑦𝑖 −𝜇𝑌 ) ∑(𝑥𝑖 −𝜇𝑋 )(𝑦𝑖 −𝜇𝑌 )
𝑛
= 1 = ∑(𝑥𝑖 −𝜇𝑌 )2
∑(𝑥𝑖 −𝜇𝑋 )2
𝑛

Thus b YX =
a b i i

a i
2

Similarly bXY = 
a i bi
b i
2

∑ 𝑎𝑖 𝑏 𝑖
The regression line of Y on X is 𝒀 − 𝝁𝒀 = 𝒃𝒀𝑿 (𝑿 − 𝝁𝑿 ) where 𝑏𝑌𝑋 = ∑ 𝑎𝑖 2

∑ 𝑎𝑖 𝑏 𝑖
The regression line of Y on X is 𝑿 − 𝝁𝑿 = 𝒃𝑿𝒀 (𝒀 − 𝝁𝒀 ) where 𝑏𝑋𝑌 = ∑ 𝑏𝑖 2

Note: Clearly, we observe that the both regression lines satisfied by the point
(𝜇𝑋 , 𝜇𝑌 )
So, the point (𝜇𝑋 , 𝜇𝑌 ) is the intersection point of the regression lines.
Problems:
1. For a sample of 200 points of observations , the following quantities were
calculated
∑ 𝒙𝒊 = 𝟏𝟏. 𝟑𝟒, ∑ 𝒚𝒊 = 𝟐𝟎. 𝟕𝟖,∑ 𝒙𝒊 𝟐 = 𝟏𝟐. 𝟏𝟔, ∑ 𝒚𝒊 𝟐 = 𝟖𝟒. 𝟗𝟔,
∑ 𝒙𝒊 𝒚𝒊 = 𝟐𝟐. 𝟏𝟑
From the above data prepare the two regression lines.

Sol: The Regression line of 𝑌 on 𝑋 is 𝑌 = 𝑎 + 𝑏𝑋


The normal equations are
200 𝑎 + (∑ 𝑥𝑖 )𝑏 = ∑ 𝑦𝑖
(∑ 𝑥𝑖 )𝑎 + (∑ 𝑥𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
Given that ∑ 𝑥𝑖 = 11.34, ∑ 𝑦𝑖 = 20.78,∑ 𝑥𝑖 2 = 12.16, ∑ 𝑦𝑖 2 = 84.96,
. ∑ 𝑥𝑖 𝑦𝑖 = 22.13.

Thus 200 𝑎 + 11.34𝑏 = 20.78


11.34 𝑎 + 12.16 𝑏 = 22.13
Solving these equations we get 𝑎 =0.00075, 𝑏 = 1.8192
Thus the regression line of 𝑌 on 𝑋 is 𝑌 = 0.00075 + 1.8192𝑋
The Regression line of 𝑋 on 𝑌 is 𝑋 = 𝑎 + 𝑏𝑌
The normal equations are
200 𝑎 + (∑ 𝑦𝑖 )𝑏 = ∑ 𝑥𝑖
(∑ 𝑦𝑖 )𝑎 + (∑ 𝑦𝑖 2 )𝑏 = ∑ 𝑥𝑖 𝑦𝑖
Given that ∑ 𝑥𝑖 = 11.34, ∑ 𝑦𝑖 = 20.78,∑ 𝑥𝑖 2 = 12.16, ∑ 𝑦𝑖 2 = 84.96,
. ∑ 𝑥𝑖 𝑦𝑖 = 22.13.

Thus 200 𝑎 + 20.78 𝑏 = 11.34


20.78 𝑎 + 84.96 𝑏 = 22.13
Solving these equations we get 𝑎 =0.03, 𝑏 = 0.253
Thus the regression line of 𝑿 on 𝒀 is 𝑿 = 𝟎. 𝟎𝟑 + 𝟎. 𝟐𝟓𝟑𝒀
2. Construct the two regression lines for the following data . Also, obtain the
estimate of 𝑿 when 𝒀 = 𝟕𝟎
Sol:
X 65 66 67 67 68 69 70 72

𝒀 67 68 65 68 72 72 69 71
Sol: The regression line of Y on X is 𝑌 − 𝜇𝑌 = 𝑏𝑌𝑋 (𝑋 − 𝜇𝑋 ) where bYX = 
a i bi
a i
2

The regression line of X on Y is 𝑋 − 𝜇𝑋 = 𝑏𝑋𝑌 (𝑌 − 𝜇𝑌 ) where


b XY =
a b i i

b i
2

∑ 𝑥𝑖 544 ∑ 𝑦𝑖 552
𝜇𝑋 = = = 68, 𝜇𝑌 = = = 69
8 8 8 8

𝑥𝑖 𝑦𝑖 𝑎𝑖 𝑏𝑖 𝑎𝑖 2 𝑏𝑖 2 𝑎𝑖 𝑏𝑖
= 𝑥𝑖 − 𝜇𝑋 = 𝑦𝑖 − 𝜇𝑌
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8

 a =36,  b = 44,  a b = 24
i
2
i
2
i i

b YX =
a bi i
=
24 2
= = 0.6667 , b XY =
a b i i
=
24 6
= = 0.5455
a i
2
36 3 b i
2
44 11

The regression line of Y on X is 𝑌 − 𝜇𝑌 = 𝑏𝑌𝑋 (𝑋 − 𝜇𝑋 )


i.e 𝑌 − 69 = 0.6667(𝑋 − 68)
i.e 𝑌 = 23.66644 + 0.6667𝑋

The regression line of X on Y is 𝑋 − 𝜇𝑋 = b𝑋𝑌 (𝑌 − 𝜇𝑌 )


i.e 𝑋 − 68 = 0.5455(𝑌 − 69)
i.e 𝑋 = 30.3605 + 0.5455𝑌
Consider 𝑋 = 30.3605 + 0.5455𝑌
If 𝑌 = 70, then 𝑋 = 30.3605 + 0.5455 × 70 = 68.5455
3. If ∑ 𝑋 2 = 190 , ∑ 𝑌 2 = 140 , ∑ 𝑋𝑌 = 92, where 𝑋 = 𝑥 − 𝑥̅ , 𝑌 = 𝑦 − 𝑦̅ and
𝑥̅ = 50 , 𝑦̅ = 56 then find the two regression lines.
∑ 𝑋𝑌 95 1
Ans : Regression coefficient 𝑏𝑦𝑥 = ∑ = −
𝑋2 190 2

∑ 𝑋𝑌 95 19
𝑏𝑥𝑦 = = =
∑ 𝑌 2 140 28
(i) Line of Regression of 𝑦 on 𝑥 is 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
1
i.e 𝑦 − 56 = (𝑥 − 50)
2
⟹ 𝑦 = 0.5𝑥 + 31
(ii) Line of Regression of 𝑥 on 𝑦 is 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
19
i.e 𝑥 − 50 = (𝑦 − 56)
28
⟹ 𝑥 = 0.6786𝑦 + 12

Practice problems:
1. Find the regression line of y on x from the data and so find y (20)
X 10 12 13 12 16 15

𝒚 40 38 43 45 37 43

Ans: 𝑦 = 44.25 − 0.25 𝑥 , 𝑦(20) = 39.25

2. If the equations of two regression lines obtained in correlation analysis as


follows 20X-9Y = 107, 4X-5Y=-33.Then find (i) the value of correlation
coefficient (ii) means of X,Y.
3. The correlation coefficient of 𝑋 and 𝑌 is 0.6, variance of 𝑋 and Y are 2.25
and 4 respectively, and means of 𝑋 and 𝑌 𝑎𝑟𝑒 10 𝑎𝑛𝑑 20 respectively. Find
the two regression lines.
4. The following table gives the data on rainfall(X) in inches and
discharge(Y)in 1000CC in a certain river. Obtain the lines of regression.
X 1.53 1.78 2.6 2.95 3.42

𝑌 33.5 36.3 40 45.8 53.8

5. In a partially destroyed laboratory data, only the equations giving the two
lines of regression are available are 7x-16y+9=0,5y-4x-3=0.Calculate the
coefficient of correlation and means 𝑥̅ , 𝑦̅.

Standard error: standard error is the square root mean square deviations
of the points in the data from the regression line of Y on X.
𝜎𝑌
Regression line 𝑌 = 𝜇𝑌 + 𝑟 (𝑋 − 𝜇𝑋 )
𝜎𝑋
2
𝜎𝑌
Sum of square of the deviations 𝑆 = ∑ ((𝑌 − 𝜇𝑌 ) − 𝑟 (𝑋 − 𝜇𝑋 ))
𝜎𝑋

𝜎𝑌 2 𝜎
= 𝑛𝜎𝑌 2 + (𝑟 ) 𝑛𝜎𝑋 2 − 2𝑟 𝜎𝑌 . 𝑛. 𝜎𝑋 𝜎𝑌
𝜎𝑋 𝑋

= 𝑛𝜎𝑌 2 (1 − 𝑟 2 )

Standard error 𝑆𝑦 = 𝜎𝑌 √1 − 𝑟 2

You might also like