Applications of Numerical Methods in Civil Engineering
Applications of Numerical Methods in Civil Engineering
Lecture 7
Curve Fitting
Linear Regression
Polynomial Regression
Multiple Linear Regression
Mongkol JIRAVACHARADET
SURANAREE
UNIVERSITY OF TECHNOLOGY
INSTITUTE OF ENGINEERING
SCHOOL OF CIVIL ENGINEERING
LINEAR REGRESSION
y
x
No exact solution but many approximated solutions
Observation: [ xi yi ]
Model: y = x +
Error: ei = yi - xi +
x
Criteria for a Best Fit
Find the BEST line which minimize the sum of error for all data
i =1
i =1
S r = e i2 = (y i x i )
Differentiate with respect to each coefficient:
S r
= 2 ( y i x i )
S r
= 2 [( y i x i ) x i ]
Setting derivatives = 0 :
0 = y i x i
0 = y i x i x i x i2
From = n , express equations as set of 2 unknowns ( , )
n + x i = y i
x i + x i2 = y i x i
1
xi yi xi yi
n
=
1
2
2
xi ( xi )
n
S xy
S xx
= y x
where y and x are the mean of y and x
1
Define: S xy = xi yi xi yi
n
1
2
S xx = x ( xi )
n
2
i
S yy = yi2
1
2
y
( i)
n
y = x +
xi yi
y i2
xi
yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
1
4
9
16
25
36
49
0.5
5.0
6.0
16.0
17.5
36.0
38.5
0.25
6.25
4
16
12.25
36
30.25
28
24
140
119.5
105
(119.5) (28)(24) / 7
=
= 0.8393
2
(140) (28) / 7
n =7
28
x =
=4
7
24
y =
= 3 . 4286
7
Least-square fit:
y = 0 . 8393 x + 0 . 0714
i =1
i =1
S r = ei2 = ( yi xi ) = ( S xx S yy S xy2 ) / S xx
2
St = ( yi y ) = S yy
i =1
Standard deviation:
sy / x
Sr
=
n2
St
sy =
n2
sy
Linear regression
sy > sy/x
sy/x
S xy2
Coefficient of determination
St S r
=
r =
S xx S yy
St
r2
y
yi
( yi y )
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
8.5765
0.8622
2.0408
0.3265
0.0051
6.6122
4.2908
28
24
22.7143
( y i - - x i) 2
0.1687
0.5626
0.3473
0.3265
0.5896
0.7972
0.1993
y = 3.4286
= 0.8393
= 0.0714
22.7143
sy =
72
= 2.131
sy / x
2.9911
2.9911
=
72
= 0.773
St
Sr
Since sy/x < sy , linear regression has merit.
r=
22.7143 2.9911
= 0.868 = 0.932
22.7143
y i2
xi
yi
xi yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
1
4
9
16
25
36
49
0.5
5.0
6.0
16.0
17.5
36.0
38.5
0.25
6.25
4
16
12.25
36
30.25
28
24
140
119.5
105
S xx = 140 282 / 7 = 28
S yy = 105 242 / 7 = 22.7
S xy = 119.5 28 24 / 7 = 23.5
22.7
= 2.131
72
2.977
= 0.772
72
i
x
x
For CI 95%, you can be 95% confident that the two curved
yi t / 2 s y / x
1 ( xi x ) 2
+
n
S xx
yi
0.5
2.5
2.0
4.0
3.5
6.0
5.5
1 (3.4 4) 2
Interval: 2.9148 (2.571) (0.772) +
7
28
2.9148 0.7832
T-Distribution
t 0.025
t 0.025
t 0.005
t 0.005
f ( x) =
(1 + x 2 / ) ( +1) / 2
t
95%
99%
B(0.5, 0.5 )
B ( , ) = t 1 (1 t ) 1 dt
0
The following is the plot of the t probability density function for 4 different
values of the shape parameter.
= df
Degree of freedom
Critical Values of t
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
1
2
3
4
5
3.078
1.886
1.638
1.533
1.476
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.657
9.925
5.841
4.604
4.032
318.313
22.327
10.215
7.173
5.893
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
5.208
4.782
4.499
4.296
4.143
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
4.024
3.929
3.852
3.787
3.733
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.583
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
3.686
3.646
3.610
3.579
3.552
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
21
22
23
24
25
1.323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.831
2.819
2.807
2.797
2.787
3.527
3.505
3.485
3.467
3.450
26
27
28
29
30
1.315
1.314
1.313
1.311
1.310
1.706
1.703
1.701
1.699
1.697
2.056
2.052
2.048
2.045
2.042
2.479
2.473
2.467
2.462
2.457
2.779
2.771
2.763
2.756
2.750
3.435
3.421
3.408
3.396
3.385
31
32
33
34
35
1.309
1.309
1.308
1.307
1.306
1.696
1.694
1.692
1.691
1.690
2.040
2.037
2.035
2.032
2.030
2.453
2.449
2.445
2.441
2.438
2.744
2.738
2.733
2.728
2.724
3.375
3.365
3.356
3.348
3.340
36
37
38
39
40
1.306
1.305
1.304
1.304
1.303
1.688
1.687
1.686
1.685
1.684
2.028
2.026
2.024
2.023
2.021
2.434
2.431
2.429
2.426
2.423
2.719
2.715
2.712
2.708
2.704
3.333
3.326
3.319
3.313
3.307
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
41
42
43
44
45
1.303
1.302
1.302
1.301
1.301
1.683
1.682
1.681
1.680
1.679
2.020
2.018
2.017
2.015
2.014
2.421
2.418
2.416
2.414
2.412
2.701
2.698
2.695
2.692
2.690
3.301
3.296
3.291
3.286
3.281
46
47
48
49
50
1.300
1.300
1.299
1.299
1.299
1.679
1.678
1.677
1.677
1.676
2.013
2.012
2.011
2.010
2.009
2.410
2.408
2.407
2.405
2.403
2.687
2.685
2.682
2.680
2.678
3.277
3.273
3.269
3.265
3.261
51
52
53
54
55
1.298
1.298
1.298
1.297
1.297
1.675
1.675
1.674
1.674
1.673
2.008
2.007
2.006
2.005
2.004
2.402
2.400
2.399
2.397
2.396
2.676
2.674
2.672
2.670
2.668
3.258
3.255
3.251
3.248
3.245
56
57
58
59
60
1.297
1.297
1.296
1.296
1.296
1.673
1.672
1.672
1.671
1.671
2.003
2.002
2.002
2.001
2.000
2.395
2.394
2.392
2.391
2.390
2.667
2.665
2.663
2.662
2.660
3.242
3.239
3.237
3.234
3.232
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
61
62
63
64
65
1.296
1.295
1.295
1.295
1.295
1.670
1.670
1.669
1.669
1.669
2.000
1.999
1.998
1.998
1.997
2.389
2.388
2.387
2.386
2.385
2.659
2.657
2.656
2.655
2.654
3.229
3.227
3.225
3.223
3.220
66
67
68
69
70
1.295
1.294
1.294
1.294
1.294
1.668
1.668
1.668
1.667
1.667
1.997
1.996
1.995
1.995
1.994
2.384
2.383
2.382
2.382
2.381
2.652
2.651
2.650
2.649
2.648
3.218
3.216
3.214
3.213
3.211
71
72
73
74
75
1.294
1.293
1.293
1.293
1.293
1.667
1.666
1.666
1.666
1.665
1.994
1.993
1.993
1.993
1.992
2.380
2.379
2.379
2.378
2.377
2.647
2.646
2.645
2.644
2.643
3.209
3.207
3.206
3.204
3.202
76
77
78
79
80
1.293
1.293
1.292
1.292
1.292
1.665
1.665
1.665
1.664
1.664
1.992
1.991
1.991
1.990
1.990
2.376
2.376
2.375
2.374
2.374
2.642
2.641
2.640
2.640
2.639
3.201
3.199
3.198
3.197
3.195
Confidence Interval
df
80%
0.10
90%
0.05
95%
0.025
98%
0.01
99%
0.005
99.8%
0.001
81
82
83
84
85
1.292
1.292
1.292
1.292
1.292
1.664
1.664
1.663
1.663
1.663
1.990
1.989
1.989
1.989
1.988
2.373
2.373
2.372
2.372
2.371
2.638
2.637
2.636
2.636
2.635
3.194
3.193
3.191
3.190
3.189
86
87
88
89
90
1.291
1.291
1.291
1.291
1.291
1.663
1.663
1.662
1.662
1.662
1.988
1.988
1.987
1.987
1.987
2.370
2.370
2.369
2.369
2.368
2.634
2.634
2.633
2.632
2.632
3.188
3.187
3.185
3.184
3.183
91
92
93
94
95
1.291
1.291
1.291
1.291
1.291
1.662
1.662
1.661
1.661
1.661
1.986
1.986
1.986
1.986
1.985
2.368
2.368
2.367
2.367
2.366
2.631
2.630
2.630
2.629
2.629
3.182
3.181
3.180
3.179
3.178
96
97
98
99
100
1.290
1.290
1.290
1.290
1.290
1.661
1.661
1.661
1.660
1.660
1.985
1.985
1.984
1.984
1.984
2.366
2.365
2.365
2.365
2.364
2.628
2.627
2.627
2.626
2.626
3.177
3.176
3.175
3.175
3.174
1.282
1.645
1.960
2.326
2.576
3.090
Polynomial Regression
Second-order polynomial:
y = a0 + a1x + a2 x2
Sum of the squares of the residuals:
S r = ( y i a 0 a 1 x i a 2 x i2 ) 2
Normal equations:
( )
)a + ( x )a
)a + ( x )a
n a 0 + ( x i )a 1 + x i2 a 2 = y i
( x i )a 0 + ( x i2
( x )a + ( x
2
i
3
i
3
i
4
i
= x iy i
= x i2 y i
y
x2
x2
1
A=
, Y = 2 , C = c2
c3
2
xm 1
ym
xm
and show that C = ( A ' A) 1 A ' Y or C=A -1Y
Fit norm
Fit QR
>> C = polyfit(x, y, n)
>> [C, S] = polyfit(x, y, n)
x = independent variable
y = dependent variable
n = degree of polynomial
C = coeff. of polynomial in
descending power
S = data structure for polyval
function
yi
( yi y )
0
1
2
3
4
5
2.1
7.7
13.6
27.2
40.9
61.1
544.44
314.47
140.03
3.12
239.22
1272.11
15 152.6
From the given data:
2513.39
( yi - a0 - a1xi - a2xi2)2
0.14332
1.00286
1.08158
0.80491
0.61951
0.09439
3.74657
m=2
xi = 15
xi4 = 979
n=6
yi = 152.6
xi yi = 585.6
x = 2.5
xi2 = 55
xi2 yi = 585.6
y = 25.433
xi3 = 225
55
15
55
225
25 a0 152.6
225 a1 = 585.6
979 a2 2488.8
2513.39 3.74657
= 0.99851 = 0.99925
2513.39
Polynomial Interpolation
70
60
50
y
40
30
20
10
0
0
>> y2 = polyval(c,x)
>> plot(x, y, o, x, y2)
Error Bounds
By passing an optional second output parameter from polyfit as an input
to polyval.
yi
1
2
3
4
5
6
7
0.5
2.5
2.0
4.0
3.5
6.0
5.5
x1i
n
x
1i
x2i
Example:
x1 x2
0
2
2.5
1
4
7
0
1
2
3
6
2
x12i
x1i x2i
y
5
10
9
0
3
27
x2i c0 yi
14
c0 = 5
c1 = 4
c2 = 3
16.5
76.25
48
14 c0 54
48 c1 = 243.5
54 c2 100
x11
x
21
A=
xm1
x12
x22
xm 2
x1 p
x2 p
xmp
1
c0
y1
c
y
1
1
, c = , and y = 2
1
ym
c p
Fit norm
>> c = (A*A)\(A*y)
Fit QR
>> c = A\y
Example:
x1 x2
0
2
2.5
1
4
7
0
1
2
3
6
2
y
5
10
9
0
3
27