0% found this document useful (0 votes)
230 views

Nonlinear Curve Fitting

Nonlinear curve fitting involves fitting mathematical models to observational data using nonlinear functions with multiple parameters. Common nonlinear fitting problems include mixture distributions, fluorescence spectroscopy, and electrophoresis gel analysis. Solutions are found iteratively using algorithms like Levenberg-Marquardt to minimize the error between models and data by adjusting parameters until their derivatives are zero. Software packages use numeric or analytic derivatives to perform the nonlinear curve fitting. The goodness of fit is evaluated using metrics like the coefficient of determination R2.

Uploaded by

sanpuko
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views

Nonlinear Curve Fitting

Nonlinear curve fitting involves fitting mathematical models to observational data using nonlinear functions with multiple parameters. Common nonlinear fitting problems include mixture distributions, fluorescence spectroscopy, and electrophoresis gel analysis. Solutions are found iteratively using algorithms like Levenberg-Marquardt to minimize the error between models and data by adjusting parameters until their derivatives are zero. Software packages use numeric or analytic derivatives to perform the nonlinear curve fitting. The goodness of fit is evaluated using metrics like the coefficient of determination R2.

Uploaded by

sanpuko
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Nonlinear Curve Fitting

Earl F. Glynn
Scientific Programmer Bioinformatics
11 Oct 2006

Nonlinear Curve Fitting


Mathematical Models Nonlinear Curve Fitting Problems
Mixture of Distributions Quantitative Analysis of Electrophoresis Gels Fluorescence Correlation Spectroscopy (FCS) Fluorescence Recovery After Photobleaching (FRAP)

Linear Curve Fitting Nonlinear Curve Fitting


Gaussian Case Study Math Algorithms Software

Analysis of Results
Goodness of Fit: R2 Residuals

Summary
2

Mathematical Models
Want a mathematical model to describe observations based on the independent variable(s) under experimental control Need a good understanding of underlying biology, physics, chemistry of the problem to choose the right model Use Curve Fitting to connect observed data to a mathematical model
3

Nonlinear Curve Fitting Problems


Mixture Distribution Problem
Heming Lake Pike: Length Distribution
0.06 Probability Density 0.00 0 0.01 0.02 0.03 0.04 0.05

20

40 Length [cm]

60

80

Adapted from www.math.mcmaster.ca/peter/mix/demex/expike.html

Nonlinear Curve Fitting Problems


Mixture Distribution Problem
Heming Lake Pike: Distribution by Age Groups

Normal Probability Density Function


0.06

f ( x) =

1 2
2

Probability Density

0.03

0.04

0.05

( x )2 2 2

0.02

Coefficient of Variation

cv =
0 20 40 Length [cm] 60 80

Data are fitted by five normal distributions with constant coefficient of variation 5

0.00

0.01

Nonlinear Curve Fitting Problems


Quantitative Analysis of Electrophoresis Gels Deconvolve a pixel profile of a banding pattern into a family of Gaussian or Lorentzian curves

Das, et al, RNA (2005), 11:348

http://papakilo.icmb.utexas.edu/cshl-2005/lectures/ CSHL_Lecture05_khodursky.ppt#23

Takamato, et al, Nucleic Acids Research, 32(15), 2004,6 2 p.

Nonlinear Curve Fitting Problems


Quantitative Analysis of Electrophoresis Gels Many proposed functional forms besides Gaussian or Lorentzian curves

DiMarco and Bombi, Mathematical functions for the representation of chromatographic peaks, Journal of Chromatography A, 931(2001), 1-30.

Nonlinear Curve Fitting Problems


Fluorescence Correlation Spectroscopy (FCS)

Bacia, Kim & Schwille, Fluorescence cross-correlation spectroscopy in living cells, Nature Methods, Vol 3, No 2, p. 86, Feb. 2006.

Nonlinear Curve Fitting Problems


Fluorescence Correlation Spectroscopy (FCS)

Note likely heteroscedasticity in data


From discussion by Joe Huff at Winfried Wiegraebes Lab Meeting, 11 Aug 2006 9

Nonlinear Curve Fitting Problems


Fluorescence Recovery After Photobleaching (FRAP)

From discussion by Juntao Gao at Rong Lis Lab Meeting, 25 Sept 2006

10

Nonlinear Curve Fitting Problems


Fluorescence Recovery After Photobleaching (FRAP)

From discussion by Juntao Gao at Rong Lis Lab Meeting, 25 Sept 2006

11

Linear Curve Fitting


Linear regression Polynomial regression Multiple regression Stepwise regression Logarithm transformation

12

Linear Curve Fitting


Linear Regression: Least Squares

Given data points ( , y).


i

We want the best straight line, ( , y ), through these points, where y is thefitted value at point :
i i

y = a +b x
i

i
13

Linear Curve Fitting


Linear Regression: Least Squares
(x,y) Data

Linear Fit

y = a +b x
i

(xi,yi) Error Function

0 1 2 x 3 4

( a , b ) = y i (a
i =1

+ bx )]
i

Assume homoscedasticity (same variance)


14

Linear Curve Fitting


Linear Regression: Least Squares Search (a,b) parameter space to minimize error function, 2 Error Function 2

200

( a , b ) = y i (a
i =1

+ bx )]
i

(1.2, 0.9) = 1.9

150

100

Linear Fit
3 2 1 -1 0 0 1
b

50

y = 1.2 + 0.9 x
i

i 15

Linear Curve Fitting


Linear Regression: Least Squares
Least Squares Line
5

y = 1.2 + 0.9x

y 0 0 1 2

2 x

How can (a,b) parameters be found directly without a search? 16

Linear Curve Fitting


Linear Regression: Least Squares How can (a,b) parameters be found directly without a search? Differentiate 2 with respect to parameters a and b Set derivatives to 0.

N 2 = 2 yi a b xi = 0 a i =1 N 2 = 2 xi ( yi a b xi ) = 0 b i =1

17

Linear Curve Fitting


Linear Regression: Least Squares How can (a,b) parameters be found directly without a search? Linear Fit Simultaneous Linear Equations N xi a = y i 2 b xi y i xi x i

y = a +b x
i

18

Linear Curve Fitting


Linear Regression: Least Squares How can (a,b) parameters be found directly without a search? Linear Fit Simultaneous Linear Equations N xi a = y i 2 b xi y i xi x i
5 10 a 15 10 30 b = 39
15 39 a= 5 10 5 10 b= 5 10 10 30 15 30 39 10 60 = = 1.2 = 10 5 30 10 10 50 30 15 39 5 39 10 15 45 = = = 0.9 10 5 30 10 10 50 30

y = a +b x
i

i x y x xy 1 0 1 0 0 2 1 3 1 3 3 2 2 4 4 4 3 4 9 12 5 4 5 16 20 Sum 10 15 30 39

19

Linear Curve Fitting


Linear Regression: Least Squares
> x <- 0:4 > y <- c(1,3,2,4,5) > summary( lm(y ~ x) ) Call: lm(formula = y ~ x) Residuals: 1 2 3 -0.2 0.9 -1.0

R solution using lm (linear model)


5 0.2

4 0.1

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2000 0.6164 1.947 0.1468 x 0.9000 0.2517 3.576 0.0374 * --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.7958 on 3 degrees of freedom Multiple R-Squared: 0.81, Adjusted R-squared: 0.7467 F-statistic: 12.79 on 1 and 3 DF, p-value: 0.03739 20

Linear Curve Fitting


Linear Regression: Least Squares Assume homoscedasticity (i = constant = 1)

Assume heteroscedasticity

y ( a + b x ) ( a, b) = i
2 N i i i =1

Often weights i are assumed to be 1. 21 Experimental measurement errors can be used if known.

Nonlinear Curve Fitting


Gaussian Case Study
x -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 y 0.00004 0.00055 0.00472 0.02739 0.10748 0.28539 0.51275 0.62335 0.51275 0.28539 0.10748 0.02739 0.00472 0.00055 0.00004
Gaussian Data
0.6 y 0.0 -2 0.1 0.2 0.3 0.4 0.5

-1

1 x

Normal Probability Density Function

f ( x) =

1 2
2

( x )2 2 2 22

Nonlinear Curve Fitting


Gaussian Case Study

Minimum = 1.5 = 0.8

Gradient descent works well only inside valley here


0.6 0.8
ma si g

0.4

1.0 1.2 1.4 5 0 mu -5

f ( x) =

1 2
2

( x )2 2 2

( , ) = yi f ( xi )
i =1

23

Assume homoscedasticity

Nonlinear Curve Fitting


Gaussian Case Study Derivatives may be useful for estimating parameters
Single Gaussian
0.4 0.0 0.1 0.2 y 0.3

-4

-2

0 x

1st Derivative
0.2 -0.2 0.0 y'

-4

-2

0 x

2nd Derivative
0.2 y'' -0.4 -0.2 0.0

-4

-2

0 x

24

U:/efg/lab/R/MixturesOfDistributions/SingleGaussian.R

Nonlinear Curve Fitting


Gaussian Case Study Derivatives may be useful for determining number of terms
Heming Lake Pike
0.04 0.00 0.02 0.06 y

20

40 x

60

80

1st Derivative

y'

-0.005

0.005

20

40 x

60

80

2nd Derivative
0.002 y'' -0.006 -0.002

20

40 x

60

80

25

Nonlinear Curve Fitting


Math

Given data points (xi,yi). Given desired model to fit (not always known): y = y(x; a) where there are M unknown parameters: ak, k = 1,2,..., M. The error function (merit function) is

y y (x ; a ) (a) = i
2 N i i i =1

From Press, et al, Numerical Recipes in C (2nd Ed), 1992, p. 682

26

Nonlinear Curve Fitting


Math Need to search multidimensional parameter space to minimize error function, 2

27

Nonlinear Curve Fitting


Math Gradient of 2 with respect to parameters a will be zero at the minimum:

N [ y y (xi ; a )] y (xi ; a ) 2 = 2 i k = 1,2,..., M 2 ak ak i i =1

k (after dropping -2)

Taking the second derivative of 2:

Often small and ignored

2 2 2 N 1 y ( x i ; a ) y (x i ; a ) y (x i ; a ) = 2 2 [ yi y (xi ; a )] a k al ak al al a k i =1 i

kl = Hessian or curvature matrix (after dropping 2) From Press, et al, Numerical Recipes in C (2nd Ed), 1992, p. 682 28

Nonlinear Curve Fitting


Algorithms

Levenberg-Marquardt is most widely used algorithm:


When far from minimum, use gradient descent: al = constant l When close to minimum, switch to inverse Hessian:

l =1

kl

a l = k

Full Newton-type methods keep dropped term in second derivative considered more robust but more complicated Simplex is an alternative algorithm 29

Nonlinear Curve Fitting


Algorithms

Fitting procedure is iterative Usually need good initial guess, based on understanding of selected model No guarantee of convergence No guarantee of optimal answer Solution requires derivatives: numeric or analytic can be used by some packages

30

Nonlinear Curve Fitting


Software IDL:
curvefit function; MPFIT: Robust non-linear least square curve fitting
Instrumentation is quite-well versed in using MPFIT and applying it in IDL
(3 limited licenses) Joe Huff in Advanced

Mathematica
(1 limited license)

MatLab:
(1 limited license) (10 limited licenses)

Curve Fitting Toolbox Peak Fitting Module

OriginPro: PeakFit:
(1 limited license)

Nonlinear curve fitting for spectroscopy, chromatography and electrophoresis

R: nls function
many statistics

symbolic derivatives (if desired) flawed implementation: exact toy problems fail unless noise added

31

Nonlinear Curve Fitting


Software NIST reference datasets with certified computational results

http://www.itl.nist.gov/div898/strd/general/dataarchive.html

32

Analysis of Results
Goodness of Fit: R2 Residuals

33

Goodness of Fit: R2
Coefficient of Determination Percentage of Variance Explained

= 1

Residual Sum of Squares (RSS) Total Sum of Squares (SS) [Corrected for Mean]

( yi yi ) R =1 ( yi y )
2

0 R2 1

Adjusted R2 compensates for higher R2 as terms added. A good value of R2 depends on the application. In biological and social sciences with weakly correlated variables, and considerable noise, R2 ~ 0.6 might be considered good. In physical sciences in controlled experiments, R2 ~ 0.6 might be considered low.
Faraway, Linear Models with R, 2005, p.16-18

34

Residuals
Residuals are estimates of the true and unobservable errors. Residuals are not independent (they sum to 0).

Curve fitting made easy, Marko Ledvij, The Industrial Physicist, April/May 2003. http://www.aip.org/tip/INPHFA/vol-9/iss-2/p24.html 35

Analysis of Residuals
Are residuals random? Is mathematical model appropriate? Is mathematical model sufficient to characterize the experimental data? Subtle behavior in residuals may suggest significant overlooked property
Good Reference: Analysis of Residuals: Criteria for Determining Goodness-of-Fit, Straume and Johnson, Methods in Enzymology, Vol. 210, 87-105, 1992.
36

Analysis of Residuals
Synthetic FRAP Data: Fit with 1 term when 2 terms are better

Near perfect fit, but why is there a pattern in the residuals? 37

Analysis of Residuals
Lomb-Scargle periodogram can indicate periodicity in the residuals

Flat line with all bad p-values would indicate random residuals 38

Analysis of Residuals
Synthetic FRAP Data: Fit with 2 terms

39

Analysis of Residuals
FCS Data and Heteroscedasticity

(a) =
i =1

y i y (x i ; a )
i

Scaling Factor

Heteroscedasticity in Residuals

Scaled Residuals

Use F Test to test for unequal variances FCS Residual Plots Courtesy of Joseph Huff, Advanced Instrumentation & Physics 40

Analysis of Residuals
Heteroscedasticity and Studentized Residuals

Studentized residual is a residual divided by an estimate of its standard deviation The leverage hii is the ith diagonal entry of a hat matrix.
Studentize d Residual =

1 hii

Externally Studentized Residuals follow Students t-distribution. Can be used to statistically reject outliers
See http://en.wikipedia.org/wiki/Studentized_residual 41

Summary
A mathematical model may or may not be appropriate for any given dataset. Linear curve fitting is deterministic. Nonlinear curve fitting is non-deterministic, involves searching a huge parameter space, and may not converge. Nonlinear curve fitting is powerful (when the technique works). The R2 and adjusted R2 statistics provide easy to understand dimensionless values to assess goodness of fit. Always study residuals to see if there may be unexplained patterns and missing terms in a model. Beware of heteroscedasticity in your data. Make sure analysis doesnt assume homoscedasticity if your data are not. Use F Test to compare the fits of two equations.
42

Acknowledgements
Advanced Instrumentation & Physics Joseph Huff Winfried Wiegraebe

43

You might also like