Extracting Information From Spectral Data.: Nicole Labbé, University of Tennessee
Extracting Information From Spectral Data.: Nicole Labbé, University of Tennessee
spectral data.
Nicole Labb, University of Tennessee
SWST, Advanced analytical tools for the wood industry
Data collection
Near Infrared spectra
2150 data points, 350-2500
nm, 1 nm resolution, 8 scans
400
800
1200
1600
2000
2400
W a v e le n g t h ( n m )
3200
2400
1600
W avenum ber (cm -1 )
800
300
400
500
600
Wavelength (nm)
700
800
Data pretreatment
-Local filters
Signal
-Smoothing
-Derivatives
Absorbance
W avelength (nm )
-Baseline correction
Absorbance
Wavelength
Corrected spectral
data after MSC
Wavelength
Two approaches:
-Univariate analysis
-Multivariate analysis
Absorbance
2.5
2.0
1.5
1.0
0.5
0.0
800
375 x-variables
1200
1600
2000
2400
Wavelength (nm)
Univariate analysis
Measured cellulose content versus
predicted cellulose content using one
variable (1530 nm) as a predictor
(R2 = 0.12)
50
45
40
35
30
25
25
30
35
40
45
50
Multivariate analysis
50
45
40
35
30
25
25
30
35
40
45
50
Qualitative information
Principal Components Analysis (PCA)
Recognize patterns in data: outliers, trends, groups
0.8
PC2 (8%)
0.4
0.0
-0.4
-0.8
-6
-4
-2
PC1 (89%)
n Samples
Spectral data
x variables
30
20
Intensity (A.U.)
Bagasse
PC2
10
0.12
0.06
Corn stover
0.00
Red oak
Yellow poplar
Hichory
-10
Switchgrass
-0.06
800
-20
1200
1600
2000
Wavelength (nm)
-30
-30
-20
-10
PC1
10
20
2400
Bagasse
20
PC2
10
Intensity (A.U.)
30
0.12
0.06
Red oak
Corn stover
Yellow poplar
0.00
Hichory
-10
Switchgrass
-0.06
800
-20
1200
1600
2000
Wavelength (nm)
-30
-30
-20
-10
PC1
10
20
2400
x1
1 variable = 1 dimension
x2
x1
x1
x3
2 variables = 2 dimensions
Absorbance
2.5
2.0
1.5
1.0
0.5
375 x-variables
0.0
800
1200
1600
2000
2400
2800
Wavelength (nm)
PC3 will be orthogonal to both PC1 and PC2 while simultaneously lying along the
direction of the third largest variation.
The new variables (PCs) are uncorrelated with each other (orthogonal)
Original
variable space
PC space
Relationship between the original variable space and the new PCs space
There is a set of loadings for each PC (loading vector)
20
PC2
10
Red oak
Corn stover
Yellow poplar
Hichory
-10
Switchgrass
-20
0.18
-30
-20
-10
PC1
10
Intensity (A.U.)
-30
20
0.12
PC1
PC2
0.06
0.00
-0.06
800
1200
1600
2000
Wavelength (nm)
2400
Quantitative information
Projection to Latent Structures or Partial Least Squares Regression (PLS)
Establish relationships between input and output variables, creating predictive models.
Model
+ Model
PLS can be seen as two interconnected PCA analyses, PCA(X) and PCA(Y)
PLS uses the Y-data structure (variation) to guide the decomposition of X
The X-data structure also influences the decomposition of Y
Samples
y variables
Spectral data x variables
If y variables are not correlated PLS1
Samples
y variables
Spectral data x variables
If y variables are correlated PLS2
Samples
Calibration model to
predict cellulose content
in pine
r = 0.95
RMSEC = 1.6
45
40
12
35
30
4
25
25
30
35
Intensity
50
40
45
0
Measured cellulose content (%)
-4
50
-8
-12
1000 1200 1400 1600 1800 2000 2200 2400
Wavelength (nm)
50
r = 0.95
RMSEP = 1.55
45
40
35
30
25
25
30
35
40
45
50
y variable
Spectral data
x variables
PLS-DA
Calibration model
0
-1
Yellow poplar
-2
-0.6
1.2
Hickory
-0.4
-0.2
0.0
PC1 (92%)
0.2
Predicted Y variable
PC2 (6%)
1.0
0.4
0.6
0.8
0.6
0.4
0.2
r = 0.99
RMSEC = 0.04
0.0
-0.2
0.0
0.2
0.4
0.6
0.8
Measured Y variable
1.0
1.2
PLS-DA
Validation model
Y-reference
Predicted Y
Spectrum 00008
0.0000
-0.0338
Spectrum 00009
0.0000
0.0270
Spectrum 00015
1.0000
0.9340
Spectrum 00016
1.0000
1.0220
1.2
1.0
Predicted Y
0.8
r = 0.99
RMSEP = 0.04
Hickory
0.6
0.4
0.2
Yellow
poplar
0.0
-0.2
0.0
0.2
0.4
0.6
Y reference
0.8
1.0
1.2
Mechanical, electrical,
Perturbation chemical, magnetic
optical, thermal,
Electro-magnetic
Probe (eg, IR,
UV, LIBS,)
System
2D
correlation
maps
0.15
0.10
2.0
Intensity (A.U.)
Intensity (A.U.)
2.5
1.5
1.0
0.05
0.00
-0.05
0.5
-0.10
1000 1200 1400 1600 1800 2000 2200 2400
Wavelength (nm)
Wavelength (nm)
Reference spectrum
Synchronous matrix
Asynchronous matrix
[ASYN ](nn ) = [DYN ]T(nm ) [N ]( mm) [DYN ](mn )
Noda-Hilbert matrix
Generation of orthogonal
components: synchronous and
asynchronous 2D correlation
intensities
Homo-correlation
Hetero-correlation
NIR/NIR
NIR/MBMS
Demuth H., Beale M. and Hagan M., Neural Network Toolbox 5 Users Guide Matlab
non-linear
data
Trygg J., Wold S. (2002) Orthogonal projections to latent structures (O-PLS). J. Chemo. 16:119-128
nonlinear
data
Softwares
www.camo.com
www.umetrics.com
www.infometrix.com
www.mathworks.com
References
A user-friendly guide to Multivariate Calibration and Classification; T. Ns, T.
Isaksson, t. Fearn, T. Davies, NIR Publications, Chichester, UK, 2002
Multivariate calibration, H. Martens and T. Ns, John Wiley & Sons, Chichester, UK,
1989
Chemometric techniques for quantitative analysis, Marcel Dekker, New York, 1998
Two-dimensional correlation spectroscopy, I. Noda and Y. Ozaki, John Wiley & Sons,
Chichester, UK, 2004
Neural Network Toolbox 5 Users Guide Matlab, H. Demuth, M. Beale and M.
Hagan.
Questions?