Batch Effects Correction For Metabolomics: Andrés G. Camacho-Bonet and Wandaliz Torres-García, PH.D
Batch Effects Correction For Metabolomics: Andrés G. Camacho-Bonet and Wandaliz Torres-García, PH.D
Correction for
Metabolomics
Andrés G. Camacho-Bonet 1 and Wandaliz Torres-García2, Ph.D.
Department of Industrial Engineering, University of Puerto Rico
1
Mayagüez, PR [email protected]
Department of Industrial Engineering, University of Puerto Rico
2
Mayagüez, PR [email protected]
Justification
CAR T-Cell Therapies
• What are T-cells
White blood cell that circulate around our bodies, scanning for and infections. [1]
Kill infected cells and naturally eradicate cancer cells [1]
• What are CAR T-Cell therapies?
Challenge:
What characteristics of these
cells are important for the
potency of the therapies?
Approach:
Metabolomics Characterization
[2]
Metabolites
• What are Metabolites?
Small molecules which are the reactants, intermediates, or products of
enzyme-mediated biochemical reactions [3]
• Metabolomic Characterization
Knowing which metabolites are
present permits us to understand
regarding the T-Cell Therapies in a
micro scale:
• What has happened?
• What is happening?
[4]
Metabolomic Characterization
Methods
1. Liquid chromatography (LC)
2. Mass spectrometry (MS)
Justification for analysis
Data Type
1. Develop safer and effective
cancer therapies.
2. Understand critical to
quality metabolites for
manufacturing to make
reliable medicine.
3. Reducing manufacturing
costs through focusing on
what matters.
[5]
Problem &
Objectives
Batch effects problem
• Data acquisition variation is highly sensitive.
• If performed in batches there Is batch clustering.
E.g. different: operators, machines, time etc.
• Challenge to extract insightful information since data is biased.
PCA Plot PCA Plot
Sample of Batch 1
Clustered by
batches [7] [7]
Select batch effect
correction algorithm.
Objectives
Determine analysis to
detect presence of batch
effect before and after
correction.
Methodology
Batch Correction Algorithm
LIMMA – Linear models for microarray data
• Variant of ANalysis Of VAriance (ANOVA) [6].
• Removes any measurable, technical variation not associated with the treatment condition or
biological signal of interest [6].
• Fits a linear model with the known batch and treatment effects, the procedure essentially
performs an ANOVA decomposition on the data and removes the variability associated with
the batches while retaining that which is associated with the experimental design [6].
Combat
• Fits linear model like LIMMA.
• Uses empirical bayes to estimate linear model parameters.
• Removes components associates with bacth effect in the linear model.
Principal Component Analysis (PCA)
• Reduces the dimensionality of the data set, allows most of the
variability to be explained using fewer variables.
PCA – Scores Plot
Correction Metric: Bhattacharyya distance
• Average distance between batches based in PCA scores
1,2 [8]
Where:
D1,2: distance between batch 1 and 2 D1,2
µ1: mean of batch 1
Σ1: covariance matrix of batch 1
Σ
[8]
Between Within
After
SET2 Removal 165 753 15
After
SET3 Removal 32 240 4
Set 2 Batch Effect Correction
No Correction LIMMA Combat
R: Repeatability: 0.36
Set 3 Batch Correction
No Correction LIMMA Combat
Evaluation metrics
Database Batch Correction Method Interbatch Distance % Reduction - Interbatch Distance Repeatability % Change - Repeatability
Uncorrected 104.19 - 0.283 -
Limma 0.186 100% 0.36 27%
Set 2 Combat 20.05 81% 0.275 -3%
Uncorrected 0.387 - 0.346 -
Limma 0.076 80% 0.361 4%
Set 3 Combat 0.022 100% 0.363 5%