EncodingAndDecoding Gallant
EncodingAndDecoding Gallant
Jack Gallant
University of California at Berkeley
Main messages for today
●
The classical deductive, null-hypothesis testing
approach used in cognitive neuroscience is weak,
inefficient and often produces misleading results.
●
A more open-ended, abductive approach can be much
more efficient than the classical approach, and it can
quickly produce powerful predictive models.
●
The standard SPM pipeline discards much useful
information because it focuses almost solely on Type I
error and ignores Type II error.
●
A voxel-wise modeling approach preserves much more
of the useful information in the data and so minimizes
both Type I and Type II error.
For further background check Martin's lectures
fMRI as functional mapping
Using VM to Decode
The brain is organized at multiple scales
Neurons Layers and columns
Maps Areas
Brodmann 1909
Ohki et al., 2006
Information is represented in functional maps
Tonotopic map in A1 Whisker map in S1 (rodent)
Retinotopic maps
Spatial
Orientation frequency
Retinotopic
Mapping
Stimulus
2DG
map on
flattened
Macaque
cortex
Ocular dom.
Tootel et al., J. Neurosci (1988) Issa, Trepel & Stryker, J. Neurosci (2000)
Human cortex could contain hundreds of areas
Macaque connectome:
410 tracing studies
383 areas and structures
6602 connections Modha & Singh 2010
Mammalian vision as a model system
●
Dozens of distinct areas.
●
Areas arranged in a
hierarchical, parallel network.
●
Transformations between
areas are nonlinear.
●
Areas contain systematic,
high-dimensional maps.
●
Each area represents
different visual information.
383
areas
Modha & Singh, PNAS, 2010 Felleman and Van Essen, Cerebral Cortex, 1992
fMRI as functional mapping
Using VM to Decode
The deductive approach to task-based fMRI
●
Find out how the brain mediates behavior.
●
Use simple stimulus or task with few conditions.
●
Find out how the brain is organized into areas.
●
Use anatomy, localizers or a searchlight to discover
ROIs.
●
Find out what information is mapped within each area.
●
Test for statistically significant differences in responses
across conditions, or use a classifier.
●
Find out how these maps vary across individuals.
●
Map individual brains into standardized anatomical
coordinates and do analysis at group level.
The deductive approach to task-based fMRI
EBA Anterior to MT+ on the medial temporal gyrus Bodies – Objects Downing et al, 2001
(extrastriate body area)
FBA Fusiform sulcus/gyrus anterior to FFA Bodies - Objects Peelen & Downing, 2005
(fusiform body area) Schwarzlose et al, 2005
PPA Collateral fissure Scenes – Objects Epstein & Kanwisher, 1998
(parahippocampal place area)
TOS Just inferior to V7 Scenes – Objects Nakamura et al, 2000
(transverse occipital sulcus) Hasson et al, 2003
RSC Medial wall just superior to PPA Scenes – Objects Aguirre et al, 1996
(retrosplenial cortex)
FEF Precentral sulcus adjoining superior frontal sulcus Saccades – Fixation Luna et al, 1998
(frontal eye field)
iFEF/FO Inferior portion of precentral sulcus Saccades – Fixation Berman et al, 1999
(inferior frontal eye field) Corbetta et al, 1998
Localizers produce misleading results
●
Many assumptions are implicit in the operational
definitions and selection of task conditions.
●
It cannot recover detailed information about
representations within areas.
●
It is both too conservative (Type 1 error control too strict)
and insufficiently sensitive (Type 2 error control
insufficient).
●
Results often generalize poorly beyond the tested
subspace.
●
It doesn't offer any method for determining when you
should be satisfied with a model.
What we would like from an approach
●
Provides a method to test alternative hypotheses quickly.
●
Provides rich behavioral & brain data for low cost.
●
Makes all assumptions and operational definitions
quantitatively explicit.
●
Easily bridges between psychological concepts and brain
measurements.
●
Recovers fine detail in cortical maps.
●
Minimizes Type I error.
●
Minimizes Type II error.
●
Provides objective measures of significance and effect size
(i.e., importance).
fMRI as functional mapping
Using VM to Decode
What analysis would be optimal for these data?
●
Find out how the brain mediates behavior.
●
Use broad range of stimulus/task conditions.
●
Find out how the brain is organized into areas.
●
For each voxel in each subject, fit competing linearized
models that embody different feature spaces and
compare model predictions.
●
Find out what information is mapped within each area.
●
Visualize voxel tuning and find feature subspace that best
describes tuning of population.
●
Find out how these maps vary across individuals.
●
Define maps and areas in individual subjects and
aggregate across subjects.
VM differs from the classical approach in that...
●
The stimuli & tasks can be complicated/high-dimensional.
●
Two separate, interleaved data sets are acquired: one for
fitting, one for testing.
●
Regression occurs within a high- dimensional feature
space that mediates between stimulus/task variables and
BOLD responses.
●
A separate spatio-temporal HRF is estimated for each voxel
each feature and each delay.
●
No spatial smoothing is performed.
●
No cross-subject averaging is performed.
●
Predictions are used to evaluate and compare models.
●
Interpretation involves visualizing voxels and maps.
Voxel-wise modeling (VM)
Collect
functional
data
Estimate
voxel-wise
models
Visualize &
interpret
results
Decode
information
fMRI as functional mapping
Using VM to Decode
Sampling the stimulus and task space
●
The fit set is used to
estimate voxel-wise models,
Total possible variance
so the stimulus/tasks should
sample the relevant feature Total variance in stimulus/task subspace
spaces as completely as
possible. Optimize by Potentially explainable variance given data size
collecting few trials from
many different states. Explained variance
●
The test set is used to
validate fit models, so the
signals must be measured
accurately. Optimize by
collecting many repeated
trials from a few states.
●
Make sure the stimuli/task
Significant variance
spaces for fit and test sets
overlap!
Wu, David and Gallant, Annual Reviews of Neuroscience, 2006
Sampling the stimulus and task space
●
Sample as much of the stimulus/task space as possible.
●
Obtain a good estimate of responses to the validation data.
●
Make sure non-stationary responses are evenly distributed
across both the estimation and validation data.
Unwarp
Motion Correct
Reference*
Temporal Mean
Coregister
Reslice Freesurfer
FSL
Detrend pycortex
Custom
* Reference volume from first scan
or alternate session
Minimizing head motion
~2 mm
Complexities of hemodynamic coupling
Using VM to Decode
Typical example: VM for silent movie data
Typical example: VM of silent movie data
Fitting the models
●
Replicate the data at time lags covering the HRF.
●
Separate the estimation set into 3 subsets: 80% to fit the
weights, 10% to fit the regularization parameter and 10% to
evaluate predictions.
●
Bootstrap the regularization and prediction sets.
●
Average model weights across bootstrap samples.
Prediction Regularization
Set Parameter
NOTE! VALIDATION
DATA ARE SAVED!
Alex
A category model for high-level vision Huth
Using VM to Decode
The most common decoding method is MVPA
P ( f (S )∣R)∝ P ( R∣ f ( S )) P ( f ( S ))
Encoding versus decoding
●
Quality of brain activity measurements.
●
Accuracy of brain models.
●
Computer power.
Joseph Niepce, 1825
Advantages of voxel-wise modeling
●
More sensitive and specific than any other method.
●
Produces useful results in single subjects.
●
Produces maps at the finest scale of detail available.
●
Does not require defining ROIs, but can be used to
discover ROIs and gradients.
●
Reveals substructure and detailed tuning within ROIs.
●
Produces estimates of both significance AND effect size.
●
Makes visualization and interpretation simple.
●
Allows predictions out of the fit set, and provides a
principled platform for decoding.
●
Can be generalized to include voxel cross-correlations or
group-level analysis.
●
Can be used to decode brain activity with the highest
accuracy currently attainable.
Current Lab Members Collaborating labs
Natalia Bilenko Mark Lescroart Frederic Theunissen, Bin Yu,
James Gao Lydia Majur Tom Griffiths, Cheryl Olman
Alex Huth Anwar Nunez Bertrand Thirion,
Fatma Imamoglu Michael Oliver Essa Yacoub, Kamil Ugurbil
Dustin Stansbury
●
The 1st law of science: There is no free lunch.
●
The 2nd law of science: One person's signal is another
person's noise.
●
The 1st law of neuroscience: No matter what your theory is,
it is insufficient to explain the brain.
●
The 2nd law of neuroscience: The brain doesn't care what
you think about the brain.
●
Statistical significance is necessary but not sufficient for
doing science.
●
The goal of science is to formulate an intelligible
explanation the system that predicts accurately.
●
You can learn a lot about a little, or a little about a lot, but
the amount learned is determined by the size of the data.
Opinions: fMRI
●
The biggest problem with fMRI data isn't Type I error, its
Type II error.
●
If you don't have a cortical mapping question, you shouldn't
be using fMRI.
●
All fMRI studies measure an entangled combination of
representation and intention information.
●
The biggest factors determining individual variability in
BOLD signal quality are (1) the size of the brain relative to
the receive coil, (2) head and body motion, (3) attention.
●
Remember that the people who built your magnet were
trying to make clinical radiologists happy.
Opinions: fMRI
●
Many fMRI studies make implicit assumptions of linearity.
(e.g., hemodynamic coupling or cognitive superposition).
These are almost always wrong.
●
Virtually every fMRI study spheres the data to remove non-
stationary components. This is the wrong thing to do, but
no one knows what the right thing is.
●
Flowchart models developed in cognitive psychology often
have little to do with cortical organization
●
Functional connectivity has nothing to do with connectivity
and little to do with function.
●
MVPA decoding has nothing whatsoever to do with
decoding.
●
Granger causality has nothing to do with causality.
●
It takes more data to accurately estimate functional
connectivity than to estimate task-related effects.
Opinions: Design and data collection
●
It is usually better to collect more data from fewer subjects
than to collect fewer data from more subjects.
●
Optimize fMRI data acquisition for every experiment.
●
Always collect separate, interleaved data sets for
estimation (fit) and validation (test).
●
Use well trained subjects who attend and who do not move.
●
Place subjects consistently in the magnet and collect field
maps.
●
Measure field distortion caused by your peripherals and
place them consistently.
●
Collect physiological data and all other telemetry possible.
●
Collect field maps.
Opinions: Pre-processing
●
Check for artifacts and alignment BY HAND in every single
run.
●
Detrend with a Savitsky-Golay filter (or at least a median
filter).
●
Z-score data within voxels and within runs.
●
Estimate a separate HRF for every feature, every voxel and
every subject.
●
Never smooth the data blindly. Avoid smoothing at all if
possible.
●
Be very careful when aggregating data across runs or
sessions.
●
Whatever automated pipeline you are using, it doesn't work
well enough.
Opinions: Data analysis and modeling
●
Smoothing is usually bad, blind smoothing is always bad.
●
If you are discarding data to make your statistics work, you
are doing the wrong statistics.
●
Focus on single subjects first. Only proceed to group-level
analysis after you thoroughly understand the single
subjects.
●
Focus on prediction and effect size, not significance.
Opinions: Interpretation and visualization
●
Comparisons of activity/correlations between
conditions/areas are not valid unless the SNR across the
conditions/areas is equal.
●
If you are running a study on cortical activation, show your
data on both inflated hemispheres and flat maps!
●
If you show thresholded data, show the un-thresholded
data as well.
●
If you are showing group-level results, show the individual
results as well (and report how often the phenomenon was
seen in individual subjects).
●
Always report variance explained as a portion of the
potentially explainable variance.
Opinions: Decoding
●
Decoding is a good way to do engineering, but it is
generally a bad way to do science.
●
The MVPA classifier approach is not really decoding.
●
The best encoding model will create the best decoding
model.
●
There are a few special cases where scientific issues can
be addressed with decoding.
Voxel-wise modeling as statistical inference
Bold Response
gallantlab.org strflab.berkeley.edu
neurotree.org crcns.org