0% found this document useful (0 votes)
73 views

EEG Preprocessing Protocol Guideline

The document describes the typical steps involved in preprocessing EEG data: 1. Import the raw EEG data and apply preprocessing steps like filtering, removing bad channels, and re-referencing. 2. Create epochs by extracting specific time windows related to events of interest from the continuous data. 3. Perform additional preprocessing on the epoched data like artifact rejection and baseline correction to clean the data before analysis. 4. Analyses like time-frequency decomposition, ERP extraction, connectivity measures can then be performed on the preprocessed epoched data.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

EEG Preprocessing Protocol Guideline

The document describes the typical steps involved in preprocessing EEG data: 1. Import the raw EEG data and apply preprocessing steps like filtering, removing bad channels, and re-referencing. 2. Create epochs by extracting specific time windows related to events of interest from the continuous data. 3. Perform additional preprocessing on the epoched data like artifact rejection and baseline correction to clean the data before analysis. 4. Analyses like time-frequency decomposition, ERP extraction, connectivity measures can then be performed on the preprocessed epoched data.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 114

Preprocessing EEG Data

Sequence of EEG Pre-processing


Stage 1 Stage 2 Stage 3 Stage 4
Pre-ICA ICA (if applicable) Type of analysis Post-analysis
Import data Plotting
Run ICA Time-freq. Connectivity
Channel Locations
Remove bad Ics condition ERPs Statistics
Filter
Change Sampling rate Specific Epochs Laplacian FFTs Extracting data
(if applicable)
Interpolate bad
electrodes
Re-reference data
Segment the data
Baseline correction
Trial Rejection
Concepts
7.1. What is preprocessing
• Any organization or transformations that occur between collecting the
data and analyzing the data
• Without changing any of the data (e.g. extracting epochs from
continuous data)
• Without changing clean data (e.g. removing bad electrodes or rejecting
epochs with artifacts)
• Modifying clean data (e.g. applying temporal filters or spatial
transformation)
• Transform raw data  suitable and interpretable format
• EEG: remove noise  get closer to true neural signal
How do you choose your preprocessing
pipeline?
• There is no universally adopted EEG preprocessing pipeline
How do you choose your preprocessing pipeline?
• Details of the experiment design
• The equipment used to collect the data
• The analysis you plan on performing
• Idiosyncratic protocols and preferences

• Keep track of all the details of preprocessing for each subject


• Use the same preprocessing procedures for all conditions to minimize the
possibility of bias.
7.2. The balance between signal and noise
• EEG data contain signal and noise

• Appropriate preprocessing will attenuate the noise in the data


7.2. The balance between signal and noise
• Different preprocessing strategies lead to different balances between
the amount of noise versus signal

• Goal of the study and type of the analysis being performed


• How much data you have
• How difficult the are to acquire

• Time-frequency-based analyses tend to increase the signal-to-noise


characteristics of the data, particularly for single-trial analyses , and
relatively low frequencies (<20 HZ)
7.3. Creating Epochs
• Specific time window with respect to occurrence of an event is
extracted from the continuous EEG data.
7.3. Creating Epochs
• Before epoching 2-D matrix (time and electrodes)
• After epoching 3-D matrix (time, electrode and trials)

• Epoching is not necessary for resting-state datasets


• Continuous data can be segmented into non-overlapping segments of
a few second to facilitate analyses

• What event to define as time = 0


• Buffer Zone
7.3. Creating Epochs
Which event to call time 0?
• Straightforward  onset of stimuli at each trial
• Decision-making
• Task-related activity and response related activity (which is
temporally variable with respect to stimulus onset) are of interest.

• Several stimuli being presented with variable delays multiple


events that could be used as time 0
• time-lock data to the earliest event in each trial
• Time-lock to the event on which you will focus most of you
analyses
7.3. Creating Epochs

• The time series data can be temporally shifted during analyses  the
decision of what to use as time 0 event will not necessarily limit your
analyses
7.3. Creating Epochs
Buffer Zone
• How much time to include before and after time = 0
• Depends on the experiment
• What type of analyses you want to perform
• At least as long as the duration of the trial

• Time frequency analysis  longer epochs To avoid contaminating


your trials with edge artifacts
7.3. Creating Epochs
• They can happen as a result of applying
temporal filters to sharp edges (e.g. step
function)

• Contaminate up to three cycles of activity

Long epoch caveat: epochs with overlapping


data

• (ICA bias it will run on some time


points more often  not expose ICA to
the same data more than once)
7.3. Creating Epochs
Reflection Approach

• Dataset has already been epoched and cannot be reepoched from


the continuous data

• It should be use as a measure of necessity, not as a substitute for


including buffer zone in epoching
7.3. Creating Epochs
7.4. Matching Trial Count across Condition

• It is ideal for all conditions to have the same number of trials

• Small difference  OK!

• Experiment necessary entails unbalanced trial count  match across


conditions
7.4. Matching Trial Count across Condition

• Select the first N trials from each condition, where N is the number of
trials in the smallest conditions;

• Select trials at random;

• Relevant behavioral or experiment variable (e.g. reaction time,


saccade speed, pupil response)
7.4. Matching Trial Count across Condition

Trial count matching across subjects

• Not necessary
• Compare different groups of subjects (patient vs. control)
• Correlate the EEG results across subjects with a behavioral variable
related to trial count
• Report this in method section + behavioral results before and after
trials selection
7.5. Filtering
• Remove high-frequency artifacts and low-frequency drifts
• High-pass filter
• Useful and recommended attenuate slow drifts
• Should be applied only to continuous data not epoched data
(due to edge artifact lasting longer than your epoch)
• Most time-frequency methods apply set of temporal filters
• Wavelet convolution
• FFT
• Filter-Hilbert
7.6. Trial Rejection

Preprocessing step most open to idiosyncratic preferences


• Automatic vs. Manual trial rejection
• Sharp edge artifacts detrimental to time-frequency decomposition 
small sharp edges may be undetected by automatic procedures
7.7. Spatial filtering
• Localizing results
• Isolate topographical feature of data
• Preprocessing step for connectivity analyses
7.8. Referencing

• Only for EEG data (MEG is a reference free measurement)


• Voltage values recorded from each electrode are relative to a
voltage value recorded elsewhere!

Where is this elsewhere?


• Theoretically, anywhere!
• Any activity present in the reference electrode will be
reflected as activity in other electrodes
7.8. Referencing

• When picking a reference, it is important that the


electrode(s) that you’re selecting as a reference have as little
influence on the locations of your signal of interest as
possible.
• In practice, this means that either the references are located
far away from the signal of interest or an average of several
electrodes is used.
7.8. Referencing
• Mastoids (the electrodes placed roughly behind a person’s ears), due to
being relatively far from the brain yet close to the other electrodes.
However, there is still some neural activity at that location.

• Cz (the central electrode) is frequently chosen when looking at activity


that is distant from that location.

• The average of all electrodes (also known as Common Average


Reference)  enough number of electrodes

• Some electrodes have a bipolar reference; one electrode is measured


relative to another (eye electrodes (leaving one signal)
7.8. Referencing
• How many electrodes do you have

• Where the electrodes are placed

• What analyses will be performed

• What kind of cognitive tasks you use elicit activity from


which region of the brain
7.8. Referencing
• Referencing is a linear transformation of the data;

• it can be offline not necessarily during recording  the


electrode that serves as a reference electrode during
recording not very imp!

• Imp to know you may think you have bad electrode


recording only flat line;
7.9. Interpolating bad electrodes
• A process by which data from missing electrodes are estimated based
on the activity and location of other electrodes

• Of importance for some spatial filters (surface laplacian, source


reconstruction)

• Re-referencing to the avg of all electrodes (activity of one bad


electrode contaminate signal of other clean electrodes)
7.9. Interpolating bad electrodes

• They do not provide unique data weighted some of the activity of


other electrodes reduced rank of data matrix (problematic for
matrix inverse analyses)

• Alternative: Delete bad electrode


• Problematic when averaging across subjects
7.9. Interpolating bad electrodes
• Inspect data carefully

• There is a true brain signal recorded by the electrode, but lot of noise

• Apply low-pass filter at 30hz (low frequency signal from that electrode
look similar to surrounding electrode)
7.10. Start With clean data
7.10. Start With clean data
• There is no substitute for clean data

• Preprocessing will turn good data to a very good data, but no amount
of preprocessing will turn low-quality and noisy data into very good
data
Preprocessing Pipelines Using EEGLAB
Approach I

• Loading and importing data


• High pass and low pass filter
• Creating Event List
• Creating Bin-based EEG Epochs
• Artifact Detection
• Creating Averaged ERPs
EEGLAB and ERPLAB
• EEGLAB is a MATLAB toolbox for working with EEG data
• ERPLAB is a plugin for EEGLAB

To install EEGLAB:
1. Download latest version: http://sccn.ucsd.edu/eeglab/currentversion/eeglab_current.zip
2. Unzip folder into “Documents\MATLAB\”
for specific versions, see: https://sccn.ucsd.edu/eeglab/downloadtoolbox.php

To install ERPLAB:
3. Download latest version:
https://github.com/lucklab/erplab/releases/download/7.0.0/erplab7.0.0.zip
4. Unzip folder into “Documents\MATLAB\eeglab2019_0\plugins\”
for specific versions, see: https://github.com/lucklab/erplab/releases
EEGLAB and ERPLAB
• In the EEGLAB menu bar, select File > Memory and other options
Uncheck "If set, keep at most one dataset in memory".

• This will configure EEGLAB's memory settings to allow multiple


datasets to be loaded in memory.
• This will make switching between datasets faster when the datasets
are small relative to the amount of RAM in your computer.
• However, you might want to check this option if you are processing
large datasets.
Load Existing Dataset

In the EEGLAB menu


bar:
• select File > Load
existing dataset

• Select the
file S1_EEG.set and
click the Open button.
View the EEG data for the file you just loaded

In the EEGLAB menu


bar:
• select Plot > Channel
data (scroll) 
Import Dataset
In the EEGLAB menu bar:
• select File > Import
data > Using the
BIOSIG interface

• From Biosemi BDF file


(BIOSIG toolbox)

*You will be required to


install BIOSIG interface
Import Dataset

• Reference electrodes 65 and 66


• Name the dataset (if you
desire)
Add Channel Location
High Pass/Low Pass Filter

• You need high pass


filter your EEG data
prior looking at
your data!
• Otherwise your
EEG data does not
display correctly
Going Back to S1 Dataset
• Edit > Channel
locations
• EEGLAB and ERPLAB
require electrode
coordinates for plotting
topographic maps; 
• Plot > Channel spectra
and maps
Topographic Map
Creating an Event List
• To use ERPLAB, first create an
EventList for the EEG stored in
your dataset.
• In the ERPLAB menu,
select EventList > Create EEG
EVENTLIST. (A warning may
popup warning you that some
of all of your events contain
an event-based event label,
and not a numeric event
code. For now, ignore it and
click the Continue button)
Creating Bin-Based EEG Epochs
• Once the events have been assigned to bins, the next step is to divide
the continuous EEG into a set of fixed-length epochs (also known
as segments).
• Each of which is time-locked to an event that has been assigned to a
bin.
• In the present example, we will extract the EEG during a period that
begins 200 ms prior to the onset of a stimulus and ends at 800 ms.
• Note that EEGLAB has an epoching function (Tools > Extract epochs);
you should not use this function if you are using the ERPLAB functions
for processing the epochs (e.g., averaging, plotting, etc.).
• Instead in ERPLAB, select ERPLAB > Extract Bin-based Epochs. This will
open the Extract Bin Epochs window:
Creating Bin-Based EEG Epochs
Artifact Detection
• select ERPLAB > Artifact
detection in epoched
data > Moving window
peak-to-peak threshold. 
• ERPLAB > Artifact
Detection in epoched
data> Summarize EEG
Artifact Detection >
Table. 
Artifact Detection
• ERPLAB > Artifact
detection in epoched
data > Step-like artifacts.
Average Epochs to create single-subject ERPs
• Make sure to exclude
epochs marked with
artifacts!
• Plot ERPs
Approach II. Independent Component Analysis
ICA Concept

• A blind source separation method that was developed to identify


individual sources in the crowd of noise and other sources.
ICA Concept

• A blind decomposition that finds maximally independent sources of


variance in the EEG
• Useful for identifying and removing blink artifacts, because:
• They are large in amplitude
• Have a discrete source
• Extremely reliable from blink to blink
ICA Concept

• Result of the ICA is set of weights for all the electrodes, and the
weighted sum of all the electrode activities is some independent
course.
• Each component has one time course and N electrodes
• For instance a 64-channel electrode:
• Each comp has 64 weights take the weighted sum activity of all
electrodes a single time course
• Each component  Has 64 weights and one time course
ICA Concept

• It is best to use ICA to identify the noise and subtract the noise from
the signal.

• Some prefer to analyze components instead of analyzing the channels


(rather than using components to subtract the noise from the data)
 it’s ok, but not recommended and there is a different philosophy
for this technique.
Which component to remove?
• Those containing physiological or non-physiological noise that is not
caused by brain dynamics

• When using ICA the null hypothesis is :


• All components are accepted except 1 (i.e. ocular motor artifact components)
• You need very compelling evidence to reject the null hypothesis

• ICA is sometimes not very accurate to separate the noise from signal
• It is also important to take into account how much the component is
accounting for the variance in the data.
ICA Decomposition using EEGLAB interface in MATLAB
Participant 1001
In a glance

• Tools  run ICA  couple of different algorithms to use


• Plot  component activation  plot actual EEG signals from each
component
• Recording from 70 electrodes 70 components
• The order they appear: how much variance is explained by each
component; the first one explains the most variance
• Tools reject data using ICA  reject components by map  flag the
components you want to remove
• Tools  remove components
Steps prior to running ICA
• Import data
• Channel Locations
• Filter
• High-pass: 0.1-0.20
• Low-pass: 30 HZ
• Change Sampling rate  250 for Juniper project
• Interpolate bad electrodes
• Re-reference the data to all channels
• Segment the data
• Reject artifactual or paroxysmal data epochs
Import data
• Import data through BIOSIG
interface

• IT’S HIGHLY RECOMMEND THAT


YOU CHOOSE A REFERENCE
CHANNEL IF THESE ARE BIOSEMI
DATA

• (E.G., A MASTOID OR OTHER


CHANNEL). OTHERWISE THE
DATA WILL LOSE 40 DB OF SNR!
Reference to Mastoids

• Reference to mastoid
• electrode 65 and 66: EXG1 and EXG2

• Rename your new data set if you like


Channel Locations
Filtering

• High-pass: 0.1-0.20
• Low-pass: 30 HZ
Change Sampling rate

• Change sampling rate to


250
Re-reference the data

• Re-reference the data


to all channels
Extract Epochs

• Extract your epochs

• An epoch is a section of EEG data


that has been linked to a stimulus.
We produce event-related potentials
(ERPs) by averaging all the epochs for
each stimulus type or condition.
• Time-lock your events of interest

• Choose the period of your epoch I


have chosen 200 ms before and 800
ms after the stimuli
Baseline Correction
• Refers to the procedure of relativizing the brain signal of interest with
respect to a control (baseline) signal (commonly shortly before a
stimulus event)  200 ms before the stimuli
Scroll your data
Bad Channels

• To deal with bad channels


• Delete that channel
• Interpolate that channel

• Interpolation requires adding channel locations


• When you add channel locations the reference becomes unknown;
Therefore, re-reference prior to adding channel locations.
Interpolate or Delete Bad channels
Rejecting Paroxysmal Data Epochs

• To reject data epochs:


• select Tools Reject data
epochs  Reject by
inspection
Guidelines for rejecting epochs
• If you’ll be running the ICA, it is sometimes helpful to remove additional
epochs at this step if they are very clearly messy/not usable.

• This is because the ICA fills in artifactual data with data that would be
probably based on the other data remaining.

• So if it is working with some very messy epochs, that sometimes leads to


wonky data post-ICA removal.

• you can always go back after the ICA, if you are having difficulty, and see
if removing any more epochs would help the process.
• To delete the marked epochs,
select Tools  Reject data
epochs  Reject Marked
epochs
Run ICA

• Now, that you have carefully preprocessed your data, you are ready to run ICA.

• Make sure you take your preprocessing steps carefully, because, if you needed to
change something, you have to re-run the ICA

• Remove ocular artifacts from EEG data using ICA


• To compute ICA components of a
dataset of EEG epochs (or of a
continuous EEGLAB dataset):
• select Tools > Run ICA.
• Decompose Data by ICA

• This calls the function pop_runica.m.


• To test this function, simply press OK.
• Two windows should pop up sequentially

• After you press Ok, you will see a series of “steps” being calculated in
the Matlab command window. Thus takes some time, depending on
the data, and will usually work its way up to approximately step 300
• Before checking the ICA components map, look at characteristic patterns
in your data:
• Tools  Reject data epochs  Reject by inspection (you are not going to be
rejecting anything here – just inspecting)

• When you see a blink, right click on that point. You will see a map of what
that pattern looks like. You can click Edit  Copy Figure and paste the
image into a word document, to remind yourself later.

• You can do this same thing for other notable ocular artifacts. For other
types, you’ll want to be sure to specify for yourself what each figure
referred to, in case you are making a decision later (e.g., “small ocular
blip”, “end of a blink”, “artifact in a few ocular electrodes”)
Running ICA Decompositions

****Next couple of slides are for more information****


Runica option
Extended
• Generally it means looking for line-noise
• Isolate line noise into one single component
• Default 0
• 1 is recommended to find sub-Gaussian
Stop
• Final weight change  stop
• 1e-7
• To go the next step, the weight change has to be larger than this particular number and if is smaller then ICA
lrate
• Learning rate
• Determined from the data too small (small steps to get to the end)  too long
• Too large  wts blow up
Maxstep:
• limit how many steps to go to (another way to stop ICA); it will take more steps to go through
PCA
• EEG.nbchan decompose only a principal data subspace
• Infomax (runica, binica) is the ICA algorithm we use most.

• It is based on Tony Bell's infomax algorithm as implemented for automated


use by Scott Makeig et al., using the natural gradient of Amari et al.

• It can also extract sub-Gaussian sources using the (recommended)


'extended' option of Lee and Girolami.

• Function runica() is the all-Matlab version;

• function binica() calls the (1.5x faster) binary version (a separate download)
translated into C from runica() by Sigurd Enghoff.
• ICA works best when given a large amount of basically similar and
mostly clean data.

• When the number of channels (N) is large (>>32) then a very large
amount of data may be required to find N components.

• When insufficient data are available, then using the 'pca' option to 
jader.m to find fewer than N components may be the only good
option.
• Important note: If run twice on the same data, ICA decompositions
under runica/binica will differ slightly.

• That is, the ordering, scalp topography and activity time courses of best-
matching components may appear slightly different.

• This is because ICA decomposition starts with a random weight matrix (and
randomly shuffles the data order in each training step), so the convergence
is slightly different every time.

• Is this a problem? At the least, features of the decomposition that do not


remain stable across decompositions of the same data should not be
interpreted except as irresolvable ICA uncertainty.
In Sum
• Garbage in  Garbage out
• Do you have enough data?
• Remove large, non-stereotyped artifacts
• The more channels, the more time points you need
• The fewer channel less data you need
• High-pass filter to remove slow drifts
• Remove bad channels
• Data must be in double precision
Reject ICA Components

• Now, its time to reject


components  to do this, go
to Tools  Reject data using
ICA  Reject Components by
map
Plotting 2-D Component Scalp Maps
Participant
1001
Zoom-in some components
To Consider when running ICA
• When generally data is quite clean, therefore, we ultimately may benefit
from being conservative at the ICA step and focusing on removing major,
recurrent ocular artifacts like blinks. Less regular artifacts, like the one in the
previous slide that is focused on one trial, can be good to keep.
• Next “probability” step, highly improbable data gets cut anyways.
• If we cleaned up the data so much that we have no artifacts or irregular
data, then clean data will often end up getting cut at the probability phase.
• If the data is very messy, you may end up coming back to ICA and cutting
some additional artifacts like this one here, which may in that case help
retain more trials.
• The goal is to get to a final dataset that is free from artifacts that are clearly
not ERPs, and there are multiple ways to get there! Starting with being
conservative is the best approach.
• Only remove bad components, if you can match them with artifactual
points in the data.

• For fairly clean data focus on only on the oculars on first pass
Plot component spectra and map
• What frequency are you interested to look at
• What comp at that frequency has the highest power
Plot ICA ERPs
• How ERPs look like for different components

• How one component with its all channels contributes to overall ERP

• IC back-projection envelop

• Largest ERP component (just to explore the data)


IC ERP Images

• ERP image: color coding of single trial activation


• Blue for negative and red for positive
• At the bottom is the average ERP
After removing ICA components (focused on ocular artifacts):

• Copy/paste the components you removed into your data log

• Check the data to see how it turned out after removing those
components:
• Tools  Reject data epochs  Reject by inspection (you are not going to be
rejecting anything here – just inspecting)
• Look through the data to check that blinks and major ocular artifacts were
removed, and that the data does not otherwise look more wonky.
After removing ICA components (focused on ocular artifacts):

• If you noticed any issues in step 3 (e.g., blinks not removed, or


weirdness apparent in the data that was not there before), go back to
your original ICA dataset, and check the map again.
• Re-consider the components for removal (e.g., perhaps you were too
conservative and did not remove enough components? Or perhaps
you removed one that was not really an artifact?)
• Try again, and see if things improve. This may be an iterative process
until you are satisfied with the data after ICA removal. If at any point
you are not confident that an IC is an artifact, do not remove it
Post-ICA Trial Rejection

• Automatic Rejection vs. Rejection with Improbable data (i.e. the later
one is the technique that is used by queen’s folks)
Automatic rejection

• 8/95 trials marked for rejection


ERP image after automatic rejection
Comparing with the results from previous preprocessing
measures
Comparing with the results from previous preprocessing
measures
Reject data epochs with improbable data
• Tools  Reject data epochs  Reject by probability
• Single-channel limits = “4” (default was likely 3)
All-channel limits = “4” (default was likely 3)
• Click ”OK”
• When the next window with a figure appears, click “UPDATE” to update
the marked trials.
• Click Tools  Reject data epochs  Reject marked epochs  YES
• Save the new file as something
37/95 rejected  58 remaining;
Remaining epochs in each category: NPM: 14; PPM: 13; NPF: 17; PPF: 14
Removing Baseline
• Tools  “Remove baseline”
• Enter your minimum and maximum timeframe of your pre-stim baseline
period, typically 0-100 ms as such: [0 100]
• Default is 100 ms
• I personally prefer 200 ms

• Click OK
• File  “Save current dataset  
Creating Categories

• Working from the baseline-corrected data file, click Edit  “Select epochs
or events”
• In the “category” section, type in one of your category labels (e.g., NPM
or PPM)
• Check off “keep only selected events and remove all others”
• Click OK. It will ask if you are okay with it rejecting/deleting the remaining
number of trials – select OK/yes again.
• Re-name your new dataset , and select “save it as a file” to save the file in
your folder as well. Select OK.
• Repeat for each of your stimulus categories.
Final Notes:
• In general, we want to have at least 20 trials per category.
• Fewer than 15 trials is too few (amplitudes will be high and the data will
be noisy). Between 15-20 trials is an iffy range.
• If you have any stimuli in this range (which you likely will), you can look at
the participant’s ERPs to have an idea of their data quality. To do so, load
that participant’s data in EEGlab. Then go Plot  Channel ERPs  In
scalp/rect. Array
• You can also always do this on your data files that are averaged across
the stimulus categories, just to check on the overall quality of the waves
with more trials.
• Also, once the data is in matlab, we will average across participants and
can look at the waveforms that way.
Resources
• https://www.youtube.com/watch?v=JOvhHSEt-ZU

You might also like