Lotte EEGSignalProcessing w4
Lotte EEGSignalProcessing w4
Fabien LOTTE
7.1 Introduction
One of the critical steps in the design of Brain-Computer Interface (BCI) applica-
tions based on ElectroEncephaloGraphy (EEG) is to process and analyse such EEG
signals in real-time, in order to identify the mental state of the user. Musical EEG-
based BCI applications are no exception. For instance, in (Miranda et al, 2011),
the application had to recognize the visual target the user was attending to from
his/her EEG signals, in order to execute the corresponding musical command. Un-
fortunately, identifying the user’s mental state from EEG signals is no easy task,
such signals being noisy, non-stationary, complex and of high dimensionality (Lotte
et al, 2007). Therefore, mental state recognition from EEG signals requires specific
signal processing and machine learning tools. This chapter aims at providing the
reader with a basic knowledge about how to do EEG signal processing and the kind
Fabien LOTTE
Inria Bordeaux Sud-Ouest / LaBRI, 200 avenue de la vieille tour, 33405, Talence Cedex, France,
e-mail: [email protected]
1
2 Fabien LOTTE
In BCI design, EEG signal processing aims at translating raw EEG signals into the
class of these signals, i.e., into the estimated mental state of the user. This translation
is usually achieved using a pattern recognition approach, whose two main steps are
the following:
• Feature Extraction: The first signal processing step is known as “feature extrac-
tion” and aims at describing the EEG signals by (ideally) a few relevant values
called “features” (Bashashati et al, 2007). Such features should capture the in-
formation embedded in EEG signals that is relevant to describe the mental states
to identify, while rejecting the noise and other non-relevant information. All fea-
tures extracted are usually arranged into a vector, known as a feature vector.
• Classification: The second step, denoted as “classification” assigns a class to a
set of features (the feature vector) extracted from the signals (Lotte et al, 2007).
This class corresponds to the kind of mental state identified. This step can also
be denoted as “feature translation” (Mason and Birch, 2003). Classification al-
gorithms are known as “classifiers”.
Fig. 7.1 A classical EEG signal processing pipeline for BCI, here in the context of a motor
imagery-based BCI, i.e., a BCI that can recognized imagined movements from EEG signals.
It should be mentioned that EEG signal processing is often built using machine
learning. This means the classifier and/or the features are automatically tuned, gen-
erally for each user, according to examples of EEG signals from this user. These
examples of EEG signals are called a training set, and are labeled with their class of
belonging (i.e., the corresponding mental state). Based on these training examples,
the classifier will be tuned in order to recognize as appropriately as possible the
class of the training EEG signals. Features can also be tuned in such a way, e.g., by
automatically selecting the most relevant channels or frequency bands to recognized
the different mental states. Designing BCI based on machine learning (most current
BCI are based on machine learning) therefore consists of 2 phases:
7.2.1 Classification
As mentioned above, the classification step in a BCI aims at translating the features
into commands (McFarland et al, 2006) (Mason and Birch, 2003). To do so, one can
use either regression algorithms (McFarland and Wolpaw, 2005) (Duda et al, 2001)
or classification algorithms (Penny et al, 2000) (Lotte et al, 2007), the classification
algorithms being by far the most used in the BCI community (Bashashati et al,
2007) (Lotte et al, 2007). As such, in this chapter, we focus only on classification
algorithms. Classifiers are able to learn how to identify the class of a feature vector,
thanks to training sets, i.e., labeled feature vectors extracted from the training EEG
examples.
Typically, in order to learn which kind of feature vector correspond to which
class (or mental state), classifiers try either to model which area of the feature space
is covered by the training feature vectors from each class - in this case the classi-
fier is a generative classifier - or they try to model the boundary between the areas
covered by the training feature vectors of each class - in which case the classifier is
a discriminant classifier. For BCI, the most used classifiers so far are discriminant
classifiers, and notably Linear Discriminant Analysis (LDA) classifiers.
The aim of LDA (also known as Fisher’s LDA) is to use hyperplanes to sepa-
rate the training feature vectors representing the different classes (Duda et al, 2001)
(Fukunaga, 1990). The location and orientation of this hyperplane is determined
from training data. Then, for a two-class problem, the class of an unseen (a.k.a.,
test) feature vector depends on which side of the hyperplane the feature vector is
(see Figure 7.2). LDA has very low computational requirements which makes it
suitable for online BCI system. Moreover this classifier is simple which makes it
naturally good at generalizing to unseen data, hence generally providing good re-
sults in practice (Lotte et al, 2007). LDA is probably the most used classifier for
BCI design.
Fig. 7.2 Discriminating two types of motor imagery with a linear hyperplane using a Linear Dis-
criminant Analysis (LDA) classifier.
7 EEG Signal Processing for BCI 5
Another very popular classifier for BCI is the Support Vector Machine (SVM)
(Bennett and Campbell, 2000). An SVM also uses a discriminant hyperplane to
identify classes (Burges, 1998). However, with SVM, the selected hyperplane is
the one that maximizes the margins, i.e., the distance from the nearest training
points, which has been found to increase the generalization capabilites (Burges,
1998) (Bennett and Campbell, 2000).
Generally, regarding classification algorithms, it seems that very good recogni-
tion performances can be obtained using appropriate off-the-shelf classifiers such as
LDA or SVM (Lotte et al, 2007). What seems to be really important is the design
and selection of appropriate features to describe EEG signals. With this purpose,
specific EEG signal processing tools have been proposed to design BCI. In the rest
of this chapter we will therefore focus on EEG feature extraction tools for BCI. For
readers interested to learn more about classification algorithms, we refer them to
(Lotte et al, 2007), a review paper on this topic.
• Spatial information: Such features would describe where (spatially) the relevant
signal comes from. In practice, this would mean selecting specific EEG channels,
or focusing more on specific channels than on some other. This amounts to fo-
cusing on the signal originating from specific areas of the brain.
1 note that this was estimated before SVM were invented, and that SVM are generally less sensitive
Note that these three sources of information are not the only ones, and alterna-
tives can be used (see Section 7.5). However, they are by far the most used one,
and, at least so far, the most efficient ones in terms of classification performances.
It should be mentioned that so far, nobody managed to discover nor to design a set
of features that would work for all types of BCI. As a consequence, different kinds
of BCI currently use different sources of information. Notably, BCI based on oscil-
latory activity (e.g., BCI based on motor imagery) mostly need and use the spectral
and spatial information whereas BCI based on event related potentials (e.g., BCI
based on the P300) mostly need and use the temporal and spatial information. The
next sections detail the corresponding tools for these two categories of BCI.
BCI based on oscillatory activity are BCI that use mental states which lead to
changes in the oscillatory components of EEG signals, i.e., that lead to change in the
power of EEG signals in some frequency bands. Increase of EEG signal power in a
given frequency band is called an Event Related Synchronisation (ERS), whereas a
decrease of EEG signal power is called an Event Related Desynchronisation (ERD)
(Pfurtscheller and da Silva, 1999). BCI based on oscillatory activity notably includes
motor imagery-based BCI (Pfurtscheller and Neuper, 2001), Steady State Visual
Evoked Potentials (SSVEP)-based BCI (Vialatte et al, 2010) as well as BCI based
on various cognitive imagery tasks such as mental calculation, mental geometric
figure rotation, mental word generation, etc. (Friedrich et al, 2012) (Millán et al,
2002). As an example, imagination of a left hand movement leads to a contralateral
ERD in the motor cortex (i.e., in the right motor cortex for left hand movement)
in the µ and β bands during movement imagination, and to an ERS in the β band
(a.k.a., beta rebound) just after the movement imagination ending (Pfurtscheller and
da Silva, 1999). This section first describes a basic design for oscillatory activity-
based BCI. Then, due to the limitations exhibited by this design, it exposes more
advanced designs based on multiple EEG channels. Finally, it presents a key tool to
design such BCIs: the Common Spatial Pattern (CSP) algorithm, as well as some of
its variants.
7 EEG Signal Processing for BCI 7
Fig. 7.3 Signal processing steps to extract band power features from raw EEG signals. The EEG
signal displayed here was recorded during right hand motor imagery (the instruction to perform
the imagination was provided at t = 0 s on the plots). The contralateral ERD during imagination is
here clearly visible. Indeed, the signal power in channel C3 (left motor cortex) in 8-12 Hz clearly
decreases during this imagination of a right hand movement.
Unfortunately, this basic design is far from being optimal. Indeed, it uses only
two fixed channels. As such, relevant information, measured by other channels
might be missing, and C3 and C4 may not be the best channels for the subject at
hand. Similarly, using the fixed frequency bands 8 − 12 Hz and 16 − 24 Hz may not
be the optimal frequency bands for the current subject. In general, much better per-
formances are obtained when using subject-specific designs, with the best channels
and frequency bands optimized for this subject. Using more than two channels is
also known to lead to improved performances, since it enables to collect the rele-
vant information spread over the various EEG sensors.
8 Fabien LOTTE
Both the need to use subject-specific channels and the need to use more than 2
channels lead to the necessity to design BCI based on multiple channels. This is
confirmed by various studies which suggested that, for motor imagery, 8 channels
is a minimum to obtain reasonnable performances (Sannelli et al, 2010) (Arvaneh
et al, 2011), with optimal performances achieved with a much larger number, e.g.,
48 channels in (Sannelli et al, 2010). However, simply using more channels will not
solve the problem. Indeed, using more channels means extracting more features,
thus increasing the dimensionality of the data and suffering more from the curse-
of-dimensionality. As such, just adding channels may even decrease performances
if too little training data is available. In order to efficiently exploit multiple EEG
channels, 3 main approaches are available, all of which contribute to reducing the
dimensionality:
Feature selection are classical algorithms widely used in machine learning (Guyon
and Elisseeff, 2003) (Jain and Zongker, 1997) and as such also very popular in
BCI design (Garrett et al, 2003). There are too main families of feature selection
algorithms:
• Univariate algorithms: They evaluate the discriminative (or descriptive) power
of each feature individually. Then, they select the N best individual features (N
needs to be defined by the BCI designer). The usefulness of each feature is typ-
ically assessed using measures such as Student t-statistics, which measures the
feature value difference between two classes, correlation based measures such
as R2 , mutual information, which measures the dependence between the feature
value and the class label, etc. (Guyon and Elisseeff, 2003). Univariate methods
are usually very fast and computationally efficient but they are also suboptimal.
Indeed, since they only consider the individual feature usefulness, they ignore
possible redundancies or complementarities between features. As such, the best
subset of N features is usually not the N best individual features. As an example,
the N best individual features might be highly redundant and measure almost the
7 EEG Signal Processing for BCI 9
same information. As such using them together would add very little discrimi-
nant power. On the other hand, adding a feature that is individually not very good
but which measures a different information from that of the best individual ones
is likely to improve the discriminative power much more.
• Multivariate algorithms: They evaluate subsets of features together, and keep the
best subset with N features. These algorithms typically use measures of global
performance for the subsets of features, such as measures of classification per-
formances on the training set (typically using cross-validation (Browne, 2000))
or multivariate mutual information measures, see, e.g., (Hall, 2000) (Pudil et al,
1994) (Peng et al, 2005). This global measure of performance enables to actu-
ally consider the impact of redundancies or complementarities between features.
Some measures also remove the need to manually select the value of N (the num-
ber of features to keep), the best value of N being the number of features in the
best subset identified. However, evaluating the usefulness of subsets of features
leads to very high computational requirements. Indeed, there are many more pos-
sible subsets of any size than individual features. As such there are many more
evaluations to perform. In fact, the number of possible subsets to evaluate is
very often far too high to actually perform all the evaluations in practice. Con-
sequently, multivariate methods usually rely on heuristics or greedy solutions in
order to reduce the number of subsets to evaluate. They are therefore also sub-
optimal but usually give much better performances than univariate methods in
practice. On the other hand, if the initial number of features is very high, multi-
variate methods may be too slow to use in practice.
Rather than selecting features, one can also select channels and only use features
extracted from the selected channels. While both channel and feature selection re-
duce the dimensionality, selecting channels instead of features has some additional
advantages. In particular using less channels means a faster setup time for the EEG
cap and also a lighter and more comfortable setup for the BCI user. It should be
noted, however, that with the development of dry EEG channels, selecting channels
may become less crucial. Indeed the setup time will not depend on the number of
channel used, and the BCI user will not have more gel in his/her hair if more chan-
nels are used. With dry electrodes, using less channels will still be lighter and more
comfortable for the user though.
Algorithms for EEG channel selection are usually based or inspired from generic
feature selection algorithm. Several of them are actually analogous algorithms that
assess individual channel useufulness or subsets of channels discriminative power
instead of individual features or subset of features. As such, they also use similar
performance measures, and have similar properties. Some other channel selection
algorithms are based on spatial filter optimization (see below). Readers interested
to know more about EEG channel selection may refer to the following papers and
10 Fabien LOTTE
associated references (Schröder et al, 2005) (Arvaneh et al, 2011) (Lal et al, 2004)
(Lan et al, 2007), among many other.
Spatial filtering consists in using a small number of new channels that are defined
as a linear combination of the original ones:
x̃ = ∑ wi xi = wX (7.1)
i
with x̃ the spatially filtered signal, xi the EEG signal from channel i, wi the weight
given to that channel in the spatial filter and X a matrix whose ith row is xi , i.e., X is
the matrix of EEG signals from all channels.
It should be noted that spatial filtering is useful not only because it reduces the
dimension from many EEG channels to a few spatially filtered signals (we typically
use much less spatial filters than original channels), but also because it has a neu-
rophysiological meaning. Indeed, with EEG, the signals measured on the surface
of the scalp are a blurred image of the signals originating from within the brain. In
other words, due to the smearing effect of the skull and brain (a.k.a., volume con-
duction effect), the underlying brain signal is spread over several EEG channels.
Therefore spatial filtering can help recovering this original signal by gathering the
relevant information that is spread over different channels.
There are different ways to define spatial filters. In particular, the weights wi can
be fixed in advance, generally according to neurophysiological knowledge, or they
can be data driven, that is, optimized on training data. Among the fixed spatial filters
we can notably mention the bipolar and Laplacian which are local spatial filters that
try to locally reduce the smearing effect and some of the background noise (McFar-
land et al, 1997). A bipolar filter is defined as the difference between 2 neighboring
channels, while a Laplacian filter is defined as 4 times the value of a central chan-
nel minus the values of the 4 channels around. For instance, a bipolar filter over
channel C3 would be defined as C3bipolar = FC3 − CP3, while a Laplacian filter
over C3 would be defined as C3Laplacian = 4C3 − FC3 − C5 − C1 − CP3, see also
Figure 7.4. Extracting features from bipolar or Laplacian spatial filters rather than
from the single corresponding electrodes has been shown to significantly increase
classification performances (McFarland et al, 1997). An inverse solution is another
kind of fixed spatial filter (Michel et al, 2004) (Baillet et al, 2001). Inverse solutions
are algorithms that enable to estimate the signals originating from sources within
the brain based on the measurements taken from the scalp. In other words, inverse
solutions enable us to look into the activity of specific brain regions. A word of
caution though: inverse solutions do not provide more information than what is al-
ready available in scalp EEG signals. As such, using inverse solutions will NOT
make a non-invasive BCI as accurate and efficient as an invasive one. However, by
focusing on some specific brain areas, inverse solutions can contribute to reducing
background noise, the smearing effect and irrelevant information originating from
7 EEG Signal Processing for BCI 11
other areas. As such, it has been shown than extracting features from the signals spa-
tially filtered using inverse solutions (i.e., from the sources within the brain) leads to
higher classification performances than extracting features directly from scalp EEG
signals (Besserve et al, 2011) (Noirhomme et al, 2008). In general, using inverse so-
lutions has been shown to lead to high classification performances (Congedo et al,
2006) (Lotte et al, 2009b) (Qin et al, 2004) (Kamousi et al, 2005) (Grosse-Wentrup
et al, 2005). It should be noted that since the number of source signals obtained with
inverse solutions is often larger than the initial number of channels, it is necessary
to use feature selection or dimensionality reduction algorithms.
Fig. 7.4 Left: channels used in bipolar spatial filtering over channels C3 and C4. Right: channels
used in Laplacian spatial filtering over channels C3 and C4.
The second category of spatial filters, i.e., data driven spatial filters, are opti-
mized for each subject according to training data. As any data driven algorithm, the
spatial filter weights wi can be estimated in an unsupervised way, that is without the
knowledge of which training data belongs to which class, or in a supervised way,
with each training data being labelled with its class. Among the unsupervised spatial
filters we can mention Principal Component Analysis (PCA), which finds the spa-
tial filters that explain most of the variance of the data, or Independent Component
Analysis (ICA), which find spatial filters whose resulting signals are independent
from each other (Kachenoura et al, 2008). The later has been shown rather useful to
design spatial filters able to remove or attenuate the effect of artifacts (EOG, EMG,
etc. (Fatourechi et al, 2007)) on EEG signals (Tangermann et al, 2009) (Xu et al,
2004) (Kachenoura et al, 2008) (Brunner et al, 2007). Alternatively, spatial filters
can be optimized in a supervised way, i.e., the weights will be defined in order to
optimize some measure of classification performance. For BCI based on oscillatory
EEG activity, such a spatial filter has been designed: the Common Spatial Patterns
(CSP) algorithm (Ramoser et al, 2000) (Blankertz et al, 2008b). This algorithm has
greatly contributed to the increase of performances of this kind of BCI, and, thus,
has become a standard tool in the repertoire of oscillatory activity-based BCI de-
12 Fabien LOTTE
signers. It is described in more details in the following section, together with some
of its variants.
Informally, the CSP algorihtm finds spatial filters w such that the variance of the
filtered signal is maximal for one class and minimal for the other class. Since the
variance of a signal band-pass filtered in band b is actually the band-power of this
signal in band b, this means that CSP finds spatial filters that lead to optimally
discriminant band-power features since their values would be maximally different
between classes. As such, CSP is particularly useful for BCI based on oscillatory
activity since their most useful features are band-power features. As an example,
for BCI based on motor imagery, EEG signals are typically filtered in the 8 − 30
Hz band before being spatially filtered with CSP (Ramoser et al, 2000). Indeed this
band contains both the µ and β rhythms.
Formally, CSP uses the spatial filters w which extremize the following function:
Fig. 7.5 EEG signals spatially filtered using the CSP (Common Spatial Patterns) algorithm. The
first two spatial filters (top filters) are those maximizing the variance of signals from class “Left
Hand Motor Imagery” while minimizing that of class “Right Hand Motor Imagery”. They corre-
spond to the largest eigen values of the GEVD. The last two filters (bottom filters) are the opposite,
they maximize the variance of class “Right Hand Motor Imagery” while minimizing that of class
“Left Hand Motor Imagery” (They correspond to the lowest eigen values of the GEVD). This
can be clearly seen during the periods of right or left hand motor imagery, in light and dark grey
respectively.
The CSP algorithm has numerous advantages: first, it leads to high classification
performances. CSP is also versatile, since it works for any ERD/ERS BCI. Finally,
it is computationally efficient and simple to implement. Altogether this makes CSP
one of the most popular and efficient approach for BCI based on oscillatory activity
(Blankertz et al, 2008b).
Nevertheless, despite all these advantages, CSP is not exempt from limitations
and is still not the ultimate signal processing tool for EEG-based BCI. In particu-
lar, CSP has been shown to be non-robust to noise, to non-stationarities and prone
to overfitting (i.e., it may not generalize well to new data) when little training data
is available (Grosse-Wentrup and Buss, 2008) (Grosse-Wentrup et al, 2009) (Reud-
erink and Poel, 2008). Finally, despite its versatility, CSP only identifies the relevant
spatial information but not the spectral one. Fortunately, there are ways to make CSP
robust and stable with limited training data and with noisy training data. An idea is
to integrate prior knowledge into the CSP optimization algorithm. Such knowledge
could represent any information we have about what should be a good spatial filter
for instance. This can be neurophysiological prior, data (EEG signals) or meta-data
(e.g., good channels) from other subjects, etc. This knowledge is used to guide and
constraint the CSP optimization algorithm towards good solutions even with noise,
limited data and non-stationarities (Lotte and Guan, 2011). Formally, this knowl-
edge is represented in a regularization framework that penalizes unlikely solutions
(i.e., spatial filters) that do not satisfy this knowledge, therefore enforcing it. Simi-
larly, prior knowledge can be used to stabilize statistical estimates (here, covariance
matrices) used to optimize the CSP algorithm. Indeed, estimating covariance matri-
ces from few training data usually leads to poor estimates (Ledoit and Wolf, 2004).
14 Fabien LOTTE
wC̃1 wT
JRCSP1 (w) = (7.4)
wC̃2 wT + λ P(w)
wC̃2 wT
JRCSP2 (w) = (7.5)
wC̃1 wT + λ P(w)
with
Fig. 7.6 Spatial filters (i.e., weight attributed to each channel) obtained to classify left hand versus
right hand motor imagery. The electrodes, represented by black dots, are here seen from above,
with the subject nose on top. a) basic CSP algorithm, b) RCSP with a penalty term imposing
spatial smoothness, c) RCSP with a penalty term penalizing unlikely channels according to EEG
data from other subjects.
formed. For instance, this can be done manually (by trial and errors), or by looking
at the average EEG frequency spectrum in each class. In a more automatic way, pos-
sible methods include extracting band power features in multiple frequency bands
and then selecting the relevant ones using feature selection (Lotte et al, 2010), by
computing statistics on the spectrum to identify the relevant frequencies (Zhong
et al, 2008), or even by computing optimal band-pass filters for classification (De-
vlaminck, 2011). These ideas can be used within the CSP framework in order to
optimize the use of both the spatial and spectral information. Several variants of
CSP has been proposed in order to optimize spatial and spectral filters at the same
time (Lemm et al, 2005) (Dornhege et al, 2006) (Tomioka et al, 2006) (Thomas
et al, 2009). A simple and computationally efficient method is worth describing:
the Filter Bank CSP (FBCSP) (Ang et al, 2012). This method, illustrated in Figure
7.7, consists in first filtering EEG signals in multiple frequency bands using a filter
bank. Then, for each frequency band, spatial filters are optimized using the clas-
sical CSP algorithm. Finally, among the multiple spatial filters obtained, the best
resulting features are selected using feature selection algorithms (typically mutual
information-based feature selection). As such, this selects both the best spectral and
7 EEG Signal Processing for BCI 17
spatial filters since each feature corresponds to a single frequency band and CSP
spatial filter. This algorithm, although simple, has proven to be very efficient in
practice. It was indeed the algorithm used in the winning-entries of all EEG data
sets from the last BCI competition2 (Ang et al, 2012).
Fig. 7.7 Principle of Filter Bank Common Spatial Patterns (FBCSP): 1) band-pass filtering the
EEG signals in multiple frequency bands using a filter bank; 2) optimizing CSP spatial filter for
each band; 3) selecting the most relevant filters (both spatial and spectral) using feature selection
on the resulting features.
In summary, when designing BCI aiming at recognizing mental states that involve
oscillatory activity, it is important to consider both the spectral and the spatial in-
formation. In order to exploit the spectral information, using band power features
in relevant frequency bands is an efficient approach. Feature selection is also a nice
tool to find the relevant frequencies. Concerning the spatial information, using or
selecting relevant channels is useful. Spatial filtering is a very efficient solution for
EEG-based BCI in general, and the Common Spatial Patterns (CSP) algorithm is
a must-try for BCI based on oscillatory activity in particular. Moreover, there are
several variants of CSP that are available in order to make it robust to noise, non-
stationarity, limited training data sets or to jointly optimize spectral and spatial fil-
ters. The next section will address the EEG signal processing tools for BCI based
2BCI competitions are contests to evaluate the best signal processing and classification algorithms
on given brain signals data sets. See http://www.bbci.de/competition/ for more info.
18 Fabien LOTTE
on evoked potentials, which are different from the ones described so far, but share
some general concepts.
7.4 EEG signal processing tools for BCI based on event related
potentials
An Event Related Potential (ERP) is a brain responses due to some specific stimulus
perceived by the BCI user. A typical ERP used for BCI design is the P300, which is a
positive deflection of the EEG signal occurring about 300ms after the user perceived
a rare and relevant stimulus (Fazel-Rezai et al, 2012) (see also Figure 7.8).
Averaged ERP waveforms (electrode CZ) for targets and non targets - S1 - Standing
4
Target
Non target
3
-1
-2
-3
-4
-5
0 0.1 0.2 0.3 0.4 0.5 0.6
Time (s)
Fig. 7.8 An exemple of an average P300 ERP after a rare and relevant stimulus (Target). We can
clearly observe the increase in amplitude about 300ms after the stimulus, as compared to the non-
relevant stimulus (Non target).
ERP are characterized by specific temporal variations with respect to the stim-
ulus onset. As such, contrary to BCI based on oscillatory activity, ERP-based BCI
exploit mostly a temporal information, but rarely a spectral one. However, as for
BCI based on oscillatory activity, ERP-based can also benefit a lot from using the
spatial information. Next section illustrates how the spatial and temporal informa-
tion is used in basic P300-based BCI designs.
7 EEG Signal Processing for BCI 19
Fig. 7.9 Recommended electrodes for P300-based BCI design, according to (Krusienski et al,
2006).
Once the relevant spatial information identified, here using, for instance, only the
electrodes mentioned above, features can be extracted for the signal of each of them.
For ERP in general, including the P300, the features generally exploit the temporal
information of the signals, i.e., how the amplitude of the EEG signal varies with
time. This is typically achieved by using the values of preprocessed EEG time points
as features. More precisely, features for ERP are generally extracted by 1) low-pass
or band-pass filtering the signals (e.g., in 1-12 Hz for the P300), ERP being generally
slow waves, 2) downsampling the filtered signals, in order to reduce the number of
EEG time points and thus the dimensionality of the problem and 3) gathering the
values of the remaining EEG time points from all considered channels into a feature
vector that will be used as input to a classifier. This process is illustrated in Figure
7.10 to extract features from channel Pz for a P300-based BCI experiment.
Once the features extracted, they can be provided to a classifier which will be
trained to assigned them to the target class (presence of an ERP) or to the non-target
class (absence of an ERP). This is often achieved using classical classifiers such
as LDA or SVM (Lotte et al, 2007). More recently, automatically regularized LDA
have been increasingly used (Lotte and Guan, 2009) (Blankertz et al, 2010), as well
20 Fabien LOTTE
Fig. 7.10 Typical process to extract features from a channel of EEG data for a P300-based BCI
design. On this picture we can see the P300 becoming more visible with the different processing
steps.
as Bayesian LDA (Hoffmann et al, 2008) (Rivet et al, 2009). Both variants of LDA
are specifically designed to be more resistant to the curse-of-dimensionality through
the use of automatic regularization. As such, they have proven to be very effective in
practice, and superior to classical LDA. Indeed, the number of features is generally
higher for ERP-based BCI than for those based on oscillatory activity. Actually,
many time points are usually needed to describe ERP but only a few frequency
bands (or only one) to describe oscillatory activity. Alternatively, feature selection or
channel selection techniques can also be used to deal with this high dimensionality
(Lotte et al, 2009a) (Rakotomamonjy and Guigue, 2008) (Krusienski et al, 2006).
As for BCI based on oscillatory activity, spatial filters can also prove very useful.
As mentionned above, with ERP the number of features is usually quite large, with
many features per channel and many channels used. The tools described for oscilla-
tory activity-based BCI, i.e., feature selection, channel selection or spatial filtering
can be used to deal with that. While feature and channel selection algorithms are
the same (these are generic algorithms), spatial filtering algorithms for ERP are dif-
ferent. One may wonder why CSP could not be used for ERP classification. This
is due to the fact that a crucial information for classifying ERP is the EEG time
course. However, CSP completely ignores this time course as it only considers the
average power. Therefore, CSP is not suitable for ERP classification. Fortunately,
other spatial filters have been specifically designed for this task.
One useful spatial filter available is the Fisher spatial filter (Hoffmann et al,
2006). This filter uses the Fisher criterion for optimal class separability. Informally,
this criterion aims at maximizing the Between class-variance, i.e., the distance be-
tween the different classes (we want the feature vectors from the different classes to
be as far apart from each other as possible, i.e., as different as possible) while mini-
mizing the within class-variance, i.e., the distance between the feature vectors from
the same class (we want the feature vectors from the same class to be as similar as
possible). Formally, this means maximizing the following objective function:
7 EEG Signal Processing for BCI 21
tr(Sb )
JFisher = (7.9)
tr(Sw )
with
Nc
Sb = ∑ pk (x¯k − x̄)(x̄k − x̄)T (7.10)
k=1
and
Nc
Sw = ∑ pk ∑ (xi − x̄k )(xi − x̄k )T (7.11)
k=1 i∈Ck
wADDT AT wT
JxDAW N = (7.12)
wXX T wT
where A is the time course of the ERP response to detect for each channel (esti-
mated from data, usually using a Least Square estimate) and D is a matrix contain-
ing the positions of target stimuli that should evoke the ERP. In this equation, the
numerator represents the signal, i.e., the relevant information we want to enhance.
Indeed, wADDT AT wT is the power of the time course of the ERP responses after
spatial filtering. On the contrary, in the denominator, wXX T wT is the variance of all
EEG signals after spatial filtering. Thus, it contains both the signal (the ERP) plus
the noise. Therefore, maximizing JxDAW N actually maximizes the signal, i.e., it en-
hances the ERP response, and simultaneously minimizes the signal plus the noise,
i.e., it makes the noise as small as possible (Rivet et al, 2009). This has indeed been
shown to lead to much better ERP classification performance.
22 Fabien LOTTE
In practice, spatial filters have proven to be useful for ERP-based BCI (in particu-
lar for P300-based BCI), especially when little training data is available. From a the-
oretical point of view, this was to be expected. Actually, contrary to CSP and Band
Power which extract non-linear features (the power of the signal is a quadratic oper-
ation), features for ERP are all linear and linear operations are commutative. Since
BCI classifiers, e.g., LDA, are generally also linear, this means that the classifier
could theoretically learn the spatial filter as well. Indeed, both linearly combining
the original features X for spatial filtering (F = W X), then linearly combining the
spatially filtered signals for classification (y = wF = w(W X) = Ŵ X) or directly lin-
early combining the original features for classification (y = W X) are overall a simple
linear operation. If enough training data is available, the classifier, e.g., LDA, would
not need spatial filtering. However, in practice, there is often little training data
available, and first performing a spatial filtering eases the subsequent task of the
classifier by reducing the dimensionality of the problem. Altogether, this means that
with enough training data, spatial filtering for ERP may not be necessary, and leav-
ing the classifier learn everything would be more optimal. Otherwise, if few training
data is available, which is often the case in practice, then spatial filtering can benefit
a lot to ERP classification (see also (Rivet et al, 2009) for more discussion of this
topic).
So far, this chapter has described the main tools used to recognize mental states in
EEG-based BCI. They are efficient and usually simple tools that have become part
of the standard toolbox of BCI designers. However, there are other signal process-
ing tools, and in particular other kinds of features or information sources that can
be exploited to process EEG signals. Without being exhaustive, this section briefly
presents some of these tools for interested readers, together with corresponding ref-
7 EEG Signal Processing for BCI 23
erences. The alternative EEG feature representations that can be used include the
following 4 categories:
While these various alternative features may not be as efficient as the standards
tools such as Band Power features, they usually extract a complementary informa-
tion. Consequently, using band power features together with some of these alter-
native features has led to increase classification performances, higher that the per-
formances obtained with any of these features used alone (Dornhege et al, 2004)
(Brodu et al, 2012) (Lotte, 2012).
It is also important to realize that while several spatial filters have been designed
for BCI, they are optimized for a specific type of feature. For instance, CSP is the
optimal spatial filter for Band Power features and xDAWN or Fisher spatial filters
are optimal spatial filters for EEG time points features. However, using such spa-
tial filters with other features, e.g., with the alternative features described above,
would be clearly suboptimal. Designing and using spatial filters dedicated to these
24 Fabien LOTTE
7.6 Discussion
Many EEG signal processing tools are available in order to classify EEG signals
into the corresponding user’s mental state. However, EEG signal processing is a
very difficult task, due to the noise, non-stationarity, complexity of the signals as
well as due to the limited amount of training data available. As such, the existing
tools are still not perfect, and many research challenges are still open. In particular,
it is necessary to explore and design EEG features that are 1) more informative, in
order to reach better performances, 2) robust to noise and artifacts, in order to use the
BCI outside laboratories, potentially with moving users, 3) invariant, to deal with
non-stationarity and session-to-session transfer and 4) universal, in order to design
subject-independent BCI, i.e., BCI that can work for any user, without the need for
individual calibration. As we have seen, some existing tools can partially address,
or at least, mitigate such problems. Nevertheless, there is so far no EEG signal pro-
cessing tool that has simultaneously all these properties and that is perfectly robust,
invariant and universal. Therefore, there are still exciting research works ahead.
7.7 Conclusion
In this chapter, we have provided a tutorial and overview of EEG signal processing
tools for users’ mental state recognition. We have presented the importance of the
feature extraction and classification components. As we have seen, there are 3 main
sources of information that can be used to design EEG-based BCI: 1) the spectral
information, which is mostly used with band power features; 2) the temporal infor-
mation, represented as the amplitude of preprocessed EEG time points and 3) the
spatial information, which can be exploited by using channel selection and spatial
filtering (e.g., CSP or xDAWN). For BCI based on oscillatory activity, the spectral
and spatial information are the most useful, while for ERP-based BCI, the temporal
and spatial information are the most relevant. We have also briefly explored some
alternative sources of information that can also complement the 3 main sources
mentioned above.
This chapter aimed at being didactic and easily accessible, in order to help people
not already familiar with EEG signal processing to start working in this area or to
start designing and using BCI in their own work or activities. Indeed, BCI being
such a multidisciplinary topic, it is usually difficult to understand enough of the
different scientific domains involved to appropriately use BCI systems. It should
also be mentioned that several software tools are now freely available to help users
7 EEG Signal Processing for BCI 25
design BCI systems, e.g., Biosig (Schlögl et al, 2007), BCI2000 (Mellinger and
Schalk, 2007) or OpenViBE (Renard et al, 2010). For instance, with OpenViBE, it
is possible to design a new and complete BCI system without writing a single line of
code. With such tools and this tutorial, we hope to make BCI design and use more
accessible, e.g., to design musical BCI.
7.8 Questions
Please find below 10 questions to reflect on this chapter and try to grasp the essential
messages:
1. Do we need feature extraction? In particular why not using the raw EEG signals
as input to the classifier?
2. What part of the EEG signal processing pipeline can be trained/optimized based
on the training data?
3. Can we design a BCI system that would work for all users (a so-called subject-
indepedent BCI)? If so, are BCI designed specifically for one subject still rele-
vant?
4. Are univariate and multivariate feature selection methods both suboptimal in gen-
eral? If so, why using one type or the other?
5. By using an inverse solution with scalp EEG signals, can I always reach a similar
information about brain activity as I would get with invasive recordings?
6. What would be a good reason to avoid using spatial filters for BCI?
7. which spatial filter to you have to try when designing an oscillatory activity-based
BCI?
8. Let us assume that you want to design an EEG-based BCI, whatever its type: can
CSP be always useful to design such a BCI?
9. Among typical features for oscillatory activity-based BCI (i.e., band power fea-
tures) and ERP-based BCI (i.e., amplitude of the preprocessed EEG time points),
which ones are linear and wich ones are not (if applicable)?
10. Let us assume you want to explore a new type of features to classify EEG data:
could they benefit from spatial filtering and if so, which one?
References
Ang K, Chin Z, Wang C, Guan C, Zhang H (2012) Filter bank common spatial pat-
tern algorithm on bci competition iv datasets 2a and 2b. Frontiers in Neuroscience
6
Arvaneh M, Guan C, Ang K, Quek H (2011) Optimizing the channel selection and
classification accuracy in eeg-based bci. IEEE Transactions on Biomedical Engi-
neering 58:1865–1873
26 Fabien LOTTE
Lotte F (2012) A new feature and associated optimal spatial filter for EEG signal
classification: Waveform length. In: International Conference on Pattern Recog-
nition (ICPR), pp 1302–1305
Lotte F, Guan C (2009) An efficient P300-based brain-computer interface with min-
imal calibration time. In: Assistive Machine Learning for People with Disabilities
symposium (NIPS’09 Symposium)
Lotte F, Guan C (2010a) Learning from other subjects helps reducing brain-
computer interface calibration time. In: International Conference on Audio,
Speech and Signal Processing (ICASSP’2010), pp 614–617
Lotte F, Guan C (2010b) Spatially regularized common spatial patterns for EEG
classification. In: International Conference on Pattern Recognition (ICPR)
Lotte F, Guan C (2011) Regularizing common spatial patterns to improve BCI de-
signs: Unified theory and new algorithms. IEEE Transactions on Biomedical En-
gineering 58(2):355–362
Lotte F, Congedo M, Lécuyer A, Lamarche F, Arnaldi B (2007) A review of classi-
fication algorithms for EEG-based brain-computer interfaces. Journal of Neural
Engineering 4:R1–R13
Lotte F, Fujisawa J, Touyama H, Ito R, Hirose M, Lécuyer A (2009a) Towards am-
bulatory brain-computer interfaces: A pilot study with P300 signals. In: 5th Ad-
vances in Computer Entertainment Technology Conference (ACE), pp 336–339
Lotte F, Lécuyer A, Arnaldi B (2009b) FuRIA: An inverse solution based feature
extraction algorithm using fuzzy set theory for brain-computer interfaces. IEEE
transactions on Signal Processing 57(8):3253–3263
Lotte F, Langhenhove AV, Lamarche F, Ernest T, Renard Y, Arnaldi B, Lécuyer A
(2010) Exploring large virtual environments by thoughts using a brain-computer
interface based on motor imagery and high-level commands. Presence: teleoper-
ators and virtual environments 19(1):54–70
Mason S, Birch G (2003) A general framework for brain-computer interface design.
IEEE Transactions on Neural Systems and Rehabilitation Engineering 11(1):70–
85
McFarland DJ, Wolpaw JR (2005) Sensorimotor rhythm-based brain-computer in-
terface (BCI): feature selection by regression improves performance. IEEE Trans-
actions on Neural Systems and Rehabilitation Engineering 13(3):372–379
McFarland DJ, McCane LM, David SV, Wolpaw JR (1997) Spatial filter selection
for EEG-based communication. Electroencephalographic Clinical Neurophysiol-
ogy 103(3):386–394
McFarland DJ, Anderson CW, Müller KR, Schlögl A, Krusienski DJ (2006) BCI
meeting 2005-workshop on BCI signal processing: feature extraction and trans-
lation. IEEE Transactions on Neural Systems and Rehabilitation Engineering
14(2):135 – 138
Mellinger J, Schalk G (2007) Toward Brain-Computer Interfacing, in: g. dornhege,
j.r. millán et al. (eds.) edn, MIT Press, chap BCI2000: A General-Purpose Soft-
ware Platform for BCI Research, pp 372–381. 21
Michel C, Murray M, Lantz G, Gonzalez S, Spinelli L, de Peralta RG (2004) EEG
source imaging. Clin Neurophysiol 115(10):2195–2222
30 Fabien LOTTE