A Comparative Analysis On Feature Extraction and Classification of EEG Signal For Brain-Computer Interface Applications
A Comparative Analysis On Feature Extraction and Classification of EEG Signal For Brain-Computer Interface Applications
Keywords: Brain-Computer Interface (BCI), Time Domain Parameters (TDP), Adaptive Auto-Regressive Parameters (AAR),
Linear Discriminant Analysis (LDA), Support Vector Machine (SVM)
band pass filtered signals is equal to the band power, CSP Where, is called update co-efficient and has an important
increases the discrimination of mental states that are effect in the performance of TDP as features in BCI.
characterized by ERD/ERS [9].
However, it has been shown in result section that, the
Let, is the multichannel EEG data of N optimal value of k is 5. Hence, the size of the feature vector
channels and T sample points with { } is the set of is 40 for TDP features.
classes. CSP finds a linear transformation with
2.3.2 Adaptive Auto-Regressive (AAR) parameters
such that,
The AAR method is appropriate for on-line and single trial
(1) analysis of the time varying EEG spectrum. When there is
When , there is no reduction of dimensionality. Only no averaging of an ensemble of recordings, AAR method is
the discriminativity will be increased. On the other hand, very useful to extract features from EEG signal for BCI
for dimensionality reduction is also provided along classifiers [11].
with increased discrimination between classes. Columns of An Auto-Regressive model is useful for describing the
is represented by where, . stochastic behavior of an EEG time series. This can be
Each columns of is called a spatial filter. described as,
However, to calculate the transformation matrix , two (4)
covariance matrices are estimated for two classes. Then
optimized value of is computed by simultaneous with, { } (5)
diagonalization of the two covariance matrices. Where, is a zero-mean-Gaussian-noise process with
However, the simultaneous diagonalization problem can be variance . The index is an integer number and
solved using generalized eigenvalue problem and the describes discrete, equidistance time points. Here, is the
dimensionality reduction can be implemented by selecting model order and AR parameters of an AR
the spatial filters with smallest/largest eigenvalues. model can be used as features.
As the dataset used in this research indicates the However, to consider non-stationarity of the EEG signal,
implementation of multiclass CSP, Joint Approximate the AR parameters are allowed to vary in time and hence
Diagonalization (JAD) was used to solve the multiclass these time-varying parameters are known as AAR
problem [9]. parameters. Hence the model is changed to represent by
following equation.
However, to implement CSP along with dimensionality
reduction scheme, in this paper, L was taken to 8 and hence, (6)
the dimensionality of the processed dataset reduced to 8
spatial filtered channels with 1875 data points. The AAR parameters were estimated using scalar Kalman
filtering.
2.3 Feature extraction
However, it has been shown in result section that, the
In this research, two different types of features were optimal value of AAR model order is 6 and hence the size
extracted from the pre-processed EEG data. One of them is of the feature vector is 48 for AAR features.
time domain feature; another is based on time series
analysis. These are: Time Domain Parameters (TDP) and 2.4 Classification
Adaptive Auto-Regressive parameters (AAR). To classify each trial of the multichannel EEG data or in
These features for different classes show different other words predicting class label for each trial, the
discriminative properties which is useful for a classifier (at extracted features were fed to a classifier. The performance
the next step) to classify the data. of a BCI system significantly depends on which type of
classifier is used. In this research, two different types of
2.3.1 Time Domain Parameters (TDP) classifiers were used to classify the EEG features extracted.
Time Domain Parameters (TDP) was first introduced in These are: Linear Discriminant Analysis (LDA) and
[10], and considered as more generalized representation of Support Vector Machine (SVM).
Hjorth Parameters. TDPs are obtained by calculating the 2.4.1 Linear Discriminant Analysis (LDA)
variances of derivatives of the signal with different order.
For each order of derivatives a TDP can be obtained as, Linear Discriminant Analysis (LDA) is a generalization of
Fisher's linear discriminant, and used in statistics, pattern
( ) (2) recognition and machine learning to find a linear
combination of features that characterizes or separates two
However, for better performance, the parameters are or more classes of objects or events [12].
smoothed using an IIR filter by implementing the following
Let, [ ] is the feature vector of EEG data
expression.
of M different classes represented by the set
(3)
86 Mohammed Nowaz Rabbani Chowdhury and Subrata Kumar Aditya
{ }. Then, the discriminant functions are evaluation criteria have been proposed [14]. In this research
represented as, work, the Cohen‟s kappa co-efficient was applied to
evaluate the performance of the classifiers with different
(7) features.
Where, At first, a confusion matrix as shown in Table 1 was
The classification is done as: calculated from the output of the classifier, where the rows
represent the true classes and the columns represent the
If, ; ; Then, ; Otherwise, predicted classes by the classifier.
Table 1. Example of a Confusion Matrix for M=4 classes
Then training of the classifier is done using one-vs-rest
scheme by calculating the optimal value of the weight Class 1 2 3 4 Total
vector using Equation 8 for the discriminant function 1 63 4 2 3 72
expressed in Equation 7. 2 2 67 2 1 72
3 5 3 58 6 72
̂ ̅ ̅ (8) 4 2 3 3 64 72
Total 72 77 65 74 288
Where, ̅ = Mean value of the data of the class for which
to be calculated; ̅ = Mean value of the data for rest of the Then, the classification accuracy can be calculated as
classes; = pooled covariance matrix of two classes; follows.
∑
2.4.2 Support Vector Machine (SVM) (11)
SVM classifies linearly separable two class data by ∑ ∑
Where,
constructing an optimal hyperplane so that the margin of
separation between the two classes is maximized [13]. Finally, the Cohen‟s kappa co-efficient, was calculated
[ ] is the feature vector of EEG data as,
Let,
for two different classes and which are linearly (12)
separable. Then, the mathematical representation of the
hyperplane separating the two classes can be represented as, Where, (Overall agreement);
(9) ∑
(Chance agreement);
Where, is the weight vector and is the bias. Then, for a
given weight vector bias , the separation between the 2.6 Optimization of performance parameters
closest data point (known as support vector) and the The performance parameters (e.g. update co-efficient u and
hyperplane representing by Equation 9 is called the margin no. of derivatives k for TDP, update co-efficient UC and
of separation, . The training of SVM is nothing but finding model order p for AAR, kernel-width σ for SVM) were
an optimal value of and bias so that is maximized. optimized using cross-validation. However, it is important
This is done by solving an optimization problem by the to notice that during cross-validation only the training set
method of Lagrange multipliers. However, for the data was used.
which are not linearly separable, a non-linear mapping to
higher dimensional space is necessary. To execute the non- In this process, for each of the different values of the
linear mapping, inner product kernels are used in SVM. parameter, the 8-fold cross-validation was implemented by
These kernels are of different types such as polynomial dividing the training set of each subject into 8 subsets, 7 of
kernel, radial-basis function (RBF) kernel etc. In this them were used for training and the rest of the subset was
research, RBF kernel was used which can be expressed as, used for test. The process is repeated for 8 times, every time
with different subset as test set. Finally, average value of
( ‖ ‖ ) (10) kappa was calculated. In this way, the value of the
parameter with highest kappa was chosen as optimal value.
The parameter is called kernel width and it has a
significant effect on the performance of the classifier. 3. Results and Discussion
However, as the EEG data classified in this research 3.1 Optimized Performance parameters
consists of more than two classes, one-vs-one scheme of As stated earlier, the performance parameters were optimized
SVM was implemented [13]. using crossvalidation. These results are presented below.
2.5 Evaluation Figure 3 represents the variation of crossvalidation kappa, k
To analyze the performance of BCI systems, some with the change of update coefficient, u for TDP feature
evaluation criteria must be applied. The most popular is extraction. As seen in the figure, crossvalidation kappa
accuracy. However, because of some strict prerequisites, decreases with the increase of u and the optimal value of u
accuracy is not always a suitable criterion, and other was found 0.0045.
A Comparative Analysis on Feature Extraction and Classification of EEG Signal for Brain-Computer Interface Applications 87
0.7
Crossvalidation kappa
0.6
0.5
Fig. 3: Change of Crossvalidation kappa with update coefficient, u
for TDP features. 0.4
0.3
0.2
0.1
0
0 2 4 6 8
SVM Kernel width
Fig. 4: Variation of Crossvalidation kappa with no. of derivatives, Parameter Name Optimal
as TDP features. Value
AAR Update Coefficient, & Model Order, 0.004 & 6
Figure 4 represents the variation of crossvalidation kappa
Update Coefficient to extract TDP features, 0.0045
with no. of derivatives, used for TDP feature extraction No. of derivatives extracted as TDP features, 5
and the optimal value is 5. It‟s important to note that, SVM Kernel Width, 4
inclusion of higher order of derivatives causes an increase
3.2 Final Experiment Results on Test Set
of processing time for TDP feature extraction.
Finally, after selection of optimized parameters, these
parameters were used in corresponding methods to classify
0.6 EEG trials of test set. In these final experiments, the
0.4- corresponding classifiers were trained using the
Cross Validation kappa
1
0.01
13. Haykin, S. S., Haykin, S. S., Haykin, S. S., & Haykin, S. S. 15. BCI Competition IV: Results. [online] Available at:
(2009). Neural networks and learning machines (Vol. 3). http://www.bbci.de/competition/iv/results/index.html
Upper Saddle River, NJ, USA:: Pearson. [Accessed 22 Aug. 2018].
14. Schlogl, A., Kronegg, J., Huggins, J., & Mason, S. (2007). 19
evaluation criteria for bci research. Toward brain-computer
interfacing.
90 Mohammed Nowaz Rabbani Chowdhury and Subrata Kumar Aditya