678948ab0fca4_Impulse_2025_Problem_Statement
678948ab0fca4_Impulse_2025_Problem_Statement
1 Instructions
1. The hackathon begins at 15:00hrs on 17 January 2025 and ends at 15:00hrs
on 19 January 2025.
2. Submissions can be in the form of Jupyter Notebooks or GitHub reposi-
tories. Make sure that the outputs are clearly visible during submission.
The participants also need to submit test outputs.csv file for the evalution
of their model on test set.
1
2 Introduction
Biomedical signal processing refers to the application of techniques and algo-
rithms to analyze physiological signals from the human body, such as ECG,
EEG, EMG, and others. In an era where healthcare moves to a more data-
driven approach, biomedical signal processing can play an important role in
extracting meaningful information from such data. This field aims to convert
such raw physiological signals to concrete insights and get an actionable di-
agnosis. It not only enhances clinical decision-making but also supports the
development of new medical technologies, offering the potential for real-time
health monitoring, early detection of diseases, and remote patient care.
For the final round of Impulse 2025, the participants are challenged to de-
velop a robust model for classifying Electroencephalogram (EEG) seizure types
from signals. Adding to it, an important aspect of it is to implement explain-
ability techniques, promoting clinical trust in a hospital setting. The problem
statement derives its motivation from the critical role of EEG in neurology,
helping in diagnosing and managing neurological disorders. Manual EEG inter-
pretation requires skill and is subjective. The integration of machine learning
with explainability will make a practical and clinically accurate tool ready for
use.
3 The Dataset
The dataset provided for this task is of EEG signals categorized into four classes,
three of which show seizures and one is seizure-free. The seizure classes are as
follows: Complex Partial Seizures, Electrographic Seizures and Video detected
seizures with no visual change over the EEG. Participants are required to down-
load the dataset from this link. The README file in the link has more in-depth
information of the dataset.
2
domain, help identify potential signal patterns that can serve as features for
machine learning models in future classification tasks.
4.1 Tasks
• As the first part of this task, select one data point from each of the four
classes in train data folder. For each selected data point, plot the EEG
signal for all 19 channels separately and create a single plot where all
19 EEG channels are superimposed on the same graph. This will result
in a total of 20 plots per data point: 19 individual channel plots and 1
combined plot. These visualizations will help in inspecting the amplitude
and general behavior of the signal over time.
• Also, compute basic statistical time domain metrics from each channel
including
– Mean
– Zero Crossing Rate
– Range
– Energy
– RMS
– Variance
3
both time and frequency. This allows for the analysis of non-stationary signals,
which exhibit changes in frequency content over time, as is often the case with
biomedical signals like EEG.
5.1 Tasks
• Extract frequency domain features from the data points in train data
folder. Also, present the extracted features for at least 1 data point from
each class. The frequency domain features that need to be extracted are:
– Fourier Transform
– Wavelet Decomposition
• Perform Wavelet Decomposition for 4 levels. Present the Approximation
and Detail coefficients channel-wise and for a given channel and show
which coefficient is most similar to the original signal.
• Generate and analyze spectrograms for a data point to visualize and in-
terpret the time-frequency characteristics of each channel.
6.1 Tasks
• Use the Fourier features and the Zero Crossing Rate extracted from the
previous task.(from train data)
• Use Support Vector Machine (SVM) as your classification model.
4
• Evaluate the SVM model on the validation set and present the classifica-
tion report, roc auc score and balanced accuracy from scikit-learn.
7.1 Tasks
• Participants are encouraged to explore more feature extraction methods
and models and are free to use whatever methods they deem fit for training
the best possible model including deep learning models. The goal is to get
the best possible metrics as mentioned in the evaluation criteria. Submit
the test outputs.csv file in the format shown here. The label class
mapping is given in Table 1.
• Report the total number of model parameters (trainable and non-trainable).
Models with fewer parameters are encouraged.
• Do not use the validation set for training the model. Use only the training
set to train your model.
5
Table 1: Label-Class Mapping
Label Class
0 Normal
1 Complex Partial Seizures
2 Electrographic Seizures
3 Video detected Seizures with no visual change over EEG
8.1 Tasks
• Identify the top 3 most important EEG channels contributing to the pre-
diction of each class. You are free to use any xAI (Explainable AI) tech-
niques. Some suggestions include SHAP (SHapley Additive exPlanations)
and saliency maps.
• Evaluate the impact of these channels on your best model performance
by masking or removing them from the training set and re-evaluating the
model’s accuracy on the validation set.
• Analyze and document how masking crucial channels impacts classifica-
tion performance across all classes.
9 Denoising
In real-world applications, data collected from sensors or medical instruments,
such as EEG signals, are often noisy due to various factors like environmental
interference, hardware limitations, and motion artifacts. These imperfections in
the data can significantly degrade the performance of tasks, such as classification
and anomaly detection, by masking the underlying signal of interest. Therefore,
effective signal denoising is a critical step in preprocessing, ensuring that the
data used for analysis is as clean and representative as possible.
9.1 Tasks
• Participants have been provided with a noisy train data folder. Denoise
this data with any method of your choice. Report the Peak Signal to
6
Noise Ratio (PSNR) value in dB by comparing the denoised signal with
the ground truth.
• Train your classifier using the denoised data. Evaluate this model on the
validation set and present the classification report from scikit-learn.
10.1 Tasks
• Participants are tasked with generating synthetic EEG data class-wise us-
ing modern generative algorithms. The synthetic data should aim to repli-
cate the distribution and characteristics of the real EEG data as closely
as possible. You are free to use any generative models of your choice.
• Using the generated synthetic data, participants must train a classifier
and evaluate its performance on the validation set. The evaluation will
be based on how closely the performance of this model aligns with the
performance of a classifier trained on real data.
7
10.2 Evaluation Criteria
• Performance of the model using generated synthetic data. Present the
classification report from scikit-learn for the validation set.
• Visualization and comparison of synthetic data with real data.
The End
8