0% found this document useful (0 votes)
8 views

678948ab0fca4_Impulse_2025_Problem_Statement

The Impulse 2025 hackathon focuses on biomedical signal processing, specifically classifying EEG seizure types using machine learning. Participants will work with a provided dataset to develop models, implement explainability techniques, and address challenges such as denoising and generative modeling. The event runs from January 17 to 19, 2025, with submissions required via Unstop and specific evaluation criteria outlined for each task.

Uploaded by

KanyeWestGoat123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

678948ab0fca4_Impulse_2025_Problem_Statement

The Impulse 2025 hackathon focuses on biomedical signal processing, specifically classifying EEG seizure types using machine learning. Participants will work with a provided dataset to develop models, implement explainability techniques, and address challenges such as denoising and generative modeling. The event runs from January 17 to 19, 2025, with submissions required via Unstop and specific evaluation criteria outlined for each task.

Uploaded by

KanyeWestGoat123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Impulse - 2025: Final Hackathon

Biomedical Signal Processing


17-19 January 2025

1 Instructions
1. The hackathon begins at 15:00hrs on 17 January 2025 and ends at 15:00hrs
on 19 January 2025.
2. Submissions can be in the form of Jupyter Notebooks or GitHub reposi-
tories. Make sure that the outputs are clearly visible during submission.
The participants also need to submit test outputs.csv file for the evalution
of their model on test set.

3. In case of GitHub submissions, mention the commit to be considered for


submission. By default, the latest commit before the deadline will be
considered for submission.
4. All submissions must be made via Unstop. Submissions through any other
platform or medium will not be considered for evaluation.
5. Participants must provide a clear explanation of their approach. This can
be included as markdown cells in the Jupyter Notebook or as a README
file in the GitHub repository.
6. For hardware acceleration, you can either use Google Colab, Kaggle Note-
books or Gradient Notebooks, or set up the GPU support locally.
7. Only use the dataset provided by the organisers.
8. The code should be well organized and documented.
9. All the code MUST be written by you. Do NOT indulge in plagiarism of
any form.
10. In case of any queries, feel free to contact the organizers for clarification.

1
2 Introduction
Biomedical signal processing refers to the application of techniques and algo-
rithms to analyze physiological signals from the human body, such as ECG,
EEG, EMG, and others. In an era where healthcare moves to a more data-
driven approach, biomedical signal processing can play an important role in
extracting meaningful information from such data. This field aims to convert
such raw physiological signals to concrete insights and get an actionable di-
agnosis. It not only enhances clinical decision-making but also supports the
development of new medical technologies, offering the potential for real-time
health monitoring, early detection of diseases, and remote patient care.

For the final round of Impulse 2025, the participants are challenged to de-
velop a robust model for classifying Electroencephalogram (EEG) seizure types
from signals. Adding to it, an important aspect of it is to implement explain-
ability techniques, promoting clinical trust in a hospital setting. The problem
statement derives its motivation from the critical role of EEG in neurology,
helping in diagnosing and managing neurological disorders. Manual EEG inter-
pretation requires skill and is subjective. The integration of machine learning
with explainability will make a practical and clinically accurate tool ready for
use.

A primary useful feature of an EEG model is to segregate seizure-containing


EEGs to normal ones and secondly, in the latter, to demarcate seizure specific
regions and further classify seizure type, which helps neurologists establish a
diagnosis quickly and effectively. Videographic seizure detection provides an-
other layer of usefulness, catching rare seizures not reflected on EEGs and hence,
improving the robustness of an EEG ML model.

3 The Dataset
The dataset provided for this task is of EEG signals categorized into four classes,
three of which show seizures and one is seizure-free. The seizure classes are as
follows: Complex Partial Seizures, Electrographic Seizures and Video detected
seizures with no visual change over the EEG. Participants are required to down-
load the dataset from this link. The README file in the link has more in-depth
information of the dataset.

4 Basic Analysis of EEG Signals


In the field of biomedical signal processing, raw EEG signals often require ini-
tial preprocessing and feature extraction to understand and model underlying
patterns for classification tasks. The first step in this process is to visualize
the EEG data and compute basic statistical metrics that provide foundational
insights into the signal characteristics. These metrics, derived from the time

2
domain, help identify potential signal patterns that can serve as features for
machine learning models in future classification tasks.

4.1 Tasks
• As the first part of this task, select one data point from each of the four
classes in train data folder. For each selected data point, plot the EEG
signal for all 19 channels separately and create a single plot where all
19 EEG channels are superimposed on the same graph. This will result
in a total of 20 plots per data point: 19 individual channel plots and 1
combined plot. These visualizations will help in inspecting the amplitude
and general behavior of the signal over time.
• Also, compute basic statistical time domain metrics from each channel
including
– Mean
– Zero Crossing Rate
– Range
– Energy
– RMS
– Variance

4.2 Evaluation Criteria


The task will be evaluated based on the completeness and clarity of the plots.
They should also be interpretable, reflecting the key characteristics of the signals
and providing some insights into the data set. The computed metrics should be
clearly presented, either as tables or summary statistics, providing a structured
overview of the signal characteristics.

5 Extracting Frequency Domain Features


In this task, you will extract frequency-based features from the EEG signals.
These features provide valuable insights into the signal’s frequency content,
which is critical for understanding brain activity and for classification tasks in
biomedical signal processing. The Fourier Transform is a mathematical tech-
nique that converts a signal from its time-domain representation into the fre-
quency domain. It breaks down a time-series signal into its constituent sinu-
soidal components (sines and cosines), revealing the different frequencies present
within the signal.

Another technique is Wavelet Decomposition, which overcomes the limitation


of Fourier transform by providing both time and frequency information. It in-
volves breaking down a signal into smaller wavelets—short waves localized in

3
both time and frequency. This allows for the analysis of non-stationary signals,
which exhibit changes in frequency content over time, as is often the case with
biomedical signals like EEG.

Wavelets can capture both high-frequency (fast changes) and low-frequency


(slow changes) components of the signal at different times. The Approxima-
tion Coefficients represent the low-frequency components of the signal, which
capture the general shape or trend over time. The Detail Coefficients represent
the high-frequency components, which capture the fast, detailed variations of
the signal. These detail coefficients are further split at multiple levels (cD1,
cD2, etc.), capturing the signal’s details at different frequency resolutions.

5.1 Tasks
• Extract frequency domain features from the data points in train data
folder. Also, present the extracted features for at least 1 data point from
each class. The frequency domain features that need to be extracted are:
– Fourier Transform
– Wavelet Decomposition
• Perform Wavelet Decomposition for 4 levels. Present the Approximation
and Detail coefficients channel-wise and for a given channel and show
which coefficient is most similar to the original signal.
• Generate and analyze spectrograms for a data point to visualize and in-
terpret the time-frequency characteristics of each channel.

5.2 Evaluation Criteria


The task will be evaluated based on the correctness, visualization and presen-
tation of the frequency features. They must be presented neatly in a structured
manner with clear separation for each channel.

6 Building the Baseline Model


In this task, you need to build a baseline machine learning model using the
Fourier features extracted from the EEG signals as well as Zero Crossing Rate.
The purpose of the baseline model is to establish an initial performance metric,
which can be used as a reference for further improvement.

6.1 Tasks
• Use the Fourier features and the Zero Crossing Rate extracted from the
previous task.(from train data)
• Use Support Vector Machine (SVM) as your classification model.

4
• Evaluate the SVM model on the validation set and present the classifica-
tion report, roc auc score and balanced accuracy from scikit-learn.

7 Building the Best Model


In this task, participants need to build the best performing model by experi-
menting with various advanced techniques to improve upon the baseline model
developed earlier. The focus will be on optimizing the feature set, tuning hyper-
parameters, and possibly using more sophisticated models beyond the baseline
SVM.

7.1 Tasks
• Participants are encouraged to explore more feature extraction methods
and models and are free to use whatever methods they deem fit for training
the best possible model including deep learning models. The goal is to get
the best possible metrics as mentioned in the evaluation criteria. Submit
the test outputs.csv file in the format shown here. The label class
mapping is given in Table 1.
• Report the total number of model parameters (trainable and non-trainable).
Models with fewer parameters are encouraged.

• Do not use the validation set for training the model. Use only the training
set to train your model.

7.2 Evaluation Criteria


• The task will be graded based on the performance on the test set (Clas-
sification Report, balanced accuracy score and roc auc score from scikit-
learn)
• It will be also evaluated on the performance on the validation set (Clas-
sification Report, balanced accuracy score and roc auc score from scikit-
learn)
• The model should be optimized and be able to achieve the best perfor-
mance with few parameters and be computationally efficient. Models with
fewer parameters will be awarded higher marks.

8 Interpretability of the Best Model


Explainability is crucial in healthcare applications to ensure trust, accountabil-
ity, and actionable insights for medical professionals. It is not sufficient for mod-
els to achieve high accuracy; they must also provide transparency about their

5
Table 1: Label-Class Mapping
Label Class
0 Normal
1 Complex Partial Seizures
2 Electrographic Seizures
3 Video detected Seizures with no visual change over EEG

decision-making processes. This section focuses on understanding the model’s


reliance on different EEG channels for each class prediction.

8.1 Tasks
• Identify the top 3 most important EEG channels contributing to the pre-
diction of each class. You are free to use any xAI (Explainable AI) tech-
niques. Some suggestions include SHAP (SHapley Additive exPlanations)
and saliency maps.
• Evaluate the impact of these channels on your best model performance
by masking or removing them from the training set and re-evaluating the
model’s accuracy on the validation set.
• Analyze and document how masking crucial channels impacts classifica-
tion performance across all classes.

8.2 Evaluation Criteria


• Correctness of the top 3 channels identified across classes.
• Clarity and comprehensiveness of the reasoning provided for channel se-
lection and the observed performance changes

9 Denoising
In real-world applications, data collected from sensors or medical instruments,
such as EEG signals, are often noisy due to various factors like environmental
interference, hardware limitations, and motion artifacts. These imperfections in
the data can significantly degrade the performance of tasks, such as classification
and anomaly detection, by masking the underlying signal of interest. Therefore,
effective signal denoising is a critical step in preprocessing, ensuring that the
data used for analysis is as clean and representative as possible.

9.1 Tasks
• Participants have been provided with a noisy train data folder. Denoise
this data with any method of your choice. Report the Peak Signal to

6
Noise Ratio (PSNR) value in dB by comparing the denoised signal with
the ground truth.
• Train your classifier using the denoised data. Evaluate this model on the
validation set and present the classification report from scikit-learn.

9.2 Evaluation Criteria


• Effectiveness of denoising will be evaluated by the value of Peak Signal to
Noise Ratio (PSNR) obtained.

• Performance of the model trained on the denoised data (classification re-


port from scikit-learn)
• Innovativeness and clarity of the denoising approach.

10 Generative Modeling Techniques for Synthetic


EEG Data
Modern generative algorithms have revolutionized the ability to generate re-
alistic synthetic data. In the context of EEG signal processing, these models
can be employed to generate class-wise synthetic EEG signals that mimic the
statistical properties of real data. This synthetic data can be used to augment
training datasets, helping to improve the performance of models in downstream
tasks such as classification. By enabling the generation of diverse datasets, gen-
erative approaches provide an opportunity to address problems in a dataset and
enhance model robustness, ultimately improving generalization to unseen data.
In the biomedical domain, where annotated data is often scarce and expensive
to collect, generative modeling becomes crucial for overcoming data limitations
and advancing model development

10.1 Tasks
• Participants are tasked with generating synthetic EEG data class-wise us-
ing modern generative algorithms. The synthetic data should aim to repli-
cate the distribution and characteristics of the real EEG data as closely
as possible. You are free to use any generative models of your choice.
• Using the generated synthetic data, participants must train a classifier
and evaluate its performance on the validation set. The evaluation will
be based on how closely the performance of this model aligns with the
performance of a classifier trained on real data.

• Suggest metrics that could be used to evaluate the quality of generated


synthetic EEG data.

7
10.2 Evaluation Criteria
• Performance of the model using generated synthetic data. Present the
classification report from scikit-learn for the validation set.
• Visualization and comparison of synthetic data with real data.

• Correctness of the metrics that have been suggested.

The End
8

You might also like