0% found this document useful (0 votes)
33 views8 pages

ML Project Guidelines SWE Winter 2024

The document outlines the project guidelines for the CSE 4554 Machine Learning Lab, detailing the project timeline, objectives, mark distribution, evaluation criteria, and types of projects. Students are encouraged to select topics that align with their interests and apply machine learning techniques to real-world problems. Additionally, it provides instructions for report preparation, presentation, and submission requirements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views8 pages

ML Project Guidelines SWE Winter 2024

The document outlines the project guidelines for the CSE 4554 Machine Learning Lab, detailing the project timeline, objectives, mark distribution, evaluation criteria, and types of projects. Students are encouraged to select topics that align with their interests and apply machine learning techniques to real-world problems. Additionally, it provides instructions for report preparation, presentation, and submission requirements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CSE 4554

Machine Learning Lab

Project Guidelines

Dr. Hasan Mahmud


Professor, Department of CSE
Md. Tanvir Hossain Saikat
Junior Lecturer, Department of CSE
September 11, 2024

1
Contents
1 Project Timeline 3

2 Project Objectives 3

3 Project Mark Distribution 3

4 Evaluation Criteria 3

5 Project Types 4
5.1 Innovative Novel Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.2 Understanding Existing Work and Comparing it with Different Datasets . . . . 4
5.3 Machine Learning Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 Choosing a Topic 5

7 Project Ideas 6

8 Some Project Ideas 6

9 Report Preparation 8

10 Presentation or Viva Preparation 8

11 Submission 8

2
1 Project Timeline
1. Group Formation: Within September 15, 2024

2. Proposal submission: Within September 22, 2024

3. Final Viva and Project Report Submission: TBD

THROUGHOUT EVERY LAB THE PROJECT WILL BE ASSESSED

2 Project Objectives
The objective of this machine learning project is to know how to apply machine learning steps
starting from data collection, dataset preparation, feature engineering, understanding and ap-
plying the right algorithms, and reporting the evaluation results.
This project work will allow you to explore a real-world application of machine learning
which may include either a problem of your own choice or a topic given by us. In job interviews,
it’s often your course projects that you end up discussing, so it has some importance even
beyond this class. That said, it is better to pick a project that you will be able to go deep with
(regarding trying different methods, error analysis, etc.), then choosing a very ambitious project
that requires so much setup that you will only have time to try one or two approaches.

3 Project Mark Distribution


Marks will be distributed in the following categories-

• Project Implementation (15 Marks)

• Project Proposal and Progress (5 Marks)

• Project Final Viva (10 Marks)

• Project Report (5 Marks)

4 Evaluation Criteria
• Each member of the project group will be assessed on how much they have contributed to
the project and will be scrutinized accordingly.

• Technical quality (i.e., Does the technical material make sense? Are the things tried
reasonably? Are the proposed algorithms or applications clever and interesting? Do the
members convey novel insight into the machine learning problems or/and algorithms?)

• Significance (Did the members choose an interesting or a "real" problem to work on, or
only a small "toy" problem? Is this work likely to be useful or/and have an impact?)

• Novelty of the work (Is the proposed application and approach novel or especially innova-
tive?)

• Understandability (Are the members well informed about the know-how of the implemen-
tation of the Project)

3
5 Project Types
5.1 Innovative Novel Work
This type of project focuses on the creation of new algorithms, models, or methodologies that
address unexplored or underexplored problems in machine learning. The goal is to contribute
to the advancement of the field by proposing a solution that offers improvements in terms of
accuracy, efficiency, or scalability.
Key characteristics of this project type include:

• Originality: The solution must demonstrate an element of novelty, either through a new
algorithm, architecture, or by applying an existing technique in an unexplored domain.

• Research-driven: The project often requires extensive literature review and deep under-
standing of current state-of-the-art methods.

• Evaluation: The performance of the novel approach should be thoroughly evaluated using
appropriate metrics, and it must be compared against established baselines to demonstrate
its superiority.

• Impact: The solution should have the potential to solve real-world problems or contribute
to academic advancements, leading to publication-worthy results.

Examples:

• For my undergrad research I developed a novel Generative Adversarial Network (GAN)


architecture which can translate from PET to CT images in an unpaired way. It consists
of incorporating attention mechanism, structure consistent loss functions and many more.

• Developing a novel optimization algorithm that enhances model training speed. Adam
Optimizer which is currently used in more or less all deep learning models is a perfect
example for this.

5.2 Understanding Existing Work and Comparing it with Different


Datasets
This type of project is focused on gaining a deep understanding of established machine learning
models or algorithms and testing their performance across a variety of datasets. The objective
is to critically assess how these models behave under different conditions and gain insights into
their strengths and weaknesses.
Key aspects of this project type include:

• Reproducibility: Successfully implementing existing models from research papers or


open-source repositories, ensuring they are trained and evaluated properly.

• Dataset diversity: Applying the models to various datasets, which may vary in size,
complexity, domain (e.g., text, image, or time series data), and noise level.

• Comparative analysis: Assessing model performance using consistent evaluation metrics


(e.g., accuracy, F1 score, RMSE) to draw meaningful conclusions. The comparison could
reveal model generalizability, overfitting tendencies, or robustness.

• Discussion of findings: Highlighting insights gained from comparing results, including


unexpected behavior or limitations of models on certain types of data.

4
Examples:
• Comparing a state-of-the-art image classification model across medical imaging and natural
image datasets.

• Evaluating the generalization capabilities of a natural language processing model on low-


resource versus high-resource languages.

5.3 Machine Learning Tool


This type of project aims to develop a practical, user-friendly tool or software that simplifies the
process of applying machine learning algorithms to real-world problems. These tools can target
developers, data scientists, or non-technical users, helping them streamline tasks such as data
preprocessing, model selection, training, and evaluation.
Key characteristics of this project type include:
• User Interface (UI) and Experience (UX): The tool should be designed to be intuitive
and accessible, allowing users to perform complex machine learning tasks with minimal
technical knowledge.

• Functionality: It should automate or simplify key stages of the machine learning pipeline,
such as data cleaning, feature engineering, model training, hyperparameter tuning, or result
visualization.

• Customization: The tool should offer flexibility, enabling users to modify parameters,
choose different algorithms, or input different datasets.

• Scalability: The system should handle a variety of tasks efficiently, from small-scale
experiments to large-scale datasets, without sacrificing performance.

• Documentation and Tutorials: To facilitate widespread adoption, the tool should be


well-documented and include examples, tutorials, and possibly an active support commu-
nity.
Examples:
• Developing a web-based tool that allows small business owners to upload data and auto-
matically generate predictive models for customer behavior.

• Building a machine learning library that integrates multiple algorithms and allows seamless
model comparison and hyperparameter optimization.

6 Choosing a Topic
Your first task as a team is to identify a topic for your project. One of the best ways to identify
a topic is to choose an application domain that interests you and recognize problems in that
domain. For instance, students can select a problem domain like Human-Computer Interaction,
Computer Vision, Natural Language Processing, Software Engineering, Data Mining, and so on,
then explore how to apply Machine learning algorithms to solve the problem of the domain best.
Let the problem drive your choice of technique, rather than the other way around. Most projects
will be based on particular applications.
Alternatively, you can also choose a problem or set of problems and then develop a new learning
algorithm (or novel variant of an existing learning algorithm) to solve it. Although this class is
not intended to prepare you to develop novel learning methods, you may choose to develop a

5
novel learning method (or novel variant) if you want a challenge.
Regardless, most projects will combine aspects of both applications and algorithms. Your project
must include an evaluation of real-world data (i.e., not a "toy" domain or synthetic data). The
techniques used should be relevant to our class, so most likely you will be building a prediction
system. A deep learning model would also be acceptable, though we will not cover these topics
until later in the semester.
If you intend to work on the latest methodologies in machine learning and aim towards turning
your project into a publication, then, you should look at the proceedings of top conferences in
various domains such as OOPSLA, ICSE, ICST, ISSTA, CHI, ACL, CVPR, EMNLP, AAAI,
ICLR, ICPR, Big Data etc. Many authors share their code on github for free, which can be
often found on the website called "paperswithcode.com".
Also, think about feasibility. You should not choose a project that is so ambitious that you
cannot show any progress in your final viva. So, choose a project that you will be able to make
reasonable progress and hopefully complete it within the given timeline.

7 Project Ideas
Many fantastic course projects will come from students choosing either an application that they
are interested in or picking some sub-field of machine learning that they want to explore more
and working on that topic. If you have been thinking about starting a research project, this
project may also provide you an opportunity to do so.
Alternatively, if you are already working on a research project that machine learning might
apply to, then working out how to apply ML to the project will often make an excellent project
topic. Similarly, if you currently work in the industry and have an application on which machine
learning might help, that could also be a great project.
As mentioned already students working in the domains of SE, HCI, DIP, BIOINFORMATICS,
NETWORKING, and DATA MINING can select project ideas from their domain and apply
Machine Learning algorithms to develop their project.

8 Some Project Ideas


• Frontal to Facial Face Generation: This project aims to develop a machine learning
model that can take a frontal view image of a face and generate realistic images from differ-
ent angles. The challenge lies in preserving facial features and expressions while changing
the viewpoint. Deep learning models like Generative Adversarial Networks (GANs) or
Variational Autoencoders (VAEs) can be explored for this task. Key aspects include: Col-
lecting a diverse dataset of facial images from multiple angles, Training a model to map the
frontal view to other viewpoints, Evaluating the generated images for realism and facial
feature consistency.

• Automated Code Review: ML models analyze code syntax, structure, changes, and
comments to identify potential bugs, security flaws, and style issues. Provides real-time
feedback to developers as they make changes. Complements human code reviews focused
on design and architecture. Tools like CodeGuru use neural networks, random forests, and
logistic regression trained on codebases. Key challenges are model accuracy, explainability,
and integration into developer workflows.

• Real-time Object Detection in Videos: This project involves building a real-time


object detection system that can accurately identify and track objects in video footage.
Models like YOLO (You Only Look Once) or SSD (Single Shot Multibox Detector) can

6
be used for this purpose. Key aspects include: Training an object detection model on a
large video dataset, Optimizing the model for speed and accuracy to work in real-time,
Implementing techniques for object tracking across frames.

• Automated Testing: ML generates more effective test cases by modeling code execution
paths to find edge cases. Focuses testing on high-risk modules based on past defect data.
Increases test coverage and defect detection compared to manual or random testing. Uses
techniques like neural networks, static analysis, and evolutionary fuzzing. Key challenges
are computational complexity, explaining auto-generated tests, and developer adoption.

• Sentiment Analysis of Social Media Posts: This project involves developing a system
to automatically analyze the sentiment (positive, negative, or neutral) expressed in social
media posts. Natural Language Processing (NLP) techniques, such as word embeddings
and Transformer models, can be applied to understand the context and emotion behind
the text. Key aspects include: Collecting social media posts (e.g., Twitter, Reddit) and la-
beling them with sentiment categories, Preprocessing text data to handle noise, slang, and
abbreviations, Training a sentiment analysis model using supervised learning techniques.

• Debugging Assistance: ML analyzes logs, memory dumps, and execution traces to iden-
tify root causes of crashes, errors, or performance issues. Flags abnormal events indicative
of bugs by learning expected execution flows. Reduces debugging time by pointing devel-
opers directly at likely causes. Tools like Lookout use random forests and neural networks.
Key challenges are model interpretability, sufficient training data, and integration with
debuggers.

• Performance Optimization: ML profiles code execution to detect hot spots, inefficient


algorithms, and memory leaks. Recommends optimizations like caching, parallelization,
and data structure changes. Avoids need for manual code instrumentation and line-by-
line profiling. Tools like Sapienz use MCMC and clustering. The key challenge is model
generalizability across different codebases.

• Security Enhancement: ML detects vulnerabilities like SQL injection, and XSS by


analyzing code patterns and commits. Flags unauthorized access or anomalous events
indicative of attacks. Reinforces secure coding best practices. Integrates into workflows
via tools like CodeQL. Key challenges are false positives, rare security event modeling, and
adversarial attacks.

• Code Generation: ML generates boilerplate code, documentation, and unit tests based
on learned patterns, saving developer time. Enforces consistent style via tools like TabNine
using GPT-style models. Key challenges are contextual awareness, controlling code quality,
and building developer trust.

• Code Search: ML indexes and abstracts code into semantic representations to enable
intelligent search for relevant code examples. Improves discoverability within large, unfa-
miliar codebases. Uses transformer techniques like GitHub Copilot. Key challenges are
search relevance, plagiarism detection, and licensing.

• Refactoring Suggestions: ML identifies code quality issues and recommends targeted


refactors like extracts, move, and encapsulation to improve maintainability. Saves time
compared to manual code reviews. Tools like SapFix use logistic regression on code metrics.
Key challenges are justifying recommendations, preventing over-engineering, and model
interpretability.

7
• Predictive Analytics: ML analyzes project artifacts to forecast risks, defects, and time-
lines. Supports early risk detection and resource planning. Predicts outcomes like delivery
date, quality, and productivity. Uses techniques like clustering, regression, and DNNs. A
key challenge is model accuracy across projects.
• Speech Emotion Recognition: This Speech Emotion Recognition (SER) directs the
identification of the emotional states of a person from their speech.
• Online Product Recommendation System by Using Eye Gaze Data: Recom-
mendation system takes information related to the users’ habits or interests or profiles to
suggest users with more convenient or similar materials that the users might be interested
in. This study presents a recommendation system that will use users’ eye gaze data to
apprehend their interest in recommending products as an implicit feedback technique.
• Static hand gesture recognition: Symbolic gestures are static hand postures that con-
vey meaning without voice. Existing hand gesture recognition approaches underutilize
depth information of finger shapes and bending as contextual features. However, finger
bending extracted from depth maps can help distinguish between very similar static ges-
tures that differ subtly in hand-finger poses. Static or even dynamic hand gesture analysis
using ML algorithms can be an exciting topic.

9 Report Preparation
You have to write the report on the IEEE Official Conference Template which can be found on
Overleaf using the following link-
https:// www.overleaf.com/ latex/ templates/ ieee-conference-template/ grfzhhncsfqn
Use necessary figures, images, diagrams, flowcharts, and tables to properly explain your
project in full detail. The writing should be professional and academic and should not have
spelling and grammar mistakes. Your reports must not contain any sort of plagiarism. The
reports will be checked in TurnItIn for plagiarism detection.

10 Presentation or Viva Preparation


Prepare to explain your code and report fully. You should be able to express each of the member’s
contributions to the project. Be prepared to explain any difficulties, challenges, or problems you
faced during development and how you were able to solve them.

11 Submission
Upload all of your files in a GitHub repository. Ideally, your repository should have the following
folders:
• Code
• Dataset
• Presentations
• Report
• Others
Provide the link to your GitHub repository in the classroom submission assignment.

You might also like