2nd Review PPT Template
2nd Review PPT Template
11/06/2024 3
Objectives
Data Exploration and Preparation:
• Merge and consolidate provided datasets.
• Visualize gene expression using a hierarchically-clustered heatmap.
• Conduct null-hypothesis testing for significant observations.
Dimensionality Reduction:
• Extract relevant attributes from the 20K gene expressions using PCA, LDA, and t-SNE.
Gene and Sample Clustering:
• Group genes with similar expression patterns.
• Cluster samples based on similarity and cancer type, and identify any outliers.
Model Development and Classification:
• Develop classification models (multiclass SVM, Random Forest, Deep Neural Network) to predict cancer type.
• Refine models for accuracy and robustness.
Feature Selection and Validation:
• Employ forward selection and backward elimination for feature refinement.
• Validate selected genes using t-test and F-test.
Outcome:
• Aim for early and precise cancer type identification to enhance timely treatment.
• In essence, the project aims to use gene expression data for efficient cancer type diagnosis and treatment
recommendations.
11/06/2024 4
Literature Survey
• Genetic algorithms (GAs) are metaheuristics that belong
to the class of evolutionary algorithms. GAs can find the
optimal or near-optimal solutions in huge, difficult search
spaces and are widely used for search and optimization.
This makes them ideal for detecting cancer by creating
models to interpret the results of tests, especially
noninvasive. In this article, we have comprehensively
reviewed the existing literature, analyzed them critically,
provided a comparative analysis of the state-of-the-art
techniques, and identified the future challenges in the
development of such techniques by medical professionals.
11/06/2024 5
Architecture diagram / Block diagram of the
modules
11/06/2024 6
UML Diagrams
11/06/2024 7
Algorithms used
• Exploratory Data Analysis:
– Hierarchically-clustered heatmap (used for visualization).
• Dimensionality Reduction:
– PCA (Principal Component Analysis)
– LDA (Linear Discriminant Analysis)
– t-SNE (t-Distributed Stochastic Neighbor Embedding)
• Clustering:
– k-means clustering
– Hierarchical clustering
– Mean shift clustering
• Classification Models:
– Multiclass SVM (Support Vector Machine)
– Random Forest
– Deep Neural Network
• Feature Selection
• Model Validation (Statistical Significance Testing):
11/06/2024
– t-test (for one vs. all) 8
Output
11/06/2024 9
11/06/2024 10
11/06/2024 11
11/06/2024 12
Thanks
11/06/2024 13