0% found this document useful (0 votes)
5 views

final_synopsis

The document presents a project titled 'ANN based Classification of Dietary Restricted Genes', which aims to develop a deep learning model for classifying genes based on their response to calorie restriction. The model achieved an accuracy of 98% and could contribute to advancements in personalized medicine and health strategies. The project involves data collection, preprocessing, model training, and evaluation, focusing on the relationship between dietary habits and genetic responses.

Uploaded by

officialravi7978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

final_synopsis

The document presents a project titled 'ANN based Classification of Dietary Restricted Genes', which aims to develop a deep learning model for classifying genes based on their response to calorie restriction. The model achieved an accuracy of 98% and could contribute to advancements in personalized medicine and health strategies. The project involves data collection, preprocessing, model training, and evaluation, focusing on the relationship between dietary habits and genetic responses.

Uploaded by

officialravi7978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

SYNOPSIS

ON
ANN based Classification of Dietary Restricted Genes

Submitted By

PRADUMN MONDAL
SONAM KUMARI
ANUJ KUMAR
VISHAL KUMAR ARYA
KUMAR ROUNAK

Under the guidance of


Mrs. Sunidhi Priyadarshini
Department of Computer Science & Engineering

Techno India, Polytechnic Compound, Dumka, Jharkhand 814101


Synopsis
Submitted to
Dumka Engineering College
in partial fulfilment
for the award of the degree of
BACHELOR OF TECHNOLOGY
In the department of
Computer Science & Engineering
March 2024

Jharkhand University of Technology, Ranchi


DUMKA ENGINEERING COLLEGE
(Estd. by govt. of Jharkhand & run by Techno India under ppp)
Techno India, polytechnic compound road, Dumka, Jharkhand 814101

CERTIFICATE OF THE
SUPERVISOR

Certified that this project report titled “ANN based Classification of Dietary
Restricted Genes" is the bonafide work of “Pradumn Mandal (20040440027)” who carried
out the project work under my supervision. Certified further, that to the best of my
knowledge, the work reported herein does not form any other project report or dissertation
on the basis of which a degree or award was conferred on an earlier occasion on this or any
other candidate.

Signature
Mrs. Sunidhi Priyadarshini Signature
Assistant Professor Mr. Ranadeep Dey
(Department of Computer Head of the Department
Science &Engineering) (Department of Computer
Science & Engineering)

i
DECLARATION

I declare that this written submission represents my ideas in my own words and where others'
ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.

….…………………………….

Name and Signature of the Students

3
ACKNOWLEDGEMENT

I would like to express my deepest gratitude to my guide, Mrs. Sunidhi Priyadarshini her
valuable guidance, consistent encouragement, personal caring, timely help and providing me
with an excellent atmosphere for doing the project. All through the work, in spite of her busy
schedule, she has extended cheerful and cordial support to me for completing this project
work.

..…..……………………….

Name and Signature of the Student

4
ABSTRACT

ANN based classification of dietary genes model is a smart web application. This model is
one of the approaches which is being studied for its ability to extend lifespan and improve
health. In this project, we are using a Deep learning (subset of Machine Learning) approach
to classify genes based on their response to calorie restriction. In the model input data are fed
which are output of the effect of biological processes on the particular gene and the model
would predict whether the gene is calorie restricted or non-calorie restricted. For preparing
the model it involved the steps like collection of data where data of both kinds, i.e. calorie
restricted and non-calorie restricted were collected, Data preprocessing where data was
converted into suitable format for analysis, splitting of data into training data and test data,
and then model is created and trained using labeled data (calorie-restricted and non-calorie-
restricted genes) and evaluated its performance using cross-validation. Our model achieved
an accuracy of 98% in classifying genes into the correct category.
So, this project could play important role in improving health, longevity, and personalized
medicine. It may help to identify specific targets for treatments related to calorie restriction.
By exploring how genes respond to different diets, especially calorie restriction, the project
adds valuable information to the growing field that combines genetics and nutrition.

5
TABLE OF CONTENTS

CERTIFICATE OF THE SUPERVISOR i

DECLARATION ii

ACKNOWLEDGEMENT iii

ABSTRACT iv

LIST OF FIGURES

CHAPTER 1: INTRODUCTION

1.1. PURPOSE 1

1.2. OBJECTIVE & SCOPE 2

1.3. MOTIVATION 3

1.4. SIGNIFICANCE 4

1.5. LITERATURE REVIEW 5

CHAPTER 2: WORKING APPROACH FOR PROJECT

2.1. TOOLS AND TECHNOLOGY 6

CHAPTER 3: IMPLEMENTATION AND MODIFICATION

3.1. IMPLEMENTATION AND MODIFICATION 7-9

CHAPTER 4: SNAPSHOTS

4.1. SNAPSHOTS 10-11

CHAPTER 5: FUTURE ENHANCEMENT AND CONCLUSION

5.1. FUTURE ENHANCEMENTS 12

5.2. CONCLUSION 13

CHAPTER 6: REFRENCES

6.1. REFRENCES 14

6
LIST OF FIGURES

Figure 3.1: Dataset First Part 7


Figure 3.2: Dataset Second Part 7
Figure 3.3: Processed Dataset First Part 8
Figure 3.4: Processed Dataset Second Part 8
Figure 3.5: Data Splitting Process 8
Figure 4.1: Accuracy After First Fold 10
Figure 4.2: Accuracy After Second Fold 10
Figure 4.3: Accuracy After Third Fold 10
Figure 4.4: Accuracy After Forth Fold 11
Figure 4.5: Accuracy After Fifth Fold 11

7
PURPOSE

The project sheds light on the process of developing plans to provide valuable information of
complex biology of calorie restriction and contribute to develop strategies for promoting
health and longevity. It develops the knowledge of healthy ageing. This project involves
developing of Deep learning model which classifies genes based on its responsive to calorie
attributes. A Calorie restriction involves eating less without
malnutrition and it has been connected to health benefits like improved metabolism, longer
life spans, and delayed onset of age-associated diseases and longer lifespan. By classifying
genes, this model aim to admit which ones have the positive impact. This model uses
machine learning and deep learning algorithms for classification and model will investigate
how gene will react to calorie restriction. Some genes might be more sensitive to changes in
caloric intake, affecting overall health and well-being. It will benefit the healthcare
professionals, Pharmaceutical industries and others.

1
OBJECTIVE & SCOPE

Objective
Living a healthy life and promoting healthy ageing is a universal right, and various
approaches are explored to enhance good health. Calorie restriction is one such way that has
been used for it, i.e. promoting health and longevity. The main objective of the project is to
develop Deep learning model which classifies genes based on its response to restriction of
calories, i.e. into two distinct categories: calorie-restricted(CR) and non-calorie-
restricted(NotCR) by analyzing a dataset containing information about genes or proteins,
their relationships with biological processes, and their association with calorie restriction.
Based on the outputs of different biological processes with a particular gene, the machine
would decide whether to restrict calories to this gene or not. By knowing which genes are
affected, we might be able to create drugs or therapies that copy the positive effects of calorie
restriction.

Scope
The main focus is on creating a deep learning model capable of accurately classifying genes
into two categories: "calorie-restricted" and "non-calorie-restricted."

 Data Collection: Gather gene data from relevant sources (such as public databases or
experimental studies).
 Feature Extraction: Extract valuable features from the data.
 Model Development: Build a Deep Learning model to predict whether a gene is CR or
Non-CR.
 Evaluation: Assess the model’s performance using appropriate metrics (AUC –
ROC,etc.).
 Biological Insights: Investigate the biological significance of the identified genes and
their response to calorie restriction.
 Application: The model can be used is as follows:

o Personalized Medicine: Contribute to the development of medicines based on an


individual's genetic structure, considering their response to calorie restriction.
o Disease Prevention: Contribute to strategies for preventing diseases associated with
aging or poor metabolic health.

2
o Contribute to Scientific Literature: Publish findings to contribute to the scientific
community's understanding of the molecular aspects of calorie restriction.

3
MOTIVATION

The motivation behind using Artificial Neural Networks for classifying dietary-restricted
genes is to identifying dietary-restricted genes using Artificial Neural Networks is to gain
insight into the fundamental mechanisms of dietary restriction, a well-researched long
duration strategy. An data-driven approach to examining the intricate relationship between
genes and dietary restriction can be provided by ANNs. This may provide light on the
mechanisms underlying lifespan and suggest a path to get cured.

4
SIGNIFICANCE

 Dietary Restriction involves intentionally restricting or limiting calorie intake in one’s


diet, it’s a well-established way that improves lifespan and promotes good health.

 ANNs, since it learns complex patterns easily. It is used on dataset containing gene
expression profiles under various dietary conditions, it develops model to classify those
data into their respective groups Calorie Restricted and Non-Calorie Restricted.

 This classification can help identify specific genes that are associated with dietary
restriction. These genes may serve as potential biomarkers for monitoring the effects of
dietary restrictions or predicting responses to dietary changes.

 It is useful in recommending nutrition to people of different ages based on response of


genes on dietary restrictions.

 Contribute to the development of medicines or drugs based on their response to calorie


restriction which represents the drug targets, developing these can modulate these genes
could lead relief for aging-related problems.

5
LITERATURE REVIEW

Ageing increases the risk of many health-related disease as there is decline in the capacity to
respond to the external stimuli or nature around us. There are lot of efforts have been put to
improve these problems to promote healthy ageing and longevity, and among those Dietary
Restriction is one of the solutions. It involves reduction in total dietary intake while
maintaining adequate vitamin and mineral levels, is currently the most promising way for
increasing both lifespan and healthspan. A Machine learning model is developed to classify
genes into Calorie Restricted (CR) and Non-Calorie Restricted group based on response to
the different biological processes on the gene[6].

Recent studies shown that the use of ML effectively identified features associated with DR,
which made the work easier for health worker to suggest diet to patients, development of
medicines based on individual's genetic structure, considering their response to calorie
restriction, etc. and there are many issues it solved or made easier to solve. But somehow
there were some data which were little biased towards any of the categories, on these data
model’s efficiency was less [5]. Now a days many researches are going on which is
improving its performance. So, its suggested in studies that this model proved to be boon to
the mankind, it helps in improving lifespan and healthy ageing with less health-related
problem.

6
TOOLS AND TECHNOLOGY

Tools

objective of disseminating machine learning education and


research
 Chrome: Google chrome browse is a free web browser used for accessing the internet
and running web-based application. Chrome is a Fast, secure browser and it is easy to
use also it work as search engine. It also provides Google Workspace products, such as
Google Docs, Sheets, Slides, and more.

 Google Collab: Collaboratory or “Colab” for short, is a product from Google Research.
It allows us to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. It provides either
python 2 and 3 runtimes pre-configured with the essential machine learning and artificial
intelligences libraries, such as TensorFlow, Matplotlib, Keras[2].

Technology
 Machine Learning: A branch of artificial intelligence where engineers and scientists
manually select features within the data and train the model. Common machine learning
algorithms include decision trees, support vector machines, neural networks, and
ensemble methods.

 Python Libraries:

o Keras: Keras is an API designed for human beings, not machines. Keras follows
best practices for reducing cognitive load: it offers consistent & simple APIs, it
minimizes the number of user actions required for common use cases, and it
provides clear & actionable error messages[3].

o Pandas: It is a powerful open source library used for exploring, analysis and
manipulation of data. It is used because pandas are used in conjunction with other
libraries that are used for data science and machine learning[1].

o Sklearn :-Scikit-learn (Sklearn) is the most useful and robust library for machine
learning in Python. It provides a selection of efficient tools for machine learning
and statistical modelling including classification, regression, and clustering. This

7
library, which is largely written in Python, is built upon NumPy, SciPy and
Matplotlib[4].

IMPLEMENTATION AND MODIFICATION

1. Data Preprocessing and Transformation:


Data preprocessing is the method of analyzing, filtering, transforming and encoding data so
that a machine learning algorithm can understand and work with the processed output. In data
extracted from real-world scenarios, there’s always noise and missing values. This happens
due to manual errors, unexpected events, technical issues, or a variety of other obstacles.
Incomplete and noisy data can’t be consumed by algorithms, because they’re usually not
designed to handle missing values, and the noise causes disruption in the true pattern of the
sample. Data preprocessing aims to solve these problems by thorough treatment of the data at
hand [1].

Algorithms used by the model only works with numerical data, but the data available, have
string column and categorical values. So, data transformation is done.

Figure 3.1: -Dataset First Part

Figure 3.2: -Dataset Second Part

8
First column contains names which can be removed and last column “Class” is categorical
data will be mapped to numerical data. After transformation data becomes like this.

Figure 3.3: -Processed Dataset First Part

Figure 3.4: - Processed Dataset Second Part

2. Data Splitting in Training and Testing:

Processed dataset is have both input values and corresponding label. At first extract input
values and label as output. Then perform data splitting of data in training and testing dataset.

Train test split is a model validation process that allows the model to simulate how it would
perform with new data .

9
Figure 3.5: - Data Splitting Process

3. Model creation and compilation:

The model creation process involves defining the architecture of the model, which includes
the number of layers, the type of layers, and the number of nodes in each layer. The model
used in this project is a deep neural network implemented using the Keras library, which is a
high-level neural networks API, written in Python and capable of running on top of
TensorFlow.

The model is created using the Model class from Keras. The architecture of the model is
defined as follows:

 An input layer (visible) with 8640 nodes, corresponding to the number of features in the
dataset.
 Five hidden layers (hidden1, hidden2, hidden3, hidden4, hidden5) with varying numbers
of nodes (50, 20, 20, 10, 10 respectively) and all using the ReLU (Rectified Linear Unit)
activation function. The ReLU function is an activation function that outputs the input
directly if it is positive, otherwise, it outputs zero.
 An output layer (output) with one node, using the sigmoid activation function. The
sigmoid function is commonly used in binary classification problems as it squashes its
input values between 0 and 1, which can be treated as probabilities.

The model is then compiled with the compile method, which configures the model for
training. The model uses the binary_crossentropy loss function, which is suitable for binary
classification problems. The optimizer used is Adam. The model metrics to be evaluated
during training and testing is accuracy.

The model is then trained using the fit method, which trains the model for a fixed number of
epochs (iterations on a dataset). The model is trained for 500 epochs with a batch size of 15.

The batch size is a hyperparameter that defines the number of samples to work through
before updating the internal model parameters. The number of epochs is a hyperparameter
that defines the number times that the learning algorithm will work through the entire training
dataset. One epoch means that each sample in the training dataset has had an opportunity to
update the internal model parameters. An epoch is comprised of one or more batches.

10
This model uses K-fold cross validation for training and testing purpose with split of 5. After
testing it with K-fold cross validation accuracy of the model comes out to be 99.55%.

11
SNAPSHOTS

Figure 4.1: - Accuracy After First Fold

Figure 4.2: - Accuracy After Second Fold

Figure 4.3: - Accuracy After Third Fold

12
Figure 4.4: - Accuracy After Fourth Fold

Figure 4.4: - Accuracy After Fifth Fold

13
FUTURE ENHANCEMENTS

The ANN based model is developed which aimed to classify genes into respective dietary
restriction, i.e. – Calorie Restriction (CR) and Non-Calorie Restriction (Non-CR), and the
model successfully achieved the target. Cross validation test was performed on the model and
it gave 99.55% accuracy. It has lots of scope to improve performance, interface and accuracy.

There are some testing required which are to be performed which are kept in future
enhancement-

 Perform Confusion Matrix computation on the model.


 Perform AUC - ROC curve calculation.
 Use different number of nodes to improve accuracy and performance.
 Plot model to visualize it.

14
CONCLUSION

In conclusion, this project has reached a significant milestone in the world of health and
longevity by creating an advanced web tool. This tool, powered by sophisticated Artificial
Neural Networks (ANNs), serves a crucial purpose: it distinguishes between genes influenced
by calorie restriction and those unaffected, with an outstanding accuracy rate of 99.55%. The
significance of this achievement cannot be overstated. By exploring the complex relationship
between genetics and diet, this project offers valuable insights into how calorie restriction
impacts our biology. Moving forward, model will be refined and enhanced even further.
Through these endeavours, it aims to unlock new ways of understanding the molecular
mechanisms underlying calorie restriction, paving the way for transformative advancements
in healthcare and well-being.

15
REFRENCES

[1] https://www.geeksforgeeks.org/introduction-to-pandas-in-python/

[2] https://www.researchgate.net/publication/328158184_Performance_Analysis_of

_Google_Colaboratory_as_a_Tool_for_Accelerating_Deep_Learning_Applications

[3] https://keras.io/about/

[4] https://www.tutorialspoint.com/scikit_learn/index.htm

[5] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8729156/

[6] MacNee W., Is chronic obstructive pulmonary disease an accelerated aging disease?

(2016)

16

You might also like