final_synopsis
final_synopsis
ON
ANN based Classification of Dietary Restricted Genes
Submitted By
PRADUMN MONDAL
SONAM KUMARI
ANUJ KUMAR
VISHAL KUMAR ARYA
KUMAR ROUNAK
CERTIFICATE OF THE
SUPERVISOR
Certified that this project report titled “ANN based Classification of Dietary
Restricted Genes" is the bonafide work of “Pradumn Mandal (20040440027)” who carried
out the project work under my supervision. Certified further, that to the best of my
knowledge, the work reported herein does not form any other project report or dissertation
on the basis of which a degree or award was conferred on an earlier occasion on this or any
other candidate.
Signature
Mrs. Sunidhi Priyadarshini Signature
Assistant Professor Mr. Ranadeep Dey
(Department of Computer Head of the Department
Science &Engineering) (Department of Computer
Science & Engineering)
i
DECLARATION
I declare that this written submission represents my ideas in my own words and where others'
ideas or words have been included, I have adequately cited and referenced the original
sources. I also declare that I have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.
….…………………………….
3
ACKNOWLEDGEMENT
I would like to express my deepest gratitude to my guide, Mrs. Sunidhi Priyadarshini her
valuable guidance, consistent encouragement, personal caring, timely help and providing me
with an excellent atmosphere for doing the project. All through the work, in spite of her busy
schedule, she has extended cheerful and cordial support to me for completing this project
work.
..…..……………………….
4
ABSTRACT
ANN based classification of dietary genes model is a smart web application. This model is
one of the approaches which is being studied for its ability to extend lifespan and improve
health. In this project, we are using a Deep learning (subset of Machine Learning) approach
to classify genes based on their response to calorie restriction. In the model input data are fed
which are output of the effect of biological processes on the particular gene and the model
would predict whether the gene is calorie restricted or non-calorie restricted. For preparing
the model it involved the steps like collection of data where data of both kinds, i.e. calorie
restricted and non-calorie restricted were collected, Data preprocessing where data was
converted into suitable format for analysis, splitting of data into training data and test data,
and then model is created and trained using labeled data (calorie-restricted and non-calorie-
restricted genes) and evaluated its performance using cross-validation. Our model achieved
an accuracy of 98% in classifying genes into the correct category.
So, this project could play important role in improving health, longevity, and personalized
medicine. It may help to identify specific targets for treatments related to calorie restriction.
By exploring how genes respond to different diets, especially calorie restriction, the project
adds valuable information to the growing field that combines genetics and nutrition.
5
TABLE OF CONTENTS
DECLARATION ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF FIGURES
CHAPTER 1: INTRODUCTION
1.1. PURPOSE 1
1.3. MOTIVATION 3
1.4. SIGNIFICANCE 4
CHAPTER 4: SNAPSHOTS
5.2. CONCLUSION 13
CHAPTER 6: REFRENCES
6.1. REFRENCES 14
6
LIST OF FIGURES
7
PURPOSE
The project sheds light on the process of developing plans to provide valuable information of
complex biology of calorie restriction and contribute to develop strategies for promoting
health and longevity. It develops the knowledge of healthy ageing. This project involves
developing of Deep learning model which classifies genes based on its responsive to calorie
attributes. A Calorie restriction involves eating less without
malnutrition and it has been connected to health benefits like improved metabolism, longer
life spans, and delayed onset of age-associated diseases and longer lifespan. By classifying
genes, this model aim to admit which ones have the positive impact. This model uses
machine learning and deep learning algorithms for classification and model will investigate
how gene will react to calorie restriction. Some genes might be more sensitive to changes in
caloric intake, affecting overall health and well-being. It will benefit the healthcare
professionals, Pharmaceutical industries and others.
1
OBJECTIVE & SCOPE
Objective
Living a healthy life and promoting healthy ageing is a universal right, and various
approaches are explored to enhance good health. Calorie restriction is one such way that has
been used for it, i.e. promoting health and longevity. The main objective of the project is to
develop Deep learning model which classifies genes based on its response to restriction of
calories, i.e. into two distinct categories: calorie-restricted(CR) and non-calorie-
restricted(NotCR) by analyzing a dataset containing information about genes or proteins,
their relationships with biological processes, and their association with calorie restriction.
Based on the outputs of different biological processes with a particular gene, the machine
would decide whether to restrict calories to this gene or not. By knowing which genes are
affected, we might be able to create drugs or therapies that copy the positive effects of calorie
restriction.
Scope
The main focus is on creating a deep learning model capable of accurately classifying genes
into two categories: "calorie-restricted" and "non-calorie-restricted."
Data Collection: Gather gene data from relevant sources (such as public databases or
experimental studies).
Feature Extraction: Extract valuable features from the data.
Model Development: Build a Deep Learning model to predict whether a gene is CR or
Non-CR.
Evaluation: Assess the model’s performance using appropriate metrics (AUC –
ROC,etc.).
Biological Insights: Investigate the biological significance of the identified genes and
their response to calorie restriction.
Application: The model can be used is as follows:
2
o Contribute to Scientific Literature: Publish findings to contribute to the scientific
community's understanding of the molecular aspects of calorie restriction.
3
MOTIVATION
The motivation behind using Artificial Neural Networks for classifying dietary-restricted
genes is to identifying dietary-restricted genes using Artificial Neural Networks is to gain
insight into the fundamental mechanisms of dietary restriction, a well-researched long
duration strategy. An data-driven approach to examining the intricate relationship between
genes and dietary restriction can be provided by ANNs. This may provide light on the
mechanisms underlying lifespan and suggest a path to get cured.
4
SIGNIFICANCE
ANNs, since it learns complex patterns easily. It is used on dataset containing gene
expression profiles under various dietary conditions, it develops model to classify those
data into their respective groups Calorie Restricted and Non-Calorie Restricted.
This classification can help identify specific genes that are associated with dietary
restriction. These genes may serve as potential biomarkers for monitoring the effects of
dietary restrictions or predicting responses to dietary changes.
5
LITERATURE REVIEW
Ageing increases the risk of many health-related disease as there is decline in the capacity to
respond to the external stimuli or nature around us. There are lot of efforts have been put to
improve these problems to promote healthy ageing and longevity, and among those Dietary
Restriction is one of the solutions. It involves reduction in total dietary intake while
maintaining adequate vitamin and mineral levels, is currently the most promising way for
increasing both lifespan and healthspan. A Machine learning model is developed to classify
genes into Calorie Restricted (CR) and Non-Calorie Restricted group based on response to
the different biological processes on the gene[6].
Recent studies shown that the use of ML effectively identified features associated with DR,
which made the work easier for health worker to suggest diet to patients, development of
medicines based on individual's genetic structure, considering their response to calorie
restriction, etc. and there are many issues it solved or made easier to solve. But somehow
there were some data which were little biased towards any of the categories, on these data
model’s efficiency was less [5]. Now a days many researches are going on which is
improving its performance. So, its suggested in studies that this model proved to be boon to
the mankind, it helps in improving lifespan and healthy ageing with less health-related
problem.
6
TOOLS AND TECHNOLOGY
Tools
Google Collab: Collaboratory or “Colab” for short, is a product from Google Research.
It allows us to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. It provides either
python 2 and 3 runtimes pre-configured with the essential machine learning and artificial
intelligences libraries, such as TensorFlow, Matplotlib, Keras[2].
Technology
Machine Learning: A branch of artificial intelligence where engineers and scientists
manually select features within the data and train the model. Common machine learning
algorithms include decision trees, support vector machines, neural networks, and
ensemble methods.
Python Libraries:
o Keras: Keras is an API designed for human beings, not machines. Keras follows
best practices for reducing cognitive load: it offers consistent & simple APIs, it
minimizes the number of user actions required for common use cases, and it
provides clear & actionable error messages[3].
o Pandas: It is a powerful open source library used for exploring, analysis and
manipulation of data. It is used because pandas are used in conjunction with other
libraries that are used for data science and machine learning[1].
o Sklearn :-Scikit-learn (Sklearn) is the most useful and robust library for machine
learning in Python. It provides a selection of efficient tools for machine learning
and statistical modelling including classification, regression, and clustering. This
7
library, which is largely written in Python, is built upon NumPy, SciPy and
Matplotlib[4].
Algorithms used by the model only works with numerical data, but the data available, have
string column and categorical values. So, data transformation is done.
8
First column contains names which can be removed and last column “Class” is categorical
data will be mapped to numerical data. After transformation data becomes like this.
Processed dataset is have both input values and corresponding label. At first extract input
values and label as output. Then perform data splitting of data in training and testing dataset.
Train test split is a model validation process that allows the model to simulate how it would
perform with new data .
9
Figure 3.5: - Data Splitting Process
The model creation process involves defining the architecture of the model, which includes
the number of layers, the type of layers, and the number of nodes in each layer. The model
used in this project is a deep neural network implemented using the Keras library, which is a
high-level neural networks API, written in Python and capable of running on top of
TensorFlow.
The model is created using the Model class from Keras. The architecture of the model is
defined as follows:
An input layer (visible) with 8640 nodes, corresponding to the number of features in the
dataset.
Five hidden layers (hidden1, hidden2, hidden3, hidden4, hidden5) with varying numbers
of nodes (50, 20, 20, 10, 10 respectively) and all using the ReLU (Rectified Linear Unit)
activation function. The ReLU function is an activation function that outputs the input
directly if it is positive, otherwise, it outputs zero.
An output layer (output) with one node, using the sigmoid activation function. The
sigmoid function is commonly used in binary classification problems as it squashes its
input values between 0 and 1, which can be treated as probabilities.
The model is then compiled with the compile method, which configures the model for
training. The model uses the binary_crossentropy loss function, which is suitable for binary
classification problems. The optimizer used is Adam. The model metrics to be evaluated
during training and testing is accuracy.
The model is then trained using the fit method, which trains the model for a fixed number of
epochs (iterations on a dataset). The model is trained for 500 epochs with a batch size of 15.
The batch size is a hyperparameter that defines the number of samples to work through
before updating the internal model parameters. The number of epochs is a hyperparameter
that defines the number times that the learning algorithm will work through the entire training
dataset. One epoch means that each sample in the training dataset has had an opportunity to
update the internal model parameters. An epoch is comprised of one or more batches.
10
This model uses K-fold cross validation for training and testing purpose with split of 5. After
testing it with K-fold cross validation accuracy of the model comes out to be 99.55%.
11
SNAPSHOTS
12
Figure 4.4: - Accuracy After Fourth Fold
13
FUTURE ENHANCEMENTS
The ANN based model is developed which aimed to classify genes into respective dietary
restriction, i.e. – Calorie Restriction (CR) and Non-Calorie Restriction (Non-CR), and the
model successfully achieved the target. Cross validation test was performed on the model and
it gave 99.55% accuracy. It has lots of scope to improve performance, interface and accuracy.
There are some testing required which are to be performed which are kept in future
enhancement-
14
CONCLUSION
In conclusion, this project has reached a significant milestone in the world of health and
longevity by creating an advanced web tool. This tool, powered by sophisticated Artificial
Neural Networks (ANNs), serves a crucial purpose: it distinguishes between genes influenced
by calorie restriction and those unaffected, with an outstanding accuracy rate of 99.55%. The
significance of this achievement cannot be overstated. By exploring the complex relationship
between genetics and diet, this project offers valuable insights into how calorie restriction
impacts our biology. Moving forward, model will be refined and enhanced even further.
Through these endeavours, it aims to unlock new ways of understanding the molecular
mechanisms underlying calorie restriction, paving the way for transformative advancements
in healthcare and well-being.
15
REFRENCES
[1] https://www.geeksforgeeks.org/introduction-to-pandas-in-python/
[2] https://www.researchgate.net/publication/328158184_Performance_Analysis_of
_Google_Colaboratory_as_a_Tool_for_Accelerating_Deep_Learning_Applications
[3] https://keras.io/about/
[4] https://www.tutorialspoint.com/scikit_learn/index.htm
[5] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8729156/
[6] MacNee W., Is chronic obstructive pulmonary disease an accelerated aging disease?
(2016)
16