synopsis__final3
synopsis__final3
USING AIML
Synopsis submitted in the partial fulfillment of the requirements
for the award of the degree of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
by
Submitted by
Suraj Yadav
28231275
Shweta Kumari
28231276
Neha Kumari
2822784
Abhishek Kumar Maurya
2822783
February, 2025
CANDIDATE'S DECLARATION
We hereby declare that the work which is being presented here entitled, " Lung Cancer
Diagnosis and Prediction Using AIML” by Suraj Yadav, Shweta kumari, Neha
kumari, Abhishek Kumar Maurya in partial fulfillment of requirements for the award
of the degree of B.Tech. in Information Technology submitted in the Department of
Information Technology, PIET, Panipat affiliated to Kurukshetra University, is an
authentic record of our own work under the supervision of Urvinder Kaur. The matter
presented in this synopsis has not been submitted by us in full or in part to any other
University / Institute for the award of B.Tech degree.
Name of Students:
Suraj Yadav
28231275
Shweta Kumari
28231276
Neha kumari
2822784
Abhishek Kumar Maurya
2822783
This is to certify that the above statement made by the candidate is correct to the best
of our knowledge.
Faculty Name:
Ms. Urvinder Kaur
Assistant Professor
PIET, Panipat
Signature of HOD
(Dr. Neeraj Gupta)
Professor & Head, Department of Information Technology
Abstract
Lung cancer remains one of the leading causes of mortality worldwide, necessitating
early detection and intervention to improve patient outcomes. This project introduces
an advanced Artificial Intelligence and Machine Learning (AIML) framework designed
to predict lung cancer at an early stage. Utilizing a comprehensive dataset sourced from
various medical records and public databases, the model leverages algorithms such as
Random Forest, Support Vector Machines (SVM), and Neural Networks to analyze a
diverse set of features, including demographic information, medical history, and
environmental factors.
TABLE OF CONTENTS
S. Page
Title
No. No.
8 Conclusions 8
10 Key References 10
1. INTRODUCTION
1.1 OVERVIEW
Machine Learning is the field of study that gives computers the capability to learn
without being explicitly programmed. ML is one of the most exciting technologies that
one would have ever come across. As it is evident from the name, it gives the computer
that makes it more similar to humans: The ability to learn. Machine learning is actively
being used today, perhaps in many more places than one would expect.
Machine learning, as a powerful approach to achieve Artificial Intelligence, has
been widely used in pattern recognition, a very basic skill for humans but a challenge
for machines. Nowadays, with the development of computer technology, pattern
recognition has become an essential and important technique in the field of Artificial
Intelligence. The pattern recognition can identify letters, images, voice or other objects
and also can identify status, extent or other abstractions.
Since the computer was invented, it has begun to affect our daily life. It
improves the quality of our lives; it makes our life more convenient and more efficient.
A fascinating idea is to let a computer think and learn as a human. Basically, machine
learning is to let a computer develop learning skills by itself with given knowledge.
Pattern recognition can be treated like computer being able to recognize different
species of objects. Therefore, machine learning has close connection with pattern
recognition.
Machine Learning is a scientific research of statistical procedures and methods
which they are used by computer systems designed to perform such functions without
specific instructions, rather than trusting in the models and conclusions. This is believed
to be part of an artificial intelligence. Machine Learning algorithms sets up a
mathematical model based on data examples called "training data" to make predictions
without the completion of a task being explicitly programmed.
With the rapid increase in population rate, the rate of diseases like cancer, chikungunya,
cholera etc., are also increasing. Among all of them, cancer is becoming a common
cause of death. Cancer can start almost anywhere in the human body, which is made up
of trillions of cells. Normally, human cells grow and divide to form new cells as the
body needs them. When cells grow older or become damaged, they die, and new cells
take their place. When cancer cells develop, however, this orderly process breaks down.
As cells become more and more abnormal, old or damaged cells survive when they
should die, and new cells form when they are not needed. These extra cells can divide
without stopping and may form growths called tumor. This tumor starts spreading to
different of body. Tumors are of two types benign and malignant where benign
(noncancerous) is the mass of cell which lack in ability to spread to other part of the
body and malignant (cancerous) is the growth of cell which has ability to spread in other
part3 of body this spreading of infection is called metastasis. There is various type of
cancer like Lung cancer, leukemia, and colon cancer etc. The incidence of lung cancer
has significantly increased from the early 19th century. There is various cause of lung
cancer like smoking, exposure to radon gas, secondhand smoking, and exposure to
asbestos etc.
1.4 OBJECTIVE
2. LITERATURE REVIEW
In the 21st century, cancer is still considered a serious disease as the mortality rates are high.
Among all cancer types, lung cancer ranks first regarding morbidity and mortality [1, 2].
There are two main categories of lung cancer: non-small-cell lung cancer (NSCLC) and
small cell lung cancer (SCLC). For non-small-cell lung cancer, a subcategorization into
lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) is further used.
These types of cancers account for approximately 85% of lung cancer cases [3]. Compared
with the diagnosis of benign and malignant, further fine-grained classification of lung
cancers such as LUSC, LUAD, and SCLC is of great significance for the prognosis of lung
cancer. Accurately determining the category of lung cancer in the early diagnosis directly
influences the effect of the treatment and thus the patients’ survival rate [1, 4]. Positron
emission tomography (PET) and computed tomography (CT) are both widely used
noninvasive diagnostic imaging techniques for clinical diagnosis in general and for the
diagnosis of lung cancer in particular [4]. Immunohistochemical evaluation is considered
the gold standard for lung cancer classification. However, this procedure requires a tissue
biopsy, an invasive procedure with the inherent risk of a delayed diagnosis and thus
exacerbation of the patient’s pain.
Advances in artificial intelligence research enabled numerous studies on the automatic
diagnosis of lung cancer. The use of data in lung cancer-type classification is roughly
divided into three categories: CT and PET image data as well as pathological images [5].
The well-known data science community Kaggle provides high-quality CT images for
participants with the task to distinguish malignant or benign nodules from pulmonary
nodules. Kaggle competitions repeatedly produce excellent deep learning approaches for
these tasks [6, 7]. With the progresses in the research of automatic lung cancer diagnosis,
studies are no longer limited to the classification of benign and malignant nodules and data
sets are no longer limited to CT images [8–12]. Wu et al. [9] use quantitative imaging
characteristics such as statistical, histogram-related, morphological, and textural features
from PET images to predict the distance metastasis of NSCLC, which shows that
quantitative features based on PET images can effectively characterize intratumor
heterogeneity and complexity. Two recent publications propose the application of deep
learning to pathological images to classify NSCLC and SCLC [10] and to classify
transcriptome subtypes of LUAD [11]. The complexity of the clinical diagnosis of lung
cancer is also characterized by the wide range of imaging modality, which is employed in
the diagnosis [13, 14]. 5The presentation of these attention mechanisms illustrates the
source of characteristic noise from different perspectives. There are few related studies on
how to use the attention mechanism more effectively on images with different imaging
modalities, so the deep learning model based on the multimodality dataset still has problems
in fine-grained problems.
Many works has already been proposed for prediction of cancer by various researchers
among then Palani et al., [5] has proposed IoT based predictive modeling by using fuzzy C
mean clustering for segmentation and incremental classification algorithm using
association rule mining and decision tree for classification for classifying the tumor sets
and based on the output generated by incremental classification model convolutional neural
network has been applied with other features for predicting benign or malignant.
Lynch et al., [6] Various machine learning algorithm are implemented for predicting the
survivability rate of person, performance is measured based on root mean square error.
Each model is trained using 10-fold cross validation, as the parameters are preprocessed by
assigning default value so cross 6
Previous research already proved that deep learning approaches can not only use the feature
distribution patterns from different pulmonary imaging modalities but even merging
different features to achieve the computer-aided diagnosis. Liang et al. [15] employ
multichannel techniques to predict the IDH genotype from PET/CT data using a
convolutional neural network (CNN), while other approaches use a parallel CNN
architecture to extract several features of different imaging modalities [16, 17].
Compared with the classification of the benign and malignant, the classification of the three
types of lung cancer from medical images are more suitable to constitute a fine-grained
image recognition problem as diverse distributions of features and potential pathological
features need to be considered. Because the fine-grained features which need to extract in
images, and meanwhile the lesion region is a small part of the whole image, the deep
learning framework is susceptible to feature noise. At present, most methods based on
various deep learning frameworks have proved to have certain bottleneck in fine-grained
problems. In order to solve this problem, the previous research mainly implements the
attention mechanism from the two dimensions (channel and spatial) of the feature
representation. The channel attention mechanism models the relationship between feature
channels [18], while the spatial attention mechanism ensures that noise is suppressed by
weighting feature representation spatially [19–21]. So far, spatial attention mechanism has
been used in medical image processing to enhance extracted features [20, 21]. The channel
attention mechanism has been used in the detection and classification of pulmonary
METHODOLOGY:
1. Data Collection: Collect lung cancer patient data, including medical images
(e.g., CT scans, X-rays), clinical features (e.g., age, sex, smoking history), and
genetic information (e.g., gene mutations).
2. Data Preprocessing: Clean, preprocess, and normalize the collected data to
prepare it for analysis.
3. Feature Extraction: Extract relevant features from the preprocessed data,
including image features (e.g., texture, shape) and clinical features (e.g., tumor size,
location).
4. Model Development: Develop and train AI/ML models using the extracted
features to diagnose and predict lung cancer.
5. Model Evaluation: Evaluate the performance of the developed models using
metrics such as accuracy, precision, recall, and F1-score.
6. Model Deployment: Deploy the best-performing model in a clinical setting to
support lung cancer diagnosis and prediction.
Workflow:
Hardware Requirements:
• Processor: Intel Core i5 or higher
• RAM: 8GB or more
• Storage: 256GB SSD or more
• Graphics Card: NVIDIA GTX 1050 or equivalent
• Peripherals: Keyboard, Mouse, Monitor
Software Requirements:
• Operating System: Windows 10 or higher
• Development Environment: Visual Studio Code, Eclipse, or PyCharm
• Programming Languages: Python, Java
• Libraries/Frameworks: TensorFlow, Scikit-learn
• Other Tools: Git, Docker
The Lung Cancer Diagnosis and Prediction Using AIML project aligns with the
following engineering POs and program:
- PO1: Engineering Knowledge
- PO2: Problem Analysis
- PO3: Design/Development of Solutions
- PO4: Modern Tool Usage
- PO5: The Engineer and Society
- PO6: Environment and Sustainability
- PO7: Ethics
- PO8: Individual and Team Work
6. PROJECT CATEGORIZATION
The Lung Cancer Diagnosis and Prediction Using AIML project aligns with the
following Sustainable Development Goal (SDG):
8. CONCLUSIONS
The "Lung Cancer Diagnosis and Prediction Using AIML" project leverages AI and
ML technologies to facilitate early detection and personalized treatment of lung cancer.
By integrating advanced predictive models into clinical practice, this project aims to
improve patient outcomes and survival rates, aligning with the goal of promoting good
health and well-being. The success of this project could revolutionize lung cancer
diagnosis and set a precedent for future innovations in healthcare.
9. SPECIFIC CONTRIBUTIONS OF TEAM MEMBERS
1. Shweta:
2. Suraj:
3. Neha:
4. Abhishek: