0% found this document useful (0 votes)
5 views

synopsis__final3

The document presents a project titled 'Lung Cancer Diagnosis and Prediction Using AIML' submitted by students for their B.Tech degree in Information Technology. It outlines the use of Artificial Intelligence and Machine Learning to predict lung cancer at an early stage, utilizing various algorithms and a comprehensive dataset. The project emphasizes the importance of early detection in improving patient outcomes and includes a detailed methodology, literature review, and hardware/software requirements.

Uploaded by

aryapankajbug
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

synopsis__final3

The document presents a project titled 'Lung Cancer Diagnosis and Prediction Using AIML' submitted by students for their B.Tech degree in Information Technology. It outlines the use of Artificial Intelligence and Machine Learning to predict lung cancer at an early stage, utilizing various algorithms and a comprehensive dataset. The project emphasizes the importance of early detection in improving patient outcomes and includes a detailed methodology, literature review, and hardware/software requirements.

Uploaded by

aryapankajbug
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LUNG CANCER DIAGNOSIS AND PREDICTION

USING AIML
Synopsis submitted in the partial fulfillment of the requirements
for the award of the degree of

BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
by

Submitted by
Suraj Yadav
28231275
Shweta Kumari
28231276
Neha Kumari
2822784
Abhishek Kumar Maurya
2822783

Under the Supervision of


Ms. Urvinder Kaur
(Assistant Professor)

Department of Information Technology

PANIPAT INSTITUTE OF ENGINEERING & TECHNOLOGY

SAMALKHA, PANIPAT 132102

February, 2025
CANDIDATE'S DECLARATION

We hereby declare that the work which is being presented here entitled, " Lung Cancer
Diagnosis and Prediction Using AIML” by Suraj Yadav, Shweta kumari, Neha
kumari, Abhishek Kumar Maurya in partial fulfillment of requirements for the award
of the degree of B.Tech. in Information Technology submitted in the Department of
Information Technology, PIET, Panipat affiliated to Kurukshetra University, is an
authentic record of our own work under the supervision of Urvinder Kaur. The matter
presented in this synopsis has not been submitted by us in full or in part to any other
University / Institute for the award of B.Tech degree.

Signature of the Students Date: 17.02.2025

Name of Students:
Suraj Yadav
28231275
Shweta Kumari
28231276
Neha kumari
2822784
Abhishek Kumar Maurya
2822783

This is to certify that the above statement made by the candidate is correct to the best
of our knowledge.

Faculty Name:
Ms. Urvinder Kaur
Assistant Professor
PIET, Panipat

Signature of HOD
(Dr. Neeraj Gupta)
Professor & Head, Department of Information Technology
Abstract

Lung cancer remains one of the leading causes of mortality worldwide, necessitating
early detection and intervention to improve patient outcomes. This project introduces
an advanced Artificial Intelligence and Machine Learning (AIML) framework designed
to predict lung cancer at an early stage. Utilizing a comprehensive dataset sourced from
various medical records and public databases, the model leverages algorithms such as
Random Forest, Support Vector Machines (SVM), and Neural Networks to analyze a
diverse set of features, including demographic information, medical history, and
environmental factors.
TABLE OF CONTENTS
S. Page
Title
No. No.

1 Introduction of the problem and objectives 1-2

2 Literature Survey 2-4

3 Methodology and Workflow 4-5

4 Hardware and Software Requirements 6

5 Project Title Map with Engineering POs and Program/Department 6


Specific Outcomes
6 Project Categorization 7

7 Project Map with SDG Goal 7

8 Conclusions 8

9 Specific Contributions of Team Members 9

10 Key References 10
1. INTRODUCTION

1.1 OVERVIEW

Machine Learning is the field of study that gives computers the capability to learn
without being explicitly programmed. ML is one of the most exciting technologies that
one would have ever come across. As it is evident from the name, it gives the computer
that makes it more similar to humans: The ability to learn. Machine learning is actively
being used today, perhaps in many more places than one would expect.
Machine learning, as a powerful approach to achieve Artificial Intelligence, has
been widely used in pattern recognition, a very basic skill for humans but a challenge
for machines. Nowadays, with the development of computer technology, pattern
recognition has become an essential and important technique in the field of Artificial
Intelligence. The pattern recognition can identify letters, images, voice or other objects
and also can identify status, extent or other abstractions.
Since the computer was invented, it has begun to affect our daily life. It
improves the quality of our lives; it makes our life more convenient and more efficient.
A fascinating idea is to let a computer think and learn as a human. Basically, machine
learning is to let a computer develop learning skills by itself with given knowledge.
Pattern recognition can be treated like computer being able to recognize different
species of objects. Therefore, machine learning has close connection with pattern
recognition.
Machine Learning is a scientific research of statistical procedures and methods
which they are used by computer systems designed to perform such functions without
specific instructions, rather than trusting in the models and conclusions. This is believed
to be part of an artificial intelligence. Machine Learning algorithms sets up a
mathematical model based on data examples called "training data" to make predictions
without the completion of a task being explicitly programmed.

1.2 PURPOSE OF THE MACHINE LEARNING

Machine learning is an application of artificial intelligence (AI) that provides systems


the ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it learn for themselves.
Nowadays, with the development of computer technology, pattern recognition
has become an essential and important technique in the field of Artificial Intelligence.
The pattern recognition can identify letters, images, voice or other objects and also can
identify status, extent or other abstractions.
The process of learning begins with observations or data, such as examples,
direct experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is to
allow the computers learn automatically without human intervention or assistance and
adjust actions accordingly.

1.3 PROBLEM STATEMENT

With the rapid increase in population rate, the rate of diseases like cancer, chikungunya,
cholera etc., are also increasing. Among all of them, cancer is becoming a common
cause of death. Cancer can start almost anywhere in the human body, which is made up
of trillions of cells. Normally, human cells grow and divide to form new cells as the
body needs them. When cells grow older or become damaged, they die, and new cells
take their place. When cancer cells develop, however, this orderly process breaks down.
As cells become more and more abnormal, old or damaged cells survive when they
should die, and new cells form when they are not needed. These extra cells can divide
without stopping and may form growths called tumor. This tumor starts spreading to
different of body. Tumors are of two types benign and malignant where benign
(noncancerous) is the mass of cell which lack in ability to spread to other part of the
body and malignant (cancerous) is the growth of cell which has ability to spread in other
part3 of body this spreading of infection is called metastasis. There is various type of
cancer like Lung cancer, leukemia, and colon cancer etc. The incidence of lung cancer
has significantly increased from the early 19th century. There is various cause of lung
cancer like smoking, exposure to radon gas, secondhand smoking, and exposure to
asbestos etc.

1.4 OBJECTIVE

1.Input Design is the process of converting a user-oriented description of the input


into a computer-based system. This design is important to avoid errors in the data input
process and show the correct direction to the management for getting correct
information from the computerized system.
2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free
from errors. The data entry screen is designed in such a way that all the data manipulates
can be performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered with the
help of screens. Appropriate messages are provided as when needed so that the user
will not be in maize of instant. Thus the objective of input design is to create an input
layout that is easy to follow

2. LITERATURE REVIEW

In the 21st century, cancer is still considered a serious disease as the mortality rates are high.
Among all cancer types, lung cancer ranks first regarding morbidity and mortality [1, 2].
There are two main categories of lung cancer: non-small-cell lung cancer (NSCLC) and
small cell lung cancer (SCLC). For non-small-cell lung cancer, a subcategorization into
lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) is further used.
These types of cancers account for approximately 85% of lung cancer cases [3]. Compared
with the diagnosis of benign and malignant, further fine-grained classification of lung
cancers such as LUSC, LUAD, and SCLC is of great significance for the prognosis of lung
cancer. Accurately determining the category of lung cancer in the early diagnosis directly
influences the effect of the treatment and thus the patients’ survival rate [1, 4]. Positron
emission tomography (PET) and computed tomography (CT) are both widely used
noninvasive diagnostic imaging techniques for clinical diagnosis in general and for the
diagnosis of lung cancer in particular [4]. Immunohistochemical evaluation is considered
the gold standard for lung cancer classification. However, this procedure requires a tissue
biopsy, an invasive procedure with the inherent risk of a delayed diagnosis and thus
exacerbation of the patient’s pain.
Advances in artificial intelligence research enabled numerous studies on the automatic
diagnosis of lung cancer. The use of data in lung cancer-type classification is roughly
divided into three categories: CT and PET image data as well as pathological images [5].
The well-known data science community Kaggle provides high-quality CT images for
participants with the task to distinguish malignant or benign nodules from pulmonary
nodules. Kaggle competitions repeatedly produce excellent deep learning approaches for
these tasks [6, 7]. With the progresses in the research of automatic lung cancer diagnosis,
studies are no longer limited to the classification of benign and malignant nodules and data
sets are no longer limited to CT images [8–12]. Wu et al. [9] use quantitative imaging
characteristics such as statistical, histogram-related, morphological, and textural features
from PET images to predict the distance metastasis of NSCLC, which shows that
quantitative features based on PET images can effectively characterize intratumor
heterogeneity and complexity. Two recent publications propose the application of deep
learning to pathological images to classify NSCLC and SCLC [10] and to classify
transcriptome subtypes of LUAD [11]. The complexity of the clinical diagnosis of lung
cancer is also characterized by the wide range of imaging modality, which is employed in
the diagnosis [13, 14]. 5The presentation of these attention mechanisms illustrates the
source of characteristic noise from different perspectives. There are few related studies on
how to use the attention mechanism more effectively on images with different imaging
modalities, so the deep learning model based on the multimodality dataset still has problems
in fine-grained problems.
Many works has already been proposed for prediction of cancer by various researchers
among then Palani et al., [5] has proposed IoT based predictive modeling by using fuzzy C
mean clustering for segmentation and incremental classification algorithm using
association rule mining and decision tree for classification for classifying the tumor sets
and based on the output generated by incremental classification model convolutional neural
network has been applied with other features for predicting benign or malignant.
Lynch et al., [6] Various machine learning algorithm are implemented for predicting the
survivability rate of person, performance is measured based on root mean square error.
Each model is trained using 10-fold cross validation, as the parameters are preprocessed by
assigning default value so cross 6
Previous research already proved that deep learning approaches can not only use the feature
distribution patterns from different pulmonary imaging modalities but even merging
different features to achieve the computer-aided diagnosis. Liang et al. [15] employ
multichannel techniques to predict the IDH genotype from PET/CT data using a
convolutional neural network (CNN), while other approaches use a parallel CNN
architecture to extract several features of different imaging modalities [16, 17].
Compared with the classification of the benign and malignant, the classification of the three
types of lung cancer from medical images are more suitable to constitute a fine-grained
image recognition problem as diverse distributions of features and potential pathological
features need to be considered. Because the fine-grained features which need to extract in
images, and meanwhile the lesion region is a small part of the whole image, the deep
learning framework is susceptible to feature noise. At present, most methods based on
various deep learning frameworks have proved to have certain bottleneck in fine-grained
problems. In order to solve this problem, the previous research mainly implements the
attention mechanism from the two dimensions (channel and spatial) of the feature
representation. The channel attention mechanism models the relationship between feature
channels [18], while the spatial attention mechanism ensures that noise is suppressed by
weighting feature representation spatially [19–21]. So far, spatial attention mechanism has
been used in medical image processing to enhance extracted features [20, 21]. The channel
attention mechanism has been used in the detection and classification of pulmonary

3. METHODOLOGY AND WORKFLOW

METHODOLOGY:

1. Data Collection: Collect lung cancer patient data, including medical images
(e.g., CT scans, X-rays), clinical features (e.g., age, sex, smoking history), and
genetic information (e.g., gene mutations).
2. Data Preprocessing: Clean, preprocess, and normalize the collected data to
prepare it for analysis.
3. Feature Extraction: Extract relevant features from the preprocessed data,
including image features (e.g., texture, shape) and clinical features (e.g., tumor size,
location).
4. Model Development: Develop and train AI/ML models using the extracted
features to diagnose and predict lung cancer.
5. Model Evaluation: Evaluate the performance of the developed models using
metrics such as accuracy, precision, recall, and F1-score.
6. Model Deployment: Deploy the best-performing model in a clinical setting to
support lung cancer diagnosis and prediction.

Workflow:

Phase 1: Data Collection and Preprocessing


1. Collect lung cancer patient data from various sources (e.g., hospitals, research
institutions).
2. Preprocess the collected data by cleaning, normalizing, and transforming it into
a suitable format for analysis.

Phase 2: Feature Extraction and Selection


1. Extract relevant features from the preprocessed data, including image features
and clinical features.
2. Select the most informative features using feature selection techniques (e.g.,
correlation analysis, recursive feature elimination).

Phase 3: Model Development and Evaluation


1. Develop and train AI/ML models using the selected features to diagnose and
predict lung cancer.
2. Evaluate the performance of the developed models using metrics such as
accuracy, precision, recall, and F1-score.

Phase 4: Model Deployment and Maintenance


1. Deploy the best-performing model in a clinical setting to support lung cancer
diagnosis and prediction.
2. Continuously monitor and update the deployed model to ensure its performance
and accuracy.

4. HARDWARE AND SOFTWARE REQUIREMENTS

Hardware Requirements:
• Processor: Intel Core i5 or higher
• RAM: 8GB or more
• Storage: 256GB SSD or more
• Graphics Card: NVIDIA GTX 1050 or equivalent
• Peripherals: Keyboard, Mouse, Monitor

Software Requirements:
• Operating System: Windows 10 or higher
• Development Environment: Visual Studio Code, Eclipse, or PyCharm
• Programming Languages: Python, Java
• Libraries/Frameworks: TensorFlow, Scikit-learn
• Other Tools: Git, Docker

5. Project Title Map with Engineering Program Outcomes and


Program/Department Specific Outcomes

The Lung Cancer Diagnosis and Prediction Using AIML project aligns with the
following engineering POs and program:
- PO1: Engineering Knowledge
- PO2: Problem Analysis
- PO3: Design/Development of Solutions
- PO4: Modern Tool Usage
- PO5: The Engineer and Society
- PO6: Environment and Sustainability
- PO7: Ethics
- PO8: Individual and Team Work
6. PROJECT CATEGORIZATION

Project Category : Healthcare/Medical


•Sub-Category: Medical Imaging Analysis
•Project Type: Research/Development
•Application Domain: Oncology (Cancer Research)
• Technologies Used: AI, ML, DL, Computer Vision, Medical Imaging
•Tools and Frameworks: Scikit-learn, pandas, NumPy, Python,
TensorFlow, Keras, PyTorch, OpenCV
• Methodologies: Convolutional Neural Networks (CNNs) 2. Transfer
Learning 3. Data Preprocessing 4. Feature Extraction 5. Model Training
and Evaluation
• Deliverables: A predictive model on lung cancer diagnosis
Applications of the software in analysis of medical images
Research paper/publication

7. PROJECT MAP WITH SDG GOALS

The Lung Cancer Diagnosis and Prediction Using AIML project aligns with the
following Sustainable Development Goal (SDG):

SDG 3: Good Health and Well-being

8. CONCLUSIONS

The "Lung Cancer Diagnosis and Prediction Using AIML" project leverages AI and
ML technologies to facilitate early detection and personalized treatment of lung cancer.
By integrating advanced predictive models into clinical practice, this project aims to
improve patient outcomes and survival rates, aligning with the goal of promoting good
health and well-being. The success of this project could revolutionize lung cancer
diagnosis and set a precedent for future innovations in healthcare.
9. SPECIFIC CONTRIBUTIONS OF TEAM MEMBERS

1. Shweta:

➢ Role: Data Scientist


➢ Contributions:Responsible for data collection and prepossessing.Handled data
cleaning, transformation, and normalization.Conducted exploratory data analysis
to identify key features for model development.

2. Suraj:

➢ Role: Machine Learning


➢ Contributions:Developed and trained the machine learning models.Implemented
various algorithms such as Random Forest, SVM, and Neural
Networks.Optimized hyper-parameters and evaluated model performance using
metrics like accuracy, precision, and recall.

3. Neha:

➢ Role: Software Developer


➢ Contributions:Designed and developed the IHMS (Integrated Health Monitoring
System) application.Integrated the trained ML models into the
application.Implemented user interface and real-time prediction features for
clinical use.

4. Abhishek:

➢ Role: Quality Assurance


➢ Contributions:Conducted thorough testing and evaluation of the IHMS
application.Performed system validation and verification to ensure accuracy and
reliability.Provided feedback and made necessary adjustments to enhance the
application's performance.

10. KEY REFERENCES

[1] Krishnaiah, V., G. Narsimha, and Dr N. Subhash Chandra. "Diagnosis of


lung cancer prediction system using data mining classification techniques."
International Journal of Computer Science and Information Technologies 4.1
(2013): 39-45.
[2] Zhang, Junjie, et al. "Pulmonary nodule detection in medical images: a
survey." Biomedical Signal Processing and Control 43 (2018): 138- 147.
[3] Fenwa, Olusayo D., Funmilola A. Ajala, and A. Adigun. "Classification of
cancer of the lungs using SVM and ANN." Int. J. Comput. Technol. 15.1 (2016):
6418-6426.
[4] Daoud, Maisa, and Michael Mayo. "A survey of neural network-based cancer
prediction models from microarray data." Artificial intelligence in medicine
(2019).
[5] Palani, D., and K. Venkatalakshmi. "An IoT based predictive modelling for
predicting lung cancer using fuzzy cluster based segmentation and
classification." Journal of medical systems 43.2 (2019): 21.
[6] Lynch, Chip M., et al. "Prediction of lung cancer patient survival via
supervised machine learning classification techniques." International journal of
medical informatics 108 (2017): 1-8.
[7] Öztürk, Şaban, and Bayram Akdemir. "Application of feature extraction and
classification methods for histopathological image using GLCM, LBP, LBGLCM,
GLRLM and SFTA." Procedia computer science 132 (2018): 40-46.
[8] Jin, Xin-Yu, Yu-Chen Zhang, and Qi-Liang Jin. "Pulmonary nodule detection
based on CT images using convolution neural network." 2016 9th International
symposium on computational intelligence and design (ISCID). Vol. 1. IEEE,
2016.
[9] Sumathipala, Yohan, et al. "Machine learning to predict lung nodule biopsy
method using CT image features: A pilot study." Computerized Medical Imaging
and Graphics 71 (2019): 1-8.
[10] A. Jemal, F. Bray, M. M. Center, J. Ferlay, E. Ward, and D. Forman, “Global
cancer statistics,” CA: A Cancer Journal for Clinicians, vol. 61, no. 2, pp. 69–90,
2013.
[11] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2018,” CA: A
Cancer Journal for Clinicians, vol. 68, no. 1, pp. 7–30, 2018.

You might also like