Project Publish1
Project Publish1
Learning
ANU C.S
Akash M J, Abhishek I Hanchinamani , Venkatesha H, Karthik S Badiger
Abstract:
Liver cirrhosis is the most common type of chronic liver disease in the globe. The
ability to forecast the onset of liver cirrhosis disease is critical for successful treatment
and the prevention of catastrophic health implications. As a result, we are going to design
a prediction model using machine learning techniques. The proposed model for the
prediction of liver cirrhosis disease uses Ensemble learning models (Naive Bayes
classifier, Classification and Regression Tree (CART), and Support Vector Machine
(SVM) with 10-fold cross-validation). Accuracy, precision, recall, and F1 Score were
used to evaluate the model’s performance. Ensemble learning techniques may provide a
more accurate prediction for liver cirrhosis disease. This approach can be used to help
doctors make better clinical decisions.
Keywords: Liver functions tests, Data preprocessing, Deep learning, Ensemble Model.
1. Introduction
Liver cirrhosis is an important type of liver damage. It usually occurs as a result of long
term damage of liver caused by many forms of liver diseases and circumstances, such as
hepatitis and chronic alcoholism or through genetics. Each time the liver is injured it tries
to repair itself fibrous scar tissue can be deposited in place of the missing cells which
forms the cirrhosis. As cirrhosis progresses, more and more scar tissue forms, hence
making it difficult for the liver to function. Advanced cirrhosis is life threatening. The
liver damage done by cirrhosis generally can't be undone. But if liver cirrhosis is
diagnosed early and the cause is treated, further damage can be limited and, rarely,
reversed. In addition to fibrosis, the complications of cirrhosis include portal
hypertension, ascites, hepatorenal syndrome and hepatic encephalopathy.
A poor correlation exists between histologic findings of cirrhosis and the clinical
picture. Some patients with cirrhosis are completely asymptomatic and have a
reasonably normal life expectancy while some individuals have severe symptoms of end-
stage liver disease and limited chance for survival. Common signs and symptoms may
arise from decreased hepatic synthetic function (coagulopathy), decreased detoxification
capabilities of the liver (hepatic encephalopathy) or portal hypertension (variceal
bleeding) (Wolf & Katz, 2013). ICT has been globally credited for changing the course
of history and adding value to human lives in various ways. Of all the technologies that
add value and enhance human life, the introduction of telemedicine which perhaps go
down in history as the most defining and has the potential to impact positively on humans,
especially those living in the rural areas (Ezeorah, Ayatalumo & IbeEnwo, 2009).
Causes of Liver Cirrhosis:
• Chronic Alcoholism: Excessive and prolonged alcohol consumption is a common
cause of liver cirrhosis. Alcohol-related liver disease occurs when the liver is unable
to metabolize alcohol effectively.
• Viral Hepatitis: Chronic infections with hepatitis B or C viruses can lead to
inflammation and damage to the liver over time, contributing to cirrhosis.
• Non-Alcoholic Fatty Liver Disease (NAFLD): Accumulation of fat in the liver,
often associated with obesity and metabolic syndrome, can progress to non-alcoholic
steatohepatitis (NASH) and eventually lead to cirrhosis.
• Autoimmune Hepatitis: The immune system mistakenly attacks healthy liver cells,
causing inflammation and, in some cases, cirrhosis.
• Genetic Disorders: Inherited conditions such as hemochromatosis, Wilson's disease,
and cystic fibrosis can lead to the accumulation of toxins in the liver, contributing to
cirrhosis.
• Biliary Atresia: This is a rare condition where the bile ducts inside or outside the
liver are absent or damaged, leading to bile accumulation and liver damage,
especially in infants.
• Primary Biliary Cirrhosis (PBC) and Primary Sclerosing Cholangitis (PSC):
These are autoimmune conditions affecting the bile ducts, leading to inflammation,
scarring, and eventually cirrhosis.
The prognosis of liver cirrhosis varies widely depending on various factors, including the
underlying etiology, degree of liver dysfunction, presence of complications, and response
to treatment. Decompensated cirrhosis, characterized by the development of ascites,
variceal bleeding, hepatic encephalopathy, or hepatorenal syndrome, is associated with
significantly worse outcomes and higher mortality rates compared to compensated
cirrhosis.
2. Research questions
A research review work has been done in the area of Machine Learning (ML) and crop
yield prediction. To achieve this, many studies from different viewpoints has been carried
out. In this study, following five research questions are framed (Qs).
Q1: What is the objective of liver disease prediction using machine learning?
Q2: Which comparative analysis of machine learning algorithms to predict liver disease?
Q3: What is the algorithm for predicting liver disease?
Q4: Which model is best for prediction in machine learning?
Q5: What is the future scope of disease prediction system?
3. Literature review
C. Geetha et al. has proposed a work on “Evaluation based Approaches for Liver Disease
Prediction using Machine Learning Algorithms” in 2021. In this Study, methods used are
Support Vector Machine, Decision Tree. While its Accuracy is 70%. This work focused
on algorithms for classification of healthy people from liver datasets. Centre on their
success variables, this research also aims to compare the classification algorithms and to
provide prediction accuracy.
Jianxia Wen et al. demonstrated a work on “Research Progress and Treatment Status of
Liver Cirrhosis with Hypoproteinemia” in 2022. In this paper, Support Vector Machine
is used. While its Accuracy is 55%. This study comprehensively analyzed the common
complications, pathogenic mechanisms, and treatment status of cirrhosis caused by
hypoproteinemia and proposed research prospects for dealing with this increasingly
serious problem .
Manjula Devarakonda Venkata Sumalatha Lingamgunta et al. proposed a work on
“Health Care Automation” in 2022. The ML-based system for the early prediction of liver
disease based on the Indian dataset has been successfully developed using the RF
algorithms. The performance analysis of the technique is presented in terms of metrics
for evaluation. The performance of predicting the positive cases from the dataset is
approximately 95% which is evident from the recall metrics. Further, the precision
reported to be approximately 74% emphasizing that the performance of predicting the
positive class which are actually positive.
Md. Fazle Rabbi, et al. demonstrated a work on “Prediction of Liver Disorder Using
Machine Learning Algorithm” in 2020. In this research, Logistic regression, Decision
Tree, Random Forest. ML algorithms such as Logistic Regression (LR), Decision Tree
(DT), Random Forest (RF), and Extra Trees (ET) for classifying Indian Liver Patient
Dataset (ILPD). Pearson Correlation Coefficient based feature selection (PCC-FS) is
applied to eliminate irrelevant features from the dataset. Also, a boosting algorithm
(AdaBoost) is utilized to enhance the predictive performance of those algorithms. The
comparative analysis is evaluated in terms of accuracy, ROC, F-1 score, precision, and
recall. After comparing experimental results, we have found that boosting on ET provides
the highest accuracy of 92.19%.
Hartatik et al. demonstrated a work on “Prediction for Diagnosing Liver Disease in
Patients using KNN and Naive Bayes Algorithms” in 2020. Based on the results of testing
the Naive Bayes and KNN algorithms to solve predictive problems for patients with liver
disease or not using the python application. Data were taken from the UCI Machine
Learning Repository, namely the Indian Liver Patient Dataset (ILPD). The results show
that of the two algorithms, the Naive Bayes algorithm provides a better value than the
KNN by using six variables in the prediction model, which gives an increase in accuracy
compared to the results of previous studies[5].
Maria Alex Kuzhippallil et al. proposed work on “Comparative Analysis of Machine
Learning Techniques for Indian Liver Disease Patients” in 2020. In this work, liver
disease prediction has been studied and analyzed. The data is cleaned by performing
various techniques such as imputation of missing values with median, label encoding to
convert categorical into numerical data for easy analysis, duplicate value elimination and
outliers are eliminated using Isolation forest in order to improve the performance.
Genetic algorithm combined with XGBoost is used to fetch the best attributes required
for prediction of liver disease.
A. Sivasangari et al. proposed a work on “Diagnosis of Liver Disease using Machine
Learning Models” in 2020. In this paper, the different machine learning algorithms is
evaluated for the prediction of liver disease. Due to the subtle nature of its symptoms,
liver disease is particularly difficult to diagnose. Liver disease prediction followed the
step of preparing data in that data was collected from the public database preprocessing
of data for -1 value replacement. Data division into the entire array of data split into
training and research. Eventually, quantitative measurement metrics such as precision,
accuracy and recall are measured over various machine learning models[8].
Sateesh Ambesange et al. demonstrated a work on “Liver Diseases Prediction using
KNN with Hyper Parameter Tuning Techniques” in 2020. In this work they developed
the K-Nearest Neighbor model to diagnose and predict liver disease. The data is
transformed and further dimensionality reduction is performed to reduce the features to
improve the model performance. The performance of classification and prediction
techniques are evaluated on different performance measures some of them are precision
, accuracy, recall and score of F-1.
Golmei Shaheamlung et al. proposed a work on “A Survey on machine learning
techniques for the diagnosis of liver disease” in 2020. This Project gives us the basic
idea of past published paper of detection and diagnosis of liver disease based on different
machine learning algorithm. With this survey and study it has clearly find and observed
that some machine learning algorithm such as Decision tree, J48 and ANN provide better
accuracy on detection and prediction of liver disease. And different algorithm has
different performance based on different scenario but most importantly the dataset and
feature selection is also very important to get better prediction results.
4. Publication Details
The Below graph displays the total number of publications released annually over
the previous seven years.
Distribution of
Publications per Year
12 11
10
8 7
6
4
4 3 3
2
2 1
0
2023202220212020201920182017
Publication Year
Q3: Algorithms:
The choice of algorithm for predicting liver disease depends on various factors, including
the nature of the data, the size of the dataset, the desired level of interpretability, and the
specific requirements of the application. Here are several machine learning algorithms
commonly used for predicting liver disease:
Logistic Regression: Logistic regression is a simple and interpretable algorithm used for
binary classification tasks. It models the probability of the presence of liver disease based
on input features. Easy to interpret, computationally efficient, suitable for small to
moderately sized datasets.
Decision Trees: Decision trees recursively partition the feature space into regions based
on feature values, aiming to maximize information gain or purity. Each leaf node
represents a class label (presence or absence of liver disease). Interpretable, handles
nonlinear relationships well, robust to irrelevant features.
Random Forests: Random forests are an ensemble learning method that builds multiple
decision trees and combines their predictions through voting or averaging. They reduce
overfitting and improve prediction accuracy compared to individual decision trees.
Robust to overfitting, handles high-dimensional data well, provides feature importance.
Support Vector Machines (SVM): aim to find the hyperplane that best separates the
classes in the feature space by maximizing the margin between them. They can handle
nonlinear relationships through kernel functions. Effective in high-dimensional spaces,
versatile due to kernel functions, robust to overfitting.
Gradient Boosting Machines (GBM): GBM sequentially builds an ensemble of weak
learners (e.g., decision trees) by focusing on the errors made by the previous models. It
combines their predictions to produce a strong learner. predictive accuracy, robust to
overfitting, handles heterogeneous features well.
Neural Networks: Neural networks consist of interconnected layers of neurons that learn
complex patterns from the data through forward and backward propagation. They can
capture intricate relationships in the data but require large amounts of data and
computational resources. Can capture complex patterns, state-of-the-art performance in
many tasks.
The choice of algorithm depends on factors such as the size and complexity of the dataset,
the desired level of interpretability, computational resources available, and the specific
goals of the liver disease prediction task. Experimentation with multiple algorithms and
careful evaluation of their performance can help identify the most suitable algorithm(s)
for a given application.
Determining the "best" model for prediction in machine learning depends on several
factors, including the nature of the data, the specific task or problem being addressed, the
available computational resources, and the desired balance between accuracy,
interpretability, and computational efficiency. There is no one-size-fits-all answer, as
different models excel in different scenarios. However, here are some guidelines for
selecting a model:
Start with Simplicity: Simple models such as logistic regression or decision trees are
often a good starting point, especially for small to moderately sized datasets or when
interpretability is important. These models are easy to understand, fast to train, and can
provide insights into the underlying relationships in the data.
Evaluate Multiple Models: It's essential to evaluate multiple models and compare their
performance using appropriate evaluation metrics on validation or test data. Consider
factors such as accuracy, precision, recall, F1-score, area under the ROC curve (AUC-
ROC), and computational efficiency when comparing models.
The feature scope for predicting liver disease encompasses a wide range of factors
that may influence the development, progression, and severity of liver conditions. These
features can be broadly categorized into demographic, clinical, laboratory, imaging,
genetic, and lifestyle-related factors. Here's a breakdown of each feature scope:
Demographic Features:
Age: Age of the individual, as liver disease prevalence often increases with age.
Gender: Some liver diseases may have different prevalence rates or risk factors between
males and females.
Ethnicity: Certain ethnic groups may have higher susceptibility to specific liver diseases.
Clinical Features:
Medical History: Previous liver diseases, hepatitis infections, autoimmune disorders, or
other comorbidities.
Symptoms: Presence and severity of symptoms such as fatigue, jaundice, abdominal
pain, and ascites.
Alcohol Consumption: Amount and frequency of alcohol consumption, as excessive
alcohol intake is a major risk factor for liver disease.
Smoking History: Smoking can contribute to the progression of liver diseases and
influence treatment outcomes.
Laboratory Features:
Liver Function Tests: Levels of enzymes such as ALT, AST, ALP, bilirubin, and albumin,
which indicate liver function and damage.
Complete Blood Count (CBC): Hemoglobin, platelet count, and white blood cell count,
which may be altered in liver disease.
Coagulation Profile: Prothrombin time (PT) and international normalized ratio (INR),
which reflect liver synthetic function and clotting ability.
Markers of Liver Injury: Serum levels of markers such as alpha-fetoprotein (AFP),
gamma-glutamyl transferase (GGT), and alkaline phosphatase (ALP).
Imaging Features:
Ultrasonography: Ultrasound findings such as liver size, surface nodularity, echogenicity,
and the presence of focal lesions.
Computed Tomography (CT) or Magnetic Resonance Imaging (MRI): Imaging
characteristics of liver parenchyma, vasculature, and lesions indicative of cirrhosis,
fibrosis, or tumors.
Genetic Polymorphisms: Variations in genes associated with liver diseases, such as
PNPLA3, MTHFR, HFE, and SERPINA1.
Family History: Presence of liver diseases or related conditions in first-degree relatives,
indicating genetic predisposition.
Lifestyle Factors:
Diet: Consumption of fatty or processed foods, which may contribute to non-alcoholic
fatty liver disease (NAFLD).
Physical Activity: Level of physical activity and exercise habits, which can impact
metabolic health and liver function.
Medication and Substance Use: Use of hepatotoxic medications, illicit drugs, or herbal
supplements that may affect liver health.
Conclusion
Predicting liver cirrhosis using machine learning holds great promise for improving early
detection, prognosis, and patient outcomes. By leveraging diverse sources of patient data,
including demographics, clinical history, laboratory tests, imaging, and genetic
information, machine learning models can learn complex patterns and relationships
associated with the development and progression of liver cirrhosis. However, several
challenges must be addressed to realize the full potential of machine learning in this
domain. These challenges include handling imbalanced data, selecting relevant features,
ensuring data quality, interpreting model decisions, generalizing to new patient
populations, integrating models into clinical practice, and addressing ethical and
regulatory considerations.
REFERENCES
[1.] A. Al-Aiad, S. Abualrub, Y. Alnsour, and M. Alsharo and was titled "Data Mining
Algorithms Predicting Different Types of Cancer: Integrative Literature Review." It
debuted during the AMCIS 2020 TREOs. You may find the document at
https://aisel.aisnet.org/treos_amcis2020/59.
[2.] R. D. Canlas Jr. finished an unpublished master's thesis titled "DATA MINING IN
HEALTHCARE: CURRENT APPLICATIONS AND ISSUES" in August 2009. The ten-
page thesis focuses on the use of data mining in healthcare.
[3.] Ibrahim and A. Abdulazeez wrote a paper in the Journal of Applied Science and
Technology Trends titled "The Role of Machine Learning Algorithms in Disease
Diagnosis." The essay appears on pages 10 through 19 of volume 2, issue 1. It was
published in 2021 and has the following DOI: 10.38094/jastt20179.
[4.] "Hepatitis C Virus Vaccine: Challenges and Prospects," co-authored by J. D. Duncan,
R. A. Urbanowicz, A. W. Tarr, and J. K. Ball, was published in the journal "Vaccines."
The paper goes from page 1 through page 23 of volume 8, issue 1. It was published in
2020 and has the DOI: 10.3390/vaccines8010090.
[5.] L. Syafa'ah, Z. Zulfatman, I. Pakaya, and M. Lestandy did study titled "Comparison
of Machine Learning Classification Methods in Hepatitis C Virus," which was
published in 2021 in the Journal of Online Information, volume 6, issue 1, page 73. The
corresponding DOI is 10.15575/join.v6i1.719.
[6.] Günaydin, M. Günay, and engel co-authored a paper titled "Comparison of Lung
Cancer Detection Algorithms," which was presented at the 2019 Scientific Meeting on
Electrical, Biomedical Engineering, and Computer Science. The DOI for the
publication is 10.1109/EBBT.2019.8741826, and it is tied to the EBBT 2019 event.
[7.] G. S. Rao, G. V. Kumari, and B. P. Rao contributed to "Network for Biomedical
Applications," which was published by Springer Singapore in Volume 2, Issue 1 in
January 2019. The DOI for this article is 10.1007/978-981-13-1595-4.
[8.] "Enhanced Prognosis of Hepatocellular Carcinoma Fatality Using Ensemble
Learning Approach," by M. Sharma and N. Kumar, was published in the Journal of
Ambient Intelligence and Humanised Computing. The paper was published in Volume
13, Issue 12, pages 5763-5777 in 2022. The associated DOI is 10.1007/s12652-021-
03256-z.
[9.] Mr. Sagar Patel of D. P. P. and Dr. Chintan Shah of Dr. Chintan Shah did study on
"Diagnosis of Liver Diseases and Prediction of Liver Disease Stage Using Hybrid
Machine Learning Classifiers." This research appears in Volume 38, Issue 3, pages
945-954, published in 2023. 10.5281/zenodo.7923033 is the DOI.
[10.] M. Banu Priya, P. Laura Juliet, and P. R. Tamilselvi conducted research on the
"Evaluation of Liver Disease Prediction Using Machine Learning Algorithms." The
study was published in the International Research Journal of Engineering and
Technology in 2018, Volume 5, Issue 1, pages 206-211. The story may be found at
www.irjet.net.
[11.] "Machine Learning Algorithms for Predicting Fatty Liver Disease," by C. C. Wu et
al., was published in the journal "Computational Methods and Programmes in
Biomedicine." The article was published in 2019 and can be found on pages 23-29 of
Volume 170. The corresponding DOI is 10.1016/j.cmpb.2018.12.032.
[12.] C. Geetha and A. R. Arunachalam presented their study "Approaches for Evaluating
Liver Disease Prediction Using Machine Learning Algorithms" at the 2021
International Conference on Computer Communications and Informatics (ICCCI
2021). In 2021, the work was published in the conference proceedings on pages 55-58.
The DOI for this work is 10.1109/ICCCI50826.2021.9402463.
[13.] Alkhateeb, A., Ghazal, S., & Khater, A. (2022). Machine learning algorithms for the
prediction of cirrhosis in chronic hepatitis C patients. Digestive and Liver Disease.
[14.] Cheng, H., Li, H., & Wu, Z. (2021). A machine learning-based model for predicting
the risk of decompensated cirrhosis in patients with hepatitis B virus infection. Journal
of Viral Hepatitis.
[15.] Elbattay, A., Naman, F., & Mohammed, M. (2020). A machine learning approach to
predict liver fibrosis in patients with non-alcoholic fatty liver disease. Liver
International.
[16.] Gomaa, A., Hashim, M., & Ata, H. (2023). Machine learning models for predicting
the development of liver cirrhosis in patients with chronic hepatitis C. Saudi Journal of
Gastroenterology.
[17.] Hassan, M., Saeed, A., & Abd El-Maksoud, A. (2021). Prediction of liver cirrhosis
using machine learning algorithms based on clinical and laboratory data. The Egyptian
Journal of Internal Medicine.
[18.] Ibrahim, A., Elkholy, R., & Elshehawy, A. (2020). A machine learning-based approach
for predicting liver fibrosis progression in patients with chronic hepatitis B. Journal of
Medical Virology.
[19.] Lee, H., Hwang, I., & Lee, S. (2019). Predicting liver cirrhosis in patients with chronic
hepatitis B using machine learning algorithms. BMC Medical Informatics and Decision
Making.
[20.] Mahmud, N., Abe, N., & Suzuki, K. (2022). Machine learning-based prediction of liver
cirrhosis in patients with chronic hepatitis C using electronic medical records.
Scientific Reports, 12(1), 3174.
[21.] Mostafa, A., Abd El-Razek, R., & Abd El-Maksoud, A. (2021). Machine learning
models for predicting the development of liver cirrhosis in patients with non-alcoholic
fatty liver disease. European Journal of Gastroenterology & Hepatology, 33(3), 376-
383.
[22.] Nour, M., Mohamed, A., & Mahmoud, A. (2020). A machine learning-based model for
predicting liver fibrosis in patients with chronic hepatitis B virus infection. Journal of
Medical Imaging and Health Informatics, 10(12), 2982-2989.
[23.] Sato, K., Kobayashi, T., & Yamaguchi, H. (2022). A machine learning approach for
predicting the progression of liver fibrosis in patients with chronic hepatitis B. Journal
of Gastroenterology, 57(1), 51-60.
[24.] Wang, X., Zhou, S., & Zhou, W. (2021). Predicting the risk of decompensated cirrhosis
in patients with hepatitis B using machine learning models. World Journal of
Hepatology, 13(8), 1115-1124.
[25.] Yang, X., Xie, S., & Wang, Y. (2020). A machine learning-based approach for
predicting the development of liver cirrhosis in patients with non-alcoholic fatty liver
disease. Journal of Clinical and Experimental Hepatology, 10(6), 534-541.
[26.] Zhang, J., Feng, X., & Wang, L. (2023). Development and validation of a machine
learning-based model for predicting the risk of hepatocellular carcinoma in patients
with liver cirrhosis. Digestive Diseases and Sciences.