0% found this document useful (0 votes)
3 views

5

The document presents research on effective machine learning techniques for detecting fatty liver disease (FLD), highlighting the use of models such as Naive Bayes, Random Forest, and eXtreme Gradient Boosting. The study analyzes a dataset of 70,000 cases to evaluate the accuracy and reliability of these models in early diagnosis, emphasizing the importance of machine learning in improving healthcare outcomes. The findings suggest that hybrid models combining different algorithms yield the best predictive performance for FLD detection.

Uploaded by

vgokul948
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

5

The document presents research on effective machine learning techniques for detecting fatty liver disease (FLD), highlighting the use of models such as Naive Bayes, Random Forest, and eXtreme Gradient Boosting. The study analyzes a dataset of 70,000 cases to evaluate the accuracy and reliability of these models in early diagnosis, emphasizing the importance of machine learning in improving healthcare outcomes. The findings suggest that hybrid models combining different algorithms yield the best predictive performance for FLD detection.

Uploaded by

vgokul948
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023).

IEEE Xplore Part Number: CFP23AV8-ART; ISBN: 979-8-3503-4664-0

Effective Machine Learning Techniques to Detect


Fatty Liver Disease
2023 Second International Conference on Electronics and Renewable Systems (ICEARS) | 979-8-3503-4664-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICEARS56392.2023.10085622

1st N.V. Naik 2nd Dudekula Nasreen 3rd Somu Chowdeswar Reddy
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Lakireddy Bali Reddy College of Lakireddy Bali Reddy College of Lakireddy Bali Reddy College of
Engineering (Autonomous) Engineering (Autonomous) Engineering (Autonomous)
M ylavaram, India M ylavaram, India M ylavaram, India
[email protected] [email protected] [email protected]

4th Pajjuru Lasya Sri


Department of Computer Science and
Engineering
Lakireddy Bali Reddy College of
Engineering (Autonomous)
M ylavaram, India
[email protected]

Abstract—Heart disease, lung disease, respiratory disease,


etc. are currently the top killers. The majority of liver
problems are difficult to detect early on. One of these is fatty
liver disease, a common disorder brought on by a collection of
too much liver's fat. Hepatic steatosis is an additional name for
it. Alcoholism causes the cells of the liver to accumulate fat.
The liver's ability to function is hampered by this. It could
result in cancer of the liver and liver damage. Even if a person
does not regularly consume alcohol, they can nonetheless get
fatty liver disease. Blood tests, ultrasounds, and computerized
tomography scans are the three main types of diagnostic tests.
A more precise and dependable, for the early identification of Fig. 1. Stages of Fatty Liver Disease
fatty liver disease, an automated software is needed. To
anticipate the disease, particular machine learning models are A frequent clinical issue, fatty liver d isease (FLD) is also
created for this purpose. To identify fatty liver disease with known to have substantial morbidity and fatality rates. FLD
specificity, accuracy, and dependability, methods of Naive finally results in hepatocellular cancer and non-cholesteric
Bayes (NB), Random Forest (RF), and eXtreme Gradient
cirrhosis [7]. Additionally, obesity, metabolic syndrome, and
Boosting with ANN are proposed in this study. A total of 70000
cases are included in the collection. This classification system is diabetes have all been on the rise along with FLD. An
assessed for precision using a confusion matrix. increased economic cost of FLD has been identified.
Therefore, accurate risk assessment of individuals and early
Keywords—Machine Learning, Random Forest, eXtreme detection of FLD may be extremely helpful for diagnosis,
Gradient Boosting, Naïve Bayes, Artificial Neural Networks. prevention, and even effective therapy. For the last ten years,
the biopsy has been utilized to categorize patients and is
I. INT RODUCT ION recognized the gold typical for evaluating liver fatty
Liver is a vital body part with various life-boosting intrusion. The adoption of this procedure could result in
activities. The hepatic system manufactures bile that assists adverse consequences and sample errors, and it is also very
in digestion. It assembles proteins for the body. It supplies intrusive and expensive. Although ultrasonography is used to
iron[9]. Nutrients will be converted into energy. It creates diagnose FLDs with greater accuracy, the accuracy of
materials that help blood clot. Infections will be resisted by identification is ext remely operator dependent. The signs and
creating immune factors and detaching bacteria and toxins symptoms of fatty liver disease are as follows:
from the blood.
Stage 1: At this stage size of the liver increase (swell) will • Male breast growth and discomfort in the abdomen
increase unnaturally. • Fluid buildup in the belly (ascites)
Stage 2: Liver tissues will get damaged
Stage 3: Complete liver damage. • Cluster of blood vessels under the skin
• Dark urine
• Easy bruising or bleeding
• Itchy skin

979-8-3503-4664-0/23/$31.00 ©2023 IEEE 1220


Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on February 06,2025 at 08:22:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023).
IEEE Xplore Part Number: CFP23AV8-ART; ISBN: 979-8-3503-4664-0

• Appetite loss; nausea; pale stools; swelling (edoema) of accurate predictions, with the XGBoost model having the
the legs; loss of weight; weakness or exhaustion; and best accuracy..
yellowing of the skin and eyes.[10] Weidong Ji et al created and evaluated machine learning
(ML) models that can be utilised for identifying a group of
individuals. This research involved 304,145 individuals who
took part in the national physical examination, and their
survey results and physical measurement data were used as
candidate covariates in the model. The relevance score of the
covariate in NAFLD after absolute shrinkage was generated
by a classifier with the highest performance, and a selection
operator (LASSO) was used to feature select from potential
covariates. The screening model for NAFLD was then
developed using four ML approaches. The performance of
XGBoost was the best of the four ML algorithms, with BMI,
age and waist circumference ranking highest in
significance.[2]
Fig. 2. Stages of Fatty Liver Disease
Chieh-ChenWu et al. set out to design a model to predict
Some patients get fatty liver disease completely out of
FLD that would help medical practitioners categories a
the blue. Fatty liver disease has the following causes includes patients, establish a diagnosis, and treat, and stop FLD. To
Excessive weight gain, Type 2 diabetes, insulin resistance,
predict FLD, classification algorithms such as RF, NB,
metabolic syndrome, high blood fat levels, particularly ANN, and LR were developed. The area under the receiver
triglycerides, adverse effects from several drugs , certain operating characteristic curve was used to assess the
infections, such as hepatitis C, and uncommon hereditary performance of the four models (ROC). The experiment
diseases.[11] involved 577 individuals, 377 of whom had fatty livers. The
Risk elements for fatty liver disease include Heavy random forest model outperformed the others.[3]
alcohol consumption, exposure to certain toxins, genetics, Cheng-fu Xu et al proposed the best clinical prediction
obesity, obstructive sleep apnea, older age, polycystic ovary
model for NAFLD was assessed using machine learning
syndrome (PCOS), pregnancy, starvation, rare genetic techniques. At Zhejiang University, participants in a health
conditions like Wilson disease, hypobetalipoproteinemia, examination participated in a cross -sectional study. The use
smoking, and use of certain medications like methotrexate of questionnaires, lab testing, physical exams, and hepatic
(Trexall), tamo xifen (Nolvadex), and ultrasonography was made. Then, using the free program
amiodarone(Pacerone).[8]
Weka, machine learn ing techniques were put into practice.
Machine learning (ML) is the process of analysing Features selection and classification were among the tasks. A
massive amounts of data to identify patterns that may be screening model was created using feature selection
used to forecast a variety of outcomes .In a variety of approaches by deleting unnecessary elements. A prediction
disciplines, machine learning approaches have emerged as a model was created using classification and assessed using the
potential tool for prediction and decision-making. F-measure. [4] 11 cutting-edge machine learning methods
Developing a machine learning model would be a huge help were researched. 2,522 (24%) of the 10,508 registered
participants matched the NAFLD d iagnostic criteria. Using a
in recognizing disorders and making positive healthcare
variety of statistical testing methodologies, the top five risk
decisions in real-t ime. It would also allow for the earlier
variables for NAFLD were discovered to be BMI,
classification of appropriate individuals with significant risk triglycerides, gamma-g lutamyl transpeptidase (GT), seru m
factors, allowing for the optimization of hospital alanine aminotransferase (ALT), and uric acid. [20] To
resources[21]. classify the data, a 10-fold cross-validation was used. The
results revealed that, among the 11 different tactics tested,
In this paper, the methods of Naïve Bayes (NB), Support the Bayesian network model performed the best. For
Vector Machine (SVM ) and Hybrid of ANN with eXtreme accuracy, specificity, sensitivity, and F-measure, up to 83%,
Gradient Boosting (XGBoost) are applied to the dataset to 0.878, 0.675, and 0.655, respectively, were obtained. The
find accuracy, sensitivity, specificity, duration and AUROC. Bayesian network model increases the F-measure score by
9.17% when compared to logistic regression.[22]
II. LIT ERAT URE SURVEY
Pei X et al proposed a model to predict FLD that can III. A RCHIT ECT URE
support medical professionals in catogorising people who are Following is the architecture of the proposed system.
at high risk of FLD and in making unique diagnoses, Firstly data is collected, then unnecessary data is removed,
decisions about treatment, and plans for FLD prevention. A then it is trained and algorithms are applied to compare
total of 3,419 participants were chosen, and 845 of them had accuracy[1].
FLD screenings. In order to find the disease, classification
models were applied. The models included in this study are
LDA, KNN, ANN, LR, RF and XGBoost. The prediction
accuracy was measured using AUC, sensitivity, specificity,
positive predictive value, and negative predictive value[1]. It
demonstrated that machine learning models yield more

979-8-3503-4664-0/23/$31.00 ©2023 IEEE 1221


Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on February 06,2025 at 08:22:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023).
IEEE Xplore Part Number: CFP23AV8-ART; ISBN: 979-8-3503-4664-0

decision trees instead of considering the output of one


decision tree. In a variety of fields, including medical
diagnostics, RF has proven to be a very accurate
approach.[5]
c) ANN: Co mputing models called artificial neural
networks (ANN) imitate bio logical brain networks. It is an
extremely potent nonlinear modelling technique that has has
been shown to produce precise forecasts in numerous CDS.
This model co mp rises of several "perceptrons" artificial
neural units. The way that a signal is transferred into a
neuron by a dendrite[16]. The ANN and a biolog ical neural
cell are extremely co mparable. It recreates the signal's
journey through the input layer, mu ltiple h idden layers, and
finally the output layer. Although there are numerous
perceptrons in each layer, among the perceptrons the
algorith ms are being trained, the layers are connected by
various weights that can be modified.. It uses a variety of
samples to learn from the train ing dataset until the best
Fig. 3. Flow Chart of Architecture prediction is made and each input matches the corrected
A. Data Collection output.
The Kaggle dataset is where the dataset was gathered. Y= W1 X1 +W2 X2 +b (2)
For the Liver informat ive collection fro m Kaggle, a total of Where X1 ,X2 ,.. are feature set
70000 instances with 13 attributes were gathered[12]. The W 1 ,W 2,…… are weights of corresponding features
feature "diagnostic," which is presented as quantifiable, b is constant
implies two ways of individuals with liver infection and one d) XGBoost: XGBoost is boosting algorith m used to
way of individuals who do not currently have liver disease. convert a weak classifier into a strong classifier by boosting
B. Pre-Processing the feature set[18]. It is a very popular advanced algorithm.
It constructs decision trees for attributes in the liver dataset
One crucial stage in machine learning is the elimination
to get more accuracy. It stands for extreme gradient
of unnecessary data from the dataset in order to reduce
boosting.
noise[15]. These elements demonstrate the influence on the
anticipated result and enhance the efficiency of this e) Hybrid: The majority of learn ing algorithms used
preprocessing stage. in mach ine learning are excellent at finishing one task o r
using one dataset. These methods will not enable you to
C. Training fully utilise AI across all of your data, even though they are
In general, th is process is essential fo r identifying FLD. quite beneficial and far superior to doing it manually.
The experiences that the calculation utilises to learn are Hybrid mach ine learn ing (HM L) can help with that.
structured by the perceptions in the preparation set. Every Together, several straightforward algorithms support and
perception in administered learn ing problems consists of a improve one another. Together, they can find solutions to
noticed yield variable and at least one noticed information issues that they were not intended to handle separately.
element.
E. Result Analysis:
D. Algorithm
a) Naïve Bayes: It works on the basis of Bayes TABLE I. COMP ARISION TABLE

theorem. It applies bayes theorem formulas on the data set Naïve Bayes Random Hybrid
to predict future dataset values. This is very easy to use. It Forest (ANN+XGBoost)
works on the principle that no features are dependent[14]. Accuracy 62.570455 82.424546 86.8836
Sensitivity 62.676507 82.530598 86.9896
( ) ( ) Specificity 62.348799 82.202890 86.6619
( ) ( ) AURO C 0.788671 0.888671 0.9060
( )
Where A=a1 ,a2 ,a3 ,…….,an , P(b) is prior probability, Here Hybrid of ANN with XGBoost gave better results than
p(b/A)is the posterior probability of A and A is feature Naïve Bayes and Random Forest when compared with
vector Accuracy, Sensitivity, Specificity and AUROC.
a) Accuracy: Accuracy is the probability of exact
b) Random Forest: Leo Rando m forest is one of the predictions of a model. The hybrid model has given an
commonly used machine learn ing algorith ms that is used to higher accuracy in the detection of fatty liver disease.
process handwriting. In this algorith m, all the dataset is Accuracy= (3)
divided into subsets based on features. Decision trees are
constructed for each feature[17]. The output of rando m
forest is the output of maximu m decision trees. The majority
voting of all decision trees is taken as the output of the
Random forest. It brings together the results of various

979-8-3503-4664-0/23/$31.00 ©2023 IEEE 1222


Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on February 06,2025 at 08:22:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023).
IEEE Xplore Part Number: CFP23AV8-ART; ISBN: 979-8-3503-4664-0

for models that perform binary classification tasks is


frequently a good idea.[11]

Fig. 4. Graphical representation of Model Accuracy


b) Sensitivity: In machine learning, sensitivity is a
metric utilised to assess a utilised to model’s competence to Fig. 7. Graphical representation of Model AUROC
forecast true positives of each available category[19]. In
writing, this phrase can alternatively be interpreted as a true IV. CONCLUSION
positive rate. The three machine learn ing methods used in this work
are contrasted in order to accurately predict fatty liver
disease. The hybrid of ANN with XGBoost model, however,
demonstrated. superior performance than conventional
machine learn ing methods. Implementing a hybrid ANN-
XGBoost approach in the clinical setting could assist doctors
in classifying individuals with monitoring, early treatment,
and care for fatty liver.

REFERENCES
[1] Pei X, Deng Q, Liu Z, Yan X, Sun W: Machine Learning Algorithms
for Predicting Fatty Liver Disease. Ann Nutr Metab 2021;77:38-45.
doi: 10.1159/000513654.
[2] Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based
Framework to Identify and Classify Non-alcoholic Fatty Liver
Disease in a Large-Scale Population. Front Public Health. 2022 Apr
Fig. 5. Graphical representation of Model Sensitivity 4;10:846118. doi: 10.3389/fpubh.2022.846118. PMID: 35444985;
PMCID: PMC9013842.
c) Specificity: The capacity of an algorith m or model
[3] Wu CC, Yeh WC, Hsu WD, Islam MM, Nguyen PAA, Poly TN,
to predict a true negative for every accessible category can Wang YC, Yang HC, Jack Li YC. Prediction of fatty liver disease
be used to measure specificity. Th is is sometimes referred to using machine learning algorithms. Comput Methods Programs
as the genuine negative rate in the literature.[13] Biomed. 2019 Mar;170:23-29. doi: 10.1016/j.cmpb.2018.12.032.
Epub 2018 Dec 29. PMID: 30712601.
[4] Han Ma, Cheng-fu Xu, Zhe Shen, Chao-hui Yu, You-ming Li,
"Application of Machine Learning T echniques for Clinical Predictive
Modeling: A Cross-Sectional Study on Nonalcoholic Fatty Liver
Disease in China", BioMed Research International, vol. 2018, Article
ID 4304376, 9 pages, 2018. https://doi.org/10.1155/2018/4304376
[5] M. F. Rabbi, S. M. Mahedy Hasan, A. I. Champa, M. AsifZaman and
M. K. Hasan, "Prediction of Liver Disorders using Machine Learning
Algorithms: A Comparative Study," 2020 2nd International
Conference on Advanced Information and Communication
Technology (ICAICT), Dhaka, Bangladesh, 2020, pp. 111-116, doi:
10.1109/ICAICT51780.2020.9333528.
[6] C. Anuradha, D. Swapna, B. Thati, V. N. Sree and S. P. Praveen,
"Diagnosing for Liver Disease Prediction in Patients Using Combined
Machine Learning Models," 2022 4th International Conference on
Smart Systems and Inventive Technology (ICSSIT ), T irunelveli,
India, 2022, pp. 889-896, doi: 10.1109/ICSSIT 53264.2022.9716312.
[7] Islam, Md & Wu, Chieh-Chen & Poly, T ahmina & Nguyen, Phung
Fig. 6. Graphical representation of Model Specificity Anh & Yang, Hsuan-Chia & Li, Yu-Chuan. (2019). Prediction of
Fatty Liver Disease using Machine Learning Algorithms. Computer
methods and programs in biomedicine.
d) AUROC: AUC-ROC is a curve used to visualize 170.10.1016/j.cmpb.2018.12.032.
the performance of a model. For unbalanced data, the [8] Rahman, A. K. M. & Shamrat, F.M. & Tasnim, Zarrin & Roy, Joy &
AUROC is more revealing than accuracy. It is a widely Hossain, Syed. (2019). A Comparative Study On Liver Disease
reported performance statistic that is simp le to calculate Prediction Using Supervised Machine Learning Algorithms. 8. 419 -
using multiple software packages, so calculating AUROC 422.

979-8-3503-4664-0/23/$31.00 ©2023 IEEE 1223


Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on February 06,2025 at 08:22:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023).
IEEE Xplore Part Number: CFP23AV8-ART; ISBN: 979-8-3503-4664-0

[9] El-Shafeiy, Engy & El-Desouky, Ali & Elghamrawy, Sally. (2018). Machine Learning-Based Surgical Planning for Neurosurgery:
Prediction of Liver Diseases Based on Machine Learning T echnique Artificial Intelligent Approaches to the Cranium. Front. Surg. 2022, 9,
for Big Data. 10.1007/978-3-319-74690-6_36. 863633.
[10] A.M. Hall and A.L. Smith. (1999), “Feature Selection for Machine [17] Sakatani, K.; Oyama, K.; Hu, L.; Warisawa, S. Estimation of Human
Learning: Comparing a Correlation-Based Filter Approach to the Cerebral Atrophy Based on Systemic Metabolic Status Using
Wrapper”, In Proceedings of the T welfth International Florida Machine Learning. Front. Neurol. 2022, 13, 869915.
Artificial Intelligence Research Society Conference, AAAI Press pp. [18] Yen, H.H.; Wu, P.Y.; Chen, M.F.; Lin, W.C.; Tsai, C.L.; Lin, K.P.
235- 239. Current Status and Future Perspective of Artificial Intelligence in the
[11] Torkadi, P.P.; Apte, I.C.; Bhute, A.K. Biochemical evaluation of Management of Peptic Ulcer Bleeding: A Review of Recent
patients of alcoholic liver disease and non-alcoholic liver Literature. J. Clin. Med. 2021, 10, 3527. [Google Scholar] [CrossRef]
disease. Indian J. Clin. Biochem. 2014, 29, 79–83. [19] Yen, H.-H.; Wu, P.-Y.; Su, P.-Y.; Yang, C.-W.; Chen, Y.-Y.; Chen,
[12] Robles-Diaz, M.; Garcia-Cortes, M.; Medina-Caliz, I.; Gonzalez- M.-F.; Lin, W.-C.; T sai, C.-L.; Lin, K.-P. Performance Comparison of
Jimenez, A.; Gonzalez-Grande, R.; Navarro, J.M.; Castiella, A.; the Deep Learning and the Human Endoscopist for Bleeding Peptic
Zapata, E.M.; Romero-Gomez, M.; Blanco, S.; et al. The value of Ulcer Disease. J. Med. Biol. Eng. 2021, 41, 504–513. [Google
serum aspartate aminotransferase and gamma-glutamyl transpetidase Scholar] [CrossRef]
as biomarkers in hepatotoxicity. Liver Int. 2015, 35, 2474–2482. [20] Yen, H.H.; Su, P.Y.; Zeng, Y.H.; Liu, I.L.; Huang, S.P.; Hsu, Y.C.;
[13] Arieira, C.; Monteiro, S.; Xavier, S.; Dias de Castro, F.; Magalhaes, Chen, Y.Y.; Yang, C.W.; Wu, S.S.; Chou, K.C. Glecaprevir-
J.; Moreira, M.J.; Marinho, C.; Cotter, J. Hepatic steatosis and pibrentasvir for chronic hepatitis C: Comparing treatment effect in
patients with inflammatory bowel disease: When transient patients with and without end-stage renal disease in a real-world
elastography makes the difference. Eur. J. Gastroenterol. Hepatol. setting. PLoS ONE 2020, 15, e0237582.
2019, 31, 998–1003 [21] Sakatani, K.; Oyama, K.; Hu, L.; Warisawa, S. Estimation of Human
[14] M. Ghosh, M. Mohsin Sarker Raihan, M. Raihan, L. Akter, A. Kumar Cerebral Atrophy Based on Systemic Metabolic Status Using
Bairagi et al., "A comparative analysis of machine learning Machine Learning. Front. Neurol. 2022, 13, 869915.
algorithms to predict liver disease," Intelligent Automation & Soft [22] Demšar, J.; Curk, T.; Erjavec, A.; Gorup, Č.; Hočevar, T .;
Computing, vol. 30, no.3, pp. 917–928, 2021. Milutinovič, M.; Možina, M.; Polajnar, M.; Toplak, M.; Starič, A.; et
[15] Ravi Kumar R., Babu Reddy M. and Praveen, 2019 “An evaluation of al. Orange: Data mining toolbox in Python. J. Mach. Learn. Res.
feature selection algorithms in machine learning”, International 2013, 14, 2349–2353. [Google Scholar]
journal of scientific & technology research, 8(12) PP. 2071–2074.
[16] Dundar, T.T.; Yurtsever, I.; Pehlivanoglu, M.K.; Yildiz, U.; Eker, A.;
Demir, M.A.; Mutluer, A.S.; T ektaş, R.; Kazan, M.S.; Kitis, S.; et al.

979-8-3503-4664-0/23/$31.00 ©2023 IEEE 1224


Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on February 06,2025 at 08:22:02 UTC from IEEE Xplore. Restrictions apply.

You might also like