0% found this document useful (0 votes)
3 views

3

This paper reviews the applications of large language models (LLMs) in healthcare, highlighting their potential to enhance clinical decision support systems, natural language processing in electronic health records, and healthcare education. While LLMs can significantly improve patient outcomes and streamline processes, challenges such as data privacy, model bias, and the need for extensive training data must be addressed. The successful implementation of LLMs in healthcare requires collaboration among various stakeholders, including computer scientists, healthcare professionals, and policymakers.

Uploaded by

Kashish Gidwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

3

This paper reviews the applications of large language models (LLMs) in healthcare, highlighting their potential to enhance clinical decision support systems, natural language processing in electronic health records, and healthcare education. While LLMs can significantly improve patient outcomes and streamline processes, challenges such as data privacy, model bias, and the need for extensive training data must be addressed. The successful implementation of LLMs in healthcare requires collaboration among various stakeholders, including computer scientists, healthcare professionals, and policymakers.

Uploaded by

Kashish Gidwani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 7th International Symposium on Computer Science and Intelligent Control (ISCSIC)

Large Language Models in Healthcare: A Review


2023 7th International Symposium on Computer Science and Intelligent Control (ISCSIC) | 979-8-3503-4298-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ISCSIC60498.2023.00038

Shun Zou* Jun He


College of Information and Communication College of Information and Communication
National University of Defense Technology National University of Defense Technology
Wuhan, China Wuhan, China
[email protected] [email protected]

Abstract— This paper examines the potential of large revolutionize medical research and drug discovery. They
language models (LLMs) in the healthcare sector, delving into enables rapid in-depth analysis of medical record data to
their prospective applications, challenges, and future identify typical patterns or trends, and generate hypotheses.
trajectories. LLMs have demonstrated encouraging results in This can accelerate the pace of medical research and lead to
various healthcare-related domains, including the the discovery of new treatments and cures [23] [24] [25] .
development of clinical decision support systems, natural Despite these potential benefits, there are also several
language processing in electronic health records, healthcare challenges related to the implementation of large language
question/answer systems, and healthcare education. However, models in healthcare. These include issues related to data
integrating these models into healthcare practice also raises
privacy and security, the requirement for large training
several concerns, such as data privacy and security issues, the
dataset, the risk of model bias, and insufficient
requirement for vast amounts of training data, model biases,
and the limited interpretability of model predictions.
interpretability of model results. Moreover, the successful
Overcoming these hurdles necessitates a collaborative effort implementation of these models requires a multidisciplinary
from experts across multiple disciplines. Despite these collaboration, involving not only computer scientists and
obstacles, the deployment of LLMs in healthcare holds the engineers, but also doctors, patients, and policymakers [6] .
potential to transform the industry and significantly enhance Large language models have the potential to reshape
patient outcomes. healthcare, but their implementation requires careful
consideration of both the advantages and challenges. In
Keywords—Large Language Models, ChatGPT,
Applications in Healthcare;
following sections, an overall review about large language
models in healthcare will be made from several aspects: the
applications of LLMs in healthcare, the challenges of using
I. INTRODUCTION LLMs in healthcare, and the future directions. This will give
The rise of large language models (LLMs) such as GPT us a more comprehensive understanding of the current
[1] [2] , Llama[3] , Alpaca[4] , and others has revolutionized development, application, limitations, and future direction
various sectors, including healthcare. These models, about LLMs in healthcare.
powered by machine learning and artificial intelligence,
have the potential to reshape healthcare delivery, research, II. APPLICATIONS OF LARGE LANGUAGE MODELS IN
and education. HEALTHCARE
Large language models have been instrumental in the The application of LLMs in healthcare has been a topic
development and application of Clinical Decision Support of interest in recent years. These models have the potential
Systems (CDSS). These systems leverage the power of to revolutionize various aspects of healthcare, including
these models to analyze vast amounts of data and provide clinical decision support systems, natural language
evidence-based treatment recommendations [5] [6] [7] [8] processing in electronic health records, healthcare Q/A
[9] . In addition to CDSS, large language models have found systems, healthcare education, and medical research and
application in Natural Language Processing (NLP) in drug discovery.
Electronic Health Records (EHR). EHRs contain tons of
valuable patient information, but much of this data is A. Clinical Decision Support Systems
unstructured and difficult to analyze. Large language Clinical Decision Support Systems (CDSS) have long
models can process this data, extract relevant information, been at the forefront of integrating technology into
and present it in a structured format that is easy to healthcare to improve patient outcomes and healthcare
understand and analyze [10] [11] [12] [13] [14] . efficiency. With the rise of LLMs like GPT-3 and its
Large language models are also being used in healthcare successors or variations, there has been a paradigm shift in
Q/A systems. These systems can answer patient queries, how these systems assist healthcare professionals in making
provide information about diseases and treatments, and clinical decisions. Large language models, trained on large
even guide patients through the healthcare system [15] [16] amounts of corpus, have the capability to generate human-
[17] [18] .In the field of healthcare education, large like answer based on their received input. This ability can
language models can be used to develop intelligent tutoring be harnessed in a CDSS to provide real-time, evidence-
systems. These systems can provide personalized learning based recommendations to healthcare professionals. For
experiences, adapt to the learning pace of individual instance, when a physician inputs patient symptoms and
students, and provide instant feedback[19] [20] [21] [22] . medical history, the LLM can quickly scan through the
Large language models also have great potential to latest medical literature and provide potential diagnoses,

979-8-3503-4298-7/23/$31.00 ©2023 IEEE 141


DOI 10.1109/ISCSIC60498.2023.00038
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
treatment options, and even predict patient outcomes based of large datasets for analysis and research [11] .By automate
on similar cases [5] . coding, information extraction, and data mining, these
The potential impact of integrating LLMs into CDSS on models can significantly reduce the workload of healthcare
patient outcomes cannot be understated. Compared with professionals and improving the efficiency of healthcare
human beings, large language models will not be affected services [10] .
by experience, emotion, and physical strength, and will only LLMs have been proven to be a powerful tool for
consider practical factors, therefore these advanced systems processing EHRs. In study [12] , a novel clinical language
can reduce diagnostic errors, especially in complex cases model called GatorTron was developed and assessed on five
where symptoms might be ambiguous [6] .Moreover, they distinct clinical natural language processing (NLP) tasks.
can assist in personalizing treatment plans by considering a These tasks included extracting clinical concepts,
broader range of factors than a human could feasibly identifying medical relations, determining semantic textual
analyze in a short time. Furthermore, the efficiency of similarity, inferring natural language (NLI), and responding
healthcare delivery can be significantly improved. For to medical queries (MQA). The notable improvements in
example, in emergency situations where time is of the performance across all five tasks suggest that GatorTron can
essence, an LLM-enhanced CDSS can provide rapid be effectively integrated into medical AI systems,
insights, helping medical professionals make quicker ultimately enhancing healthcare provision. In study [13] , a
decisions. med-7 model was trained for named-entity recognition,
A two-stage recommendation framework for assisted encompassing drug names, route of administration,
medical decision making in was proposed in study [7] based frequency, dosage, strength, form, and duration. With a
on the publicly available MIMIC dataset, where a pre- micro-averaged F1 score of 0.957 across all seven
trained language model was used to extract relevant and categories, the model demonstrated its potential for
useful information from the clinical records. Their findings identifying medical concepts and extracting relevant
indicated LLMs’ potential to enhance clinical decision- information. In study [14] , a cancer domain-specific
making capabilities and reduce the burden of information language model, CancerBERT, was developed to extract
processing. In study [8] , ChatGPT demonstrated strong breast cancer phenotypes from electronic health records.
performanceassessing its suitability for radiologic decision The model outperformed all other models on this task,
making. For breast cancer screening prompts, the model suggesting its potential to support clinical decision-making.
averaged an OE score of 1.83 out of 2 and a SATA In conclusion, NLP, particularly large language models,
percentage correct of 88.9%. Similarly, for breast pain holds great potential in enhancing the use of EHRs. Through
prompts, it achieved an average OE score of 1.125 out of 2 automated coding, information extraction, and data mining,
and a SATA percentage correct of 58.3%. These results it can streamline healthcare processes and contribute to
suggest that ChatGPT has the potential to aid radiologists in research and policy-making. As the growing of the volume
making accurate decisions. In study [8] , Clinicians and complexity of EHRs, and the boosting of the capability
evaluated both human-generated and AI-generated and applications of LLMs, the importance of LLMs in
suggestions for enhancing clinical decision support (CDS) healthcare will only increase. Therefore, further investment
alerts. Interestingly, among the top 20 rated suggestions, 9 in LLMs research and development is warranted to fully
were created by ChatGPT. This outcome indicates that AI- realize its potential in improving patient care and health
generated ideas can contribute to optimizing CDS alerts and outcomes.
support experts in developing their own recommendations
for CDS enhancement. C. Healthcare Q/A system
However, it's crucial to note that while LLMs offer Healthcare Q/A systems aim to provide patients and
immense potential, they are not without limitations. Their healthcare professionals with quick and accurate answers to
recommendations are based on the data they were trained their medical queries. Large language models can play a key
on, and they lack the intuitive reasoning that human role in future healthcare Q/A systems. These models can
professionals possess. Thus, while they can be a valuable understand and answer complex medical questions, thereby
and powerful tool, they should be utilized judiciously improving the efficiency and accuracy of healthcare
alongside human intuition and proficiency. services. Unlike traditional systems that often rely on
hardcoded knowledge bases, LLMs are trained on vast
B. Natural Language Processing in Electronic Health amounts of data, enabling them to generate human-like
Records responses in real-time and handle a wide array of medical
Electronic Health Records (EHRs) offer a rich source of inquiries. These models can swiftly provide detailed
data regarding patients' medical backgrounds, treatment explanations or recommend further reading based on the
paths, and health outcomes. However, the majority of this context of the question, potentially streamlining the
information is stored in unstructured text formats, making it information retrieval process for medical professionals.
difficult to analyze and extract meaningful insights The accuracy of these models, especially when
manually. Natural Language Processing (NLP) techniques, combined with domain-specific data, can rival or even
especially LLMs, offer a solution to this problem by surpass that of manual searches. Large language models
enabling the automatic processing and analysis of EHR data. (LLMs) have significantly advanced medical question
These models can analyze unstructured data in electronic answering, with Med-PaLM achieving a milestone score of
health records, such as clinical notes, and extract relevant 67.2% on the MedQA dataset, surpassing the "passing"
information. This can help healthcare professionals mark in US Medical Licensing Examination (USMLE) style
understand a patient's medical history and current health questions. Its successor, Med-PaLM 2, has shown even
status more quickly and accurately, facilitating the creation greater promise, scoring up to 86.5%. Notably, a

142

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
comprehensive human evaluation of 1066 consumer For example, researchers have explored the use of large
medical questions revealed that Med-PaLM 2's answers language models (LLMs) in developing new treatments for
were preferred over those provided by physicians on eight COVID-19, including drug repurposing, which could
of nine criteria pertinent to clinical utility, highlighting its inform prevention strategies for future pandemics [24] .
potential for practical application in medicine. [17] . In Additionally, customized domain-specific LLMs, such as
another study, the Codex model with 175B parameters chemical language models, have shown promising results in
demonstrated an answering accuracy of 60.2% on the accelerating de novo drug design by generating new
USMLE test, 59.7% on the MedMCQA validation, and 78.2% molecules with desired properties [25] .
on the PubMedQA test [18] . These results are comparable In conclusion,, LLMs hold great potential in
to the performance of human experts. transforming various areas of healthcare, though further
Furthermore, the implementation of LLMs in healthcare investigation is necessary to thoroughly understand their
Q/A systems can lead to significant cost reductions. The advantages and limitations in this field.
traditional model, involving manual searches or
consultations with specialists, can be time-consuming and III. CHALLENGES OF USING LLM IN HEALTHCARE
expensive. In contrast, once an LLM is trained, it can reduce The use of large language models (LLMs) in healthcare
the need for human intervention, thereby its marginal cost has shown great promise in various applications, including
for answering additional questions is almost negligible. clinical text analysis, patient communication, and medical
D. Healthcare education decision-making. However, there are several challenges
associated with the adoption of LLMs in healthcare, which
Healthcare education is another area where large must be addressed to fully realize their potential.
language models can have a significant impact. In
healthcare education, large language models can be A. Need for Large Amounts of Training Data
leveraged to create customized learning experiences for One of the primary challenges in developing and
students. By analyzing a student's individual learning style, deploying LLMs in healthcare is access to large amounts of
these models can tailor the educational content to better high-quality training data. Deep learning models require
meet their needs, providing a more effective and engaging vast amounts of data to learn patterns and relationships
learning experience. Moreover, these models can also within the data, and healthcare is no exception. The larger
provide instant feedback, identifying important concepts, the training dataset, the better the model's ability to
highlighting knowledge gaps, thereby helping students understand context, make predictions, and provide relevant
improve their learning outcomes [20] . information. Pre-trained models usually performs poorly on
In a recent study [22] , the capabilities of the large specific problems, which makes it essential to create
language model ChatGPT were assessed on the United domain-specific datasets. However, obtaining such data can
States Medical Licensing Exam (USMLE). The pretrained be difficult due to privacy concerns, data fragmentation, and
ChatGPT delivered impressive results, performing at or the time-consuming process of manual annotation.
near the passing threshold for all three exams and Moreover, collecting data from diverse sources, such as
showcasing a high degree of consistency and depth in its electronic health records (EHRs), clinical notes, and
explanations. These findings hint at the possibility of medical imaging reports, poses additional challenges in
ChatGPT serving as a valuable tool in medical education. terms of data heterogeneity and quality variability. Medical
Another study discussed the potential use of large language data can vary in format, structure, and language, especially
models in healthcare education [21] . LLMs could analyze when sourced from different healthcare systems or countries.
medical texts and generate quizzes based on the most This diversity requires additional preprocessing and
important concepts. LLMs could also generate a virtual standardization efforts before it can be used for training
patient simulation, Traditional medical simulations often LLMs.
rely on mannequins or actors to mimic patient scenarios.
With the advent of LLMs, these simulations can be B. Lack of Interpretability of Model Predictions
enhanced with realistic patient dialogues, allowing medical Another significant challenge in using LLMs in
students to practice their communication skills in a healthcare is the lack of interpretability of model predictions.
controlled environment. Unlike traditional machine learning models, deep learning
models are often criticized for their opaqueness, making it
E. Medical research and drug discovery
difficult to understand the reasoning behind their
LLMs can play a crucial role in medical research, predictions. In healthcare, where decisions can have life-
particularly in drug discovery. LLMs have shown promise altering consequences, understanding the rationale behind a
in analyzing vast amounts of literature and data to identify model's prediction is crucial. When an LLM suggests a
potential drug candidates. The traditional drug discovery particular diagnosis or treatment plan, clinicians need to
process is time-consuming and resource-intensive, often understand the basis for that recommendation to trust and
taking years and significant financial investments to bring a act upon it.
drug from concept to market. However, with the The lack of transparency in large language models
computational capabilities of LLMs, researchers can (LLMs) poses a significant challenge in healthcare, where
expedite the initial stages of drug discovery by rapidly understanding the thought process behind a
sifting through existing literature to identify potential drug recommendation or suggestion is critical. The so-called
candidates or mechanisms [23] . This can significantly "black box" nature of these models makes it difficult to
speed up the drug discovery process and reduce the cost of comprehend how they reach their conclusions, leading to
drug development. mistrust among healthcare professionals who may be

143

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
reluctant to rely on a tool that doesn't offer clear reasoning Another legal concern relates to intellectual property
for its decisions. Furthermore, the absence of rights. Who owns the intellectual property rights to the data
interpretability hinders the detection and correction of used to train these models? Is it the patients who provided
potential biases within the model, which can result in the data, the clinicians who collected it, or the companies
inaccurate recommendations and further undermine that developed the models? Clarifying ownership rights is
confidence in the technology. To tackle this issue, essential to prevent disputes and ensure that data are used
researchers have put forth various methods, such as responsibly.
attention mechanisms, feature importance scores, and
visualizations, aimed at offering insights into the decision- E. Biases and Limitations
making process of LLMs. Nevertheless, the ability to Despite their promise, large language models in
interpret model predictions remains an essential problem healthcare are not without their biases and limitations. One
that requires resolution. limitation is that they may not perform well outside the
dataset used for training. This means that the models may
C. Ethical Implications not generalize well to new populations or situations, leading
A key ethical consideration regarding the utilization of to poor performance or incorrect recommendations.
large language models in healthcare revolves around Another limitation is that these models rely heavily on
privacy and data security. These models necessitate structured data, which may not capture the full complexity
extensive sets of personal health data for training, giving of patient experiences. Unstructured data, such as clinical
rise to worries about patient confidentiality and data notes, can provide valuable insights into patient symptoms
protection. The integration of artificial intelligence in and experiences, but they are often difficult to analyze using
healthcare presents substantial hurdles in safeguarding machine learning techniques.
patients' privacy. Hence, it becomes imperative to establish
and uphold adequate measures to protect patient data and
uphold confidentiality standards. IV. FUTURE DIRECTIONS
Another ethical aspect to consider involves the risk of It can be seen from above that ensuring data privacy and
bias influencing the decision-making process. If large security, mitigating bias, establishing accountability, and
language models are trained on biased data or influenced by evaluating performances critically are vital steps towards
a specific worldview, they can inadvertently perpetuate integrating LLMs into healthcare safely and effectively.
existing health disparities. For instance, a language model To overcome these challenges, potential research
created for predicting patient mortality may exhibit bias directions include improving data quality and integrity
against certain racial groups. To tackle this challenge, it is through better data governance and curation practices,
essential to prioritize the use of diverse and representative developing more advanced techniques for data
datasets when training these models. preprocessing and normalization, and implementing robust
D. Legal Considerations validation methods to assess the performance and reliability
of LLMs in healthcare applications. Moreover, there is a
The use of large language models in healthcare also growing interest in developing hybrid approaches that
raises several legal considerations. One of the most pressing combine LLMs with domain-specific knowledge and
issues is liability and accountability. Who is responsible expertise to improve the interpretability and trustworthiness
when an AI system makes a mistake or provides suboptimal of AI models in healthcare.
advice? Is it the developer, the clinician who used the
system, or the patient who consented to its use? These
questions highlight the need for clearer regulations and
guidelines regarding the use of AI in healthcare.
[8] Rao, A., Kim, J., Kamineni, M., Pang, M., Lie, W., & Succi, M. D.
REFERENCES (2023). Evaluating ChatGPT as an adjunct for radiologic decision-
making. medRxiv, 2023-02.
[1] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A.,
Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient [9] Liu, S., Wright, A. P., Patterson, B. L., Wanderer, J. P., Turer, R.
foundation language models. arXiv preprint arXiv:2302.13971. W., Nelson, S. D., ... & Wright, A. (2023). Using AI-generated
suggestions from ChatGPT to optimize clinical decision support.
[2] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, Journal of the American Medical Informatics Association, 30(7),
I. (2019). Language models are unsupervised multitask learners. 1237-1245.
OpenAI blog, 1(8), 9.
[10] Pons, E., Braun, L. M., Hunink, M. M., & Kors, J. A. (2016).
[3] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Natural language processing in radiology: a systematic review.
Dhariwal, P., ... & Amodei, D. (2020). Language models are few- Radiology, 279(2), 329-343.
shot learners. Advances in neural information processing systems,
33, 1877-1901. [11] Ford, E., Carroll, J. A., Smith, H. E., Scott, D., & Cassell, J. A.
(2016). Extracting information from the text of electronic medical
[4] Stanford alpaca: an instruction-following llama model. (2023). records to improve case detection: a systematic review. Journal of
Accessed: April 3, 2023:https://github.com/tatsu- the American Medical Informatics Association, 23(5), 1007-1015.
lab/stanford_alpaca.
[12] Yang, X., Chen, A., PourNejatian, N., Shin, H. C., Smith, K. E.,
[5] Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in Parisien, C., ... & Wu, Y. (2022). A large language model for
medicine. New England Journal of Medicine, 380(14), 1347-1358. electronic health records. NPJ Digital Medicine, 5(1), 194.
[6] Topol, E. J. (2019). High-performance medicine: the convergence [13] Kormilitzin, A., Vaci, N., Liu, Q., & Nevado-Holgado, A. (2021).
of human and artificial intelligence. Nature medicine, 25(1), 44-56. Med7: A transferable clinical natural language processing model
[7] Raza, S. (2023). Improving Clinical Decision Making with a Two- for electronic health records. Artificial Intelligence in Medicine,
Stage Recommender System Based on Language Models: A Case 118, 102086.
Study on MIMIC-III Dataset. medRxiv, 2023-02.

144

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
[14] Zhou, S., Wang, N., Wang, L., Liu, H., & Zhang, R. (2022).
CancerBERT: a cancer domain-specific language model for
extracting breast cancer phenotypes from electronic health records.
Journal of the American Medical Informatics Association, 29(7),
1208-1216.
[15] Miner, A. S., Milstein, A., Schueller, S., Hegde, R., Mangurian, C.,
& Linos, E. (2016). Smartphone-Based Conversational Agents and
Responses to Questions About Mental Health, Interpersonal
Violence, and Physical Health. JAMA Internal Medicine, 176(5),
619-625.
[16] Miner, A. S., Laranjo, L., & Kocaballi, A. B. (2020). Chatbots in
the fight against the COVID-19 pandemic. NPJ digital medicine,
3(1), 65.
[17] Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., ...
& Natarajan, V. (2023). Towards expert-level medical question
answering with large language models. arXiv preprint
arXiv:2305.09617.
[18] Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language
models reason about medical questions?. arXiv preprint
arXiv:2207.08143.
[19] Woolf, B. P. (2010). A roadmap for education technology.
[20] Wartman, S. A., & Combs, C. D. (2018). Medical education must
move from the information age to the age of artificial intelligence.
Academic Medicine, 93(8), 1107-1109.
[21] Eysenbach, G. (2023). The role of ChatGPT, generative language
models, and artificial intelligence in medical education: a
conversation with ChatGPT and a call for papers. JMIR Medical
Education, 9(1), e46885.
[22] Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L.,
Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on
USMLE: Potential for AI-assisted medical education using large
language models. PLOS Digit Health 2 (2): e0000198.
[23] [9] Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke,
T. (2018). The rise of deep learning in drug discovery. Drug
Discovery Today, 23(6), 1241-1250.
[24] Liu, Z., Roberts, R. A., Lal-Nag, M., Chen, X., Huang, R., & Tong,
W. (2021). AI-based language models powering drug discovery
and development. Drug Discovery Today, 26(11), 2593-2607.
[25] Grisoni, F. (2023). Chemical language models for de novo drug
design: Challenges and opportunities. Current Opinion in Structural
Biology, 79, 102527.

145

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.

You might also like