3
3
Abstract— This paper examines the potential of large revolutionize medical research and drug discovery. They
language models (LLMs) in the healthcare sector, delving into enables rapid in-depth analysis of medical record data to
their prospective applications, challenges, and future identify typical patterns or trends, and generate hypotheses.
trajectories. LLMs have demonstrated encouraging results in This can accelerate the pace of medical research and lead to
various healthcare-related domains, including the the discovery of new treatments and cures [23] [24] [25] .
development of clinical decision support systems, natural Despite these potential benefits, there are also several
language processing in electronic health records, healthcare challenges related to the implementation of large language
question/answer systems, and healthcare education. However, models in healthcare. These include issues related to data
integrating these models into healthcare practice also raises
privacy and security, the requirement for large training
several concerns, such as data privacy and security issues, the
dataset, the risk of model bias, and insufficient
requirement for vast amounts of training data, model biases,
and the limited interpretability of model predictions.
interpretability of model results. Moreover, the successful
Overcoming these hurdles necessitates a collaborative effort implementation of these models requires a multidisciplinary
from experts across multiple disciplines. Despite these collaboration, involving not only computer scientists and
obstacles, the deployment of LLMs in healthcare holds the engineers, but also doctors, patients, and policymakers [6] .
potential to transform the industry and significantly enhance Large language models have the potential to reshape
patient outcomes. healthcare, but their implementation requires careful
consideration of both the advantages and challenges. In
Keywords—Large Language Models, ChatGPT,
Applications in Healthcare;
following sections, an overall review about large language
models in healthcare will be made from several aspects: the
applications of LLMs in healthcare, the challenges of using
I. INTRODUCTION LLMs in healthcare, and the future directions. This will give
The rise of large language models (LLMs) such as GPT us a more comprehensive understanding of the current
[1] [2] , Llama[3] , Alpaca[4] , and others has revolutionized development, application, limitations, and future direction
various sectors, including healthcare. These models, about LLMs in healthcare.
powered by machine learning and artificial intelligence,
have the potential to reshape healthcare delivery, research, II. APPLICATIONS OF LARGE LANGUAGE MODELS IN
and education. HEALTHCARE
Large language models have been instrumental in the The application of LLMs in healthcare has been a topic
development and application of Clinical Decision Support of interest in recent years. These models have the potential
Systems (CDSS). These systems leverage the power of to revolutionize various aspects of healthcare, including
these models to analyze vast amounts of data and provide clinical decision support systems, natural language
evidence-based treatment recommendations [5] [6] [7] [8] processing in electronic health records, healthcare Q/A
[9] . In addition to CDSS, large language models have found systems, healthcare education, and medical research and
application in Natural Language Processing (NLP) in drug discovery.
Electronic Health Records (EHR). EHRs contain tons of
valuable patient information, but much of this data is A. Clinical Decision Support Systems
unstructured and difficult to analyze. Large language Clinical Decision Support Systems (CDSS) have long
models can process this data, extract relevant information, been at the forefront of integrating technology into
and present it in a structured format that is easy to healthcare to improve patient outcomes and healthcare
understand and analyze [10] [11] [12] [13] [14] . efficiency. With the rise of LLMs like GPT-3 and its
Large language models are also being used in healthcare successors or variations, there has been a paradigm shift in
Q/A systems. These systems can answer patient queries, how these systems assist healthcare professionals in making
provide information about diseases and treatments, and clinical decisions. Large language models, trained on large
even guide patients through the healthcare system [15] [16] amounts of corpus, have the capability to generate human-
[17] [18] .In the field of healthcare education, large like answer based on their received input. This ability can
language models can be used to develop intelligent tutoring be harnessed in a CDSS to provide real-time, evidence-
systems. These systems can provide personalized learning based recommendations to healthcare professionals. For
experiences, adapt to the learning pace of individual instance, when a physician inputs patient symptoms and
students, and provide instant feedback[19] [20] [21] [22] . medical history, the LLM can quickly scan through the
Large language models also have great potential to latest medical literature and provide potential diagnoses,
142
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
comprehensive human evaluation of 1066 consumer For example, researchers have explored the use of large
medical questions revealed that Med-PaLM 2's answers language models (LLMs) in developing new treatments for
were preferred over those provided by physicians on eight COVID-19, including drug repurposing, which could
of nine criteria pertinent to clinical utility, highlighting its inform prevention strategies for future pandemics [24] .
potential for practical application in medicine. [17] . In Additionally, customized domain-specific LLMs, such as
another study, the Codex model with 175B parameters chemical language models, have shown promising results in
demonstrated an answering accuracy of 60.2% on the accelerating de novo drug design by generating new
USMLE test, 59.7% on the MedMCQA validation, and 78.2% molecules with desired properties [25] .
on the PubMedQA test [18] . These results are comparable In conclusion,, LLMs hold great potential in
to the performance of human experts. transforming various areas of healthcare, though further
Furthermore, the implementation of LLMs in healthcare investigation is necessary to thoroughly understand their
Q/A systems can lead to significant cost reductions. The advantages and limitations in this field.
traditional model, involving manual searches or
consultations with specialists, can be time-consuming and III. CHALLENGES OF USING LLM IN HEALTHCARE
expensive. In contrast, once an LLM is trained, it can reduce The use of large language models (LLMs) in healthcare
the need for human intervention, thereby its marginal cost has shown great promise in various applications, including
for answering additional questions is almost negligible. clinical text analysis, patient communication, and medical
D. Healthcare education decision-making. However, there are several challenges
associated with the adoption of LLMs in healthcare, which
Healthcare education is another area where large must be addressed to fully realize their potential.
language models can have a significant impact. In
healthcare education, large language models can be A. Need for Large Amounts of Training Data
leveraged to create customized learning experiences for One of the primary challenges in developing and
students. By analyzing a student's individual learning style, deploying LLMs in healthcare is access to large amounts of
these models can tailor the educational content to better high-quality training data. Deep learning models require
meet their needs, providing a more effective and engaging vast amounts of data to learn patterns and relationships
learning experience. Moreover, these models can also within the data, and healthcare is no exception. The larger
provide instant feedback, identifying important concepts, the training dataset, the better the model's ability to
highlighting knowledge gaps, thereby helping students understand context, make predictions, and provide relevant
improve their learning outcomes [20] . information. Pre-trained models usually performs poorly on
In a recent study [22] , the capabilities of the large specific problems, which makes it essential to create
language model ChatGPT were assessed on the United domain-specific datasets. However, obtaining such data can
States Medical Licensing Exam (USMLE). The pretrained be difficult due to privacy concerns, data fragmentation, and
ChatGPT delivered impressive results, performing at or the time-consuming process of manual annotation.
near the passing threshold for all three exams and Moreover, collecting data from diverse sources, such as
showcasing a high degree of consistency and depth in its electronic health records (EHRs), clinical notes, and
explanations. These findings hint at the possibility of medical imaging reports, poses additional challenges in
ChatGPT serving as a valuable tool in medical education. terms of data heterogeneity and quality variability. Medical
Another study discussed the potential use of large language data can vary in format, structure, and language, especially
models in healthcare education [21] . LLMs could analyze when sourced from different healthcare systems or countries.
medical texts and generate quizzes based on the most This diversity requires additional preprocessing and
important concepts. LLMs could also generate a virtual standardization efforts before it can be used for training
patient simulation, Traditional medical simulations often LLMs.
rely on mannequins or actors to mimic patient scenarios.
With the advent of LLMs, these simulations can be B. Lack of Interpretability of Model Predictions
enhanced with realistic patient dialogues, allowing medical Another significant challenge in using LLMs in
students to practice their communication skills in a healthcare is the lack of interpretability of model predictions.
controlled environment. Unlike traditional machine learning models, deep learning
models are often criticized for their opaqueness, making it
E. Medical research and drug discovery
difficult to understand the reasoning behind their
LLMs can play a crucial role in medical research, predictions. In healthcare, where decisions can have life-
particularly in drug discovery. LLMs have shown promise altering consequences, understanding the rationale behind a
in analyzing vast amounts of literature and data to identify model's prediction is crucial. When an LLM suggests a
potential drug candidates. The traditional drug discovery particular diagnosis or treatment plan, clinicians need to
process is time-consuming and resource-intensive, often understand the basis for that recommendation to trust and
taking years and significant financial investments to bring a act upon it.
drug from concept to market. However, with the The lack of transparency in large language models
computational capabilities of LLMs, researchers can (LLMs) poses a significant challenge in healthcare, where
expedite the initial stages of drug discovery by rapidly understanding the thought process behind a
sifting through existing literature to identify potential drug recommendation or suggestion is critical. The so-called
candidates or mechanisms [23] . This can significantly "black box" nature of these models makes it difficult to
speed up the drug discovery process and reduce the cost of comprehend how they reach their conclusions, leading to
drug development. mistrust among healthcare professionals who may be
143
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
reluctant to rely on a tool that doesn't offer clear reasoning Another legal concern relates to intellectual property
for its decisions. Furthermore, the absence of rights. Who owns the intellectual property rights to the data
interpretability hinders the detection and correction of used to train these models? Is it the patients who provided
potential biases within the model, which can result in the data, the clinicians who collected it, or the companies
inaccurate recommendations and further undermine that developed the models? Clarifying ownership rights is
confidence in the technology. To tackle this issue, essential to prevent disputes and ensure that data are used
researchers have put forth various methods, such as responsibly.
attention mechanisms, feature importance scores, and
visualizations, aimed at offering insights into the decision- E. Biases and Limitations
making process of LLMs. Nevertheless, the ability to Despite their promise, large language models in
interpret model predictions remains an essential problem healthcare are not without their biases and limitations. One
that requires resolution. limitation is that they may not perform well outside the
dataset used for training. This means that the models may
C. Ethical Implications not generalize well to new populations or situations, leading
A key ethical consideration regarding the utilization of to poor performance or incorrect recommendations.
large language models in healthcare revolves around Another limitation is that these models rely heavily on
privacy and data security. These models necessitate structured data, which may not capture the full complexity
extensive sets of personal health data for training, giving of patient experiences. Unstructured data, such as clinical
rise to worries about patient confidentiality and data notes, can provide valuable insights into patient symptoms
protection. The integration of artificial intelligence in and experiences, but they are often difficult to analyze using
healthcare presents substantial hurdles in safeguarding machine learning techniques.
patients' privacy. Hence, it becomes imperative to establish
and uphold adequate measures to protect patient data and
uphold confidentiality standards. IV. FUTURE DIRECTIONS
Another ethical aspect to consider involves the risk of It can be seen from above that ensuring data privacy and
bias influencing the decision-making process. If large security, mitigating bias, establishing accountability, and
language models are trained on biased data or influenced by evaluating performances critically are vital steps towards
a specific worldview, they can inadvertently perpetuate integrating LLMs into healthcare safely and effectively.
existing health disparities. For instance, a language model To overcome these challenges, potential research
created for predicting patient mortality may exhibit bias directions include improving data quality and integrity
against certain racial groups. To tackle this challenge, it is through better data governance and curation practices,
essential to prioritize the use of diverse and representative developing more advanced techniques for data
datasets when training these models. preprocessing and normalization, and implementing robust
D. Legal Considerations validation methods to assess the performance and reliability
of LLMs in healthcare applications. Moreover, there is a
The use of large language models in healthcare also growing interest in developing hybrid approaches that
raises several legal considerations. One of the most pressing combine LLMs with domain-specific knowledge and
issues is liability and accountability. Who is responsible expertise to improve the interpretability and trustworthiness
when an AI system makes a mistake or provides suboptimal of AI models in healthcare.
advice? Is it the developer, the clinician who used the
system, or the patient who consented to its use? These
questions highlight the need for clearer regulations and
guidelines regarding the use of AI in healthcare.
[8] Rao, A., Kim, J., Kamineni, M., Pang, M., Lie, W., & Succi, M. D.
REFERENCES (2023). Evaluating ChatGPT as an adjunct for radiologic decision-
making. medRxiv, 2023-02.
[1] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A.,
Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient [9] Liu, S., Wright, A. P., Patterson, B. L., Wanderer, J. P., Turer, R.
foundation language models. arXiv preprint arXiv:2302.13971. W., Nelson, S. D., ... & Wright, A. (2023). Using AI-generated
suggestions from ChatGPT to optimize clinical decision support.
[2] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, Journal of the American Medical Informatics Association, 30(7),
I. (2019). Language models are unsupervised multitask learners. 1237-1245.
OpenAI blog, 1(8), 9.
[10] Pons, E., Braun, L. M., Hunink, M. M., & Kors, J. A. (2016).
[3] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Natural language processing in radiology: a systematic review.
Dhariwal, P., ... & Amodei, D. (2020). Language models are few- Radiology, 279(2), 329-343.
shot learners. Advances in neural information processing systems,
33, 1877-1901. [11] Ford, E., Carroll, J. A., Smith, H. E., Scott, D., & Cassell, J. A.
(2016). Extracting information from the text of electronic medical
[4] Stanford alpaca: an instruction-following llama model. (2023). records to improve case detection: a systematic review. Journal of
Accessed: April 3, 2023:https://github.com/tatsu- the American Medical Informatics Association, 23(5), 1007-1015.
lab/stanford_alpaca.
[12] Yang, X., Chen, A., PourNejatian, N., Shin, H. C., Smith, K. E.,
[5] Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in Parisien, C., ... & Wu, Y. (2022). A large language model for
medicine. New England Journal of Medicine, 380(14), 1347-1358. electronic health records. NPJ Digital Medicine, 5(1), 194.
[6] Topol, E. J. (2019). High-performance medicine: the convergence [13] Kormilitzin, A., Vaci, N., Liu, Q., & Nevado-Holgado, A. (2021).
of human and artificial intelligence. Nature medicine, 25(1), 44-56. Med7: A transferable clinical natural language processing model
[7] Raza, S. (2023). Improving Clinical Decision Making with a Two- for electronic health records. Artificial Intelligence in Medicine,
Stage Recommender System Based on Language Models: A Case 118, 102086.
Study on MIMIC-III Dataset. medRxiv, 2023-02.
144
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.
[14] Zhou, S., Wang, N., Wang, L., Liu, H., & Zhang, R. (2022).
CancerBERT: a cancer domain-specific language model for
extracting breast cancer phenotypes from electronic health records.
Journal of the American Medical Informatics Association, 29(7),
1208-1216.
[15] Miner, A. S., Milstein, A., Schueller, S., Hegde, R., Mangurian, C.,
& Linos, E. (2016). Smartphone-Based Conversational Agents and
Responses to Questions About Mental Health, Interpersonal
Violence, and Physical Health. JAMA Internal Medicine, 176(5),
619-625.
[16] Miner, A. S., Laranjo, L., & Kocaballi, A. B. (2020). Chatbots in
the fight against the COVID-19 pandemic. NPJ digital medicine,
3(1), 65.
[17] Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., ...
& Natarajan, V. (2023). Towards expert-level medical question
answering with large language models. arXiv preprint
arXiv:2305.09617.
[18] Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language
models reason about medical questions?. arXiv preprint
arXiv:2207.08143.
[19] Woolf, B. P. (2010). A roadmap for education technology.
[20] Wartman, S. A., & Combs, C. D. (2018). Medical education must
move from the information age to the age of artificial intelligence.
Academic Medicine, 93(8), 1107-1109.
[21] Eysenbach, G. (2023). The role of ChatGPT, generative language
models, and artificial intelligence in medical education: a
conversation with ChatGPT and a call for papers. JMIR Medical
Education, 9(1), e46885.
[22] Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L.,
Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on
USMLE: Potential for AI-assisted medical education using large
language models. PLOS Digit Health 2 (2): e0000198.
[23] [9] Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke,
T. (2018). The rise of deep learning in drug discovery. Drug
Discovery Today, 23(6), 1241-1250.
[24] Liu, Z., Roberts, R. A., Lal-Nag, M., Chen, X., Huang, R., & Tong,
W. (2021). AI-based language models powering drug discovery
and development. Drug Discovery Today, 26(11), 2593-2607.
[25] Grisoni, F. (2023). Chemical language models for de novo drug
design: Challenges and opportunities. Current Opinion in Structural
Biology, 79, 102527.
145
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on March 11,2025 at 15:14:35 UTC from IEEE Xplore. Restrictions apply.