Decoding ChatGPT A Primer On Large Language Models For Clinicians
Decoding ChatGPT A Primer On Large Language Models For Clinicians
Intelligence-Based Medicine
journal homepage: www.sciencedirect.com/journal/intelligence-based-medicine
A B S T R A C T
The rapid progress of artificial intelligence (AI) and the adoption of Large Language Models (LLMs) suggests that these technologies will transform healthcare in the
coming years. We present a primer on LLMs for clinicians, focusing on OpenAI’s Generative Pretrained Transformer-4 (GPT-4) model which powers ChatGPT as a use-
case, as it has already seen record-breaking uptake in usage. ChatGPT generates natural-sounding text based on patterns observed from vast amounts of training data.
The core strengths of ChatGPT and LLMs in healthcare applications include summarization and text generation, rapid adaptation and learning, and ease of cus
tomization and integration into existing applications. However, clinicians should also recognize the limitations of LLMs, most notably concerns about inaccuracy,
privacy, accountability, transparency, and explainability. Clinicians must embrace the opportunity to explore, engage, and lead in the responsible integration of
LLMs, harnessing their potential to revolutionize patient care and drive advancements in an ever-evolving healthcare landscape.
https://doi.org/10.1016/j.ibmed.2023.100114
Received 5 July 2023; Received in revised form 11 October 2023; Accepted 11 October 2023
Available online 12 October 2023
2666-5212/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
R.B. Hunter et al. Intelligence-Based Medicine 8 (2023) 100114
Fig. 1. Stephen Wolfram’s Illustration of ChatGPT’s Text Generation Process. [11] *For this example, we use the term “words” for simplicity and clarity, but in
reality ChatGPT utilizes tokens, or small units of meaningful text.
2
R.B. Hunter et al. Intelligence-Based Medicine 8 (2023) 100114
healthcare practice: quality (79 % compared to 22 % of human responses) and were more
likely to be Empathetic or Very Empathetic (45 % compared to 5 % of
1. Summarization and Text Generation: GPT-4 is a decoder model physician responses) [20]. Although these data are promising,
which means that at each stage, for a given word, the attention layers ChatGPT is also capable of “hallucinations,” where the model pro
of the model can only access words positioned before it in that sen duces fabricated data or information to respond to a user’s query, but
tence [3]. This attribute is excellent for language modeling, text quite often in a confident, coherent, and very believable way. A
generation, and summarization. It is no surprise that it has the po notable example of ChatGPT’s tendency to hallucinate is that it will
tential to revolutionize interaction with electronic health records. often fabricate very plausible but invalid references when prompted
Epic has already announced plans to integrate large language to justify where a fact or response came from. The frequency of
models, which could save time for clinicians through the generation hallucinations is unknown in general and will vary between medical
of advanced text notes, concise summarization of medical care his and non-medical contexts.
tory, finding information faster, and more [23]. It could also 2. Privacy and Accountability: GPT-4 has been trained on trillions of
streamline revenue cycles by simplifying prior authorizations, words scraped from various online sources, some of which may
speeding up coding and charging, and identifying potential errors contain personal information obtained without consent, posing a
that lead to denials before they are submitted. potential violation of privacy standards. When using ChatGPT,
2. Rapid adaptation and learning: ChatGPT can quickly adapt to new OpenAI has previously stored user data automatically, raising con
information and learn from user interactions, enabling it to improve cerns about privacy and security. In April 2023, a data breach
its performance over time. One way ChatGPT achieves rapid adap affected less than 1 % of users’ data, highlighting the risks associated
tation is through few-shot learning, where the general model learns with transmitting secure data to ChatGPT [26]. It is important to
to perform a task with only a small number of provided examples note that when using the GPT API (not ChatGPT), OpenAI stores user
[24]. Fine-tuning is the process of adjusting the model’s pre-trained data for 30 days before permanently deleting it, and only uses the
weights using a smaller, domain-specific dataset; once fine-tuned, data for model training if the user opts in [27]. Furthermore, when
examples no longer need to be provided to the model to get users interact with ChatGPT, they may inadvertently share sensitive
desired outputs. Because ChatGPT has been trained on vast amounts information. For example, a medical professional may use ChatGPT
of data, it can often be fine-tuned with significantly less data than to review a patient’s health record, inadvertently exposing confi
prior models, on the order of several hundred examples, which dential patient information to the system. We note that these policies
makes it a more versatile and adaptable tool in healthcare. have been in flux over the last several months and will be different
3. Customization and integration: GPT architecture and ChatGPT can depending on the specific LLM, so users must be vigilant regarding
be integrated into existing applications and programs with relative the exact terms and conditions of the tool being used.
ease. This adaptability allows healthcare organizations to leverage 3. Transparency and Explainability: While these models can
the power of LLMs in a manner tailored to their unique needs and generate remarkable text that appears to exhibit sound reasoning,
workflows. A notable example of this integration came with Nuance the process by which they curate and select information is not always
announcing the Dragon Ambient eXperience Express, a fully auto apparent. Although LLMs can provide step-by-step reasoning using
mated clinical documentation application employing GPT-4 which natural language, there is still no definitive method for interpreting
aims to generate full clinical notes based on passive listening to the inner workings of LLMs due to their complexity, making it
patient-clinician conversations [23,25]. With the ability to integrate challenging to discern the types of knowledge, reasoning, or goals
with electronic health records, telemedicine platforms, and various employed by the model when generating certain outputs [28].
other healthcare-related software, ChatGPT and other LLMs can Furthermore, unlike predictive models trained for specific outcomes,
enhance collaboration and communication within the medical which can be assessed for performance despite their black-box na
community, streamline processes, and improve patient care. ture, the outputs of LLMs lack objective measures of accuracy when
engaged in general dialogue. This absence of clear evaluation criteria
4. Core weaknesses of LLMs in healthcare applications may hinder the adoption of performance-based assessments in favor
of transparent processes that elucidate the model’s decision-making
As new technologies make their way into clinical practice, evaluating for a given output.
the potential risks and real-world consequences for patients is crucial.
Clinicians seeking to incorporate these tools into their daily workflow, 5. Final thoughts and encouraging collaboration: A path
or discussing them with patients and families, must be aware of the key forward for clinicians
limitations and barriers associated with using such models in clinical
medicine. Most of these models, including ChatGPT, pose challenges In conclusion, integrating LLMs like ChatGPT into healthcare offers
when assessing these risks, as evaluations are primarily limited to immense potential to enhance care delivery and alleviate administrative
examining model outputs (generated responses) rather than conducting burdens. LLMs are capable of efficiently analyzing and summarizing
a comprehensive analysis of the model architecture, training data, and large volumes of language data while exhibiting great potential for rapid
other factors. Here we outline three key limitations of these models that adaptability and customization, making them invaluable assets for
are relevant to clinicians: streamlining healthcare delivery. However, clinicians must remain
vigilant of the possible risks, such as inaccurate responses and halluci
1. Accuracy: Internal factual accuracy evaluation by OpenAI was only nations, significant privacy concerns, and a lack of transparency in un
81 % when discussing scientific topics for GPT-4, up from 62 % with derstanding model training and behavior. By proactively engaging with
GPT 3.5 [8]. Few clinical studies have evaluated the accuracy of these emerging technologies and fostering a robust knowledge base,
ChatGPT, and to our knowledge, none have yet evaluated the model healthcare professionals can collaboratively work towards the safe and
powered by GPT-4. Sarraju et al. tested ChatGPT’s (GPT-3.5) effective implementation of LLMs, ultimately fostering better patient
response to simple preventive cardiology questions, and found 84 % care and driving innovation within the healthcare landscape.
accuracy with 100 % reliability (i.e. answer meaning was consistent During the preparation of this work the author(s) used ChatGPT
with repeated prompting) [17]. Ayers et al. recently tested during early-stage outlining and for concept generation. After using this
ChatGPT’s response to online questions from a public social media tool/service, the author(s) reviewed and edited the content as needed
forum (Reddit’s/AskDocs) and found that ChatGPT’s responses to and take(s) full responsibility for the content of the publication.
medical questions were more likely to be Good or Very Good in
3
R.B. Hunter et al. Intelligence-Based Medicine 8 (2023) 100114
Declaration of competing interest [17] Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropri
ateness of cardiovascular disease prevention recommendations obtained from a
popular online chat-based artificial intelligence model. JAMA 2023. https://doi.
The authors declare the following financial interests/personal re org/10.1001/jama.2023.1044. Published online February 3.
lationships which may be considered as potential competing interests: A. [18] Ayoub NF, Lee YJ, Grimm D, Balakrishnan K. Comparison between ChatGPT and
Chang is a senior editor for Intelligence Based Medicine, is founder and Google search as sources of postoperative patient instructions. JAMA Otolar
yngology–Head & Neck Surgery 2023. https://doi.org/10.1001/
medical director of the Medical Intelligence and Innovation Institute, jamaoto.2023.0704. Published online April 27.
and has founded CardioGenomic Intelligence, Artificial Intelligence in [19] Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of
Medicine (AIMed), and Medical Intelligence 10 (MI10). AI-generated medical responses: an evaluation of the chat-GPT model. Review
2023. https://doi.org/10.21203/rs.3.rs-2566942/v1.
A. Limon serves as a principal consultant at Oneirix Labs. [20] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelli
gence chatbot responses to patient questions posted to a public social media forum.
References JAMA Intern Med 2023. https://doi.org/10.1001/jamainternmed.2023.1838.
Published online April 28.
[21] Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE:
[1] What is deep learning? | microsoft azure. https://azure.microsoft.com/en-us/
potential for AI-assisted medical education using large language models. PLOS
resources/cloud-computing-dictionary/what-is-deep-learning/. [Accessed 2 May
Digital Health 2023;2(2):e0000198. https://doi.org/10.1371/journal.
2023].
pdig.0000198.
[2] What is Natural Language processing? | IBM. https://www.ibm.com/topics
[22] Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on
/natural-language-processing. [Accessed 3 May 2023].
medical challenge problems.
[3] Introduction - hugging face NLP course. https://huggingface.co/learn/nlp-c
[23] Diaz N. Epic to use Microsoft’s GPT-4 in EHRs. Published March 30, https://www.
ourse/chapter1/1. [Accessed 3 May 2023].
beckershospitalreview.com/ehrs/epic-to-use-microsofts-open-ai-in-ehrs.html.
[4] Large Language models: complete guide in 2023. https://research.aimultiple.
[Accessed 12 April 2023].
com/large-language-models/. [Accessed 4 May 2023].
[24] OpenAI API. https://platform.openai.com. [Accessed 12 April 2023].
[5] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in
[25] Landi H. Microsoft’s Nuance integrates OpenAI’s GPT-4 into voice-enabled medical
neural information processing systems, vol. 30. Curran Associates, Inc.; 2017. http
scribe software. Fierce Healthcare. Published March 21, https://www.fiercehealt
s://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91f
hcare.com/health-tech/microsofts-nuance-integrates-openais-gpt-4-medical-scri
bd053c1c4a845aa-Abstract.html. [Accessed 12 May 2023].
be-software. [Accessed 12 April 2023].
[6] Infographic: ChatGPT sprints to one million users. Statista infographics. Published
[26] ChatGPT Confirms Data Breach. Raising security concerns. Security intelligence.
January 24, https://www.statista.com/chart/29174/time-to-one-million-users.
2023. Published May 2, https://securityintelligence.com/articles/chatgpt-confirm
[Accessed 30 March 2023].
s-data-breach/. [Accessed 4 May 2023].
[7] ChatGPT sets record for fastest-growing user base - analyst note | Reuters. https:
[27] API data usage policies. https://openai.com/policies/api-data-usage-policies.
//www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-
[Accessed 4 May 2023].
analyst-note-2023-02-01/. [Accessed 30 March 2023].
[28] Bowman SR. Eight things to know about Large Language Models.
[8] GPT-4. https://openai.com/product/gpt-4. [Accessed 18 March 2023].
[9] What are tokens and how to count them? | OpenAI Help Center. https://help.open
ai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them. [Accessed 3 R. Brandon Hunter*
May 2023]. Department of Pediatrics, Division of Critical Care, Texas Children’s
[10] What is a large language model (LLM)? – TechTarget Definition. WhatIs.com.
https://www.techtarget.com/whatis/definition/large-language-model-LLM. Hospital and Baylor College of Medicine, Houston, TX, 77030, USA
[Accessed 3 May 2023].
[11] Wolfram Stephen. What is ChatGPT doing … and why does it work? Stephen Sanjiv D. Mehta
Wolfram writings. Published February 14, https://writings.stephenwolfram.com/ Department of Pediatrics, Division of Anesthesiology and Critical Care
2023/02/what-is-chatgpt-doing-and-why-does-it-work/. [Accessed 30 March Medicine, Children’s Hospital of Philadelphia and the University of
2023].
[12] Introducing ChatGPT. https://openai.com/blog/chatgpt. [Accessed 11 April Pennsylvania, Philadelphia, PA, 19104, USA
2023].
[13] Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical
Alfonso Limon
artificial intelligence. Nature 2023;616(7956):259–65. https://doi.org/10.1038/ Oneirix Labs, Carlsbad, CA, 92008, USA
s41586-023-05881-4.
[14] Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of Anthony C. Chang
ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. Department of Pediatrics, Division of Cardiology, Children’s Hospital of
J Med Syst 2023;47(1):33. https://doi.org/10.1007/s10916-023-01925-4.
Orange County, Orange, CA, 92868, USA
[15] Haman M, Školník M. Exploring the capabilities of ChatGPT in academic research
recommendation. Resuscitation 2023. https://doi.org/10.1016/j.resuscita
tion.2023.109795. 0(0). *
Corresponding author.
[16] Boßelmann CM, Leu C, Lal D. Are AI language models such as ChatGPT ready to
improve the care of individuals with epilepsy? Epilepsia. n/a(n/a). doi:10.1111/ E-mail address: [email protected] (R.B. Hunter).
epi.17570.