0% found this document useful (0 votes)
23 views6 pages

e - Assessing the Precision of Artificial Intelligence in ED Triage Decisions

This study evaluates the effectiveness of ChatGPT, based on GPT-4, in predicting triage outcomes in an emergency department setting. Conducted with 758 patients, the results showed near-perfect agreement between the triage decisions made by both the emergency team and GPT-4 compared to a gold standard, indicating GPT-4's strong predictive capabilities. The findings suggest that AI tools like GPT-4 can significantly enhance the triage process, improving patient management in emergency situations.

Uploaded by

matheustaci10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

e - Assessing the Precision of Artificial Intelligence in ED Triage Decisions

This study evaluates the effectiveness of ChatGPT, based on GPT-4, in predicting triage outcomes in an emergency department setting. Conducted with 758 patients, the results showed near-perfect agreement between the triage decisions made by both the emergency team and GPT-4 compared to a gold standard, indicating GPT-4's strong predictive capabilities. The findings suggest that AI tools like GPT-4 can significantly enhance the triage process, improving patient management in emergency situations.

Uploaded by

matheustaci10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

American Journal of Emergency Medicine 78 (2024) 170–175

Contents lists available at ScienceDirect

American Journal of Emergency Medicine

journal homepage: www.elsevier.com/locate/ajem

Assessing the precision of artificial intelligence in ED triage decisions:


Insights from a study with ChatGPT
Sinan Paslı a,⁎, Abdul Samet Şahin b, Muhammet Fatih Beşer c, Hazal Topçuoğlu d,
Metin Yadigaroğlu e, Melih İmamoğlu a
a
Karadeniz Technical University, Faculty of Medicine, Department of Emergency Medicine, Trabzon, Turkey
b
Çarşamba State Hospital, Emergency Department, Samsun, Turkey
c
Business & Decision Life Sciences, Brussels, Belgium
d
Siirt Education & Research Hospital, Department of Emergency Medicine, Siirt, Turkey
e
Samsun University, Faculty of Medicine, Department of Emergency Medicine, Samsun, Turkey

a r t i c l e i n f o a b s t r a c t

Article history: Background: The rise in emergency department presentations globally poses challenges for efficient patient man-
Received 20 October 2023 agement. To address this, various strategies aim to expedite patient management. Artificial intelligence's (AI)
Received in revised form 25 December 2023 consistent performance and rapid data interpretation extend its healthcare applications, especially in emergen-
Accepted 21 January 2024 cies. The introduction of a robust AI tool like ChatGPT, based on GPT-4 developed by OpenAI, can benefit patients
and healthcare professionals by improving the speed and accuracy of resource allocation. This study examines
Keywords:
ChatGPT's capability to predict triage outcomes based on local emergency department rules.
Artificial intelligence
ChatGPT
Methods: This study is a single-center prospective observational study. The study population consists of all pa-
Chatbot tients who presented to the emergency department with any symptoms and agreed to participate. The study
Emergency department was conducted on three non-consecutive days for a total of 72 h. Patients' chief complaints, vital parameters,
Triage medical history and the area to which they were directed by the triage team in the emergency department
were recorded. Concurrently, an emergency medicine physician inputted the same data into previously trained
GPT-4, according to local rules. According to this data, the triage decisions made by GPT-4 were recorded. In
the same process, an emergency medicine specialist determined where the patient should be directed based
on the data collected, and this decision was considered the gold standard. Accuracy rates and reliability for
directing patients to specific areas by the triage team and GPT-4 were evaluated using Cohen's kappa test. Fur-
thermore, the accuracy of the patient triage process performed by the triage team and GPT-4 was assessed by re-
ceiver operating characteristic (ROC) analysis. Statistical analysis considered a value of p < 0.05 as significant.
Results: The study was carried out on 758 patients. Among the participants, 416 (54.9%) were male and 342
(45.1%) were female. Evaluating the primary endpoints of our study - the agreement between the decisions of
the triage team, GPT-4 decisions in emergency department triage, and the gold standard - we observed almost
perfect agreement both between the triage team and the gold standard and between GPT-4 and the gold stan-
dard (Cohen's Kappa 0.893 and 0.899, respectively; p < 0.001 for each).
Conclusion: Our findings suggest GPT-4 possess outstanding predictive skills in triaging patients in an emergency
setting. GPT-4 can serve as an effective tool to support the triage process.
© 2024 Elsevier Inc. All rights reserved.

1. Introduction for these units. Various methods have been developed to manage the
high demand effectively and quickly administer the correct treatment
Emergency departments operate continuously and provide various to patients. In modern emergency services, patients are assessed in
emergency health services. The increasing number of presentations to the triage room based on their symptoms and vital signs and are di-
emergency departments worldwide has become a growing concern rected to the appropriate emergency service area. Today, several triage
systems are implemented in emergency departments, including Emer-
⁎ Corresponding author at: Karadeniz Technical University, Department of Emergency
gency Severity Index (ESI), Canadian Triage and Acuity Scale (CTAS)
Medicine, Trabzon 61080, Turkey. and Australian Triage Scale (ATS) [1–2]. The initial assessment in the
E-mail address: [email protected] (S. Paslı). triage room helps to reduce mortality and morbidity by distinguishing

https://doi.org/10.1016/j.ajem.2024.01.037
0735-6757/© 2024 Elsevier Inc. All rights reserved.
S. Paslı, A.S. Şahin, M.F. Beşer et al. American Journal of Emergency Medicine 78 (2024) 170–175

patients who require urgent intervention and those who can wait lon- emergency department routine, patients first arrive at the triage room,
ger for examination. where they are assessed by the triage team, which consists of para-
Artificial intelligence (AI) is a machine-learning process that mimics medics. During the triage assessment, the patient's complaints and
human cognitive abilities by drawing inferences from new data [4]. The medical history are queried, alongside a thorough evaluation of their
growing ability to perform tasks and produce consistent results has ex- vital signs. All collected data is then meticulously recorded in the hospi-
panded the application of AI in healthcare. The swift inference capabili- tal information management system. Upon gathering all relevant data,
ties of AI enhance its significance in emergency departments where the triage team identifies the appropriate area for the patient's follow-
rapid interpretation of clinical data is required to categorize the condi- up and treatment in the ED, and the patient is directed to that area.
tion of patients [5]. ChatGPT is a GPT-based AI model developed by The architectural structure of the Emergency Department has the fol-
OpenAI that represents a significant advance in natural language pro- lowing areas for patient care: green zone, yellow zone, red zone, trauma
cessing (NLP) [6]. This model provides a human-like experience in area and other areas (resuscitation room, ophthalmology/ear nose and
text-based communication [7]. By utilizing the data it has been trained throat room, psychiatry room and obstetrics/gynecology room). The
on, GPT-4 can generate responses to users' inquiries in a logical, coher- green zone is usually used for the least urgent cases. Green zone patients
ent, and grammatically correct manner. Furthermore, end users can have minor ailments that are not life-threatening and do not require
easily train ChatGPT through text commands (prompts). Given the long-term medical attention. Examples include mild headaches or cold
ever-growing need in the healthcare sector, introducing a fast, consis- symptoms. The yellow zone refers to moderate emergencies. These pa-
tent, and easy-to-train AI tool like GPT-4 has the potential to offer signif- tients are referred for conditions that are not serious but require prompt
icant advantages to both patients and healthcare workers. These models medical assessment and treatment. Examples include severe infections,
can quickly analyze patient symptoms, vital signs and health status by abdominal pain, and patients with malignancies. The red zone is where
using shared data to prioritize patients [8]. This can enhance the effec- the most severe and urgent patients are assessed. These patients are in
tiveness and speed of resource allocation in emergency departments. life-threatening situations that require immediate medical attention.
These potential advantages make AI tools such as GPT-4 candidates for Examples may include heart attack, stroke, or severe respiratory dis-
an important role in emergency department triage areas. tress. The trauma area is where all kinds of trauma patients are cared
The purpose of this study is to evaluate the triage prediction ability for, and procedures such as wound dressing and incision sutures are
of ChatGPT trained according to the local triage rules of the emergency performed. The Emergency Severity Index (ESI) is being used to triage
department where the study was conducted. patients in the emergency department. Nevertheless, due to the archi-
tectural structure of the emergency department, the ESI scale has been
2. Methods modified with some local rules. The architectural structure of the emer-
gency department is shown in detail in Fig. 1.
2.1. Study design, setting and population
2.2. ChatGPT and its training according to local rules
This study is a single-center prospective observational study. The
study population consists of all patients who presented to the emer- In our study, we used the ChatGPT (25 September version), a GPT-4
gency department with any symptoms and agreed to participate. based Large Language Model (LLM) developed by OpenAI. During GPT-
Following the ethical approval (protocol number 2023/182), the 4's training process, we introduced the architectural structure of the
study was conducted in the emergency department of a tertiary health emergency department and the areas for patient follow-ups. It was
center, where approximately 250 patients were presented daily. In the stated that patient triage in the emergency department is performed

Fig. 1. Architectural structure of the emergency department.

171
S. Paslı, A.S. Şahin, M.F. Beşer et al. American Journal of Emergency Medicine 78 (2024) 170–175

according to the ESI scale. Moreover, we trained GPT 4 on the local rules Symptoms” (Fig. 2). According to this information, the triage decisions
of the emergency department. These rules were related to the areas made by GPT-4 were recorded. In the same process, an emergency med-
where patients with specific symptoms should be directed during tri- icine specialist determined where the patient should be sent based on
age. For example, a patient with a foreign body in the ear/nose or a nose- the data collected by the triage team, and this decision was considered
bleed should be directed to the ear-nose-throat room in the emergency the gold standard. The gold standard decision was not shared with the
department; while patients with signs of an upper respiratory tract in- triage team, and patients continued to be directed to areas deemed ap-
fection should be directed to the relevant area according to the ESI propriate by the triage team. In addition, the emergency medicine spe-
scale. Another example is that pregnant women presenting with gyne- cialist making the gold standard decision was blind to the GPT-4
cological or obstetric symptoms should be directed to the gynecology responses.
room, while pregnant women presenting with other symptoms should
be directed to the appropriate area according to the ESI scale. 2.4. Statistical analysis

2.3. Data collection The data collected were analyzed using the IBM SPSS package (ver-
sion 25). The normal distribution conformity of the data was deter-
The study was conducted on three non-consecutive days for a total mined with the Kolmogrov-Smirnov test. In descriptive statistics,
of 72 h. Patients' presenting complaints, vital parameters and the area normally distributed numerical data were expressed as mean and stan-
to which they were directed in the emergency department were re- dard deviation, non-normally distributed data as median (minimum-
corded in the hospital system by paramedics working in the triage maximum), and nominal data as number (n) and percentage (%).
area for at least three years. Concurrently, an emergency medicine phy- Group comparisons were conducted using the Student t-test for nor-
sician inputted the following data into the ChatGPT tool: Data was for- mally distributed numerical data and the Mann-Whitney U test for
matted as “Age / Sex / Systolic Blood Pressure / Diastolic Blood non-normally distributed numerical data. The Chi-square test was
Pressure / Pulse Rate / Respiratory Rate / Body Temperature / Oxygen used for categorical data. Sensitivity, specificity, negative predictive
Saturation (SpO2) / General Condition / Medical History / Presenting value and positive predictive value were calculated for the decisions

Fig. 2. Format of the data entered in ChatGPT and how the application responds.

172
S. Paslı, A.S. Şahin, M.F. Beşer et al. American Journal of Emergency Medicine 78 (2024) 170–175

regarding the areas to which patients were directed by the triage team Table 2
and the ChatGPT tool. Accuracy rates and reliability for directing pa- Complaints of the patients presenting to the emergency department.

tients to specific areas by the triage team and GPT-4 were evaluated Chief complaints n (%)
using Cohen's kappa test. Furthermore, the accuracy of the patient triage Gastrointestinal symptoms (abdominal pain/ nausea/ vomiting/ 186 (24.6)
process performed by the triage team and GPT-4 was assessed by re- diarrhea/ constipation/ GI bleeding)
ceiver operating characteristic (ROC) analysis. Multiclass ROC analysis Trauma (Head/thorax/abdomen/spinal/extremity/ incision in any 130 (17.2)
was performed in Python™ using Scikit-learn and Matplotlib libraries. tissue)
Cardiopulmonary system-related (dyspnea/ chest pain/ palpitations) 85 (12,1)
Statistical analysis considered a value of p < 0.05 as significant.
Neurological symptom (headache/change in consciousness/loss of 84 (11.1)
strength)
3. Results Upper-lower respiratory tract disease symptom 62 (8.2)
Other 60 (7.9)
During the three-day study period, a total of 791 patients presented Musculoskeletal symptom (low back/back/neck/lumbar pain) 46 (6.1)
Fatigue 45 (5.9)
to the emergency department, 24 patients were excluded because they Extremity complaints (non-traumatic) 42 (5.5)
did not volunteer for the study and 9 patients were excluded because of Genitourinary system-related (dysuria, hematuria, scrotal pain, 40 (5.3)
incomplete data. Thus, the study was conducted with 758 patients. Of discharge)
the patients who participated in the study, 416 (54.9%) were male and Eye/nose/ear/dental complaints 39 (5.1)
Dermatological/allergy 24 (3.2)
342 (45.1%) were female. The mean age of the women was 46.84 ±
Administration of prescribed medication (IV/IM) 18 (2.4)
22.18 years, while the mean age of the men was 41.81 ± 21.08 years Traffic accident 15 (2,0)
(p < 0.001). The patients' general condition at the time of presentation Poor general condition 12 (1,6)
to the emergency department was assessed. The majority of patients, Wound dressing renewal 11 (1.4)
711 (93.8%), were in good general condition, 29 (3.8%) in fair general Fever 10 (1.3)
Syncope 10 (1.3)
condition and 18 (2.4%) in poor general condition. Hypertension was
Pregnancy-related symptoms 7 (0.9)
the most common comorbidity in the study patients 116 (15.3%), Cardiorespiratory arrest 4 (0.5)
followed by diabetes mellitus and malignancy. Table 1 presents the
patients' demographic data, vital parameters, and general condition
characteristics.
The chief complaints in patient presentations to the ED were ana- In assessing one of our study's primary endpoints — the agreement
lyzed. The most common reason for presentation was gastrointestinal between the triage team's and GPT 4's decisions in emergency depart-
complaints 186 (24.6%), followed by trauma with 130 (17.2%) ment triage, and the gold standard — we observed a near-perfect agree-
(Table 2). The distribution of patients presented to the emergency de- ment both between the triage team and the gold standard and between
partment and referred to the relevant area during the study period ac- GPT-4 and the gold standard (Cohen's Kappa 0.893 and 0.899, respec-
cording to the triage team, GPT-4 and the gold standard decision is tively; p < 0.001 for each). We conducted a receiver operating charac-
presented in Table 3. teristic (ROC) analysis to evaluate GPT 4 and the triage team's
predictive performance in emergency areas. Fig. 3 presents the results
of the multiclass ROC analysis, showing the area under the curve
Table 1 (AUC) values for each area for both methods.
Demographic characteristics, vital parameters, general conditions and comorbid diseases When assessing each area within the emergency department, the
of the patients included in the study.
data on the evaluation of both GPT-4 and the triage team in directing pa-
Gender n (%) Age (median ± standard p value tients to the appropriate area are presented in Table 4. Accordingly, we
deviation) found that both the triage team and the GPT-4 were consistent with the
Male 416 (54.9) 41.81 ± 21.08 0.001 gold standard for referring patients to the appropriate areas.
Female 342 (45.1) 46.84 ± 22.18
Vital signs Median (min-max)
4. Discussion
Body temperature (°C) 36,7 (35,9-39,2)
Systolic blood pressure (mmHg) 118 (0−220)
Diastolic blood pressure (mmHg) 72 (0–125) The main purpose of patient triage in the emergency department is
Mean blood pressure (mmHg) 87 (0–157) to accurately and effectively separate high-risk patients from others,
Pulse (beat/min) 75 (0−200) thus ensuring the most efficient use of emergency department re-
Respiratory rate (min) 15 (0−30)
SpO2 96 (0−100)
sources. This is the first study in the literature on using ChatGPT in an ac-
General condition n (%) tual patient triage setting by training it on local rules in an emergency
Good 711 (93.8) department. The outcomes indicate that the effectiveness of this AI
Fair 29 (3.8) tool in triaging patients in the emergency room is similar to that of an
Poor 18 (2.4)
Comorbidities n (%)
Hypertension 116 (15.3)
Diabetes mellitus 55 (7.3)
Malignancy 43 (5.7) Table 3
Coronary artery disease 27 (3.6) Distribution of patients who presented to the emergency department during the study
Congestive heart failure 20 (2.6) period.
Chronic kidney disease 18 (2.4)
Emergency department area Triage n (%) ChatGPT n (%) Gold standard n (%)
Asthma 16 (2.1)
Chronic obstructive pulmonary disease 13 (1.7) Green zone 443 (58,4) 410 (54,1) 422 (55,7)
Atrial fibrillation 11 (1.3) Yellow zone 80 (10,6) 88 (11,6) 90 (11,9)
Cerebrovascular disease 12 (1.6) Red zone 53 (7) 79 (10,4) 66 (8,7)
Epilepsy 5 (0.7) Trauma area 146 (19,3) 143 (18,9) 144 (19)
Panic disorder 5 (0.7) Ear/nose/throat room 7 (0,9) 8 (1,1) 8 (1,1)
Gastroesophageal reflux disease 5 (0.7) Ophtalmology room 12 (1,6) 11 (1,5) 11 (1,5)
Benign prostate hyperplasia 5 (0.7) Obstetry&gynecology room 8 (1,1) 10 (1,3) 8 (1,1)
Migraine 4 (0.5) Psychiatry room 2 (0,3) 2 (0,3) 2 (0,3)
Other 9 (1.2) Resuscitation room 7 (0,9) 7 (0,9) 7 (0,9)

173
S. Paslı, A.S. Şahin, M.F. Beşer et al. American Journal of Emergency Medicine 78 (2024) 170–175

Fig. 3. Receiver operating characteristic curve for triage team and GPT-4 for each area.

experienced triage team and gold standard decisions. This suggests that these specific rules. In another study, the efficacy of GPT-4 in making pa-
using AI to support triage teams, especially those with little experience tient admission or discharge decisions for those with metastatic pros-
can positively contribute to patient triage. Various studies in the litera- tate cancer presenting to emergency departments was contrasted
ture show that AI can be used in patient management [7,9,10]. However, against emergency physicians, revealing a sensitivity of 95.7% and spec-
there is a limited number of studies using AI in actual patient manage- ificity of 18.2% [8]. Similar to our study, this study was performed on ac-
ment in the emergency department [8,11]. The strengths of our study tual patients, but the triage decision on a specific diagnosis was
include the fact that an AI tool is trained and used according to local evaluated. Krusche et al. conducted a study comparing the diagnostic
rules, that this tool is open to end-user access, and that no additional success of AI with clinicians in distinguishing inflammatory rheumatic
software is required. A study evaluated the performance of ChatGPT in diseases from other illnesses. The findings indicate that AI is as success-
pre-established emergency department triage scenarios and showed ful as clinicians in the top three diagnoses [12]. In a study by Levine et al.,
reasonable agreement between emergency physicians and ChatGPT authors asked the AI model to classify 48 pre-generated cases into one
(Cohen's kappa: 0.341). The authors found an overall sensitivity of of 4 categories: “emergent, within one day, within 1-week, and self-
57.1% (95% confidence interval [CI]: 34–78.2) and specificity of 34.5%, care” and compared this with lay individuals and clinicians and found
which were lower than the results observed in our study [7]. This that AI was more likely to have the correct diagnosis in its top 3 for
could be because ChatGPT's evaluation was based solely on its internal 88% (95% CI, 75% to 94%) of cases than lay individuals 54% (95% CI,
data without any training process or due to version differences. In our 53% to 55%) (p < 0. 001) though it was less likely to have the correct di-
study, we found that the triage performance of GPT-4 has high sensitiv- agnosis compared to physicians 96% (95% CI, 94% to 97%) (p = 0.03).
ity and specificity for all emergency department areas. When the areas This study is similar to ours in providing highly accurate results, except
were evaluated separately, the sensitivity and compliance with the for the number of cases, and it was conducted on pre-prepared scenar-
gold standard decision in patients referred to the yellow area were rel- ios [13]. In another study, a triage of 44 cases presenting with ophthal-
atively lower than in other areas. This may be because the patient profile mic complaints was evaluated with ophthalmologists in training,
of the yellow area is wider, and unclear patients are also directed to this ChatGPT (based on GPT-4) and Bing Chat (based on GPT-4) and com-
area during patient triage. In the “other” categorization areas, including pared with two experts' decision. Triage urgency was appropriate in
the resuscitation room, ear-nose-throat / ophthalmology room, psychi- 38 (86%), 43 (98%), and 37 (84%) cases for ophthalmology trainees,
atry room, and obstetrics-gynecology room, the highest triage perfor- ChatGPT, and Bing Chat. The results indicate that AI tools such as
mance of GPT-4 was attributed to the patients' more evident ChatGPT and Bing Chat can effectively perform triage for patients with
symptoms, well-defined rules, and the prior training of GPT-4 with ophthalmic complaints [14]. Although the results are satisfactory, issues

Table 4
Predictive performance of ChatGPT and triage team versus gold standard for each emergency area.

Emergency department area sensitivity specificity ppv npv kappa p value

Green zone (Triage team) 98,34 91,66 93,67 97,77 0,906 <0.001
Green zone (ChatGPT) 94,78 97,02 97,56 93,67 0,915 <0.001
Yellow zone (Triage team) 74,4 98,05 83,75 96,60 0,762 <0.001
Yellow zone (ChatGPT) 77,77 97,30 79,54 97,01 0,758 <0.001
Red zone (Triage team) 74,24 99,42 92,45 98,00 0,809 <0.001
Red zone (ChatGPT) 92,42 97,39 77,21 99,26 0,825 <0.001
Trauma area (Triage team) 99,30 99,51 97,94 99,83 0,983 <0.001
Trauma area (ChatGPT) 98,61 99,83 99,30 99,67 0,987 <0.001
Other areas (Triage team) 94,44 99,72 94,44 99,72 0,942 <0.001
Other areas (ChatGPT) 100 99,72 94,73 99,72 0,972 <0.001

174
S. Paslı, A.S. Şahin, M.F. Beşer et al. American Journal of Emergency Medicine 78 (2024) 170–175

such as ethics, patient consent and data privacy are very important in Funding
the context of the use of artificial intelligence in medical decision mak-
ing and guidelines are needed for the application of AI technologies such No.
as ChatGPT [15]. In our study, we did not enter fully identifying patient
information into ChatGPT. However, since everything written in the CRediT authorship contribution statement
ChatGPT is recorded and can be viewed by OpenAI and its affiliates, run-
ning AI models on a local server or hosting a GPT based model on a Sinan Paslı: Writing – review & editing, Writing – original draft,
server that the health system can control seems more appropriate. Methodology, Conceptualization. Abdul Samet Şahin: Writing –
original draft. Muhammet Fatih Beşer: Writing – review & editing,
5. Limitations Visualization, Formal analysis. Hazal Topçuoğlu: Writing – original
draft. Metin Yadigaroğlu: Methodology, Data curation. Melih
The study has several limitations. The first and most obvious limita- İmamoğlu: Writing – review & editing, Writing – original draft.
tion is that although the performance of GPT-4 in emergency triage is
compared with humans, the data input to this model is performed by Declaration of competing interest
humans. In addition, since the study was conducted on only one LLM,
it does not provide information about the performance of other LLMs. The authors declare that they have no competing financial, profes-
A significant limitation is the inability of GPT-4 to remember the chat sional or personal interests that might have influenced the performance
history after a certain number of words. This means the model forgets or presentation of the work described in this manuscript.
local emergency service rules after a certain period. Considering the
data entry format and length in our study protocol, we observed that References
this limitation occurred after approximately 50 patient data. To over- [1] Baumann MR, Strout TD. Evaluation of the emergency severity index (version 3) tri-
come this problem, we reminded the triage rules of the emergency de- age algorithm in pediatric patients. Acad Emerg Med. 2005;12:219–24.
partment with one message after every 50 patient entries. Another [2] Bullard MJ, Musgrave E, Warren D, Unger B, Skeldon T, Grierson R, et al. Revisions to
the Canadian emergency department triage and acuity scale (CTAS) guidelines 2016.
important limitation is that this study was conducted according to CJEM. 2017;19:S18–27.
local emergency service rules. Therefore, the model's performance [3] Considine J, Ung L, Thomas S. Triage nurses’ decisions using the National Triage Scale
under different local procedures and rules may not be directly reflected. for Australian emergency departments. Accid Emerg Nurs. 2000;8(4):201–9.
[4] Berlyand Y, Raja AS, Dorner SC, Prabhakar AM, Sonis JD, Gottumukkala RV, et al. How
As another limitation, the term “general condition” in this study refers artificial intelligence could transform emergency department operations. Am J
to a subjective evaluation based on various parameters, including vital Emerg Med. 2018 Aug;36(8):1515–7. https://doi.org/10.1016/j.ajem.2018.01.017.
signs, symptom severity, and patient appearance. The evaluation may Epub 2018 Jan 4. PMID: 29321109.
[5] Caldarini G, Jaf S, McGarry K. A literature survey of recent advances in chatbots.
vary depending on the perspective of the person performing the evalu- Information. 2022;13(1):41.
ation. Providing this information to the GPT-4 may have affected the [6] ChatGPT: Optimizing Language Models for Dialogue. Available from: https://openai.
results. com/blog/chatgpt/; 2024. [Last accessed on 2023 Oct 10].
[7] Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an
open access natural language processing based chatbot application (ChatGPT): a
6. Conclusion preliminary, scenario-based cross-sectional study. Turkish J Emergen Med. 2023;
23(3):156.
[8] Gebrael G, Sahu KK, Chigarira B, Tripathi N, Mathew Thomas V, Sayegh N, et al. En-
Overall, the results suggest that GPT-4 can be a useful tool for triage hancing triage efficiency and accuracy in emergency rooms for patients with metas-
in emergency departments, especially where the triage team may have tatic prostate cancer: a retrospective analysis of artificial intelligence-assisted triage
limited resources or expertise. However, it is important to note that using ChatGPT 4.0. Cancers. 2023;15(14):3717.
[9] Razzaki S, Baker A, Perov Y, Middleton K, Baxter J, Mullarkey D, et al. A comparative
GPT-4 is not a replacement for the triage team, and its use should be study of artificial intelligence and human doctors for the purpose of triage and diag-
considered as a complementary tool to support the triage process. Fur- nosis. arXiv preprint. 2018. arXiv:1806.10698.
ther studies are needed to validate the results of this study and to inves- [10] Weisberg EM, Chu LC, Fishman EK. The first use of artificial intelligence (AI) in the
ER: triage not diagnosis. Emerg Radiol. 2020;27:361–6.
tigate the feasibility and effectiveness of implementing GPT based [11] Farahmand S, Shabestari O, Pakrah M, Hossein-Nejad H, Arbab M, Bagheri-Hariri S.
applications in different emergency settings. Artificial intelligence-based triage for patients with acute abdominal pain in emer-
gency department; a diagnostic accuracy study. Adv J Emergen Med. 2017;1(1).
[12] Krusche M, Callhoff J, Knitza J, Ruffer N. Diagnostic accuracy of a large language
Presented at a meeting model in rheumatology: comparison of physician and ChatGPT-4. Rheumatol Int.
2023:1–4.
[13] Levine DM, Tuwani R, Kompa B, Varma A, Finlayson SG, Mehrotra A, et al. The diag-
No. nostic and triage accuracy of the GPT-3 artificial intelligence model. medRxiv. 2023.
2023–01.
[14] Lyons RJ, Arepalli SR, Fromal O, Choi JD, Jain N. Artificial intelligence chatbot perfor-
Grant
mance in triage of ophthalmic conditions. Can J Ophthalmol. 2023.
[15] Editorials N. Tools such as ChatGPT threaten transparent science; here are our
No. ground rules for their use. Nature. 2023;613(612):10–1038.

175

You might also like