0% found this document useful (0 votes)
79 views

Artificial Intelligence Based Chatbots For Promoting Health Behavioral Changes - Systematic Review

This systematic review evaluated 15 empirical studies on the use of AI chatbots to promote health behavior changes. Several studies found AI chatbots were effective in promoting healthy lifestyles, smoking cessation, treatment adherence, and reducing substance use. However, results on feasibility, acceptability and usability were mixed. Chatbots provided personalized support through goal setting, monitoring, feedback and on-demand assistance. Future research needs more rigorous randomized controlled trials to establish definitive conclusions on chatbot efficacy and assess their potential to support behavior changes at scale.

Uploaded by

mgw backupdata
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Artificial Intelligence Based Chatbots For Promoting Health Behavioral Changes - Systematic Review

This systematic review evaluated 15 empirical studies on the use of AI chatbots to promote health behavior changes. Several studies found AI chatbots were effective in promoting healthy lifestyles, smoking cessation, treatment adherence, and reducing substance use. However, results on feasibility, acceptability and usability were mixed. Chatbots provided personalized support through goal setting, monitoring, feedback and on-demand assistance. Future research needs more rigorous randomized controlled trials to establish definitive conclusions on chatbot efficacy and assess their potential to support behavior changes at scale.

Uploaded by

mgw backupdata
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

Review

Artificial Intelligence–Based Chatbots for Promoting Health


Behavioral Changes: Systematic Review

Abhishek Aggarwal1,2, BA; Cheuk Chi Tam1,2, PhD; Dezhi Wu3,4, PhD; Xiaoming Li1,2,3, PhD; Shan Qiao1,2,3, PhD
1
Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, United States
2
SC SmartState Center for Healthcare Quality (CHQ), University of South Carolina, Columbia, SC, United States
3
UofSC Big Data Health Science Center (BDHSC), University of South Carolina, Columbia, SC, United States
4
Department of Integrated Information Technology, College of Engineering and Computing, University of South Carolina, Columbia, SC, United States

Corresponding Author:
Shan Qiao, PhD
Department of Health Promotion, Education and Behavior
Arnold School of Public Health
University of South Carolina
915 Greene St
Columbia, SC, 29201
United States
Phone: 1 803 777 6844
Email: [email protected]

Abstract
Background: Artificial intelligence (AI)–based chatbots can offer personalized, engaging, and on-demand health promotion
interventions.
Objective: The aim of this systematic review was to evaluate the feasibility, efficacy, and intervention characteristics of AI
chatbots for promoting health behavior change.
Methods: A comprehensive search was conducted in 7 bibliographic databases (PubMed, IEEE Xplore, ACM Digital Library,
PsycINFO, Web of Science, Embase, and JMIR publications) for empirical articles published from 1980 to 2022 that evaluated
the feasibility or efficacy of AI chatbots for behavior change. The screening, extraction, and analysis of the identified articles
were performed by following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.
Results: Of the 15 included studies, several demonstrated the high efficacy of AI chatbots in promoting healthy lifestyles (n=6,
40%), smoking cessation (n=4, 27%), treatment or medication adherence (n=2, 13%), and reduction in substance misuse (n=1,
7%). However, there were mixed results regarding feasibility, acceptability, and usability. Selected behavior change theories and
expert consultation were used to develop the behavior change strategies of AI chatbots, including goal setting, monitoring,
real-time reinforcement or feedback, and on-demand support. Real-time user-chatbot interaction data, such as user preferences
and behavioral performance, were collected on the chatbot platform to identify ways of providing personalized services. The AI
chatbots demonstrated potential for scalability by deployment through accessible devices and platforms (eg, smartphones and
Facebook Messenger). The participants also reported that AI chatbots offered a nonjudgmental space for communicating sensitive
information. However, the reported results need to be interpreted with caution because of the moderate to high risk of internal
validity, insufficient description of AI techniques, and limitation for generalizability.
Conclusions: AI chatbots have demonstrated the efficacy of health behavior change interventions among large and diverse
populations; however, future studies need to adopt robust randomized control trials to establish definitive conclusions.

(J Med Internet Res 2023;25:e40789) doi: 10.2196/40789

KEYWORDS
chatbot; artificial intelligence; AI; health behavior change; engagement; efficacy; intervention; feasibility; usability; acceptability;
mobile phone

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 1


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

1 described the technical architectures and characteristics of the


Introduction AI chatbots used in chronic conditions [14].
Background Given the merits of AI chatbots in health promotion, recent
Artificial intelligence (AI)–driven chatbots (AI chatbots) are literature has paid increasing attention to the use of AI chatbots
conversational agents that mimic human interaction through for health behavior changes. Oh et al [2] conducted a systematic
written, oral, and visual forms of communication with a user review that assessed the efficacy of AI chatbots for lifestyle
[1,2]. With the increased access to technological devices (eg, modification (eg, physical activity, diet, and weight
smartphones and computers) and the internet, AI chatbots offer management). However, the scope and inclusion criteria of this
the potential to provide accessible, autonomous, and engaging review had several limitations. First, this review did not
health-related information and services, which can be promising distinguish between AI-driven chatbots and other chatbots. For
for technology-facilitated interventions. The existing digital example, the AI chatbots that performed rule-based or
therapeutic and telehealth interventions with didactic constrained conversation were included. Second, the selected
components, which enable health care providers to communicate studies targeted only a limited set of behaviors, including
with patients via digital platforms (eg, email and video call), physical activity, diet, and weight management. Third, this
have encountered several challenges, including relatively low review did not cover all platforms that could possibly deploy
adherence, unsustainability, and inflexibility [3,4]. AI chatbots AI chatbots, the emerging technology platforms. For example,
offer the flexibility of on-demand support, personalized support this review excluded the AI chatbots that were integrated into
and content, and consistent connectivity (sustainability), virtual reality, augmented reality, embodied agents, and
contributing to addressing the shortfalls of telehealth services. therapeutic robots. Therefore, to provide a state-of-the-art
The overall conversational flexibility offered by AI chatbots in understanding of AI chatbots for promoting health behavior
terms of communicating at anytime from anywhere offers a safe changes, we were motivated to conduct a systematic review that
space to facilitate interactions with patients who feel or covers the latest developments in AI chatbots, namely them
experience stigmatization while seeking health care services being integrated into diverse devices (robots, smartphones, and
[5]. computers) and diverse platforms (messenger and SMS text
message), them performing “unconstrained” conversations, and
AI chatbots demonstrate their potential for effective behavior them targeting a wide range of behavioral outcomes (smoking
change through key steps of data processing in health-related cessation, treatment or medication adherence, healthy lifestyle,
conversations: data input, data analysis, and data output. First, and related health behavior domains). As such, this study aimed
AI chatbots can collect data sets from diverse sources: electronic to provide critical evaluations of published empirical studies
health records, unstructured clinical notes, real-time that describe AI chatbots’ intervention characteristics,
physiological data points using additional sensors components, or functionality and investigate their feasibility
(eye-movement tracking, facial recognition, movement tracking, and efficacy in promoting a wide range of healthy behaviors on
and heartbeat), and user interactions [5,6]. Second, the AI traditional and emerging platforms.
algorithm uses machine learning (ML) and natural language
processing (NLP) techniques to identify clinically meaningful Methods
patterns and understand user needs [7]. Third, AI chatbots can
mimic real-life human support by offering services that can Data Sources and Search Algorithms
assist users in achieving their health behavior goals [6]. Overall,
The study protocol of this systematic literature review followed
by acknowledging user needs, demonstrating understanding,
the PRISMA (Preferred Reporting Items for Systematic Reviews
and delivering timely services tailored to user preferences (eg,
and Meta-Analyses) guidelines [15] in each step. A
goal setting, behavioral monitoring, and information or
comprehensive search was conducted in June 2022 by 3 authors
knowledge provision), AI chatbots have the potential to
(CCT, SQ, and AA) in 7 bibliographic databases, namely
effectively deliver interventions that promote diverse health
PubMed, IEEE Xplore, ACM Digital Library, PsycINFO, Web
behaviors (eg, smoking cessation, physical activity, and
of Science, Embase, and JMIR publications.
medication adherence). AI chatbots can also be integrated into
embodied functions (eg, virtual reality) that offer additional The search was conducted using a combination of various
benefits, such as an immersive experience, which can catalyze keywords from 3 categories. The first category comprised
the process of health behavior change [8]. keywords related to AI-based chatbot, including chatbot,
chatterbot, chatter robot, artificial intelligence, conversational
Prior Work AI, conversational agency, virtual agent, conversational agents,
In the past decade, evidence regarding the feasibility and and bot. The second category was related to health behaviors
efficacy of AI chatbots in delivering health care services has and included the keywords health promotion, health behaviors,
focused on different health contexts and technological behavior change, substance use, alcohol use, drinking, cigarette
perspectives, and most of these chatbots aim to improve mental use, smoking, drug abuse, drug use disorder, risk behaviors,
health outcomes. Of the extant systematic reviews on AI lifestyle, exercise, nutrition behavior, sleep, adherence, body
chatbots, 6 articles targeted at assessing efficacy of AI-chatbots weight, physical activity, diet, risky behaviors, healthcare
in enhancing mental health outcomes [1,7,9-12], 2 examined seeking behaviors, prescribed medical treatment, tobacco use,
the feasibility of AI-chatbots in health care settings [8,13], and and vaping. The third category focused on intervention study
and included 1 keyword: intervention.

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 2


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

Keywords were organized using the following approaches: (1) as ML, deep learning, natural language understanding, and NLP;
keywords within one category were lined using the OR operator (4) studies reporting qualitative or quantitative results on
(eg, chatbot OR conversational AI), and (2) keywords across interventions; and (5) English articles published from 1980 to
different categories were connected using the AND operator 2022 (as of June 2, 2022). Articles were excluded if they were
(eg, chatbot AND health behaviors AND intervention; (1) not full-text empirical studies (eg, conference abstracts or
Multimedia Appendix 1). proposals); (2) intervention studies with chatbots based on
non-AI methods, such as the rule-based approach; (3) studies
Inclusion and Exclusion Criteria that did not clarify their AI algorithms; or (4) studies that
This review selected empirical studies on health behavior focused only on mental health and not on health behaviors.
interventions applying AI-based chatbot techniques according
to the following inclusion criteria: (1) intervention research A total of 1961 articles were initially retrieved and screened
focusing on health behaviors; (2) empirical studies using based on these criteria. Finally, 15 articles met the inclusion
chatbots; (3) chatbots developed upon existing AI platforms criteria and were selected for this review (Figure 1).
(eg, IBM Watson Assistant [IBM Corp]) or AI algorithms, such Disagreements in selection were resolved through team
discussions.
Figure 1. Eligibility screening process. AI: artificial intelligence.

of the intervention, that is, the actual use of the intervention and
Data Extraction and Quality Assessment whether the intervention is doable in a certain setting [16]. For
Several summary tables were used to extract information from example, the number of messages exchanged with the chatbot
the selected articles, including study characteristics (ie, author, and the engagement rate of the participants. Acceptability was
publication year, study design, participants, age of the sample, defined as the quality of user experience with the AI chatbot
sample size, country, and target health behaviors), chatbot-based [17], for example, the satisfaction score or number of likes to
intervention features (ie, chatbot types, chatbot components or the interaction with the AI chatbot. Usability was defined as
functionality, settings, existing AI technology, input data the level of contribution by the intervention to achieve the
sources, platform, theoretical foundation, and AI algorithms), prespecified goals by users [18], such as the usability of the
and intervention outcomes (ie, health behavioral outcomes or content provided by the AI chatbot in achieving health behavior
primary outcomes, feasibility, usability, acceptability, and goals.
engagement).
Quality assessment of selected studies was performed in
Feasibility, acceptability, and usability did not have a consistent accordance with the National Institutes of Health’s quality
definition across the studies. Therefore, for the ease of assessment tool for controlled intervention studies [19]. This
comprehension and systematic representation, the authors assessment tool suggests an evaluation of 6 types of bias risks.
categorized the data on feasibility, acceptability, and usability Specifically, (1) the risk of reporting outcomes based on ad hoc
based on their definitions. Feasibility was defined as the demand analyses was assessed based on prespecified outcomes; (2) the

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 3


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

risk of bias in the randomization process was assessed based in the following years: 2 (13%) from 2021, 3 (20%) from 2020,
on randomized treatment allocation, concealment of allocation 6 (40%) from 2019, and 1 (7%) each from 2018, 2017, 2013,
sequence (blinding), and similarity of groups at baselines; (3) and 2011. Out of the 15 studies, 13 (87%) reported their
the risk of bias caused by deviations from the intended geographical locations. All 13 studies were distributed across
interventions was assessed based on concealment of the assigned low-income countries, with 4 (31%) from the United States, 2
interventions from the participants, implementors, and (15%) from Australia, and 1 (8%) from each remaining country
evaluators; (4) the risk of outcomes from unintended sources (ie, South Korea, Spain, the United Kingdom, Japan, France,
was assessed based on measures to avoid the influence of other Switzerland, and the Netherlands). The sample size in the studies
interventions and fidelity to the intervention protocol; (5) the varied from 20 to 99,217, with a median of 116 and mean of
risk of bias in the measurement of the outcomes was assessed approximately 7224 (SD 25,495.82) participants. Overall, 40%
based on concealment of assigned intervention from evaluators (6/15) of studies had >200 participants, followed by 27% (4/15)
and validity and reliability of outcome measures; and (6) the of studies with 100 to 200 participants, 13% (2/15) of studies
risk of bias in analysis was assessed based on dropout rate, with 50 to 100 participants, and 20% (3/15) of studies with <50
power calculation, and intent-to-treat analysis. Apart from the participants.
assessment of each risk type across studies, all studies were
Out of the 14 studies that reported the mean age of the
rated on the following scale: 1=compliant, 0=not clear, 0=not
participants, most had adult participants aged 18 to 30 years
compliant, and 0=not reported or not applicable. A total score
(n=2, 14%), 30 to 40 years (n=3, 21%), 40 to 50 years (n=5,
was calculated for each study.
36%), 50 to 60 years (n=1, 7%), and >60 years (n=1, 7%), with
AI techniques specific to AI chatbot interventions were also only 2 (14%) studies having participants aged <18 years. The
appraised using the CONSORT-AI (Consolidated Standards of selected studies included participants with diverse preexisting
Reporting Trials–Artificial Intelligence) extension guidance for conditions: individuals with lower physical exercise and healthy
AI studies [20]. We used a checklist of four domains: (1) diet levels (4/15, 27%), smokers (4/15, 27%), patients with
whether the rationale for using AI was specified through the obesity (2/15, 13%), patients with breast cancer (1/15, 7%),
use of AI in the context of the clinical pathway, (2) whether the patients with substance use disorder (1/15, 7%), the general
inclusion and exclusion criteria at the level of the input data population (2/15, 13%), and Medicare recipients (1/15, 7%).
and the description of the approaches to handle unavailable The target health behaviors of the reviewed studies included
input data were specified, (3) whether the input data acquisition promotion of a healthy lifestyle (physical exercise and diet;
processes and the specifications of human-AI interaction in the 5/15, 33%), smoking cessation (4/15, 27%), treatment or
collection of input data were described, and (4) whether the medication adherence (3/15, 20%), and reducing problematic
output of the AI algorithm and its significance in the context of substance use (1/15, 7%). Only 27% (4/15) of studies used
the studies’ outcomes were described. The data extraction and randomized control trials (RCTs), and most of the studies (9/15,
quality assessment were conducted by 2 authors, CCT and AA, 60%) adopted a quasiexperimental design (ie, pre- and posttests)
independently. All disagreements were resolved through with no control group, followed by 7% (1/15) of studies with a
sufficient discussions among CCT, AA, and SQ. cross-sectional design and 7% (1/15) of studies with a
postexperimental research method.
Results
Characteristics of the Reviewed Studies
The characteristics of the reviewed studies are summarized in
Table 1. The included journal articles (N=15) were published

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 4


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

Table 1. Characteristics of the reviewed studies (N=15).


Study Study design Participants Average (SD) Sample size Country Target health behav-
or median age iors or purposes
(years)
Piao et al [21] RCTa Office workers 35 • N=106 South Korea Healthy lifestyle
• n=57 (intervention (physical activity)
group)
• n=49 (control
group)

Maher et al Pre-post studyb Australians who did not 56.2 (SD 8) • N=31 Australia Healthy lifestyle
[22] meet Australia’s physical (physical activity
activity guidelines and not and healthy diet)
follow a Mediterranean di-
etary pattern
Carrasco-Her- RCT Smokers at an outpatient 49.655 • N=240 Spain Smoking cessation
nandez et al clinic • n=120 (interven-
[23] tion: chatbot +
pharmaceutical
treatment)
• n=120 (control:
pharmaceutical
treatment)

Stephens et al Pre-post studyb Youths with obesity 15.20 • N=23 The United Treatment adherence
[6] symptoms at a children’s States (obesity)
health care system
Perski et al RCT Smokers who purchased N/Ac • N=6111 The United Smoking cessation
[24] the Smoke Free app • n=1061 (interven- Kingdom
tion: chatbot +
Smoke Free app)
• n=5050 (control:
Smoke Free app)

Masaki et al Pre-post studyb Adult smokers with nico- 43.5 (SD 10.5) • N=55 Japan Smoking cessation
[25] tine dependence
Chaix et al Pre-post studyb Patients with breast cancer 48 • N=958 France Medication adher-
[26] ence
Calvaresi et al Pre-post studyb Smokers from Facebook N/A • N=270 Switzerland Smoking cessation
[27] communities
Galvão Qualitative study Volunteers from School of 23 • N=20 The United Healthy lifestyle
Gomes da Sil- Psychology’s pool Kingdom (physical activity)
va et al [5]
Stein and Pre-post studyb Adults with overweight 46.9 (SD 1.89) • N=70 The United Healthy lifestyle
Brooks [28] and obesity (BMI ≥25) States (weight loss, healthy
dietary, physical ac-
tivity, and healthy
sleep duration)
Crutzen et al Pre-post studyb Adolescents interested in 15 • N=920 The Nether- Healthy lifestyle
[29] the intervention lands
Brar Prayaga Cross-sectional Medicare recipients Median 71 • N=99,217 The United Medication adher-
et al [30] study (poststudy) States ence
Prochaska et Pre-post studyb American adults screened 36.8 (SD 10) • N=101 The United Reducing problemat-
al [31] positive for substance mis- States ic substance use
use
To et al [32] Quasiexperimental Individuals who were inac- 49.1 (SD 9.3) • N=116 Australia Healthy lifestyle
design without a tive (<20 min per day of (physical activity)
control group moderate-to-vigorous
physical activity)

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 5


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

Study Study design Participants Average (SD) Sample size Country Target health behav-
or median age iors or purposes
(years)
Bickmore et al RCT (4-arm) Individuals in precontem- 33 (SD 12.6) • N=122 NRd Healthy lifestyle
[33] plation or contemplation (physical activity
stages of change with re- and healthy diet)
spect to moderate-or-
greater intensity physical
activity or consumption of
fruits and vegetables

a
RCT: randomized controlled trials.
b
Pre-post studies had no control group.
c
N/A: not applicable.
d
NR: not reported.

estimate a sample size that can detect a significant difference


Intervention Study Quality Assessment in the primary outcomes. Third, only 40% (6/15) of studies
The results of the quality assessment are presented in adopted an intent-to-treat analysis.
Multimedia Appendix 2 [5,6,21-33]. The risk of reporting
outcomes was low, as all the studies prespecified their outcomes Among the 4 RCT studies, the study by Carrasco-Hernandez
and hypotheses. The risk of bias in the randomization process et al [23] reported the highest compliance relatively (8/12, 67%),
was low. All 27% (4/15) of RCTs adopted appropriate followed closely by the study by Piao et al [21] (7/12, 58%).
randomized treatment allocation and reported concealment of The study by Perski et al [24] reported compliance on only 50%
allocation sequence from the participants, and 75% (3/4) of (6/12) of the factors, closely followed by the study by Bickmore
them established similarity of groups at the baseline. The et al [33] (5/12, 42%). All 4 RCTs did not report concealment
non-RCT studies (11/15, 73%) were not applicable for the of assigned intervention, efforts to avoid other intervention, and
assessment of the randomization process. adherence measures, and none of the RCTs were compliant
with the dropout rates. Among the non-RCT studies, the studies
Risk of bias of deviations from the intended interventions was by Maher et al [22] and Brar Prayaga et al [30] reported the
considered low to moderate. None of the included studies highest compliance (5/9, 56%), followed closely by the studies
(N=15) reported concealment of the assigned interventions from by Masaki et al [25], Prochaska et al [31], and To et al [32] (4/9,
the facilitators, evaluators, and participants, mainly because 44%). Chaix et al [26] reported compliance on only 33% (3/9)
concealment from the persons providing and receiving of the factors, followed by the studies by Stein and Brooks [28]
behavioral, lifestyle, or surgical interventions is difficult [19]. and Crutzen et al [29] (2/9, 22%). The remaining studies, that
Risk of outcomes from unintended sources was high. First, none is, the studies by Stephens et al [6], Calvaresi et al [27], and
of the studies reported any explicit measures to avoid the Galvão Gomes da Silva et al [5], were complaint on only 11%
influence of other interventions on the outcomes or the existing (1/9) of the factors. Please note that the results of studies with
intervention. In the case of RCTs (4/15, 27%), this bias was <40% compliance need to be interpreted with caution.
minimized because of the experimental setting of the
interventions; however, for non-RCT studies (11/14, 73%), AI Quality Assessment
there was a high risk of bias owing to the potential effect of The AI component of the chatbots was evaluated to demonstrate
confounding variables. Second, most studies (13/15, 87%) did AI’s impact on health outcomes (Multimedia Appendix 3
not report whether participants adhered to the intervention [5,6,21-33]). Rationale for using AI was prespecified in all the
protocols. studies (N=15). Characteristics and handling of the input data
for AI were described in only 7% (1/15) of studies. Input data
Risk of bias in the measurement of the outcomes was moderate.
acquisition processes for AI were mentioned in 87% (13/15) of
First, none of the studies reported whether the assigned
studies. Specifications of the human-AI interaction were reported
intervention was concealed from the evaluators. Second, 60%
in the collection of input data in most of the studies (9/15, 60%).
(9/15) of studies reported the reliability and validity of the
The output ofAI algorithms and its significance in context of
outcome measures. In the remaining studies (6/15, 40%), the
the studies’outcomes were described in 87% (13/15) of studies.
reliability and validity of the outcome measures were either not
In conclusion, there was sufficient description for all factors,
clear (3/6, 50%) or not reported (3/6, 50%). Risk of bias in
except for the input data characteristics and handling of
analysis was moderate to high. First, studies with a ≥15%
unavailable input data.
differential dropout rate between groups and ≥20% dropout rate
for the intervention or control group were considered to have a Outcomes of the Reviewed Studies
high dropout rate [19]. Only 33% (5/15) of studies had a lower
dropout rate than the cutoff limits, 20% (3/15) of studies did Efficacy
not report the dropout rate, and 47% (7/15) of studies had a Quantitative Studies: Healthy Lifestyle
higher dropout rate than the cutoff limits. Second, only 33%
Out of the 15 studies, 7 (47%) studies [5,21,22,28,29,32,33]
(5/15) of studies reported the use of power calculation to
targeted healthy lifestyles, and 5 (33%) studies [21,22,28,32,33]

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 6


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

assessed the efficacy of AI chatbots in promoting healthy abstinence (exhaled carbon monoxide and urine cotinine test)
lifestyles through (1) physical activity levels, (2) healthy diet, was 2.15 times (P=.02) higher in the intervention group than in
(3) blood pressure, and (4) BMI. First, 80% (4/5) of studies [33] the control group. However, none of the secondary clinical
reported an increase in physical activity. Stein and Brooks [28] measures (health-related quality of life, healthy lifestyle, and
reported that the increase in physical activity led to an average physical activity) showed any differences between the groups.
weight loss of 2.38% in 75.7% of the users (n=53). Maher et al In conclusion, there was evidence indicating significant
[22] reported an increase in physical activity by 109.8 minutes long-term and short-term effects of chatbot-based interventions
(P=.005) and a decrease in the average weight and waist on smoking cessation.
circumference by 1.3 kg (P=.01) and 2.1 cm (P=.003),
Quantitative Studies: Substance Misuse
respectively. Piao et al [21] reported significant between-group
differences in the Self-Report Habit Index when controlled for Out of the 15 studies, only 1 (7%) study [31] aimed at reducing
intrinsic reward via chatbot enables app (P=.008). To et al [32] problematic substance use. Prochaska et al [31] reported a
reported that the participants recorded more steps (P<.01) and significant increase in the confidence to resist urges to use
more total physical activity (3.58 times higher; P<.001). substances (mean score change +16.9, SD 21.4; P<.001) and a
Moreover, the participants were also more likely to meet the significant decrease in the following: substance use occasions
physical activity guidelines (95% CI 3.31-12.27) at follow-up. (mean change −9.3, SD 14.1; P<.001) and the scores of Alcohol
However, only Bickmore et al [33] reported no significant Use Disorders Identification Test-Concise (mean change −1.3,
differences among the conditions in the International Physical SD 2.6; P<.001), 10-item Drug Abuse Screening Test (mean
Activity Questionnaire (P=.37). change −1.2, SD 2.0; P<.001), Patient Health Questionnaire-8
item (mean change 2.1, SD 5.2; P=.005), Generalized Anxiety
Second, 20% (3/15) of studies [22,28,33] reported an Disorder-7 (mean change 2.3, SD 4.7; P=.001), and cravings
improvement in diet. Stein and Brooks [28] reported that the scale (68.6% vs 47.1% moderate to extreme; P=.01).
percentage of healthy meals increased by 31% and the
percentage of unhealthy meals decreased by 54%. Maher et al Quantitative Studies: Treatment or Medication Adherence
[22] reported an increase in the mean of Mediterranean diet Out of the 15 studies, 3 (20%) studies [6,26,30] targeted
(healthy meal) scores by 5.7 points (P<.001). Bickmore et al medication or treatment adherence, but only 2 (67%) of these
[33] reported that the group with only diet-related intervention studies [26,30] reported the efficacy of AI chatbots in increasing
consumed significantly more fruits and vegetables than the treatment or medication adherence through timely and
groups which received only physical activity intervention or personalized reminders. Brar Prayaga et al [30] reported that
both physical activity and diet intervention (P=.005); however, out of the total refill reminders (n=273,356), 17.4% (n=47,552)
there were no significant differences among different groups resulted in actual refill requests. Furthermore, 54.81%
for weight (P=.37). Third, Maher et al [22] assessed blood (26,062/47,552) of those requests resulted in medications being
pressure level after intervention as a secondary outcome; actually refilled within 2 hours. Chaix et al [26] reported that
however, the mean improvement in systolic blood pressure (-0.2 the average medication adherence rate improved by more than
mmHg; P=.90) and diastolic blood pressure (−1.0 mmHg; 20% in 4 weeks (P=.40) through the prescription reminder
P=.54) were not significant. Fourth, only To et al [32] reported feature. In conclusion, there was evidence indicating a
that the decrease in BMI was not significant (95% CI −0.37 to significant increase in medication adherence rate through chatbot
0.11). In conclusion, there were significant differences in the use; however, cultural differences were observed in chatbot use.
primary outcomes of interest (physical activity level and healthy
Qualitative Study: Healthy Lifestyle
diet) in all studies aimed at improving healthy lifestyles.
Only one study conducted a qualitative analysis, that is, the
Quantitative Studies: Smoking Cessation study by Galvão Gomes da Silva et al [5]. This study reported
Out of the 15 studies, 4 (27%) studies [23-25,27] assessed the that NAO, a social robot, enhanced immediate motivation
efficacy of AI chatbots in smoking cessation. Perski et al [24] toward activities such as meeting friends and families and
reported that the intervention group had 2.44 times greater odds increasing willpower through mindfulness techniques. The
of abstinence at the 1-month follow-up than the control group participants also reported that they felt more self-aware and
(P<.001). Masaki et al [25] reported that the overall continuous were open to sharing their goals with others. However, the study
abstinence rate results (76%, 12 weeks; 64%, 24 weeks; and reported mixed results regarding the achievement of physical
58%, 52 weeks) were better than the results of the outpatient activity goals.
clinic (calculated through the national survey) and the
varenicline (medication for smoking cessation) phase 3 trial in
Feasibility
the United States and Japan. Masaki et al [25] reported a The outcomes of the selected studies are reported in Multimedia
decrease in social nicotine dependence (mean −6.7, SD 5.2), Appendix 4 [5,6,21-33]. Out of the 15 studies, 11 (73%) reported
tobacco craving (mean −0.6, SD 1.5), and withdrawal symptoms the feasibility of AI chatbots in terms of (1) safety [22] (ie, no
(mean −6.4, SD 5.8), their secondary outcomes. Calvaresi et al adverse events were reported), (2) messages exchanged with
[27] reported that 28.9% of the participants completed their the chatbot [6,26,29,31,32], (3) retention rate [22,26], and (4)
smoking cessation goal 3 months after the last cigarette. This duration of engagement. Only 7% (1/15) of studies [22] reported
result was 10% higher than that of the previous edition of the the chatbot’s safety in terms of the absence of adverse events.
smoking cessation program, which did not include chatbot Many studies reported the total number of messages exchanged
support. Carrasco-Hernandez et al [23] reported that smoking with the chatbot (5/15, 33%) [6,26,29,31,32]; however, only

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 7


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

7% (1/15) of studies reported the exact proportion of participants’ smartphone skills, platform’s user interface, and
user-initiated conversations (approximately 30%) [6], which cultural sensitivity in the chatbot’s design. One study reported
depicted the participants’ level of interest in having that chatbots were used to offer outside-office support to the
health-related conversations with the chatbot. Few studies participants, demonstrating the potential of AI chatbots to offer
[22,23,26] reported variability in engagement and retention rate sustainable and continuous support [6]. Most of the studies
across study durations. Overall, 7% (1/15) of studies reported (9/15, 60%) demonstrated the usability of the content shared
a gradual decrease in the retention rate (users who sent at least by the chatbot through self-report measures and the number of
1 message per month for over 8 months)—from 72% (second times chatbot services were used. Generally, the content was
month) to 31% (eighth month) [26]. Similarly, another study considered reliable, concise, of high quality, and easy to
reported that engagement was highest at the first month and understand. Some studies reported high scores, on average, for
reduced gradually, becoming lowest at the 12th month [23]. personalized messages and diverse information. Some studies
Similarly, another study [22] reported a decrease in check-ins reported the need to remove ambiguity in the content. Masaki
by 20% midprogram, followed by an increase to 70% in the et al [25] reported the number of calls made to the AI nurse to
final week. Overall, there was strong evidence of decrease in seek assistance for smoking impulses or side effects (mean 1.7
engagement with the chatbot over time. It is important to note times, SD 2.4), demonstrating the need for AI chatbot at a
that there was inconsistency in terms of engagement metrics critical time. Overall, the quality of the recommendations
across different studies. Overall, there was very less evidence provided by AI chatbots can be further improved to make them
on the safety of chatbots and some evidence on the feasibility more feasible for the participants to implement, along with
of chatbots in terms of the total and mean number of improvements in the design of the user interface. In 7% (1/15)
health-related messages exchanged; however, there are no of studies [32], most participants reported technical issues in
defined thresholds to determine whether the number of messages using the chatbot (82.3%), one of the reasons being that they
exchanged demonstrates feasibility. It was also interesting to stopped receiving the chatbot messages during the study period
note that in one of the studies (7%), the engagement rate (84.1%). In conclusion, although the chatbots effectively offered
decreased over time but increased at the end [22]. outside-office support, the ease of using the chatbot and the
usability of the content need to be further improved by providing
Acceptability credible and doable recommendations in a user-friendly design
Out of the 15 studies, 7 (47%) reported acceptability and interface.
engagement of AI chatbots in terms of (1) satisfaction and (2)
provision of a nonjudgmental safe space. In the case of Chatbot Intervention Characteristics
satisfaction, 7% (1/15) of studies reported that approximately Behavior Change Theories and Chatbot Functionality
one-quarter of the participants liked the messages [32], and
The chatbot intervention characteristics are summarized in
another (7%) reported that the satisfaction of the participants
Multimedia Appendix 5 [5,6,21-33]. In more than half of the
with the web-based agent was above average [33]. In 7% (1/15)
studies (9/15, 60%), the AI chatbots’ content, features, and
of studies, only one-third of the participants reported the desire
interface were designed based on a theory. Each study critically
to use the chatbot in the future [32], and in another study (7%),
selected theories based on the intervention goals and target
on average, the participants reported a below-average desire to
beneficiaries. The cognitive behavioral therapy (CBT) was used
continue with the agent in the future [33]. Similarly, another
in Tess [6], Lark Health Coach (HCAI) [28], and Woebot [31]
study (7%) reported that the participants liked the chatbot’s
to devise strategies that enhance self-efficacy and sustain
advice one-third of the times [25]. Only 7% (1/15) of studies
behavior change. In Tess [6], CBT was clubbed with the theory
[26] reported high user satisfaction (93.95%). Overall, the
of emotionally focused therapy and motivational interviewing
proportion of satisfied participants or the overall satisfaction
to assist the behavioral counseling of adolescent patients.
rate in terms of content likeability and the future use of chatbots
Similarly, in Woebot [31], CBT was clubbed with motivational
was less than 50%. Overall, 20% (3/15) of studies reported that
interviewing and dialectical behavior therapy to provide
the AI platforms offered a nonjudgmental safe space for users
emotional support and personalized psychoeducation to resist
to share detailed and sensitive information [5,26,29]. The
substance misuse. The theory of motivational interviewing was
participants reported that the chatbots provided a personal space
also used to devise interview questions addressed by NAO [5]
and time to think and respond uninterruptedly [5]; to share
(the social robot) and the motivation reinforcement messages
personal and intimate information such as sexuality, which they
provided by Bickmore et al’s [33] Chat1. In HCAI [28], CBT
could not share with their physician directly [26]; and to ask
was clubbed with the Diabetes Prevention Program’s curriculum
questions regarding sex, drugs, and alcohol as they considered
to develop content for conversations on weight loss.
chatbots to be more anonymous and faster than information
lines and search engines [29]. The habit formation model, which explains the relationship
among cues, behaviors, and rewards, was used to develop the
Usability reminder system in Healthy Lifestyle Coaching Chatbot
Out of the 15 studies, 11 (73%) reported the usability of AI (HLCC). The Mohr’s Model of Supportive Accountability, which
chatbots in terms of (1) ease of using the chatbot, (2) states that the inclusion of human support in digital interventions
outside-office support, (3) usability of the content, and (4) increases engagement, was used to mimic human support in
technical difficulties. Overall, the ease of using chatbots was Smoke Free app (SFA) [24] to increase accountability and
low to moderate. The ease of use was dependent on the belongingness. Furthermore, SFA’s [24] behavior change

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 8


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

techniques were coded against a 44-item taxonomy of behavior offered information on the health issue (breast cancer) for which
change techniques in individual behavioral support for smoking the users were taking medication.
cessation. The transtheoretical model (TTM) of behavior change
Fourth, 53% (8/15) of studies reported motivation reinforcement.
was used by Carrasco-Hernandez et al [23] to determine message
The chatbots that targeted healthy lifestyles (3/8, 38%) offered
frequency for the AI chatbot. Similarly, TTM was used in
feedback on behaviors (HLCC and Ida [32]) and reinforced
Bickmore et al’s [33] Chat1 to design the behavioral monitoring
optimism to change behaviors through planning and imagining
process, which included reviewing progress, identifying barriers,
change (NAO [5] and Ida [32]). The chatbots that targeted
and solving problems. The Capability, Opportunity, Motivation,
smoking cessation (4/8, 50%) reinforced motivation through
Behavior model, the core of the Behavior Change Wheel, a
personalized messages based on TTM (DigiQuit [23]),
behavioral system focusing on 3 components—capability,
scoreboards and trackers of milestones (SFA [24]), and
opportunity, and motivation—was used in To et al’s [32] Ida
motivational messages (CASC [25] and SMAG [27]). The
to set goals, monitor behavior, reinforce behavior change
chatbot that targeted reduction in substance misuse focused on
through motivational messages. Social cognitive theory was
motivation and engagement through individualized weekly
also used in Ida [32] to facilitate therapeutic dialog actions (ie,
reports to foster reflection (Woebot [31]).
talk therapy) and homework sessions outside the agent
counseling sessions. Apart from the use of theories, expert Fifth, 27% (4/15) of studies provided emotional support. Of
consultation and institutional assistance were adopted to develop them, 3 studies (75%) targeted healthy lifestyles, and 25% (1/4)
AI chatbots’ content. Mental health experts were consulted to targeted reduction in substance misuse. Among the interventions
develop and deliver customized messages through Tess [6], that targeted healthy lifestyles, Tess [6] offered empathetic
whereas 2 national health promotion institutions in the health counseling or compassionate care through ML-driven
Netherlands developed the content for Bzz [29]. In conclusion, emotional algorithms; NAO [5], the social robot, expressed
most studies either adopted a set of critically selected behavior empathy through humanized robot interaction, and HCAI [28]
change theories or consulted domain experts (individuals or mimicked health professionals’ empathetic health counseling.
institutions) to develop behavior change strategies. The intervention that targeted reduction in substance misuse,
Woebot [31], offered empathic responses by tailoring to users
On the basis of the behavior change theories, the AI chatbots
stated mood.
had multiple functionalities that contributed to efficacious
outcomes. First, 53% (8/15) of studies targeted behavioral goal Sixth, 7% (1/15) of studies (CASC [25]) delivered
setting. These chatbots targeted healthy lifestyles (7/8, 88%; provider-recommendation system services. CASC [25] offered
HLCC, Paola [22], SFA [24], NAO [5], HCAI [28], Ida [32], advice and counseling support to physicians. Seventh, 47%
and Chat1 [33]) and the reduction of substance misuse (1/8, (7/15) of studies reported 24*7 availability of the AI chatbot.
12%; Woebot [31]). The chatbots with goals related to healthy The chatbots that targeted healthy lifestyles (4/7, 57%; Paola
lifestyles enabled users to set physical activity and dietary goals [22], Tess [6], HCAI [28], and Bzz [29]) offered on-demand
with push alarms to maintain daily routines and monitor weight. support, unlimited conversations, and answers to infinite number
Second, 73% (11/15) of studies used behavioral monitoring. of questions. The chatbots that targeted smoking cessation (3/7,
The chatbots that targeted healthy lifestyles (5/11, 45%; HLCC, 43%) offered on-demand emergency support via an AI nurse
Paola [22], HCAI [28], Ida [32], and Chat1 [33]) enabled (CASC [25]), support during periods of high cravings (SMAG
behavioral monitoring by consistently providing feedback [27]), and unlimited availability for conversations (SFA [24]).
through performance content and pictures, weekly check-ins, Eighth, 13% (2/15) of studies promoted activities beyond
and data-based inputs on performance. The chatbots that targeted conversation with chatbots. Chat1 [33] offered homework
smoking cessation (3/11, 27%; DigiQuit [23], SFA [24], and assignments, whereas Woebot [31] required mindfulness
SMAG [27]) offered data-driven feedback on health indicators exercises, gratitude journaling, or reflecting upon patterns and
through web-based diaries and graphs. The chatbots that targeted lessons already covered. In conclusion, AI chatbots offered
medication or treatment adherence (2/11, 18%; Vik [26] and personalized, real-time feedback and on-demand support to
mPulse [30]) offered timely reminders to take medications or users continuously and indefinitely.
refill medicines. The chatbot that targeted the reduction in
Infrastructure of the Chatbots: AI Techniques
substance misuse performed mood tracking and regular
check-ins to maintain accountability (1/11, 9%; Woebot [31]). Most of the studies (10/15, 67%) deployed different AI
techniques to deliver personalized interventions: NLP, ML,
Third, 53% (8/15) of studies offered behavior-related hybrid techniques (ML and NLP), Hybrid Health Recommender
information. The chatbots that targeted healthy lifestyles (3/8, System, face-tracking technology, and procedural and
38%) offered educational sessions on the benefits of physical epistemological knowledge–based algorithm. ML-driven
activity (Ida [32]) and healthy diet (Paola [22]) and information emotional algorithms were used in Tess [6] and HCAI [28] to
on sex, drugs, and alcohol (Bzz [29]). The chatbots that targeted provide empathetic counseling or compassionate care
smoking cessation (4/8, 50%; DigiQuit [23]; SFA [24]; CureApp (emotion-based response). The AI algorithm analyzed users’
Smoking Cessation [CASC], [25]; and SMAG [27]) educated messages (voice or text based) to identify and categorize their
users on the benefits of being a nonsmoker, implications of emotions. Thereafter, the chatbots provided both emotional and
abrupt cessation, and alternatives to smoking. The chatbot that strategic support to the users. NLP and ML techniques were
targeted medication or treatment adherence (1/8, 12%; Vik [26]) used in Paola [22], Vik [26], Ida [32], and Woebot [31] to
identify and categorize user intents and entities by analyzing

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 9


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

unstructured messages. Bickmore et al’s [33] Chat1 used usability and evidence-based content. The users’ background
procedural and epistemological knowledge–based AI algorithms information or baseline characteristics were collected by 4 AI
that facilitated therapeutic dialog actions (talk therapy). A hybrid chatbots. Paola [22] measured the baseline level of physical
technique combining NLP and conversational AI or ML was activity and Mediterranean diet; SFA [24] measured time to
adopted by mPulse [30] to ensure smooth, continuous, and first cigarette and cigarettes per day; CASC [25] measured
uninterrupted conversations. Hybrid Health Recommender demographics, motivation levels for smoking cessation, number
System was adopted by Carrasco-Hernandez et al’s [23] AI of cigarettes smoked per day, and years of smoking; SMAG
chatbot to personalize messages based on user demographics, [27] measured demographics and type of smoking dependence;
content (interest of the user), and utility (ratings on each message and Tess [6] used electronic health records. Information on the
by the user). Face-tracking technology was integrated into NAO users’ goals, that is, who, when, where, what, and how, was
[5] (the social robot) to track participants’ faces to humanize collected by 3 chatbots. HLCC [21] asked the users (office
the interaction experience. The remaining chatbot studies (5/15, workers) to set realistic stair climbing goals, Paola [22] enabled
33%) specified the use of AI to personalize the chatbot the users to set dietary goals and daily step target every week
interaction but did not elaborate on the AI techniques adopted. based on the previous week’s outcomes, and SFA [24] asked
In conclusion, most studies targeted personalized services the users to set the target quit date for smoking. Real-time
through different AI techniques. feedback on usability was collected by 3 chatbots. DigiQuit
[23] collected feedback on the message content and timing, Tess
Infrastructure of the Chatbots: Logistics
[6] collected data on the usefulness of the message, and Vik
The chatbots used multimodal channels of communication with [26] collected data on the relevance of the reminders. Real-time
the users. All chatbots except NAO [5] (14/15, 93%) used feedback on the behavioral performance of the users was
text-based communication with the users, among which 2 (14%; collected by 5 chatbots. HLCC [21] collected performance
Tess [6] and Vik [26]) chatbots also used voice-based content and pictures; Paola [22] collected data on daily steps
communication. NAO [5] used only voice-based and dietary patterns; Vik [26] collected data on medication
communication, as it was deployed via a social robot. The AI adherence levels; SMAG [27] monitored the users’ smoking
chatbot–based interventions were implemented for different levels along with information on location, alone or accompanied,
durations: 0 to 2 months (3/15, 20%), 2- to 5 months (7/15, ongoing activity, and mood to create smoking profiles for them;
47%), 5 to 9 months (2/15, 13%), 9 to 12 months (2/15, 13%), and HCAI [28] gathered data automatically through sensors on
and >12 months (1/15, 7%). Out of the 13 chatbots that reported phones and integrated devices such as wearables and
the frequency of engagement, all chatbots, except NAO [5], self-reported information such as on dietary consumption.
interacted with the users daily. NAO [5] interacted only once Overall, 20% (3/15) of studies used evidence-based content
because it was delivered in person through a social robot. The apart from the user data. Tess [6] used clinical scripts targeted
AI chatbots were either integrated into existing platforms or at behavior change, CASC [25] used national guidelines on
delivered independently. Vik [26], SMAG [27], Tess [6], and counseling support, and HCAI [28] used content from the
Ida [32] were integrated into Facebook (Meta Platforms, Inc) Diabetes Prevention Program’s curriculum. In conclusion, most
Messenger. Tess [6] was also available on WhatsApp (Meta studies used diverse input data sets, indicating the need to collect
Platforms, Inc), Amazon Alexa (Amazon.com, Inc), Google comprehensive and essential input data for delivering
Home (Google LLC), and mobile SMS. HLCC [21] was personalized services.
integrated with KakaoTalk (Kakao Corp), a popular messenger
app in South Korea, and mPulse [30] was integrated with mobile Discussion
SMS. The remaining chatbots (8/15, 53%) were delivered
independently. The chatbots were deployed using different Principal Findings
devices. All chatbots except NAO [5], Ida [32], and Chat1 [33]
(12/15, 80%) were deployed through smartphones, among which Overview
3 (27%; Vik [26], SMAG [27], and Bzz [29]) chatbots were The results of this review demonstrate the potential of AI
also deployed through computers. Chat1 [33] was deployed chatbots to deliver efficacious, effective, and feasible health
only through computers. NAO [5] was deployed through a social behavior interventions. However, the high risk of internal
robot, and Ida [32] was deployed through Fitbit Flex 1 (Fitbit validity, lack of sufficient description of AI techniques, and
LLC). Three chatbots (HLCC, Paola [22], and Ida [32]) lack of generalizability of the selected studies suggest the need
integrated an existing AI-driven conversational platform, that for further research with robust methodologies to draw definitive
is, the Watson conversation tool (HLCC and Paola [22]) and conclusions. Regardless, the review identified practical and
Dialogflow, an advanced Google ML algorithm (Ida [32]). In research implications of intervention strengths and limitations
conclusion, AI chatbots can be deployed through accessible of the existing studies with potential future directions.
devices and platforms, indicating their potential for reaching
remote and large populations. Primary Outcomes
This review found that AI chatbots were efficacious in
Infrastructure of the Chatbots: Input Data for Personalized promoting healthy lifestyles, including physical exercise and
Services diet (6/15, 40%), smoking cessation (4/15, 27%), treatment or
To deliver personalized services using AI chatbots, most medication adherence (2/15, 13%), and reduction in substance
chatbots or studies (9/15, 60%) required input data on the users’ misuse (1/15, 7%). These findings are consistent with previous
background, goals, and behavioral performance and chatbots’
https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 10
(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

systematic reviews that reported the use of AI chatbots for information over time. Previous systematic reviews also reported
improvement in physical activity levels and improvement in that the use of CBT [2,11], habit formation model, emotionally
medication adherence [2,5], treatment adherence [14], adherence focused therapy, and motivational interviewing [2] for designing
to self-management practices [1], smoking cessation [12], and behavior change strategies for AI chatbots contributed to better
reduction in substance abuse [12]. engagement, user motivation, and health behavior outcomes.
More interdisciplinary collaboration between behavioral health
Secondary Outcomes experts and computer scientists is required to develop
The review found that AI chatbots reported mixed results in theory-based AI chatbots for behavior change interventions.
terms of feasibility, acceptability, and usability. In the case of
feasibility, evidence on the safety of chatbots was quite less Free-Flow Conversation
because only 7% (1/15) of studies reported safety [22]. As there Second, in all studies, free-flow conversations rather than
were no predefined or standard thresholds on the number of rule-based or constrained conversations with AI chatbots
message exchanges that demonstrates feasibility, it was difficult enhanced user experience through the personalization of
to interpret whether the AI chatbots were feasible. Some of the services, delivery of diverse information, and choice of
previous systematic reviews reported feasibility in the form of user-initiated conversation. By contrast, rule-based chatbots
engagement with AI chatbots; however, the feasibility metrics offer limited user experience through constraints on the input
differed across studies, and there was strong evidence regarding data, a finite set of conversations that are task oriented and
decrease in engagement rates over time [11,13,14]. Similarly, straightforward, and a lack of user-initiated conversations. This
AI chatbots had mixed results in terms of acceptability. finding is consistent with previous systematic reviews that
Although AI platforms offered a nonjudgmental safe space for reported the need for greater personalization in AI chatbots
the users to share detailed and sensitive information through feedback on user performance, accountability,
anonymously, the proportion of satisfied participants or the encouragement, and deep interest in the user’s situation [13].
overall satisfaction rate of chatbots was less than 50%. These Similarly, Milne-Ives et al [8] reported a need for greater
findings are partially aligned with previous systematic reviews interactivity or relational skills, empathetic conversations, and
that reported acceptability [11,13,14] and on-demand a sense of personal connection with the user through
availability, accessibility, and satisfaction [7]. Similarly, the compassionate responses.
results on usability were mixed. Some studies reported that AI
The need for greater interactivity can also be associated with
chatbots were efficient in offering outside-office support and
the fluctuations in user engagement found in 13% (2/15) of
high reliability and understandability of the content, whereas
studies [22,23]. In one of these studies, the engagement rates
other studies reported a lack of personal connection with
decreased gradually as the intervention progressed [23], and in
chatbots, poor smartphone skills among the participants,
the other study, the engagement rates decreased significantly
impractical recommendations by the chatbot, and technical
by midprogram but increased to 70% in the final week, that is,
challenges such as those where the participants stopped
the 12th week [22]. This is a novel finding that has not been
receiving the chatbot messages during the study period. These
reported in previous systematic reviews. As specified in the AI
findings are partially aligned with previous systematic reviews
chatbot design literature, user engagement is dependent on the
on AI chatbots that reported that chatbots provided helpful
chatbot’s ability to understand the user’s background, build a
information and were easy to use [7]. Overall, our mixed results
relation, be persuasive, and offer quick feedback [34]. Therefore,
regarding feasibility, acceptability, and usability are partially
it is critical for AI chatbots to establish appropriate rapport or
aligned with the findings of the existing systematic reviews that
relationship with the user through personalized and
reported heterogeneity in these secondary outcome measures
compassionate interactions for a sustained and engaging
and results across studies [1,2,7,9-11].
intervention. AI experts must establish pathways for
Implications of Intervention Characteristics comprehensive real-time data collection to produce accurate
and personalized responses.
Theoretical Foundation for Behavior Change
The fundamental characteristics of the AI chatbots played a Nonjudgmental Space
critical role in determining efficacious outcomes. First, the Third, in 20% (3/15) of studies, the humanistic yet
majority of the studies (9/15, 60%) used critically selected nonhumanistic construct of AI chatbots provided a safe space
behavior change theories in the design and delivery of the AI for the users to discuss, share, and ask for information on
chatbots. Our findings suggested that the integration of behavior sensitive issues [5,22,23,35]. The ML-driven emotional
change theories such as CBT, TTM, motivational interviewing, algorithms offered the potential for perceiving and understanding
emotionally focused therapy, habit formation model, and Mohr’s human emotions [36], whereas the nonhuman interaction
Model of Supportive Accountability resulted in the delivery of experience or the lack of interaction with a real human made it
consistent motivational support to users through goal setting, easier for the user to self-disclose sensitive information [37].
monitoring or tracking behaviors, and reinforcement. These Thus, AI chatbots demonstrate their potential for intervening
strategies not only contributed toward better primary and with vulnerable populations, especially in terms of stigmatized
secondary outcomes but also solved several challenges in the issues. For example, adolescence is characterized by high social
traditional face-to-face intervention models from users’ anxiety; therefore, adolescents perceive stigma in seeking
standpoint, such as limited connectivity with the expert, lack services on sensitive issues such as mental health disorders. In
of consistent motivation, and lack of access to diverse such scenarios, AI chatbots offer sufficient privacy and

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 11


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

anonymity for adolescents to express their thoughts and Limitations of the Reviewed Studies and Future
emotions freely. This finding is consistent with a previous Research Directions
systematic review that reported the use of anonymity for
encouraging users to freely express their emotions [13]. Nascent Application of AI Chatbots
Almost 75% (11/15) of the articles were published in the years
Scalability
2019 and 2021, indicating that the use of AI-driven chatbot
Fourth, most studies (8/15, 53%) reported that the AI chatbots interventions for behavior changes is at a nascent stage. Most
have a low threshold for integration into existing services yet studies (9/15, 60%) adopted a pre-post study design with no
a high reward. Most of the traditional behavioral interventions control group, with only 27% (4/15) of studies using RCT
require in-person service delivery; however, this approach has models, reinstating the immaturity in establishing causal
several limitations from the implementor’s standpoint such as connections between AI-based conversational agents and health
lack of consistent data collection, continuous monitoring, behavior outcomes. This finding is aligned with many previous
scalability, and sustainability of the intervention. AI chatbots systematic reviews that reported that 4 of 9 studies were RCTs,
have a low threshold for integration into these traditional remaining were quasiexperimental, feasibility, or pilot RCT
services because they do not put a strain on existing resources studies [9], 2 of 10 studies were RCTs, majority were
such as experts, time, money, and effort. The chatbots can be quasiexperimental [14], and 2 of 17 studies were RCTs, majority
freely deployed through daily use platforms and accessed at any were quasiexperimental [1]. Future studies need to adopt robust
time by the users. The use of chatbots can help integrate RCTs that can establish a causal relationship between AI
behavioral interventions into the daily clinical setting and avoid chatbots and health outcomes.
addition pressure faced by health care providers. For example,
chatbots can independently offer low-intensity services such as Risk of Internal Validity
information delivery to users. Furthermore, chatbots can offer The outcome of this review should be interpreted with caution
provider-recommendation services, wherein, based on the because of the moderate to high risk of internal validity within
analysis of real-time user data, the chatbots may offer the selected studies. In the included studies, the risk of outcomes
suggestions to the health care providers to help them offer more from unintended sources was high owing to the lack of
effective services [27]. Therefore, public health professionals information on the measures to avoid the influence of other
and health care providers can consider the integration of AI interventions and level of adherence to the intervention protocol.
chatbots into existing services as a support tool, rather than a The risk of bias in the measurement of the outcomes was
replacement [9]. moderate to high owing to the lack of concealment of the
assigned intervention from the evaluators and the lack of using
Most of the studies (10/15, 67%) had a large and diverse sample
validated and reliable outcome measures. The risk of bias in
population, demonstrating the potential for scaling up
the analysis was moderate to high owing to high dropout rates,
chatbot-based interventions. Almost half of the studies had >200
the lack of power calculation to estimate sample size, and the
participants, with 27% (4/15) of studies consisting of a sample
lack of information on the use of intent-to-treat analysis. These
size ranging from approximately 920 to 991,217 participants.
findings are consistent with many previous systematic reviews
Similarly, the selected studies not only included samples with
that reported moderate risk of outcomes from unintended sources
diverse health and behavioral conditions (13/15, 87%), such as
owing to confounding in all quasiexperimental studies [9]; high
breast cancer, smoking, obesity, unhealthy eating patterns, lack
risk of outcome measurement because evaluators were aware
of physical exercise, conditions requiring medication, substance
of the assigned intervention [8,9] or nonvalidated instruments
misuse, but also samples with no preexisting conditions (2/15,
were used for outcome measurement [1,11]; and moderate risk
13%). This demonstrates the potential of AI chatbots to reach
of bias in analysis owing to high attrition rate, the lack of
a large and diverse population in different settings. This is
analysis methods for bias correction, the lack of power analysis,
because AI chatbots have the potential to be integrated into
and small sample size at follow-up [2,9].
extensively used existing platforms such as text SMS, Facebook
Messenger, and WhatsApp and deployed through commonly There was also inconsistency across studies in the measures of
used devices such as smartphones, computers, and Alexa, secondary outcomes, that is, feasibility, usability, acceptability,
making it highly feasible to access a large and diverse and engagement. This finding is consistent with most of the
population. This finding is consistent with the previous previous systematic reviews that reported mixed findings on
systematic reviews that reported the integration of AI chatbots secondary outcome measures [1,2,7,9-11]. First, this issue stems
into diverse platforms, such as Slack (Slack Technologies, LLC), from the lack of a common operational definition for secondary
Messenger, WhatsApp, and Telegram [1,2,14], and the use of outcomes in the context of chatbot-based interventions. Second,
a large sample size, such as >100 participants in 10 of 15 studies because the AI chatbot intervention domain is relatively new,
[21,23,24,26,27,29-33]. Thus, public health professionals can there are very few measures on feasibility, usability,
deploy AI chatbots for education, the promotion of behavior acceptability, and engagement with tested reliability and validity.
change, and the provision of health care services to prevent Therefore, the researchers in the selected studies had to develop
health issues that affect a large population. their own measures for assessing outcomes. This led to
inconsistency in the measures and their operational definitions
across the studies. Future studies should shape the development
of common operational definitions for each of these outcomes
to enable comparison and standardized reporting. Furthermore,
https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 12
(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

future AI-chatbot–based intervention studies should follow the a solution to the challenges faced by adolescents in seeking
National Institutes of Health’s quality assessment criteria for behavioral health services, such as perceived and enacted stigma
controlled intervention studies [19] to assess their studies’ and lack of motivation [9,40].
internal validity.
Safety and Ethics
Lack of Description of AI Algorithm In this review, the evidence for patient safety was limited;
In this review, most studies (14/15, 93%) did not describe the however, the limited evidence stated that chatbots were safe for
characteristics and handling of the input data, along with other behavioral and mental health interventions. Only 7% (1/15) of
processes related to the AI algorithm. This finding is consistent studies, that is, the study by Maher et al [22], reported safety
with the previous systematic literature review that reported in terms of the absence of adverse events. This finding is
inconsistent use of AI-software taxonomy and lack of depth of consistent with the previous systematic literature reviews that
reported AI techniques and systems [14]. In alignment with reported very few studies discussed participant safety or ethics
CONSORT-AI extension [20], future studies need to elaborate in terms of adverse events [1,2,7,9] and data security or privacy
on the following components related to the AI algorithm: (1) [2,8]. The occurrence of flexible, real-time, and large number
the process of supplying input data to the AI algorithm, of conversations with AI chatbots increases the probability of
including the user interface that enables data collection, error by the AI algorithm. This can lead to unintended adverse
inclusion or exclusion criteria of input data, handling of outcomes, especially in the case of sensitive topics. Therefore,
unavailable data, and establishing the credibility of the data in the context of the nascent use of AI technologies, future
collected (eg, specifying the source of input data); (2) the output studies should assess and report AI performance from ethical
by the AI algorithm and its relevance to the health-related goals; and safety standpoints.
(3) the AI functioning, including the type of personalization
algorithm such as ML, NLP, etc, version of the AI algorithm,
Limitations of This Review
and the accuracy level of the algorithm; (4) performance This systematic literature review has several limitations. First,
backlogs in the AI algorithm deployed, which would indicate a meta-analysis was not conducted for the reviewed studies.
the level of safety in using AI algorithms, especially with Owing to heterogeneity in the research design, outcomes
vulnerable populations; (5) the level and type of expertise reported, and outcome measures, a meta-analysis was not
required to integrate and successfully deploy the AI algorithm; perceived as feasible by the authors. Second, this review did
and (6) the skills needed by the participants to use the AI not cover a comprehensive set of behavioral outcomes. The
chatbot, which would indicate the number of resources required selected studies focused on only 3 behavioral outcomes: healthy
and the feasibility of using AI algorithms lifestyle (physical activity and diet), smoking cessation, and
treatment or medication adherence. However, this was also
Lack of Generalizability because the authors had adopted strict inclusion criteria for AI
The selected studies were not representative of diverse chatbots, and studies with rule-based chatbots were ruled out,
geographies, cultures, and age groups, which exerted a strong restricting the number of behavioral outcomes covered. Third,
bias on the generalizability of the studies. Out of the 13 studies the data matching for the tables was not quantified was not
that reported the geographical locations, all (100%) were quantified; therefore, intercoder reliability was not reported.
conducted in the high-income countries; the majority of the However, data extraction and quality assessment were conducted
studies (80%) were embedded in the Western culture, apart from by 2 authors independently, followed by a discussion among
the studies in South Korea and Japan; and most of the studies the authors to finalize tables. Fourth, articles from outside
(>80%) were implemented with adults (≥18 years). These selected databases (eg, Google Scholar), unpublished work and
findings are consistent with the previous systematic literature conference articles, gray literature (eg, government reports),
reviews that reported that all the chatbot intervention studies and articles in other languages were not included. Fifth,
were conducted in high-income countries [2,10,11,14], most intervention studies that did not provide a clear description of
studies were conducted with adults [2,7], and most studies did AI chatbots or did not label AI chatbots as a keyword were
not focus on racial or ethnic minorities [2,14]. excluded.
To increase the generalizability of the efficacy and feasibility Conclusions
of AI chatbots, future studies need to test their use in This review provides an evaluation of AI chatbots as a medium
low-income countries or low-resource settings and with children for behavior change interventions. On the basis of the outcomes
and adolescents. The increased mobile connectivity and internet of the selected studies (N=15), AI chatbots were efficacious in
use in low-income countries [38] offer the potential to promoting healthy lifestyles (physical activity and diet), smoking
implement AI chatbot–based health behavior interventions. The cessation, and treatment or medication adherence. However,
use of AI chatbots can tackle the challenges faced by the health the studies had mixed results in terms of the feasibility,
systems in low-income countries, such as the lack of experts, acceptability, and usability of AI chatbots in diverse settings
limited health infrastructure in rural areas, and poor health with diverse populations. The efficacious outcomes of AI-driven
access [39]. Similarly, with the rise in the use of smartphones chatbot interventions can be attributed to the fundamental
and latest digital technologies among adolescents [40], AI characteristics of an AI chatbot: (1) personalized services, (2)
chatbots offer the opportunity to deliver engaging behavioral nonjudgmental safe space to converse, (3) easy integration into
health interventions to them. The nonjudgmental and existing services, (4) engaging experience, and (5) scalability
nonstigmatic attributes of AI chatbot–based interventions offer

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 13


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

to a large and diverse population. However, the outcomes of behavior change services owing to their (1) the ease of
this review need to be interpreted with caution because most of integration; (2) potential for affordability, accessibility,
the included studies had a moderate risk of internal validity, scalability, and sustainability; (3) delivery of services to
given that the AI chatbot intervention domain is at a nascent vulnerable populations on sensitive issues in a nonstigmatic and
stage. Future studies need to adopt robust RCTs and provide engaging manner; and (4) the potential for consistent data
detailed descriptions of AI-related processes. Overall, AI collection to support health care providers’ decisions.
chatbots have immense potential to be integrated into existing

Acknowledgments
XL and SQ conceived the research topics and questions. CCT and AA performed the literature search and screening. AA, CCT,
and SQ performed the data extraction and analysis. AA and SQ developed the first draft. DW reviewed the paper and provided
key feedback and edits. All the authors reviewed the final manuscript.
The research reported in this publication was supported by the National Institutes of Allergy and Infectious Diseases under award
R01AI127203-5S1. The content is solely the responsibility of the authors and does not necessarily represent the official views
of the National Institutes of Health.
The authors would also like to acknowledge the generous funding support from the University of South Carolina Big Data Health
Science Center, a University of South Carolina excellence initiative program (grants BDHSC-2021-14 and BDHSC-2021-11).

Conflicts of Interest
None declared.

Multimedia Appendix 1
Search string.
[DOCX File , 13 KB-Multimedia Appendix 1]

Multimedia Appendix 2
Methodology assessment based on the National Institutes of Health’s quality assessment tool for controlled intervention studies.
[DOCX File , 19 KB-Multimedia Appendix 2]

Multimedia Appendix 3
Quality assessment of chatbot interventions based on CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial
Intelligence extension).
[DOCX File , 16 KB-Multimedia Appendix 3]

Multimedia Appendix 4
Outcomes of the reviewed articles.
[DOCX File , 25 KB-Multimedia Appendix 4]

Multimedia Appendix 5
Features of the chatbots in the reviewed studies.
[DOCX File , 23 KB-Multimedia Appendix 5]

References
1. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic
review. J Am Med Inform Assoc 2018 Sep 01;25(9):1248-1258 [FREE Full text] [doi: 10.1093/jamia/ocy072] [Medline:
30010941]
2. Oh YJ, Zhang J, Fang ML, Fukuoka Y. A systematic review of artificial intelligence chatbots for promoting physical
activity, healthy diet, and weight loss. Int J Behav Nutr Phys Act 2021 Dec 11;18(1):160 [FREE Full text] [doi:
10.1186/s12966-021-01224-6] [Medline: 34895247]
3. Duarte A, Walker S, Littlewood E, Brabyn S, Hewitt C, Gilbody S, et al. Cost-effectiveness of computerized
cognitive-behavioural therapy for the treatment of depression in primary care: findings from the Randomised Evaluation
of the Effectiveness and Acceptability of Computerised Therapy (REEACT) trial. Psychol Med 2017 Jul;47(10):1825-1835
[FREE Full text] [doi: 10.1017/S0033291717000289] [Medline: 28228182]

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 14


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

4. Gilbody S, Brabyn S, Lovell K, Kessler D, Devlin T, Smith L, REEACT collaborative. Telephone-supported computerised
cognitive-behavioural therapy: REEACT-2 large-scale pragmatic randomised controlled trial. Br J Psychiatry 2017
May;210(5):362-367 [FREE Full text] [doi: 10.1192/bjp.bp.116.192435] [Medline: 28254959]
5. Galvão Gomes da Silva J, Kavanagh DJ, Belpaeme T, Taylor L, Beeson K, Andrade J. Experiences of a motivational
interview delivered by a robot: qualitative study. J Med Internet Res 2018 May 03;20(5):e116 [FREE Full text] [doi:
10.2196/jmir.7737] [Medline: 29724701]
6. Stephens TN, Joerin A, Rauws M, Werk LN. Feasibility of pediatric obesity and prediabetes treatment support through
Tess, the AI behavioral coaching chatbot. Transl Behav Med 2019 May 16;9(3):440-447. [doi: 10.1093/tbm/ibz043]
[Medline: 31094445]
7. Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The effectiveness of artificial intelligence
conversational agents in health care: systematic review. J Med Internet Res 2020 Oct 22;22(10):e20346 [FREE Full text]
[doi: 10.2196/20346] [Medline: 33090118]
8. Gaffney H, Mansell W, Tai S. Conversational agents in the treatment of mental health problems: mixed-method systematic
review. JMIR Ment Health 2019 Oct 18;6(10):e14166 [FREE Full text] [doi: 10.2196/14166] [Medline: 31628789]
9. Abd-Alrazaq AA, Rababeh A, Alajlani M, Bewick BM, Househ M. Effectiveness and safety of using chatbots to improve
mental health: systematic review and meta-analysis. J Med Internet Res 2020 Jul 13;22(7):e16021 [FREE Full text] [doi:
10.2196/16021] [Medline: 32673216]
10. Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health:
a review of the psychiatric landscape. Can J Psychiatry 2019 Jul;64(7):456-464 [FREE Full text] [doi:
10.1177/0706743719828977] [Medline: 30897957]
11. Hoermann S, McCabe KL, Milne DN, Calvo RA. Application of synchronous text-based dialogue systems in mental health
interventions: systematic review. J Med Internet Res 2017 Jul 21;19(8):e267 [FREE Full text] [doi: 10.2196/jmir.7023]
[Medline: 28784594]
12. Provoost S, Lau HM, Ruwaard J, Riper H. Embodied conversational agents in clinical psychology: a scoping review. J
Med Internet Res 2017 May 09;19(5):e151 [FREE Full text] [doi: 10.2196/jmir.6553] [Medline: 28487267]
13. Scholten MR, Kelders SM, Van Gemert-Pijnen JE. Self-guided web-based interventions: scoping review on user needs and
the potential of embodied conversational agents to address them. J Med Internet Res 2017 Nov 16;19(11):e383 [FREE Full
text] [doi: 10.2196/jmir.7351] [Medline: 29146567]
14. Schachner T, Keller R, Wangenheim FV. Artificial intelligence-based conversational agents for chronic conditions: systematic
literature review. J Med Internet Res 2020 Sep 14;22(9):e20701 [FREE Full text] [doi: 10.2196/20701] [Medline: 32924957]
15. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, PRISMA-P Group. Preferred reporting items for
systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev 2015 Jan 01;4(1):1 [FREE Full text]
[doi: 10.1186/2046-4053-4-1] [Medline: 25554246]
16. Bowen DJ, Kreuter M, Spring B, Cofta-Woerpel L, Linnan L, Weiner D, et al. How we design feasibility studies. Am J
Prev Med 2009 May;36(5):452-457 [FREE Full text] [doi: 10.1016/j.amepre.2009.02.002] [Medline: 19362699]
17. Nahum-Shani I, Shaw SD, Carpenter SM, Murphy SA, Yoon C. Engagement in digital interventions. Am Psychol 2022
Oct;77(7):836-852 [FREE Full text] [doi: 10.1037/amp0000983] [Medline: 35298199]
18. Lyon AR, Pullmann MD, Jacobson J, Osterhage K, Achkar MA, Renn BN, et al. Assessing the usability of complex
psychosocial interventions: the Intervention Usability Scale. Implement Res Pract 2021;2:10.1177/2633489520987828
[FREE Full text] [doi: 10.1177/2633489520987828] [Medline: 35601889]
19. Study Quality Assessment Tools. National Heart, Lung and Blood Institute, National Institutes of Health. 2021. URL:
https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools [accessed 2022-07-03]
20. Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK, SPIRIT-AICONSORT-AI Working Group. Reporting guidelines
for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 2020 Sep
09;370:m3164 [FREE Full text] [doi: 10.1136/bmj.m3164] [Medline: 32909959]
21. Piao M, Ryu H, Lee H, Kim J. Use of the healthy lifestyle coaching chatbot app to promote stair-climbing habits among
office workers: exploratory randomized controlled trial. JMIR Mhealth Uhealth 2020 May 19;8(5):e15085 [FREE Full
text] [doi: 10.2196/15085] [Medline: 32427114]
22. Maher CA, Davis CR, Curtis RG, Short CE, Murphy KJ. A physical activity and diet program delivered by artificially
intelligent virtual health coach: proof-of-concept study. JMIR Mhealth Uhealth 2020 Jul 10;8(7):e17558 [FREE Full text]
[doi: 10.2196/17558] [Medline: 32673246]
23. Carrasco-Hernandez L, Jódar-Sánchez F, Núñez-Benjumea F, Moreno Conde J, Mesa González M, Civit-Balcells A, et al.
A mobile health solution complementing psychopharmacology-supported smoking cessation: randomized controlled trial.
JMIR Mhealth Uhealth 2020 Apr 27;8(4):e17530 [FREE Full text] [doi: 10.2196/17530] [Medline: 32338624]
24. Perski O, Crane D, Beard E, Brown J. Does the addition of a supportive chatbot promote user engagement with a smoking
cessation app? An experimental study. Digit Health 2019 Sep 30;5:2055207619880676 [FREE Full text] [doi:
10.1177/2055207619880676] [Medline: 31620306]

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 15


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

25. Masaki K, Tateno H, Kameyama N, Morino E, Watanabe R, Sekine K, et al. Impact of a novel smartphone app (CureApp
Smoking Cessation) on nicotine dependence: prospective single-arm interventional pilot study. JMIR Mhealth Uhealth
2019 Feb 19;7(2):e12694 [FREE Full text] [doi: 10.2196/12694] [Medline: 30777848]
26. Chaix B, Bibault JE, Pienkowski A, Delamon G, Guillemassé A, Nectoux P, et al. When chatbots meet patients: one-year
prospective study of conversations between patients with breast cancer and a chatbot. JMIR Cancer 2019 May 02;5(1):e12856
[FREE Full text] [doi: 10.2196/12856] [Medline: 31045505]
27. Calvaresi D, Calbimonte JP, Dubosson F, Najjar A, Schumacher M. Social network chatbots for smoking cessation: agent
and multi-agent frameworks. In: IEEE/WIC/ACM International Conference on Web Intelligence. 2019 Oct Presented at:
WI '19; October 14-17, 2019; Thessaloniki, Greece p. 286-292. [doi: 10.1145/3350546.3352532]
28. Stein N, Brooks K. A fully automated conversational artificial intelligence for weight loss: longitudinal observational study
among overweight and obese adults. JMIR Diabetes 2017 Nov 01;2(2):e28 [FREE Full text] [doi: 10.2196/diabetes.8590]
[Medline: 30291087]
29. Crutzen R, Peters GJ, Portugal SD, Fisser EM, Grolleman JJ. An artificially intelligent chat agent that answers adolescents'
questions related to sex, drugs, and alcohol: an exploratory study. J Adolesc Health 2011 May;48(5):514-519. [doi:
10.1016/j.jadohealth.2010.09.002] [Medline: 21501812]
30. Brar Prayaga R, Agrawal R, Nguyen B, Jeong EW, Noble HK, Paster A, et al. Impact of social determinants of health and
demographics on refill requests by Medicare patients using a conversational artificial intelligence text messaging solution:
cross-sectional study. JMIR Mhealth Uhealth 2019 Nov 18;7(11):e15771 [FREE Full text] [doi: 10.2196/15771] [Medline:
31738170]
31. Prochaska JJ, Vogel EA, Chieng A, Kendra M, Baiocchi M, Pajarito S, et al. A therapeutic relational agent for reducing
problematic substance use (Woebot): development and usability study. J Med Internet Res 2021 Mar 23;23(3):e24850
[FREE Full text] [doi: 10.2196/24850] [Medline: 33755028]
32. To QG, Green C, Vandelanotte C. Feasibility, usability, and effectiveness of a machine learning-based physical activity
chatbot: quasi-experimental study. JMIR Mhealth Uhealth 2021 Nov 26;9(11):e28577 [FREE Full text] [doi: 10.2196/28577]
[Medline: 34842552]
33. Bickmore TW, Schulman D, Sidner C. Automated interventions for multiple health behaviors using conversational agents.
Patient Educ Couns 2013 Aug;92(2):142-148 [FREE Full text] [doi: 10.1016/j.pec.2013.05.011] [Medline: 23763983]
34. Zhang J, Oh YJ, Lange P, Yu Z, Fukuoka Y. Artificial intelligence chatbot behavior change model for designing artificial
intelligence chatbots to promote physical activity and a healthy diet: viewpoint. J Med Internet Res 2020 Sep 30;22(9):e22845
[FREE Full text] [doi: 10.2196/22845] [Medline: 32996892]
35. Radziwill NM, Benton MC. Evaluating quality of chatbots and intelligent conversational agents. arXiv 2017 Apr 15.
36. Brandtzæg PB, Skjuve M, Dysthe KK, Følstad A. When the social becomes non-human: young people's perception of
social support in chatbots. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 2021
May Presented at: CHI '21; May 8-13, 2021; Yokohama, Japan p. 257. [doi: 10.1145/3411764.3445318]
37. Silver L, Smith A, Johnson C, Jiang J, Anderson M, Rainie L. Mobile connectivity in emerging economies. Pew Research
Center. 2019 Mar 7. URL: https://www.pewresearch.org/internet/2019/03/07/mobile-connectivity-in-emerging-economies/
[accessed 2022-04-08]
38. Daley K, Hungerbuehler I, Cavanagh K, Claro HG, Swinton PA, Kapps M. Preliminary evaluation of the engagement and
effectiveness of a mental health chatbot. Front Digit Health 2020 Nov 30;2:576361 [FREE Full text] [doi:
10.3389/fdgth.2020.576361] [Medline: 34713049]
39. Anderson M, Jiang J. Teens, social media and technology 2018. Pew Research Center. 2018 May 31. URL: https://www.
pewresearch.org/internet/2018/05/31/teens-social-media-technology-2018/ [accessed 2022-04-08]
40. Grové C. Co-developing a mental health and wellbeing chatbot with and for young people. Front Psychiatry 2021 Feb
1;11:606041 [FREE Full text] [doi: 10.3389/fpsyt.2020.606041] [Medline: 33597898]

Abbreviations
AI: artificial intelligence
CASC: CureApp Smoking Cessation
CBT: cognitive behavioral therapy
CONSORT-AI: Consolidated Standards of Reporting Trials–Artificial Intelligence
HCAI: Lark Health Coach
HLCC: Healthy Lifestyle Coaching Chatbot
ML: machine learning
NLP: natural language processing
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RCT: randomized control trial
SFA: Smoke Free app
TTM: transtheoretical model

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 16


(page number not for citation purposes)
XSL• FO
RenderX
JOURNAL OF MEDICAL INTERNET RESEARCH Aggarwal et al

Edited by G Eysenbach, A Mavragani; submitted 05.07.22; peer-reviewed by T Sathish, L Weinert, L He; comments to author 21.10.22;
revised version received 03.01.23; accepted 10.01.23; published 24.02.23
Please cite as:
Aggarwal A, Tam CC, Wu D, Li X, Qiao S
Artificial Intelligence–Based Chatbots for Promoting Health Behavioral Changes: Systematic Review
J Med Internet Res 2023;25:e40789
URL: https://www.jmir.org/2023/1/e40789
doi: 10.2196/40789
PMID:

©Abhishek Aggarwal, Cheuk Chi Tam, Dezhi Wu, Xiaoming Li, Shan Qiao. Originally published in the Journal of Medical
Internet Research (https://www.jmir.org), 24.02.2023. This is an open-access article distributed under the terms of the Creative
Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly
cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright
and license information must be included.

https://www.jmir.org/2023/1/e40789 J Med Internet Res 2023 | vol. 25 | e40789 | p. 17


(page number not for citation purposes)
XSL• FO
RenderX

You might also like