0% found this document useful (0 votes)
12 views9 pages

AI in imaging: the regulatory landscape

The document discusses the increasing application of artificial intelligence (AI) in medical imaging and the evolving regulatory landscape governing AI-enabled medical devices. It highlights the need for more rigorous development and validation processes to address weaknesses in the current literature and ensure the safety and effectiveness of these devices. The article emphasizes the importance of high-quality labeled data for training AI models and the implications of regulatory changes for developers and users in the medical imaging community.

Uploaded by

Nabil Chabane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

AI in imaging: the regulatory landscape

The document discusses the increasing application of artificial intelligence (AI) in medical imaging and the evolving regulatory landscape governing AI-enabled medical devices. It highlights the need for more rigorous development and validation processes to address weaknesses in the current literature and ensure the safety and effectiveness of these devices. The article emphasizes the importance of high-quality labeled data for training AI models and the implications of regulatory changes for developers and users in the medical imaging community.

Uploaded by

Nabil Chabane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

BJR, 2024, 97, 483–491

https://doi.org/10.1093/bjr/tqae002
Advance access publication: 4 January 2024
Review

AI in imaging: the regulatory landscape


Derek L.G. Hill , PhD
UCL, Gower Street, London, WC1E 6BT, United Kingdom
Corresponding author: Derek L.G. Hill, PhD, Medical Physics & Biomedical Engineering, Malet Place Engineering Building, UCL, London, WC1E 6BT,
United Kingdom ([email protected])

Abstract

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


Artificial intelligence (AI) methods have been applied to medical imaging for several decades, but in the last few years, the number of publica-
tions and the number of AI-enabled medical devices coming on the market have significantly increased. While some AI-enabled approaches are
proving very valuable, systematic reviews of the AI imaging field identify significant weaknesses in a significant proportion of the literature.
Medical device regulators have recently become more proactive in publishing guidance documents and recognizing standards that will require
that the development and validation of AI-enabled medical devices need to be more rigorous than required for tradition “rule-based” software.
In particular, developers are required to better identify and mitigate risks (such as bias) that arise in AI-enabled devices, and to ensure that the
devices are validated in a realistic clinical setting to ensure their output is clinically meaningful. While this evolving regulatory landscape will
mean that device developers will take longer to bring novel AI-based medical imaging devices to market, such additional rigour is necessary to
address existing weaknesses in the field and ensure that patients and healthcare professionals can trust AI-enabled devices. There would also
be benefits in the academic community taking into account this regulatory framework, to improve the quality of the literature and make it easier
for academically developed AI tools to make the transition to medical devices that impact healthcare.
Keywords: radiological; AI; machine learning; medical device; regulation; bias.

Introduction patients. This article looks at applications of AI in medical


imaging, the evolving regulatory landscape for AI-enabled
Machine learning and artificial intelligence (AI) methods
medical devices, and the implications of this for the develop-
have been applied to medical imaging applications for several
ers and users of AI medical imaging applications, and the
decades, with publications and dedicated conferences on the
medical imaging community more broadly.
topic in the 1990s.1-3 In recent years, however, there has been
a rapid acceleration in activity in this area, both in academic
research and in the launch of commercial products. AI is State of the art of AI in medical imaging
achieving an ever-higher profile in the mass media, most re- Given this article focuses on the regulatory landscape for AI
cently with the high-profile launch of several Generative AI in medical imaging, it is appropriate to start with a medical
tools. The general public is increasingly aware of the poten- device regulators definition, recently published by the FDA.6
tial impact of AI and machine learning on their lives, and of We will use this definition throughout the rest of this publica-
the benefits and risks of AI, and this may be especially the tion, and we will use “artificial intelligence” (AI) as short-
case where AI impacts their health. hand for this definition.
Because AI imaging tools have applications in the diagnosis
and management of patients, they come under the definition of Artificial Intelligence (AI) and Machine Learning (ML)
medical devices, and the medical device regulators are therefore can be described as a branch of computer science, statis-
key gatekeepers in the arrival of such AI tools on the market. AI tics, and engineering that uses algorithms or models to per-
applications in medical imaging are now entering the healthcare form tasks and exhibit behaviors such as learning, making
market in significant numbers. The US Food and Drug decisions, and making predictions. ML is considered a
Administration (FDA), which currently has the most compre- subset of AI that allows models to be developed by train-
hensive database of medical devices, periodically publishes the ing algorithms through analysis of data, without models
number of AI-enabled medical devices that have received mar- being explicitly programmed.
ket authorizations (eg, 510k clearance, de novo). A review of
devices cleared between 2019 and 2021 was published by Barragan-Montero et al, in a thorough technical review,7 em-
Muehlematter et al.4 phasize that the most recent innovation in AI in medical imag-
The most recent FDA publication, published October 19, ing has been in that machine learning subset of AI. In machine
2023,5 reports that, up to the end of July 2023, a total of 692 learning, an algorithm learns from data without needing to be
AI-enabled medical devices had received marketing authoriza- explicitly programmed with a set of rules. The types of algo-
tion, of which more than 75% are for radiology applications. rithms we are considering in this article are those based on ma-
The regulators need to strike a balance between enabling chine learning, as distinct from rule-based approaches.
innovation in this important area and ensuring that AI tools In general, the medical images used for training AI models
put on the market have a positive benefit: risk ratio for have been pre-labelled. This labelling is very often done by

Received: 11 October 2023; Revised: 3 December 2023; Accepted: 26 December 2023


# The Author(s) 2024. Published by Oxford University Press on behalf of the British Institute of Radiology.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
484 BJR, 2024, Volume 97, Issue 1155

experts, who delineate image features by hand. For example, resolution, for example, to reduce examination time
an algorithm to find the boundary of the left ventricle in a while preserving image diagnostic quality.15
cardiac ultrasound scan may be trained with images that 4. Identifying a disease-specific signature, learned from
have been carefully delineated by a radiologist or ultrasound multiple image features, that could be used in the diag-
technician using a drawing tool on a workstation. However, nosis of a disease, for example, an imaging signature of
labelling may also be done based on data that are not in the Alzheimer disease (AD) from MRI.16 Unlike categories 1
images; for example, an algorithm might be trained to tell the and 2, this application of AI goes beyond automating a
difference between patients with rapidly progressing or task that could be done by a radiologist on a worksta-
slowly progressing disease by training with longitudinal out- tion. These approaches may use information from sour-
come data. ces other than just the images to generate their output.
One particular type of machine learning, referred to as 5. Predicting outcomes based on medical images, for exam-
deep learning, has recently become extremely widespread in ple, predicting outcomes from ischaemic stroke,17 intra-
medical imaging applications. Deep learning is used to de- cranial aneurysm rupture, in COVID-19,18 or in
scribe methods in which more sophisticated AI models, typi- oncology.19 These approaches may also use information

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


cally multiple-layer neural networks, are trained to classify from sources other than just the images to generate
input data. They have been shown to provide better perfor- their output.
mance than more traditional data-driven machine learning
methods. Barragan-Montero et al’s review provides a helpful These five categories of imaging AI could be considered to
summary of the various deep learning approaches and high- be in order of increasing potential to positively impact health-
lights that convolution neural networks are currently viewed care: the first two categories involve automating a task that
as state of the art for many medical imaging applications. might currently be performed by a radiologist. The third cate-
Recently, generative AI has received a lot of publicity be- gory uses AI to provide radiologists with higher-quality
cause of its use in natural language processing, and the images than would otherwise have been available. The fourth
launch of generative AI applications such as OpenAI’s and fifth categories, however, are more disruptive as they
ChatGPT based on large language models. Such applications provide information that could not be obtained from a tradi-
can generate content that may be indistinguishable from tional radiology read, and such approaches might be used to
human-generated content and has been shown to be able to streamline clinical workflows, with AI review of images being
pass third-year medical student exams.8 used to directly impact patient management decisions with-
Gong et al9 have reviewed the application of Generative AI out the input of a radiologist.
in medical imaging, and its use to create models trained on Within all these categories, it is possible to have AI algo-
existing medical image data to generate new images with sim- rithms that are trained once, validated, and then used (“train-
ilar underlying properties to real-world images. While made- then-validate”), or algorithms that, once they enter clinical
up medical images might provide opportunities for fraud in use, re-train themselves using additional data received
medical research, it is not yet clear how these might be rele- (“continuous learning”). The “train-then-validate” approach
vant to patient management. Gong et al, however, point out is more consistent with the traditional way that medical de-
that they do provide a way of generating large volumes of vice software is written, with the software developed,
training images for training traditional deep learning algo- “frozen”, and then validated, with the validated device then
rithms, from a smaller set of hand-annotated images. submitted for regulatory review prior to being put on the
Generative AI may, therefore, be seen as a solution to the market. Any significant subsequent update in the software
challenge of obtaining sufficient labelled images to train AI would require a re-review of the technical file. The
algorithms for medical applications. We will return to this “continuous learning” approaches do not fit into this tradi-
topic later in the review, as it highlights the challenge of tional medical device framework. While they provide a means
obtaining truly representative images for training and testing. to continuously improve the performance of AI tools once
Regardless of the underlying algorithmic approach, there they enter clinical use, safeguards would be needed to ensure
are a wide variety of applications of AI in medical imaging that the performance does not worsen with additional train-
that are proposed in the literature. This variety of applica- ing, and in particular, the algorithms do not get better at gen-
tions can be illustrated by dividing AI applications into the erating incorrect output. As a consequence, continuously
following five categories, based on the output they generate learning algorithms are at a much earlier stage in terms of
and the way this output is then used in a clinical setting: impacting medical practice.
A fundamental concept in considering the application of AI
1. Automatic delineation of a structure of interest in an im- in imaging is the assessment of the performance of the AI tool
age, where that structure is known to be present. This in a realistic clinical environment.
structure might be one or more chambers of the heart
from MRI,10 a liver from abdominal CT,11 the hippo-
campus from volumetric MRI.12 Evaluating the performance of AI-based
2. Detecting abnormalities in a medical image, for exam- medical imaging applications
ple, detecting lesions in mammograms,13 or glioma in There are increasing numbers of publications in the academic
MRI.14 Unlike the first category, this approach involves literature that assess the performance of AI methods in medi-
detecting whether an abnormality is present in the im- cal imaging, and report that AI methods can perform as well
age, rather than measuring a structure that is known to or better than normal clinical practice. Examples include
be present. breast cancer detection in mammography20 and stroke.21
3. Image Enhancement such as improving the resolution of However, there are also publications that point out the lim-
reconstructed images such as deep learning super- itations of AI methods and find that radiologists outperform
BJR, 2024, Volume 97, Issue 1155 485

AI tools. A study comparing four commercially marketed AI The AI regulatory landscape


tools for assessment of chest radiographs found concerning Medical device regulations do not specifically deal with the
rates of false positives.22 use of AI in medical devices, often because the regulations
During the COVID-19 pandemic, there was very rapid im- were put in place before AI became widely used. The regula-
plementation and publication of AI-based methods to try to tory landscape is therefore defined by those regulations that
help manage patients, including through predicting outcomes apply to software, augmented by publications from regula-
from chest CT scans. Wynants et al’s review of these publica- tors, such as guidance documents, discussion documents, and
tions18 reported that most were poorly reported and at high recognized standards that help manufacturers apply the med-
risk of bias, therefore potentially of very limited clini- ical device regulations to their products.
cal value. From a medical device regulatory point of view, the
The volume of academic literature on AI in medical imag- amount of performance data required to show that a medical
ing is vast, with tens of thousands of papers being published device is safe and effective is dependent on what the manufac-
per year,7 but systematic reviews of these approaches often turer claims the device can do. These claims are captured not
exclude the great majority for methodological reasons, sug- only in the device’s specific regulatory documentation as

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


gesting only a small fraction of the work in this area may be intended use, intended purpose, or instructions for use, but
close to clinical applicability.23 There are increasing numbers also in associated marketing material, whether in hard copy,
of papers that highlight the risks that many AI products may on the website, or in social media posts, all of which are con-
not be sufficiently safe and effective for clinical use, and en- sidered by regulators to be part of the device “labelling”.
courage the development of suitable “comprehensive guide- Earlier in this article, we illustrated 5 different categories of
lines for their implementation”.24 AI applications that illustrate the variety of intended use: the
This demonstrates the importance of considering how the type and amount of performance data required to show ade-
performance of AI imaging tools should be evaluated both quate performance will clearly vary. Rapid innovation in AI
before, and after, they are put on the market. It is quite possi- applications has resulted in increasing interest from medical
ble that the performance an AI tool achieves while under de- device regulators, and in some cases, significant changes in
velopment is not replicated when it enters clinical use. the documentation required before AI imaging tools can be
The main challenge facing the field is often identified as the put on the market.
difficulty in obtaining sufficient high-quality labelled data to A particular feature of software medical devices, as distinct
train the AI models. In breast imaging, for example, it has from traditional “hardware” medical devices such as a joint
been shown that algorithms perform best when trained with implant, is that the performance of the device can be signifi-
large volumes of highly annotated data, with key image fea- cantly changed by upgrading the software, which can be un-
tures all meticulously labelled by an expert annotation.20 dertaken much more rapidly than updating the design and
There are two distinct phases in developing an AI tool: train- performance of a hardware medical device, and with the up-
ing and then testing. These tasks are distinct and make use of grade deployed entirely remotely. And unlike traditional
training data and test data, respectively. While there are huge hardware devices, software upgrades could also change the
numbers of medical images collected each day in clinical intended use of the device, for example, from providing a ra-
practice, the need for high-quality annotations means that diologist with decision support through prompting for lesions
image data for training and testing algorithms are often in in a radiograph, to finding the lesions automatically and gen-
short supply. As a result, developers frequently use the same erating a treatment plan, with no input from a radiologist.
dataset for both training and testing, using a method called Medical device regulators worked together under the aus-
“cross-validation”. A common approach is to divide the pices of the International Medical Device Regulatory Form
dataset into k equal-sized subsets, then repeat the train-test (IMDRF) to publish guidance on the Clinical Evaluation of
cycle k times, in each case using a different subset for testing, Software as a Medical Device (SaMD) that provided a risk
and the remaining k-1 subsets for training. Performance is framework and guidance on validation, for SaMD.26 This
assessed by averaging the performance over the k repeats. In introduces important concepts for manufacturers, including
this approach, the data used for training is different from the the “clinical association” between device output and the tar-
data used for testing, but it is not truly independent, as all the geted clinical condition, as part of the clinical evaluation pro-
training and test data come from the same original dataset. cess required before putting the device on the market
The alternative approach is to train the algorithm on one (reproduced in Table 1), and a two-dimensional risk-catego-
dataset, and test it on an independently acquired dataset, for rization framework that takes account both of the signifi-
example, collected from different hospitals or over a different cance of the information provided by the software on the
time period. The systematic review by Borchert et al25 healthcare decision, and the state of healthcare situation or
reported that “studies using an independent dataset for vali- condition (reproduced in Table 2). While this guidance
dation, as opposed to cross-validation, reported much lower applies to machine learning software, it does not treat it in a
accuracy particularly when community-based population fundamentally different way from other medical de-
was used”. This has led many to question whether cross- vice software.
validation provides a reliable measure of the performance of The rapid evolution of AI means that regulators are in-
AI tools in clinical practice. creasingly wanting to treat AI differently from traditional
Medical device regulators, aware of this literature and of software, and we are therefore seeing regular publications
concerns around the performance of products on the market, from regulators focused on AI-enabled devices.
have been busy over recent years responding to the rapid in- In a further example of collaboration between medical de-
crease in AI applications by providing additional clarity on vice regulators, the US FDA, UK MHRA, and Health Canada
how medical device regulations apply to AI-enabled devices. jointly published “Good Machine Learning Practice: guiding
486 BJR, 2024, Volume 97, Issue 1155

Table 1. Clinical evaluation process.

Valid clinical association Analytical validation Clinical validation

Is there a valid clinical association between Does your SaMD correctly process input data Does use of your SaMD’s accurate, reliable,
your SaMD output and your SaMD’s targeted to generate accurate, reliable, and precise out- and precise output data achieve your intended
clinical condition? put data? purpose in your target population in the con-
text of clinical care?

Table 2. SaMD risk categories intended medical purpose (horizontal) vs targeted healthcare condition (vertical).

Significance of information provided by SaMD to the healthcare decision


State of healthcare situation
or condition Treat or diagnose Drive clinical management Inform clinical management

Critical IV III II

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


Serious III II I
Nonserious II I I

Table 3. Good machine learning practice for medical device development: guiding principles.

Multi-disciplinary expertise is leveraged throughout the total product Good software engineering and security practices are implemented
life cycle
Clinical study participants and datasets are representative of the Training datasets are independent of test sets
intended patient population
Selected reference datasets are based upon available methods Model design is tailored to the available data and reflects the intended
use of the device
Focus is placed on the performance of the Human-AI team Testing demonstrates device performance during clinically rele-
vant conditions
Users are provided clear and essential information Deployed models are monitored for performance and re-training risks
are managed

principles” in October 2021.27 This is a short document that Risk management in medical devices is already focused on
captures some aspects of good practice in the development of possible harm to patients and the hazardous situation that
medical devices that incorporate machine learning. Table 3 can give rise to that harm. This AAMI publication highlights
reproduces these guiding principles. the fact that AI introduces new possible hazards that are not
While these guiding principles are helpful, for example, properly covered by current product development methodol-
stating that independent test data (rather than cross-validated ogy for “rule-based” algorithms, and provides a detailed rec-
methods) should be used, it is not in all cases clear how to ipe for how to handle risk in AI software. Table 4 gives the
show compliance. In order to provide greater clarity to devel- risks highlighted in this document.
opers, the FDA recognized as a “consensus standard”, a guid- The FDA is arguably the leading medical device regulator
ance document published by AAMI CR34971:2022 for the for providing guidance for device developers and manufac-
application of the established medical device risk manage- turers in their AI-enabled devices. While technically the FDA
ment standard, ISO14971, to medical devices incorporating jurisdiction is limited to the United States, several other juris-
dictions provide fast-track means for FDA-cleared or ap-
AI and machine learning. This document has subsequently
proved devices to be put on the market in their own
been released by BSI as BS/AAMI 34971:2023, demonstrat-
countries. Most recently, the UK MHRA has announced
ing its international impact. This publication starts with a
plans for such a recognition route to enable FDA-cleared and
cautionary note: approved devices to be sold in the UK.
A paper authored by employees at the FDA was recently
Despite the sophistication and complicated methodologies
published, focusing specifically on regulatory concepts and
employed, machine learning systems can introduce risks to challenges for AI-enabled medical imaging devices.28 This ar-
safety by learning incorrectly, making wrong inferences, ticle emphasizes how radiology has been a pioneer in adopt-
and then recommending or initiating actions that, instead ing AI-enabled medical devices in a clinical environment, but
of better outcomes, can lead to harm. also highlights how these devices “come with unique
challenges” including the need for large and representative
The amplification of errors in an AI system has the poten- datasets, dealing with bias, understanding impact on clinical
tial to create large scale harm to patients. workflows, and maintaining safety and efficacy over time.
One key innovation from the FDA is the concept of
With medical devices without AI, risk can be assessed “Predetermined Change Control Plans for Artificial
from real-world experience with that technology. With Intelligence/Machine Learning-enabled Medical Devices”. This
AI-enabled medical devices, however, that experience is idea was proposed in the FDA “Artificial Intelligence/Machine
lacking … . it may be more complex to identify risks and Learning Software as a Medical Device Action Plan” in
bias since the algorithmic decision pathways may be chal- January 2021,29 and in 2023 a draft guidance was published30
lenging to interpret. that describes how this approach would be used to provide
BJR, 2024, Volume 97, Issue 1155 487

Table 4. Risk categories for AI/ML medical devices, to be incorporated in ISO14971 risk analysis.

Data quality Bias Data storage/security/privacy Overtrust

Incorrect data Selection bias Privacy failures Overconfidence


Incomplete data Confounding variables Bias due to privacy Perceived risk
Subjective data Non-normality Inability to contact patient User workload
Underfitting Proxy variables Self-confidence
Overfitting Implicit bias Variation in social trust
Proxy Measure Group attribution bias
Experimental bias

Table 5. Issues to be addressed in ensuring safe and effective use of In addition to taking account of publications from medical
AI tools. device regulators on AI-enabled devices, developers need to take
account of other relevant regulations such as data privacy and
Human-led governance, accountability, and transparency

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


Ensure adherence to legal and ethical values, where the European Union Artificial Intelligence act, which rather like
accountability and transparency are essential for the the FDA discussion paper referred to above, puts in place
development of trustworthy AI requirements for transparency and governance around AI.
Quality, reliability, and representativeness of data
Bias
Privacy and security AI-enabled medical imaging devices on
Provenance of data the market
Relevance
Replicability As stated in the Introduction section, data published by the
Model development, performance, monitoring, and validation FDA5 show that more than three-quarters of AI-enabled medi-
In balancing performance and explainability, it may be cal devices that received marketing authorization up to the end
important to consider the complexity of the AI model of July 2023 are for applications in radiology, with cardiology
applications such as arrhythmia detection from ECG being sec-
ond largest application at 10% of devices. Two-thirds of the ra-
what the FDA describes as “a science-based approach to ensur- diology devices received their marketing authorization in the
ing that AI/ML-enabled devices can be safely, effectively, and three years between August 2020 and July 2023. A spreadsheet
rapidly modified, updated, and improved in response to new of these 350 AI-enabled radiology devices was downloaded
data”. Predetermined change control plans do not provide for from the FDA website, and sorted based on the FDA product
“continuous improvement” in the way many AI proponents ar- code. The FDA product classification database was then used to
gue for, but does provide a means by which manufacturers of cross-reference product code against the type of device, to deter-
AI-enabled medical devices can optionally submit with their mine whether the devices are hardware based (eg, image acqui-
510k or de novo submission, a document which describes the sition devices) or software only, and whether these product
sorts of change that can be made to the device without re-
codes are specific to AI-enabled products. The FDA 510(K) and
review by the FDA, including how risks are mitigated.
513(f) de novo databases32,33 were then searched to find indica-
The FDA also published a “discussion document” in 2023
tions of use for selected devices to identify the intended radio-
on the use of AI and Machine Learning in the development of
logical application, and how they are intended to fit into
drugs and biological.6 While this document is not focused on
clinical workflows.
medical devices, and as a discussion document, is less formal
In Tables 6 and 7, we summarize the result of this analysis.
than a Guidance document, it does give further insights into
For each type of device, we tabulate the number of such devi-
thinking within the FDA on the role of regulators in the applica-
ces cleared or approved by the FDA, and the associated prod-
tion of AI in medical applications and raises the issues that need
to be addressed in ensuring safe and effective use of AI tools. uct code. Table 6 lists the 124 AI-enabled hardware devices.
These are summarized in the categories listed in Table 5. None of these types of devices are AI specific, but some man-
The UK MHRA, which is in the process of updating its ufacturers are incorporating AI in these devices to provide
medical device regulatory structure following the UK’s depar- features such as automatic delineation of image features.
ture from the European Union, is also considering the impli- Table 7 summarizes how the 226 AI-enabled software
cations of AI on medical device pathways. One possible route medical devices are broken down by type of device and prod-
they are considering is the so-called “airlock process”,31 uct code. Most of these types of software devices, unlike the
which provides a means to put some devices on the market hardware devices in Table 6, have an associated definition in
with limited pre-market performance data: the FDA product code database, which is summarized in the
right-hand column. For product code LLZ, which is older
Some manufacturers of innovative products that meet a criti- than the others, there is no such definition provided in the
cal unmet clinical need may struggle to generate evidence in FDA product code database so we have provided one in
the premarket phase. Accordingly, this process will allow italics for comparability. In addition, most of these product
software to generate real world evidence for a limited period codes are specific to AI-enabled devices.
of time while being continuously monitored. Tables 6 and 7 illustrate the wide variety of radiology
hardware and software products that have recently been
This proposal is not yet implemented, and to have great cleared or approved by the FDA. They cover a variety of im-
value to device developers, it will need to link in with other aging modalities and clinical applications, and some treat-
international regulatory approaches. ment devices.
488 BJR, 2024, Volume 97, Issue 1155

Table 6. AI-enabled hardware radiology devices cleared by FDA August that publicly available datasets have played in catalysing in-
2021 to July 2024. novation in AI algorithms. There is now a wide range of pub-
Type of device Number product code licly available datasets that can be used to train machine
learning image analysis algorithms, and here we will in par-
Ultrasonic pulsed Doppler imaging system 28 IYN ticular consider the UK Biobank and the Alzheimer Disease
Ultrasonic pulsed echo imaging system 1 IYO Neuroimaging Initiative (ADNI). These datasets have driven
Mobile X-ray system 1 IZL
Computed tomography X-ray system 38 JAK
a lot of high-quality science, but they do not include a repre-
Emission computed tomography 8 KPS sentative sample of the general population, and illustrate the
Magnet resonance diagnostic device 26 LNH problem of bias in algorithms used to train imaging
Stationary X-ray system 3 MQB AI models.
Densitometer, bone 1 KGI
Image-intensified fluoroscopic X-ray system 2 OWB Bias
Optoacoustic imaging system 1 QNK
Medical charged-particle radiation 15 MUJ
Petrick et al28 reported that a particular concern of regulators
therapy system is how studies used to evaluate performance are “often based

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


on limited patient, group and site diversity”, and it is not
clear how these generalize to actual clinical practice.
A largest number of devices are for product codes QIH Large publicly available datasets, such as the UK Biobank
(Automated Radiological Image Processing Software) and and ADNI dataset, are skewed in terms of demographics. Fry
LLZ (System, Image Processing, Radiological: the product et al reported that “UK Biobank participants were more
code historically used for PACS workstations). These AI- likely to be older, to be female, and to live in less socioeco-
enabled devices provide more sophisticated image segmenta- nomically deprived areas than nonparticipants”.34 Borchert
tion, enhancement, manipulation, and visualization tools et al25 undertook a systematic review in which they consid-
than traditional radiology PACS workstations. ered the role of ADNI in published algorithms using AI for
The product code with the third greatest number of devices in diagnosis and prognosis in neurodegenerative disease. They
Table 7 is QAS, Radiological Computer-Assisted Triage And reported that 71% of these algorithms rely on the ADNI
Notification Software. Examination of the indication for use of data, which introduces multiple sources of bias. They go on
these devices showed that 69% of these are for analysis of head to argue firstly “potential ethnic and socio-economic
CT for the purpose of triage of patients with acute brain injury biases … that may hamper generalization”, and in addition
(intracranial haemorrhage, stroke, brain trauma, large vessel oc- that the image acquisition may be unrepresentative of current
clusion), with most of the rest for use of chest CT for triage of clinical data collection, introducing a bias related to the data.
patients with possible pulmonary embolism or aortic dissection. Similar issues with bias in training sets have been raised for
The indications for use of these devices, however, emphasize AI-enabled computer-aided diagnosis (CAD) in mammogra-
that they need to be used under the supervision of imaging phy35 including unrepresentative patient populations and im-
experts, with statements such as: “not intended to be used as a age acquisition protocols and vendors.
primary diagnostic device”, “notified clinicians are ultimately The skewed nature of these training datasets illustrates the
responsible for reviewing full images per the standard of care”, importance of considering the sources of bias presented in
and to be used “in parallel with standard of care”. This indi- Table 4 in the development and validation of AI-enabled
cates how, for AI-enabled medical imaging devices, risks of in- medical devices, and the need to use independent training
adequate performance currently need to be mitigated by and test datasets, with the test datasets being representative
ensuring significant clinical supervision. of the intended population, to ensure that relevant perfor-
While we are seeing increasing numbers of AI-enabled radi- mance data are obtained. While artificial data can be used as
ology devices coming onto the market, this analysis shows part of the assessment of AI-enabled device performance (eg,
that the impact of these recently marketed devices on clinical FDA product code QIH in Table 7), there is a risk that if gen-
practice is likely to be more incremental than disruptive, erative AI is used to simulate large numbers of additional
more as an adjunct to current radiological workflows, than training or test data based on these biased datasets, this bias
significantly changing workflows. will be amplified.

Clinical meaningfulness
Discussion The widespread availability of well-curated public databases
There have been large numbers of publications on applica- has catalysed the innovation of AI tools, but a perverse conse-
tions of AI and machine learning to medical imaging and ra- quence is that they encourage algorithm developers to focus
diology, and hundreds of medical devices placed on the on problems implicit in the datasets, rather than challenges in
market that are based on machine learning and AI tools. This clinical care. For example, many authors developing algo-
rapid innovation, however, has highlighted some important rithms trained on the ADNI dataset demonstrate that they
challenges that the field needs to address in order for these in- can separate subjects who are “normal”, “mild cognitive
novative tools to be trusted by patients and healthcare profes- impairment” (MCI), or “Alzheimer’s Disease” or that they
sionals. In particular, there is increasing evidence that poorly can accurately predict conversion of MCI to early AD.
implemented AI could lead to patient harm, and there is a However, not only do the patients enrolled in ADNI not rep-
need to identify and mitigate the underlying risks. resent the typical patient population in a community memory
Two key challenges for the field are dealing with bias that clinic, but these sorts of classifiers may not be relevant to
might detrimentally impact real-world performance, and en- addressing a clinically meaningful question. For example, if a
suring that the output is relevant to clinical care, that is, clini- patient arrives in a memory clinic with impaired memory, the
cally meaningful. These challenges are illustrated by the role question is not likely to be “does this patient have MCI or
BJR, 2024, Volume 97, Issue 1155 489

Table 7. AI-enabled software radiology devices cleared by FDA August 2021 to July 2024.

Type of device Number Product code AI specific Definition summary

System, image processing, radiological 56 LLZ N Image visualization, enhancement, and


segmentation
Lung CT computer-aided detection 2 OEB N To assist radiologists in the review of multi-slice CT
of the chest and highlight potential nodules that the
radiologist should review
Liver iron concentration imaging 1 PCS N The determination of iron in the liver for any
companion diagnostic for deferasirox indication where an assessment of liver iron
concentration is needed
Medical image analyser 9 MYN N Now mainly reclassified to Class II
Computer-assisted diagnostic 5 POK N Assist users in characterizing lesions identified on
software for lesions suspicious acquired medical images
for cancer
Radiological computer-assisted triage 26 QAS Y Aid in prioritization and triage of time-sensitive

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


and notification software patient detection and diagnosis based on the
analysis of medical images
Radiological computer-assisted 19 QFM Y To aid in prioritization and triage of time-sensitive
prioritization software for lesions patient detection and diagnosis based on the
analysis of medical images acquired from
radiological signal acquisition systems
Radiological computer-assisted 10 QDQ Y To aid in the detection, localization, and
detection/diagnosis software for characterization of lesions suspicious for cancer on
lesions suspicious for cancer acquired medical images (eg, mammography, MR,
CT, ultrasound, radiography)
Automated radiological image 68 QIH (Y) Automated radiological image processing and
processing software analysis tools. Software implementing artificial
intelligence including non-adaptive machine
learning algorithms trained with clinical and/or
artificial data
Radiological image processing 23 QKB Y Semi-automatic or fully automated radiological
software for radiation therapy image processing and analysis tools for
radiation therapy
Radiological computer-assisted 4 QBS Y Aid in the detection, localization, and/or
detection/diagnosis software characterization of fracture on acquired medical
for fracture images (eg, radiography, MR, CT)
Image acquisition and/or optimization 2 QJU Y Aid in the acquisition and/or optimization of images
guided by artificial intelligence and/or diagnostic signals
Radiological machine learning-based 1 QVD Y Software-only device which employs machine
quantitative imaging software with learning algorithms on radiological images to
change control plan provide quantitative imaging outputs

AD”, but “what is the underlying pathology causing these marketed are for medical imaging applications. However, as
symptoms”, as that can impact subsequent management. the examples given earlier in this article illustrate, the litera-
Borchert et al reported that in their systematic review “We ture contains many papers that justify the medical device reg-
found no studies that assessed the common clinical challenge ulators’ position that these methodologies introduce risks
of differential diagnosis from among multiple (>2) possible that are different, and in many cases greater, than the risks
diagnoses”, which is quite a strong critique of the field. present in traditional “rule-based” software medical devices.
The regulatory framework for AI-enabled medical devices As a consequence, AI-enabled devices on the market mitigate
described in this article has relevance to addressing these sorts these risks with indications for use that require they be used
of limitations in academic AI tool development. The Clinical under expert supervision, often in parallel with current clini-
Evaluation SaMD framework helps clearly define the need to cal practice, reducing their likely impact on clinical practice.
evaluate performance in the context of clinical care; Good For AI to have a greater clinical impact, developers of AI-
Machine Learning Practice makes clear the importance of in- enabled medical imaging tools need to provide more rigorous
dependent datasets for testing and validating (you should not risk analysis and performance assessment than traditional
use a single dataset like UK Biobank or ADNI for both train- software methods that are already on the market.
ing and testing), and the FDA recognized consensus standard Radiologists and their professional bodies have a key role to
AAMI CR34971:2022 provides a detailed framework for play in helping imaging AI researchers and device developers
identifying and mitigating risks such as bias in AI- to put in place more rigorous frameworks for developing
enabled devices. medical imaging AI devices, and monitoring their perfor-
mance on the market in clinical practice. Radiologists and
their radiographer and medical physics colleagues have a de-
Conclusions tailed understanding of the variation in patient presentation,
Artificial intelligence has already demonstrated it has great impact of artefacts, variability due to radiographic practice,
potential to enable novel and valuable medical technologies, and variability caused by different imaging device manufac-
and the great majority of AI-enabled medical devices turers and acquisition parameters, which is of great value in
490 BJR, 2024, Volume 97, Issue 1155

helping identification and mitigation of risks in AI medical implications of large language models for medical education and
imaging tool development. knowledge assessment. JMIR Med Educ. 2023;9:e45312.
Also, as the technology evolves, the regulatory landscape is 9. Gong C, Jing C, Chen X, et al. Generative AI for brain image com-
likely to continue to evolve, and in particular, ways in which puting and brain network computing: a review. Front Neurosci.
2023;17:1203104.
AI software can be updated once on the market, and ways in
10. Avendi MR, Kheradvar A, Jafarkhani H. Automatic segmentation
which the balance of pre-market and post-market perfor-
of the right ventricle from cardiac MRI using a learning-based ap-
mance data can be used to demonstrate safety, are likely to proach. Magn Reson Med. 2017;78(6):2439-2448.
evolve in the near future. 11. Perez AA, Noe-Kim V, Lubner MG, et al. Deep learning CT-based
The evolving regulatory landscape can be criticized for pro- quantitative visualization tool for liver volume estimation: defining
viding developers with a “moving target” by rapidly chang- normal and hepatomegaly. Radiology. 2021;302(2):336-342.
ing the documentation required before AI-enabled medical 12. Balboni E, Nocetti L, Carbone C, et al. The impact of transfer learn-
devices can be put on the market, thus providing a barrier to ing on 3D deep learning convolutional neural network segmentation
innovation. However, it is also arguable that regulators are of the hippocampus in mild cognitive impairment and Alzheimer dis-
being agile in providing developers with increasing clarity on ease subjects. Hum Brain Mapp. 2022;43(11):3427-3438.

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025


how to manage risk and assess the performance of these 13. Ranjbarzadeh R, Dorosti S, Jafarzadeh Ghoushchi S, et al. Breast
evolving technologies, so as to enable safe and effective AI- tumor localization and segmentation using machine learning tech-
enabled medical devices to reach patients. As always with the niques: overview of datasets, findings, and methods. Comput Biol
Med. 2023;152:106443.
regulation of medical devices, there is a balance to be struck
14. van Kempen EJ, Post M, Mannil M, et al. Performance of machine
between enabling innovation and ensuring patient safety, and
learning algorithms for glioma segmentation of brain MRI: a sys-
AI will continue to challenge the existing regulatory frame- tematic literature review and meta-analysis. Eur Radiol. 2021;31
works. Academic researchers developing AI-enabled devices (12):9638-9653.
should also familiarize themselves with this regulatory frame- 15. Bischoff LM, Peeters JM, Weinhold L, et al. Deep learning super-
work to improve the quality of their publications, and facili- resolution reconstruction for fast and motion-robust T2-weighted
tate the transition of research output into products that can prostate MRI. Radiology. 2023;308(3):e230427.
positively impact patient management. 16. Popuri K, Ma D, Wang L, Beg MF. Using machine learning to
quantify structural MRI neurodegeneration patterns of
Alzheimer’s disease into dementia score: independent validation
Funding on 8,834 images from ADNI, AIBL, OASIS, and MIRIAD data-
None declared. bases. Hum Brain Mapp. 2020;241(14):4127-4147.
17. Jabal MS, Joly O, Kallmes D, et al. Interpretable machine learning
modeling for ischemic stroke outcome prediction. Front Neurol.
Conflicts of interest 2022;13:884693.
18. Wynants L, Calster BV, Collins GS, et al. Prediction models for di-
D.L.G.H. was co-founder and remains an advisor to IXICO agnosis and prognosis of covid-19: systematic review and critical
plc. D.L.G.H. is CEO of Panoramic Digital Health SASU and appraisal. BMJ. 2020;369:m1328.
Director of Panoramic Digital Health Ltd. D.L.G.H. under- 19. Dhiman P, Ma J, Andaur Navarro CL, et al. Methodological con-
takes consultancy for a number of pharmaceutical companies duct of prognostic prediction models developed using machine
and medical device companies, including in the area of medi- learning in oncology: a systematic review. BMC Med Res
cal imaging AI. Methodol. 2022;22(1):101.
20. Lotter W, Diab AR, Haslam B, et al. Robust breast cancer detec-
tion in mammography and digital breast tomosynthesis using an
References annotation-efficient deep learning approach. Nat Med. 2021;27
(2):244-249.
1. Kahn CE. Artificial intelligence in radiology: decision support sys-
21. Brugnara G, Baumgartner M, Scholze ED, et al. Deep-learning based
tems. Radiographics. 1994; 14(4):849-861.
detection of vessel occlusions on CT-angiography in patients with
2. Scott R. Artificial intelligence: its use in medical diagnosis. J Nucl
Med. 1993;134(3):510-514. suspected acute ischemic stroke. Nat Commun. 2023;14(1):4938.
3. Woods W, Uckun S, Kohane I, et al. AAAI 1994 Spring 22. Lind Plesner L, M€ uller FC, Brejnebøl MW, et al. Commercially
Symposium Series Reports. AI Mag. 1994;15(3):22-22. available chest radiograph AI tools for detecting airspace disease,
4. Muehlematter UJ, Bluethgen C, Vokinger KN. FDA-cleared artifi- pneumothorax, and pleural effusion. Radiology. 2023;308
cial intelligence and machine learning-based medical devices and (3):e231236.
their 510(k) predicate networks. Lancet Digit Health. 2023; 5 23. Kelly BS, Judge C, Bollard SM, et al. Radiology artificial intelli-
(9):e618-e626. gence: a systematic review and evaluation of methods (RAISE).
5. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Eur Radiol. 2022;32(11):7998-8007.
Medical Devices [Internet]. FDA; 2023. Accessed November 24, 24. Grewal H, Dhillon G, Monga V, et al. Radiology gets chatty: the
2023. https://www.fda.gov/medical-devices/software-medical-device- ChatGPT saga unfolds. Cureus. 15(6):e40135.
samd/artificial-intelligence-and-machine-learning-aiml-enabled-medi 25. Borchert RJ, Azevedo T, Badhwar A, et al. Deep Dementia
cal-devices Phenotyping (DEMON) Network Artificial intelligence for diag-
6. Using Artificial Intelligence & Machine Learning in the nostic and prognostic neuroimaging in dementia: a systematic re-
Development of Drug and Biological Products [internet]. FDA; view. Alzheimers Dement. 2023;19(12):5885-5904.
Accessed October 10, 2023. https://www.fda.gov/media/ 26. Software as a Medical Device (SaMD): Clinical Evaluation—Guidance
167973/download for Industry and Food and Drug Administration Staff [internet]. 2017.
7. Barrag an-Montero A, Javaid U, Valdes G, et al. Artificial intelli- Accessed October 10, 2023. https://www.fda.gov/media/
gence and machine learning for medical imaging: a technology re- 100714/download
view. Phys Med. 2021;83:242-256. 27. Good Machine Learning Practice for Medical Device
8. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT per- Development: Guiding Principles [Internet]. FDA; 2021. Accessed
form on the United States Medical Licensing Examination? The October 10, 2023. https://www.fda.gov/media/153486/download
BJR, 2024, Volume 97, Issue 1155 491

28. Petrick N, Chen W, Delfino JG, et al. Regulatory considerations a-medical-device-change-programme/software-and-ai-as-a-medi


for medical imaging AI/ML devices in the United States: con- cal-device-change-programme-roadmap
cepts and challenges. J Med Imaging (Bellingham). 2023;10 32. FDA 510(k) Premarket Notification [Internet]. Accessed November 24,
(5):051804. 2023. https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm
29. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a 33. FDA Device Classification under Section 513(f)(2)(De Novo)
Medical Device (SaMD) Action Plan [Internet]. FDA. Accessed [Internet]. Accessed November 24, 2023. https://www.accessdata.
October 10, 2023. https://www.fda.gov/media/145022/download fda.gov/scripts/cdrh/cfdocs/cfpmn/denovo.cfm
30. Marketing Submission Recommendations for a Predetermined 34. Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of sociodemo-
Change Control Plan for Artificial Intelligence/Machine Learning graphic and health-related characteristics of UK biobank partici-
(AI/ML)-Enabled Device Software Functions [internet]. Accessed pants with those of the general population. Am J Epidemiol. 2017;
October 10, 2023. https://www.fda.gov/media/166704/download 186(9):1026-1034.
31. Software and AI as a Medical Device Change Programme— 35. Chan HP, Samala RK, Hadjiiski LM. CAD and AI for breast
Roadmap [Internet]. MHRA; 2023. Accessed October 10, 2023. cancer-recent development and challenges. Br J Radiol. 2020; 93
https://www.gov.uk/government/publications/software-and-ai-as- (1108):20190580.

Downloaded from https://academic.oup.com/bjr/article/97/1155/483/7510846 by guest on 07 March 2025

You might also like