0% found this document useful (0 votes)

11 views

Error Classification

The statistics of the selected features showed that all of the difference maps and the feature categories made balanced contributions to solve this classification task. Best performance was achieved by the Linear-SVM model with average overall classification accuracy of 0.86. Specifically, the average classification accuracies of the shift, opening, and the random errors were around 0.9.

Uploaded by

phutycharm15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Error Classification

Uploaded by

phutycharm15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

The structural similarity index for IMRT quality assurance: radiomics-based

error classiﬁcation
Chaoqiong Ma* Ruoxi Wang* Shun Zhou, Meijiao Wang and Haizhen Yue
Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology,
Peking University Cancer Hospital & Institute, Beijing 100142, China
Yibao Zhang and Hao Wua)
Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology,
Peking University Cancer Hospital & Institute, Beijing 100142, China
Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, China
(Received 30 June 2020; revised 3 September 2020; accepted for publication 15 October 2020;
published 27 November 2020)
Purpose: The implementation of radiomics and machine learning (ML) techniques on analyzing
two-dimensional gamma maps has been demonstrated superior to the conventional gamma analysis
for error identification in intensity modulated radiotherapy (IMRT) quality assurance (QA). Recently,
the Structural SIMilarity (SSIM) sub-index maps were shown to be able to reveal the error types of
the dose distributions. In this study, we aimed to apply radiomics analysis on SSIM sub-index maps
and develop ML models to classify delivery errors in patient-specific dynamic IMRT QA.
Methods: Twenty-one sliding-window IMRT plans of 180 beams for three treatment sites were
involved in this study. Four types of machine-related errors of various magnitudes were simulated for
each beam at each control point, including the monitor unit (MU) variations, same-directional and
opposite-directional shifts of the multileaf collimators (MLCs) and random mispositioning of the
MLCs. In the QA process, a total of 1620 portal dose (PD) images were acquired for the beams with
and without errors. The predicted PD images of the original beams were set as references. To quan-
tify the agreement between a measured PD image and the corresponding predicted PD image, four
difference maps including three SSIM sub-index maps, and one dose difference-derived map were
calculated. Then, radiomic features were extracted from the four difference maps of each measured
PD image. We tested four typical classifiers including linear discriminant classifier (LDC), two sup-
porting vector machine (SVM) classifiers, and random forest (RF) for this multiclass classification
task. A nested cross-validation scheme was used for model evaluations, where the SVM recursive
feature elimination method was applied for feature selection. Finally, the performance of the ML
model on identifying the error-free and the erroneous cases was compared to that of the conventional
gamma analysis.
Results: The statistics of the selected features showed that all of the difference maps and the feature
categories made balanced contributions to solve this classification task. Best performance was
achieved by the Linear-SVM model with average overall classification accuracy of 0.86. Specifically,
the average classification accuracies of the shift, opening, and the random errors were around 0.9.
Moreover, ~80% of error-free and MU errors were correctly classified. Using gamma analysis, the
3 mm/3% criterion was found insensitive to errors (sensitivity was only 0.33). Although the sensitiv-
ity to errors with the 2 mm/2% criterion increased to 0.79, still 8% worse than that of the ML model.
Conclusions: We proposed an ML-based method for machine-related error identification in patient-
specific dynamic IMRT QA, where radiomic analysis on SSIM sub-index maps were used for feature
extraction. With extensive validation to select the best features and classifiers, high accuracies in
error classification were achieved. Compared with the conventional gamma threshold method, this
approach has great potential in error identification for the patient-specific IMRT QA process. ©
2020 American Association of Physicists in Medicine [https://doi.org/10.1002/mp.14559]

Key words: IMRT QA, machine learning, quality assurance, radiomics, SSIM analysis

1. INTRODUCTION defined by multileaf collimators (MLCs). To ensure safety

in complex treatment deliveries, patient-specific quality
Intensity modulated radiation therapy (IMRT) is a widely assurance (QA) prior to the IMRT treatment is required to
used treatment modality to provide highly conformal dose check consistency between the planned and the delivered
distribution to the target, meanwhile sparing surrounding dose distributions, and potential relevant errors within the
healthy tissues. Promising performance of IMRT is actual delivery.1,2 In a regular clinical setting, the patient-
achieved by employing asymmetric and irregular apertures specific IMRT QA procedure is performed by

80 Med Phys 48 (1), January 2021 0094-2405/2021/48(1)/80/14 © 2020 American Association of Physicists in Medicine 80
81 Ma et al.: SSIM analysis of errors in IMRT QA 81

measurements with film,3 detector array,4 or electronic por- maps. Different from the gamma maps, the SSIM not only
tal imaging device (EPID).5 reflects perceptual image differences, but also possesses sev-
Gamma analysis has been widely accepted as a quantita- eral independent components describing different local pat-
tive tool to evaluate the agreement between planned and mea- terns of two images. We assert that including
sured dose distributions. In clinical applications, the gamma multidimensional difference measures from the SSIM maps
pass rate is calculated under a criterion combined thresholds would lead to substantial improvement in error classification
in dose difference (DD) and distance-to-agreement (DTA) to based on the planar QA dose images. To verify this assump-
assess the clinical acceptability of IMRT plans.6 However, it tion, several classification models were trained based on the
has been reported that the gamma analysis is insensitive to SSIM sub-index maps of EPID images with or without simu-
dose errors and the results do not exhibit correlations with lated errors. The performance of the models was assessed
the clinical dose errors.7–9 Additionally, it is difficult to iden- and compared with conventional gamma analysis in terms of
tify the root cause of existing discrepancy between dose dis- error detection rate. To the authors’ knowledge, this is the
tributions, owning to the fact that the spatial information is first work in patient-specific IMRT QA error classification
discarded in gamma pass rate. combining the SSIM measures with the radiomics analysis.
To overcome the limitations of conventional gamma analy-
sis, attempts have been made on establishing the relationship
2. MATERIALS AND METHODS
between observed spatial dose discrepancy, and underlying
error types in IMRT QA. Radiomics analysis and convolu- The main hypothesis of this study was that the machine-
tional network (CNN) were employed to extract features from related errors in treatment plan delivery can be detected and
the gamma maps between calculated planar dose maps and distinguished by quantifying the agreement between the mea-
reconstructed planar dose maps based on EPID measure- sured and the predicted portal dose (PD) images of a beam
ments.10,11 Based on the radiomic features extracted from using a radiomics-based ML model in patient-specific QA.
gamma maps, machine learning (ML) models were devel- As is shown in Fig. 1, our data analysis pipeline is composed
oped to identify MLC leaf errors. The applications of radio- of four major steps: error simulation, image preprocessing,
mics and CNN on the analysis of gamma maps in IMRT QA radiomic feature extraction, and model evaluation.
have been demonstrated to provide complementary informa-
tion to traditional gamma analysis. In addition to gamma • Error simulation: Four types of machine-related errors
were simulated for each beam of the selected IMRT
maps, the DD maps were created from volumetric modulated
plans. The PD images were acquired for the plans with
arc therapy (VMAT) QA and analyzed using CNN models.12
and without introduced errors. The predicted PD
Higher accuracy was found on the classification of MLC
images for the original plans were regarded as refer-
positional errors using the DD maps comparing to that of the
ences.
•
gamma maps in VMAT QA.
Image preprocessing: To quantify the effects of the
Recently, the Structural SIMilarity (SSIM) index, an
simulated errors, the difference maps between the mea-
image-quality metric widely used as an objective perceptual
sured PD image and the predicted error-free PD image
measure in the image processing field, was introduced into
were calculated: three from the SSIM sub-indices and
the radiotherapy field.13 The SSIM index was developed to
one from a DD-derived metric. The rationale to intro-
measure the similarity between two images, modeling any
duce an additional metric of a Gaussian-transformed
discrepancy as a combination of loss in correlation, lumi-
DD was due to the reported insensitivity of the SSIM
nance difference, and contrast difference. The capability of
sub-indices to small absolute luminance changes.13,14
the SSIM analysis to reveal different types of errors between
Thus, four difference maps were obtained for each
two dose distributions has been demonstrated by Peng et al.14
measured PD image.
The sub-indices of SSIM, luminance, contrast, and structure,
were indicated capable of detecting absolute dose error, gra- • Radiomic Feature extraction: As the input of the classi-
fication model, radiomic features were extracted from
dient discrepancy, and dose structure error, respectively. Most
each difference maps. Since the radiomics analysis is
importantly, the SSIM sub-index maps could not only indi-
typically performed within the regions of interest
cate the location of large discrepancies, but also demonstrate
(ROIs), the ROI in each difference map was the jaw-de-
different types of error-related patterns, which make the
fined field projection on the EPID.
•
SSIM analysis a potential tool for error identification in
Feature selection method and classifiers: A recursive
IMRT QA. The only study on implementing SSIM index to
feature elimination (RFE) method coupled with a sup-
intra-fraction patient verification showed higher sensitivity
port vector machine (SVM), SVM-RFE, was used for
on the patient positional and anatomical variations than that
the feature selection. We tested four classifiers includ-
of the conventional gamma analysis.15 However, there is a
ing linear discriminant classifier (LDC), supporting
lack of feasibility study on applying the SSIM sub-index
vector machine (SVM) with linear function kernel
maps to IMRT QA for error identification.
(Linear-SVM), SVM with radial basis function kernel
In this study, we aimed to develop a method for error
(RBF-SVM) and random forest (RF) for this classifica-
detection and classification in patient-specific dynamic IMRT
tion task.
QA using image features extracted from the SSIM sub-index

Medical Physics, 48 (1), January 2021

82 Ma et al.: SSIM analysis of errors in IMRT QA 82

• Model evaluation: To select the optimal model, the addition, good linearity of the EPID dose response
(≤ 0:4%) was found of this imager as well. Therefore, this
classification accuracy of each model was assessed
based on a nested cross validation (CV) scheme. imager can provide more accurate measurements to verify
the pretreatment delivery.
In the following parts, each step of the pipeline is According to the manufacturer’s recommendation,17 the
described in details. In this work, we have chosen the EPID EPID was firstly calibrated by acquisition of a dark field (DF)
as the dose image acquisition modality in considerations of image and a flood field (FF) image, which were used to elim-
both efficiency and reproducibility. Since the measurement inate the background noise and correct the pixel-to-pixel
and the calculation of the PD images are essential in the pro- response differences, respectively, for a raw image. Then, the
cess of dataset creation, the EPID calibration procedure and dosimetric calibration of the EPID was performed to evaluate
PD image prediction algorithm is introduced at first. the acquired PD images in Calibrated Units (CUs), where 1
CU was defined as the central pixel response for a 10 10-
cm2 field at a source-to-imager distance (SID) of 100 cm to
2.A. Portal dosimetry 100 MU. At last, two-dimensional beam profile correction
In this study, all PD measurements were acquired with was performed to bring back the expected nonuniformity of
a Varian aS1200 EPID, mounted on a Varian VitalBeam the treatment beam which was divided out by the FF correc-
LINAC equipped with a Millennium 120 leaf MLC (Var- tion.
ian Medical Systems, Palo Alto, CA). This imager had an The predicted PD images, with which the acquired PD
active area of 40 40 cm2 with 1190 1190 pixel arrays images were compared, were calculated using the portal
and 0.336 mm pixel pitch. The new backscatter shielding dosimetry image prediction (PDIP) algorithm.18,19 In this
design of this imager has been demonstrated to be able to study, the EPID-based dosimetry verifications, including
reduce the backscatter artifacts from the robotic support acquisition and prediction of the PD images, were all per-
arm efficiently comparing to the previous models.16 In formed at SID of 100 cm.

FIG 1. Data analysis pipeline. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (1), January 2021

83 Ma et al.: SSIM analysis of errors in IMRT QA 83

The simulated random errors served to emphasize the effects

2.B. Error simulation
may be caused by performance differences of individual
2.B.1. Treatment plans leaves. Though no significant dosimetric effects were found
by introducing such small random errors,22 they were chosen
To introduce treatment plans with different complexities,
to assess the error detection and classification abilities of the
the EPID-based QA dosimetry was performed for plans of
proposed method. To emulate large scale leaf dysfunctions or
various tumor sites, resulting in 180 beams from 21 sliding-
bank positioning errors, the studied systematic errors were
window IMRT plans: 101 beams from 11 head and neck
limited into shift and opening errors depending on whether
plans, 29 beams from three pelvis plans and 50 beams from
the leaf gap was changed or not.
seven thorax plans. The number of monitor units (MUs) for
Once modified, each original plan had eight additional
the beams ranged in 38 and 381 with the median of 124. The
“erroneous” plans. The PD images were measured for all the
beam complexity was quantified using the edge metric (de-
plans and only predicted for the original error-free plans.
fined as ratio of MLC side-length to aperture area),20 which
Thus, 1620 measured PD images and 180 predicted PD
ranged in 0.038 and 0.223 with the median of 0.093. In order
images in total were obtained.
to control model dependence on beam energies and dose
rates, all the selected plans used 6 MV photon beams with a
dose rate of 400 MU/min. In this study, the collimator and
2.C. image preprocessing
gantry angles of each beam in the IMRT QA process were
kept identical with those in the original plans. 2.C.1. SSIM analysis
To assess the effects of the plan modifications, the mea-
2.B.2. Simulation of machine-related errors sured PD image were compared to the predicted error-free
PD image using SSIM analysis. Let x and y be the image
The possible machine-related errors including systematic
patches from the same spatial location of a predicted error-
and random leaf mispositioning and the machine output
free PD image and a corresponding measured PD image,
(MU) variations, that can occur in the dynamic delivery of
respectively. The SSIM index, SSIM ðx, yÞ, of the two patches
treatment plans were considered in this study. The following
is defined as the weighted product of three relatively indepen-
four types of modifications were introduced in each beam of
dent sub-indices:
the selected plans:
1. Random error: at every control point, each open leaf SSIM ðx, yÞ ¼ ½lðx, yÞα ½cðx, yÞβ ½sðx, yÞγ (1)
position was modified by a pseudo-random number where lðx, yÞ, cðx,yÞ, and sðx, yÞ refer to the luminance, con-
produced by a Gaussian distribution with a standard trast and structure index, respectively. The parameters,
deviation, σ, of (a) 1 mm and (b) 2 mm. For the leaf α,β, and γ control the relative importance of the three compo-
positions that violated the MLC carriage and leaf gap nents. The three sub-indices of the two patches are defined,
pair constraints21 after the modification, new pseudo- respectively, as:
random numbers were generated to ensure the deliver-
2μx μy þ C1
ability of the leaf positions. lðx,yÞ ¼ (2)
2. Shift error: same-directional shifting of the MLCs. At μ2x þ μ2y þ C1
every control point, all open leaves in both banks were
2σ x σ y þ C 2
shifted to the same direction by (a) 2 mm and (b) cðx,yÞ ¼ (3)
3 mm. In this scenario, the opening of each leaf pair σ 2x þ σ 2y þ C 2
was displaced toward one side without the size chang- σ xy þ C3
ing. sðx, yÞ ¼ (4)
σ x σ y þ C3
3. Opening error: opposite-directional shifting of the
MLCs. At every control point, all open leaves were where μx andμy , σ x andσ y are the local means and standard
shifted to their retraction directions by (a) 1 mm and deviations of x and y, respectively, and σ xy is cross-covari-
(b) 1.5 mm. Therefore, the opening of each leaf pair ances of x and y. The luminance and the contrast indices are
increased by 2 or 3 mm after the modification. bounded in range [0,1] and the structure index is bounded in
4. MU error: machine output variations. The MU value range [−1, 1]. The constants
was scaled by (a) 5% and (b) −5%. C1 ¼ ðK 1 LÞ2 , C2 ¼ ðK 2 LÞ2 andC 3 ¼ C 2 =2, were included to
avoid numerical instability when the denominator is close to
The chosen magnitudes of the shift and opening errors 0, L is the dynamic range of the pixel values (255 for 8-bit
have been demonstrated to be able to cause significant dosi- grayscale images), and K 1 and K 2 are small constants. All of
metric changes that may compromise clinical outcome.22 We the three sub-indices have a unique maximum, 1, if and only
chose to simulate MU variations of 5% based on the work if the two images are identical. Otherwise at least one of them
which found that the dose variations as small as 5% may is smaller than 1. More details can be found in the original
lead to variations in tumor response and risk of morbidity.23 paper.13

Medical Physics, 48 (1), January 2021

84 Ma et al.: SSIM analysis of errors in IMRT QA 84

In this study, a square window with side length of 11 pix-

2.D. Radiomic feature extraction
els and stride of one pixel was used as the local window,
within which the local mean, standard deviation, and covari- The radiomic features were extracted from the ROIs of the
ance were calculated. Thus, the local window size was about difference maps of each measured PD image to form a dis-
3:7 3:7mm2 with a pixel size of 0.336 mm. To conserve criminative description of the possible delivery error. The dif-
the input size, zero-padding was used on the edge of two PD ference maps were discretized with fixed bin width in such a
images. L ¼ 1 was used as a conventional maximum CU way that the intensity range was quantized into 64 bins for
value of the PD images. As suggested by Peng et al.14, the luminance, contrast, and DD-derived maps and 128 bins
K 1 ¼ 0:01 and K 2 ¼ 0:03 were used throughout this study, for the structure maps in computing texture features. The dis-
since relatively large K settings, such as K 1 ¼ 0:1 and tribution of intensities within an difference map were quanti-
K 2 ¼ 0:3, were found to be insensitive to small dose and gra- fied by 17 first order statistical features and 52 texture
dient disagreements. features derived from the gray-level co-occurrence matrixes
As demonstrated by Peng et al.,14 the luminance, contrast (GLCMs, 22 features), which were determined considering
and structure indices can be used to reflect the absolute inten- two neighbors from each of four angles (0°, 45°, 90°, and
sity difference, intensity variance and the correlation of the 135°) in two dimensions, gray-level dependence matrix
local pattern between a measured and predicted PD image, (GLDM, 14 features) and gray-level size-zone matrix
respectively. Instead of only calculating one SSIM index (GLSZM, 16 features). A total of 276 feature values were cal-
map, the three sub-index maps were generated for each com- culated for each measured PD image. In this work, all statisti-
parison (Fig. 2) in order to preserve all informative features cal and texture features were extracted using the
for further radiomics analysis. PyRadiomics open-source software (version 1.3.0).24 All of
the extracted radiomic features are listed in Table I and their
mathematical definitions can be found elsewhere.25
2.C.2. Dose difference analysis
One weakness of the SSIM analysis is that the luminance
2.E. Feature selection method and classiﬁers
index is relatively insensitive to small luminance differences
between two images.13,14 Therefore, the only use of SSIM This study explored aforementioned four typical classifi-
analysis would not be sufficient in detecting small absolute cation methods including LDC, RBF-SVM, Linear-SVM,
CU changes in the measured PD images. To mitigate this and RF in order to find the optimal classifier for this multi-
insensitivity, the pixelwise relative difference between the class classification task.26 To eliminate the uncorrelated fea-
measured and predicted PD images was calculated as well, tures and avoid overfitting, we used SVM-RFE27 method
acting as a straightforward measure of the absolute CU coupled with the classification models to select the most dis-
changes. Assuming that the relative DD is distributed in a criminative features. The goal of SVM-RFE was to recur-
Gaussian form, a DD-derived metric, dðxi , yi Þ, is defined as: sively prune the least important features, which were ranked
h 2 i by SVM classifier, until the desired number of features to
dðxi , yi Þ ¼ exp μ I yi =I xi =2σ 2r (5) select was reached. Considering limited size of the dataset,
only the top 20 ranked features were preserved to put into the
where I xi and I yi refer to pixel values at the same spatial loca- classification models.
tion, i, of a predicted error-free PD image and a corresponding The classification models and the feature reduction
measured PD image, respectively, μ is the expected pixel value method were implemented by scikit-learn package,28 which
ratio, and σ r is the standard deviation of the Gaussian distribu- is a widely used and publicly available python module. When
tion. The μ has been set to 1 for all analyzed pixels so that performing LDC, the “eigen” solver combined with an auto-
dðxi ,yi Þ is 1 if and only if the values of the two pixels are iden- matically determined shrinkage parameter was used to
tical. Similar to the SSIM sub-indices, this metric has a unique improve the estimation of covariance matrix.29 The SVM
maximum of 1. The sensitivity of this DD-derived metric to classifiers were performed based on CV grid search over the
the pixel value changes can be adjusted by tuning σ r . regularization parameter C values (C = 0.01, 0.1, 1, 10) and
Throughout this study, the value of σ r was set to 0.08 to the kernel coefficient γ values (γ = 0.001, 0.01, 0.1, 1). Note
ensure the sensitivity of this metric to small absolute CU that γ was only used in RBF-SVM. Similarly, RF classifier
changes. To obtain sufficient informative features, this analy- was performed based on CV grid search over tree numbers
sis was only applied to the pixels of CU values greater than from 50 to 200 with a step of 50 and the maximum tree
20% of the maximum CU value of the predicted PD image. depths from 5 to 15 with a step of 5.
Thus, the DD-derived metric for the pixels with CU values In this dataset, the error-free class had only half as many
below this threshold was set to 1. The DD-derived maps gen- cases as any other classes. Such imbalanced data may result
erated for some measured PD images of a single beam are in models that have poor predictive performance for the
displayed in Fig. 2. Thus, a DD-derived map along with the minority class. To avoid such bias, the misclassification pen-
three SSIM sub-index maps were obtained for each measured alty of each class was set to be inversely proportional to the
PD image. These four maps are collectively called the differ- class frequency in the input data. Thus, the misclassification
ence maps.

Medical Physics, 48 (1), January 2021

85 Ma et al.: SSIM analysis of errors in IMRT QA 85

FIG 2. Structural SIMilarity sub-index maps (left three columns) and dose difference-derived maps (rightmost column) generated from measured portal dose
(PD) images with and without errors of a single beam, where the reference is the corresponding predicted PD image. From top to bottom row, the difference maps
correspond to PD images of error-free, random error of σ = 2 mm, shift error of 2 mm, opening error of 2 mm and MU error of +5%. Only the central patch of
size 20 20 cm2 of each map is displayed. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (1), January 2021

86 Ma et al.: SSIM analysis of errors in IMRT QA 86

TABLE I. Radiomic features used in this study. outer loop was performed 120 times so that the performance
of each model was evaluated on 1080 different testing sets.
Categories Features
For each model, the overall classification accuracy (OCA),
First order 10Percentile, 90Percentile, energy, entropy, interquartile defined as the ratio of the correctly classified cases over the
statistics range, kurtosis, mean, mean absolute deviation, median, total number of cases, and the normalized confusion matrix
minimum, range, robust mean absolute deviation, root of each testing set were calculated. In the normalized confu-
mean square, skewness, total energy, uniformity, variance
sion matrix, the element ði, jÞ represented the ratio of the
GLCMs Autocorrelation, joint average, cluster prominence, cluster
shade, cluster tendency, contrast, correlation, difference
cases in class i that were classified as being in class j over the
average, difference entropy, difference variance, joint total number of cases in class i. Accordingly, the diagonal
energy, joint entropy, informational measure of elements represented the specific classification accuracies
correlation1, informational measure of correlation2, inverse (SCAs) of the corresponding classes.
difference moment, inverse difference moment normalized,
inverse difference, inverse difference normalized, inverse
variance, maximum probability, sum entropy, sum of 2.G. Comparison with gamma analysis
squares
GLDM Dependence entropy, dependence non-uniformity, The models were compared to the gamma analysis.32 The
dependence non-uniformity normalized, dependence gamma pass rate, defined as the percentage of dose points
variance, gray level non-uniformity, gray level variance, with a gamma value <1, was calculated to assess the agree-
high gray level emphasis, large dependence emphasis, large ment between each measured PD image to the corresponding
dependence high gray level emphasis, large dependence
predicted error-free PD image. The global DD/DTA criteria
low gray level emphasis, low gray level emphasis, small
dependence emphasis, small dependence high gray level were set to 3%/3 mm and 2%/2 mm. Note that 3%/3 mm is
emphasis, small dependence low gray level emphasis the most commonly used DD/DTA value for gamma criteria33
GLSZM Gray level non-uniformity, gray level non-uniformity and a stricter criterion, 2%/2 mm was adopted in this study
normalized, gray level variance, high gray level zone for the purpose of obtaining high sensitivity of the gamma
emphasis, large area Emphasis, large area high gray level analysis in detecting errors. The pixel values below 20% of
emphasis, large area low gray level emphasis, low gray level
zone emphasis, size zone non-uniformity, size zone non-
the maximum pixel value of the predicted PD image were
uniformity normalized, small area emphasis, small area ignored in the analysis. As commonly used in clinical prac-
high gray level emphasis, small area low gray level tice, a 95% gamma pass rate under each criterion was consid-
emphasis, zone entropy, zone percentage, zone variance ered acceptable for an IMRT plan. Therefore, the measured
PD images of gamma pass rates greater than the threshold
95% were classified as error-free, otherwise the images were
penalty for the error-free class was set twice as large as that classified as erroneous. For brevity, this classification method
for other classes, which resulted in heavier cost of misclassi- is called gamma threshold method. To compare with the
fication for error-free cases in model fitting process. Same gamma threshold method, the cases of various error types
misclassification penalty settings were used in SVM-RFE for classified by the former models were binned under one erro-
feature selection. neous category.

2.F. Model evaluation

3. RESULTS
The performance of each model was evaluated using the
3.A. Feature selection
nested CV scheme, in which the outer loop was performed to
estimate the classification performance while the inner loop We counted the frequency of occurrences of each feature
was performed to obtain the optimal hyperparameters and fit in feature selection procedure for the 1080 training sets. In
the model.30,31 As is shown in Fig. 1, a ninefold CV was used order to see how the most discriminative features were related
in the outer loop of the nested CV scheme so that the total to these difference maps, the top 20 most frequently selected
1620 cases could be evenly partitioned into nine groups features were analyzed. Figure 3 illustrates the distribution of
meanwhile the percentage of samples for each class could be these features on the difference map. Among these difference
preserved. The inner loop contained a fivefold CV for model maps, the contrast index map contributed the most features
validation. In each iteration of the outer loop, a training set (7, 35% of total) while the luminance index map contributed
formed from a different combination of eight groups of data the least features (3, 15% of total) to these 20 features. The
was used in the inner loop, in which an aforementioned grid rest 10 (50%) features were equally selected from the DD-
search was conducted for hyperparameter optimization and derived and the structure index maps. It could be concluded
model fitting. The rest one group of data was used as the test- that all of the four difference maps were necessary in solving
ing set to evaluate the classification performance. Note that this classification task.
the feature selection was always performed prior to model fit- The feature numbers of the involved feature categories of
ting for each training set. To evaluate the variance of the each difference map were annotated in Fig. 3 as well. The 20
model hyperparameters with limited size of the dataset, the most frequently selected features spanned all of the four

Medical Physics, 48 (1), January 2021

87 Ma et al.: SSIM analysis of errors in IMRT QA 87

feature categories. Especially GLCMs, of which the features TABLE II. The 10 features that were selected for every training set. The names
accounted for more than half of the 20 features (11, 55% of of these ten features, as well as the categories and the difference map that
they belonged to, are listed.
total) and covered every difference map. For the other three
feature categories, GLDM, first order statistics and GLSZM, Category Difference map Feature name
the feature numbers of which were 4, 3, and 2, respectively. It
is worth mentioning that there were 10 features selected for First order Contrast Uniformity
every training set. As revealed by Table II, these 10 features GLSZM Luminance Zone entropy
were distributed in every difference map and feature category. GLSZM DD-derived Small area high gray level emphasis
Therefore, all of the first-order statistics and texture feature GLCMs Contrast Correlation
categories involved in this study played important roles in GLCMs Contrast Informational measure of correlation1
establishing the classification models. GLCMs Structure Correlation
The impact of the selected number of features was investi- GLCMs Luminance Informational measure of correlation1
gated by varying the feature set size from 10 to 50 with a step GLCMs DD-derived Cluster prominence
size of 10 in the feature selection procedure. For each of the GLDM Structure Dependence variance
1080 training set, the features were ranked and the classifica- GLDM DD-derived Large dependence high gray level emphasis
tion accuracy of each feature subset was estimated by SVM
classifier using a fivefold CV. Figure 4 illustrates the distribu-
tions of the classification accuracy with respect to the number
of selected features. The mean accuracy (marked as a white testing sets) were 0:83 0:09, 0:86 0:07, 0:86 0:07, and
square for each distribution) increased from 0.803 to 0.836 as 0:80 0:12, respectively. Therefore, the two SVM models
the number of the selected features increased from 10 to 30, had the highest and comparable OCAs, and the LDC came
and then remained fairly stable for the feature-set size greater next. RF exhibited the worst overall performance (lowest
than 30. Specifically, the mean accuracy at feature-set size of OCA).
20 was 0.827, which was <1% lower and >2% higher than To evaluate the performance of the models on predicting
that at feature-set size of 30 and 10, respectively. In addition, each error type, the normalized confusion matrix of each
the impact of the feature-set size was evaluated over the model was averaged over the 1080 testing sets (Fig. 5). For
whole dataset and the highest classification accuracy was all of the models, higher mean SCAs were found on predict-
achieved using the top 22 ranked features. Hence, reducing ing the shift, opening, and random errors (0:87 0:04) com-
the feature-set size to 20 was applied to avoid overfitting paring to that of the MU errors and error-free (0:70 0:12).
meanwhile preserve high classification accuracies. Therefore, all of the four models performed better on discrim-
inating the shift, opening, and random errors than the MU
errors and error-free. Specifically, comparable performance
3.B. Comparison of the classiﬁcation models
was found between the two SVM models on identifying the
The OCAs of LDC, Linear-SVM, RBF-SVM, and RF erroneous cases: the difference of the mean SCA on each
using 120 times nested ninefold CV of the outer loop (1080 error type between the two models was within 1%. However,
the mean SCA on error-free of the Linear-SVM model was
6% higher than that of the RBF-SVM model. Moreover, the

FIG 3. Stacked bar plot summarizing the distribution of the 20 most fre-
quently selected features on the difference map, with data labels showing the
number of features in each feature category. The total counts and percentage
of the features for each difference map are labeled on the top of the corre- FIG 4. Distributions of the classification accuracy with respect to the number
sponding bar. of selected features.

Medical Physics, 48 (1), January 2021

88 Ma et al.: SSIM analysis of errors in IMRT QA 88

two SVM models had the best performance on discriminating which means that the extracted radiomic features of these
the shift, opening, and random errors among all of the mod- three error types were highly distinguishable. However, a
els. Followed by the LDC, the mean SCAs on the opening, large portion of the group formed by MU errors overlapped
random and MU errors were 3–4% lower comparing with the with the group of error-free. This could partially explain the
Linear-SVM model. Identical mean SCAs were found on the confusion between the error-free and the MU error classes.
shift errors and error-free between the LDC and the Linear-
SVM models. Though the RF model had the highest mean
3.C. Inﬂuence of difference analysis and
SCA on MU errors (0.82), the lowest mean SCAs were found
misclassiﬁcation penalty
on error-free and the other three error types. Especially on
error-free, the mean SCA of the RF model was only 0.58, As we mentioned in Section 2.B.4, the purpose of com-
which was much lower than that of the other three models. bining the DD analysis with the SSIM analysis in comparing
Overall, the RF model had the worst classification perfor- the measured and the predicted PD images was to compen-
mance and the best performance was achieved by the Linear- sate the insensitivity of the luminance index to the absolute
SVM model. CU changes. In order to see how the DD-derived map could
To obtain some insights into the relationships within the affect the classification performance, the best performing
data, the linear discriminant analysis (LDA) was applied to model, Linear-SVM, was trained and evaluated on the dataset
project the radiomic features onto the first two of the most without the radiomic features from the DD-derived maps.
discriminative features for visualization. As is presented in The aforementioned nested CV scheme was used for model
Fig. 6, the data were projected down to a two-dimensional evaluation and the normalized confusion matrix is illustrated
scatter plot. In general, the errors of the same type were more in Fig. 7. Over 48% of the error-free and MU errors were
likely to cluster together. Except slightly overlapping in the misclassified as each other, which means that the model was
area where the smaller errors gathered, the shift, opening, not capable of distinguishing error-free and MU errors with-
and random errors were well separated into distinct groups, out the features from the DD-derived maps. Moreover, the

FIG 5. The normalized confusion matrixes of linear discriminant classifier (top left), Linear-supporting vector machine (SVM) (top right), RBF-SVM (bottom
left), and RF (bottom right) averaged over 1080 testing sets. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (1), January 2021

89 Ma et al.: SSIM analysis of errors in IMRT QA 89

of the misclassification penalty enhanced the ability of model

in distinguishing the error-free cases from the cases of MU
errors.

3.D. Comparison with gamma analysis

The two-class classification results of using gamma
threshold method under two DD/DTA criteria and one ML
model, Linear-SVM (for simplicity), are listed in Table III.
The results showed that most of the cases passed the gamma
evaluation (95% gamma pass rate) with the 3%/3 mm crite-
rion, which resulted in high specificity (0.99) but low sensi-
tivity (0.33). Comparing to the 3%/3 mm criterion, much
higher sensitivity (0.79) but lower specificity (0.79) were
achieved using gamma threshold method under the 2%/2 mm
criterion. From the perspective of the clinical practice, the
FIG 6. Results of the two-dimensional linear discriminant analysis. Error- 2%/2 mm criterion would be preferred in gamma threshold
free, random, shift, opening, and MU errors are displayed in red, orange, method since high sensitivity in detecting errors is more criti-
blue, black, and green, respectively. The two magnitudes of each error type cal than high specificity. After degrading the former Linear-
are separated by using circles and crosses. [Color figure can be viewed at SVM model for this two-class classification task, higher
wileyonlinelibrary.com]
specificity (0.80) and sensitivity (0.87) were achieved com-
mean SCAs on the random, shift, and opening errors paring to that of the gamma threshold method under 2%/
decreased by 0:07 0:06. Therefore, the difference analysis 2 mm criterion.
played an essential role in identifying not only error-free and To access the ability of the gamma threshold method in
MU errors, but also the shift, opening, and random errors. detecting each error type, the distributions of gamma pass
To examine the impact of the misclassification penalty rate calculated under the 2%/2 mm criterion with respect to
applied on the imbalanced data, we also evaluated the perfor- the error type were analyzed. As is shown in Fig. 8. The error
mance the Linear-SVM model trained without the misclassi- detection rates of the opening and MU errors were 95%, 81%
fication penalty. The results showed that the mean SCA on respectively, which were 3–4% higher than that of the Linear-
the error-free class drastically dropped to 58% with the mean SVM model. However, the detection rates of 62% and 79%
SCA on the MU errors increased by only ~5% after removing of the random and shift errors, respectively, were more than
the misclassification penalty in model fitting process. The 10% lower comparing to those of the Linear-SVM model.
difference of the mean SCAs of the two models on the other Moreover, the overlapping of the gamma pass rate intervals
three classes were within 1%. Therefore, the implementation of the error types, especially for the random, shift and the
MU errors, limited the gamma threshold method from sepa-
rating these errors. Thus, it could be concluded that our ML
model using radiomic features from the SSIM and DD analy-
sis were superior to the gamma threshold method on identify-
ing errors from PD images.

4. DISCUSSION
In this study, we applied ML models to address the detec-
tion and classification problems of the machine-related errors

TABLE III. Normalized confusion matrixes for the two-class (error-free and
any error) classification task by gamma threshold method applied under the
criteria 3%/3 mm and 2%/2 mm, and an ML model.

Gamma Gamma
threshold (3%/ threshold (2%/ ML model
3 mm) 2 mm) (Linear-SVM)

Ground Truth Error- Any Error- Any Error- Any

\predicted free error free error free error
FIG 7. The normalized confusion matrix of Linear-supporting vector machine
averaged over 1080 testing sets. The model was trained and evaluated without Error-free 0.99 0.01 0.79 0.21 0.80 0.20
using the radiomic features from the dose difference-derived map. [Color fig- Any error 0.67 0.33 0.21 0.79 0.13 0.87
ure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (1), January 2021

90 Ma et al.: SSIM analysis of errors in IMRT QA 90

By the nature of the three error types, the shift and opening
errors would be more likely to introduce strip-like dose
changes perpendicular to the MLC motion directions,
whereas the dose changes induced by random leaf errors
tended to be small localized clusters.10 The resulting absolute
dose and dose gradient variation could be reflected by the
luminance and the contrast index maps (Fig. 2). Especially in
the area of large dose gradient, informative features could be
provided by these two difference maps on identifying the
shift, opening, and random errors. In addition, we observed
that the variance of the dose-changing trend brought by ran-
dom leaf errors could be reflected on the structure index
maps as well. Most notably, the ability of the model in distin-
guishing error-free and the MU errors was dramatically
FIG 8. Distributions of the gamma pass rate under the 2%/2 mm criterion improved by employing additional features from the DD-
with respect to the error type. The dashed line represents the 95% gamma derived map in model training process. In the meantime, the
pass rate.
information of the pixelwise dose difference provided by the
DD-derived map further enhanced the performance of the
based on the radiomic features obtained from the SSIM and model in identifying the three types of the MLC positioning
DD analysis of the EPID measurements in patient-specific errors. Though implementing deep learning approach in
dynamic IMRT QA. The results demonstrated that the pro- EPID image classification has been shown feasible for
posed method was capable of classifying the measured PD extracting discriminative features,11 the lack of interpretabil-
images in terms of leaf positioning and machine output ity of the deep learning features makes them difficult to be
errors. Compared with the pilot studies of Wootton et al. and associated with the visual patterns in the images. Thus, the
Nyflot et al.,10,11 this work incorporated additional error types empirical feature approach employed in this study gained
including leaf opening and machine output errors, and differ- insights into the contributions of the radiomic features of the
ent error magnitudes for error identifications. The best four difference maps to classification decisions. Furthermore,
achieved average OCA was 0.86 in our work, significantly this empirical feature approach remained practical for the rel-
higher than previously reported average OCA (0.643) for atively small datasets, since curating a large library of EPID
multiclass error identification. The accuracy improvement QA images with annotated error types remains challenging.
may result from the feature engineering in this study. Com- Despite promising OCA of ~0.86 was achieved by the two
paring to the DD-DTA-blended gamma maps, the four differ- SVM models, the mean SCAs of these models on the error-
ence maps could reflect different errors patterns resulted free and the MU error cases were relatively low (≤ 0:8) com-
from various beam delivery errors separately hence more paring to those of the other three error types. Since verifica-
informative features could be preserved. For example, position of the machine output was performed before each
tioning errors (i.e., MLC errors, EPID misalignment), can be measurement session and the error was found within 0:5%,
clearly reflected in contrast maps or DTA maps.34 In this the influence of the output variation on the measurements
work, the measured PD images were directly compared to the can be neglected. Moreover, the impact of the EPID response
calculated PD images for difference map calculation, to MU changes can be ignored as well since good linearity
accounting for the existing discrepancy between the PDIP- (≤ 0:4%) of the aS1200 EPID model, which was used in this
modeled images and acquired PD images. The obtained study, has been demonstrated in the previous study.16 There-
results in the current work further proved the feasibility of fore, the influence of the variation from the measurements on
such error detection methods in clinical environments. To our discriminating error-free and the MU errors can be ruled out.
knowledge, this is the first application of ML models with The main cause of such problem could be the accuracy of the
SSIM analysis to classify the patient-specific dynamic IMRT PDIP algorithm since the MLC transmitted radiation was not
QA results. modeled accurately enough in this algorithm.18,35 Large rela-
The radiomic features, including the first order statistics tive difference between the measured and the predicted PD
and the texture features, were captured from each of the dif- images of a dynamic IMRT field was found in low dose
ference maps. The first order statistical features quantified the regions, owning to the large amount of transmitted radiation
distribution of the pixel values meanwhile the texture features through the MLC leaves during the beam delivery. The DD-
calculated from the GLCMs, GLSZM, and GLDM quantified derived map was initially designed to evaluate relative local
the inter-pixel relationships within each map. The results differences, but intensified the differences in the low dose
showed that all of the radiomic feature categories and the dif- regions, as shown in Fig. 2. Furthermore, the transmission
ference maps made important contribution to the classifica- radiation is not completely linear to the machine output, due
tion of the delivery errors contained in the measured PD to MLC leaf modulations. Thus, the radiomic features of the
images. In particular, high accuracies (~0.9) were achieved error-free images and the images of MU errors for some
on the classification of the random, shift, and opening errors. fields can be indistinguishable for the ML models.

Medical Physics, 48 (1), January 2021

91 Ma et al.: SSIM analysis of errors in IMRT QA 91

The aforementioned problem can be addressed from the distinguishing the minority class (class of no error) from the
following aspects to further enhance the classification perfor- majority class (class of MU errors). Nevertheless, the use of
mance of the ML models in the future study. First of all, mul- the inverse weighting from the dataset is not suited for every
tiple regions based on the relative dose level can be classifier, such as the RF classifier, where a large portion of
contoured, so that the features extracted from different the error-free cases were still misclassified as the cases of
regions would comprise dose level information. Secondly, a MU errors. Thus, an adaptive weighting approach may help
correction factor can be used to account for the MLC trans- finding a group of optimal weights for the classes and achiev-
mission in order to improve the agreement between the mea- ing better performance.38 Besides, the approaches of resam-
sured and the predicted PD images.35 Additionally, tuning the pling the dataset, for example, undersampling of the majority
standard deviation of the Gaussian distribution in DD analy- classes or oversampling of the minority classes,39,40 can be
sis may generate more discriminative patterns in DD-derived used to cope with class imbalance problem.
maps for the error-free and the MU errors. Moreover, apply- There were limitations of this study that should be
ing a CNN for feature extraction may capture more informa- noted. We only applied this method to detect the machine-
tive features from the difference maps. related errors in patient-specific QA for dynamic IMRT
Although the 3%/3 mm criterion is most commonly used plans. The feasibility of implementing the present method-
in the standard gamma analysis for IMRT QA, poor sensitiv- ology to VMAT QA needs further exploration. We believe
ity in detecting errors of this criterion has been demonstrated that the similar strip-like and cluster-like patterns would
in this study and by others.11 By contrast, the stricter crite- be brought by the systematic and the random MLC errors,
rion, 2%/2 mm exhibited dramatically improved error respectively, to the SSIM sub-index maps for VMAT QA
detectability. However, the detected errors could not be distin- results which appear to be discriminative to these errors.
guished owning to the highly overlapped gamma pass rate Moreover, we only simulated four types of machine-related
intervals of the error types. Therefore, the proposed radio- errors to investigate the feasibility of the present methodol-
mics-based ML models have great potential assisting the ogy. However, there are other error sources in beam deliv-
gamma analysis for error identification in IMRT QA proce- ery, such as collimator rotation, beam flatness, and
dure. symmetry, as well as gantry rotation in the delivery of
A major concern about the present work is the relatively VMAT plans. Including additional error types and expand-
small dataset. Model trained on a small number of observa- ing the proposed method to VMAT QA will be part of
tions tend to produce overfitted results. To mitigate the risk our future work.
of overfitting, two strategies were adopted: one was limiting The ML models were developed for relatively high-resolu-
the feature-set size in model training and the other one was tion EPID dosimetry in this study. However, the feasibility of
using nested CV scheme in model evaluation. We only implementing the proposed method to detectors of lower res-
selected the top 20 ranked features for our sample size of olution (e.g. diode arrays) is unclear. Especially for VMAT
180 beams, which obeyed the “one in ten” rule of thumb QA, dosimetry systems such as ArcCheck (Sun Nuclear, Mel-
for the number of predictive variables estimated from data bourne, FL)41 and Delta4 system (ScandiDos AB, Uppsala,
when doing regression analysis.36 Also, the feature-set size Sweden)42 are often utilized to identify errors in the integral
of 20 in this work was indicated reasonable to achieve dose distributions. It has been reported that SSIM analysis
promising model performance (Fig. 7). Unlike the conven- could be applied to the dose maps measured by detectors of
tional K-fold CV which uses the same data for hyperparam- lower resolutions by adjusting the local window size, which
eter tuning and model evaluation, the nested CV scheme determines the area used to calculate the local statistics in
used a series of splits of the whole dataset which allowed SSIM analysis.43 However, the impact on the radiomic fea-
one portion of each split for hyperparameter tuning and tures brought by the change of image resolution is currently
model training, and the remaining portion for model evalu- unknown. Further exploration is required to investigate the
ation. Especially for small dataset, the separation of the impact of the detector’s reduced resolution on the perfor-
data for hypermeter tuning and model evaluation in this mance of this method.
validation strategy effectively avoid overfitting caused by In this study, we chose to demonstrate error classification
information leakage.37 performance on the integral PD images per fraction (i.e.,
This study evaluated one feature selection strategy and inter-fraction QA). Another approach would be performing
four commonly used ML classifiers to find the optimal com- error classifications on the acquired image frames during the
bination for the five-class classification task. The classifica- delivery (i.e., intra-fraction), which would be advantageous
tion results showed that the Linear-SVM combined with the to identify certain types of errors, for example, MLC leaf
feature selector RFE-SVM had the best classification perfor- errors and MU errors. However, the integral influence from
mance. However, further research can be performed on the errors of all image frames would pose a challenge. In addi-
exploration of other feature selection methods and classifiers tion, incorporating this method into the error classification of
to achieve better performance. It is worth noting that the mis- an in vivo QA process would be of great benefit, since treat-
classification penalty we applied in model fitting process ment errors are often introduced by other factors rather than
effectively mitigated the bias resulting from uneven class dis- the machines (e.g. positioning errors, anatomical changes,
tributions and helped producing high accuracy in etc.).

Medical Physics, 48 (1), January 2021

92 Ma et al.: SSIM analysis of errors in IMRT QA 92

11. Nyflot MJ, Thammasorn P, Wootton LS, et al. Deep learning for patient-
5. CONCLUSIONS specific quality assurance: Identifying errors in radiotherapy delivery by
radiomic analysis of gamma images with convolutional neural networks.
We proposed an ML-based method for machine-related Med Phys. 2019;46:456–464.
error identification in patient-specific dynamic IMRT QA, 12. Kimura Y, Kadoya N, Tomori S, et al. Error detection using a convolu-
where radiomics analysis on SSIM sub-index maps were used tional neural network with dose difference maps in patient-specific qual-
ity assurance for volumetric modulated arc therapy. Phys Medica.
for feature extraction. High error classification accuracies
2020;73:57–64.
were achieved in IMRT QA using this method and superior 13. Wang Z, Bovik AC, Sheikh HR, et al. Image quality assessment: from
sensitivity in detection errors has been demonstrated in con- error visibility to structural similarity. IEEE Trans Image Process.
trast with the traditional gamma threshold method. This 2004;13:600–612.
14. Peng J, Shi C, Laugeman E, et al. Implementation of the structural
method has great potential to assist the conventional gamma SIMilarity (SSIM) index as a quantitative evaluation tool for dose distri-
analysis for error indentification in IMRT QA process. bution error detection. Med Phys. 2020;47:1907–1919.
15. Bawazeer O, Sarasanandarajah S, Sisira Herath TK, et al. Sensitivity of
electronic portal imaging device (EPID) based transit dosimetry to detect
ACKNOWLEDGMENTS inter-fraction patient variations. Springer Singapore; 2019.
16. Miri N, Keller P, Zwan BJ, et al. EPID-based dosimetry to verify IMRT
This work was supported by the National Key R&D Pro- planar dose distribution for the aS1200 EPID and FFF beams. J Appl
Clin Med Phys. 2016;17:292–304.
gram of China (2019YFF01014405), National Natural 17. Varian Medical Systems. CTB PV: Installation and Verification of the
Science Foundation of China (No. 11505012), Beijing Munic- Portal Dosimetry; 2012:1–40.
ipal Administration of Hospitals Incubating Program (No. 18. Van Esch A, Depuydt T, Huyskens DP. The use of an aSi-based EPID
PX2019042), Ministry of Education Science and Technology for routine absolute dosimetric pre-treatment verification of dynamic
IMRT fields. Radiother Oncol. 2004;71:223–234.
Development Center (No. 2018A01019) and Natural Science 19. Van Esch A, Huyskens DP, Hirschi L, et al. Optimized varian aSi portal
Foundation of Beijing (No. 1202009). dosimetry: development of datasets for collective use. J Appl Clin Med
Phys. 2013;14:82–99.
20. Younge KC, Roberts D, Janes LA, et al. Predicting deliverability of volu-
CONFLICT OF INTEREST metric-modulated arc therapy (VMAT) plans using aperture complexity
analysis. J Appl Clin Med Phys. 2016;17:124–131.
The authors have no conflict to disclose. 21. Carlone M, Cruje C, Rangel A, et al. ROC analysis in patient specific
quality assurance. Med Phys. 2013;40:1–7.
22. Rangel A, Dunscombe P. Tolerances on MLC leaf position accuracy for
† IMRT delivery with a dynamic MLC. Med Phys. 2009;36:3304–3309.
These authors are contributed equally to this work.
a) 23. Dische S, Saunders MI, Williams C, et al. Precision in reporting the dose
Author to whom correspondence should be addressed. Electronic mail:
given in a course of radiotherapy. Radiother Oncol. 1993;29:287–293.
[email protected].
24. Van Griethuysen JJM, Fedorov A, Parmar C, et al. Computational radio-
mics system to decode the radiographic phenotype. Cancer Res.
2017;77:e104–e107.
REFERENCES 25. Zwanenburg A, Vallières M, Abdalah MA, et al. The image biomarker
1. Ezzell GA, Galvin JM, Low D, et al. Guidance document on delivery, standardization initiative: standardized quantitative radiomics for high-
treatment planning, and clinical implementation of IMRT: report of the throughput image-based phenotyping. Radiology. 2020;295:328–338.
IMRT subcommittee of the AAPM radiation therapy committee. Med 26. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning:
Phys. 2003;30:2089–2115. Data Mining, Inference, and Prediction, 2nd edn. New York: Springer; 2009.
2. Miften M, Olch A, Mihailidis D, et al. Tolerance limits and methodolo- 27. Guyon I, Weston J, Stephen B. Gene selection for cancer classification
gies for IMRT measurement-based verification QA: recommendations of using support vector machines. Mach Learn. 2002;46:389–422.
AAPM Task Group No. 218. Med Phys. 2018;45:e53–e83. 28. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine
3. Zhu XR, Jursinic PA, Grimm DF, et al. Evaluation of kodak EDR2 learning in Python. J Mach Learn Res. 2011;12:2825–2830.
film for dose verification of intensity modulated radiation therapy 29. Ledoit O, Wolf M. Honey, I shrunk the sample covariance matrix. J
delivered by a static multileaf collimator. Med Phys. 2002;29: Portf Manag. 2004;30:110–119.
1687–1692. 30. Wainer J, Cawley G. Nested cross-validation when selecting classifiers
4. Jursinic PA, Nelms BE. A 2-D diode array and analysis software for veri- is overzealous for most practical applications; 2018.
fication of intensity modulated radiation therapy delivery. Med Phys. 31. Varma S, Simon R. Bias in error estimation when using cross-validation
2003;30:870–879. for model selection. BMC Bioinform. 2006;7:91.
5. Wu C, Hosier KE, Beck KE, et al. On using 3D γ-analysis for IMRT and 32. Low DA, Harms WB, Mutic S, et al. A technique for the quantitative
VMAT pretreatment plan QA. Med. Phys. 2012;39:3051–3059. evaluation of dose distributions. Med Phys. 1998;25:656–661.
6. Low DA, Moran JM, Dempsey JF, et al. Dosimetry tools and techniques 33. Pan Y, Yang R, Zhang S, et al. National survey of patient specific IMRT
for IMRT. Med Phys. 2011;38:1313–1338. quality assurance in China. Radiat Oncol. 2019;14:1–10.
7. Nelms BE, Zhen H, Tomé WA. Per-beam, planar IMRT QA passing rates 34. Potter NJ, Mund K, Andreozzi JM, Li JG, Liu C, Yan G. Error detection
do not predict clinically relevant patient dose errors. Med Phys. and classification in patient-specific IMRT QA with dual neural net-
2011;38:1037–1044. works. Med Phys. 2020;352:4711–4720.
8. Kruse JJ. On the insensitivity of single field planar dosimetry to IMRT 35. Vial P, Greer PB, Hunt P, et al. The impact of MLC transmitted radiation
inaccuracies. Med Phys. 2010;37:2516–2524. on EPID dosimetry for dynamic MLC beams. Med Phys. 2008;35:
9. Kry SF, Molineu A, Kerns JR, et al. Institutional patient-specific IMRT 1267–1277.
QA does not predict unacceptable plan delivery. Int J Radiat Oncol Biol 36. Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues
Phys. 2014;90:1195–1201. in developing models, evaluating assumptions and adequacy, and mea-
10. Wootton LS, Nyflot MJ, Chaovalitwongse WA, et al. Error detection in suring and reducing errors. Stat Med. 1996;15:361–387.
intensity-modulated radiation therapy quality assurance using radiomic 37. Cawley GC, Talbot NLC. On over-fitting in model selection and subse-
analysis of gamma distributions. Int J Radiat Oncol Biol Phys. quent selection bias in performance evaluation. J Mach Learn Res.
2018;102:219–228. 2010;11:2079–2107.

Medical Physics, 48 (1), January 2021

93 Ma et al.: SSIM analysis of errors in IMRT QA 93

38. Huang W, Song G, Li M, Hu W, Xie K. Adaptive weight optimization 41. Ĺtourneau D, Publicover J, Kozelka J, et al. Novel dosimetric phantom
for classification of imbalanced data. Lect Notes Comput Sci. for quality assurance of volumetric modulated arc therapy. Med Phys.
2013;8261:546–553. 2009;36:1813–1821.
39. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority 42. Bedford JL, Lee YK, Wai P, et al. Evaluation of the Delta4 phantom for
over-sampling technique. J Artif Intell Res. 2002;16:321–357. IMRT and VMAT verification. Phys Med Biol. 2009;54:N167–N176.
40. Liu XY, Wu J, Zhou ZH. Exploratory undersampling for class-imbal- 43. Shi C, Lim S, Chan M. Evaluation of a Transmission Detector on IMRT
ance learning. IEEE Trans Syst Man Cybern Part B Cybern. QA Using Structure Similarity Index (SSIM), poster presented at:
2009;39:539–550. AAPM annual meeting 2018, ePoster ID: TU-C1030-GePD-F6-3.

Medical Physics, 48 (1), January 2021

EWUpdate 1
50% (2)
EWUpdate 1
28 pages
Car Rental System Project Report
70% (210)
Car Rental System Project Report
40 pages
ESTRO Booklet 7
No ratings yet
ESTRO Booklet 7
104 pages
Crisis Plan
100% (2)
Crisis Plan
21 pages
TG 142 Checklist
No ratings yet
TG 142 Checklist
23 pages
Qa Spreadsheet
No ratings yet
Qa Spreadsheet
21 pages
机器学习中的基于图像的特征，用于识别交付错误和预测错误大小，以保证患者特异性的IMRT质量
No ratings yet
机器学习中的基于图像的特征，用于识别交付错误和预测错误大小，以保证患者特异性的IMRT质量
13 pages
Picket Fence Tests Christophides2016
No ratings yet
Picket Fence Tests Christophides2016
21 pages
1-s2.0-S2405631624000873-main
No ratings yet
1-s2.0-S2405631624000873-main
7 pages
J Applied Clin Med Phys - 2022 - Anetai - Assessment of Using A Gamma Index Analysis For Patient Specific Quality Assurance
No ratings yet
J Applied Clin Med Phys - 2022 - Anetai - Assessment of Using A Gamma Index Analysis For Patient Specific Quality Assurance
16 pages
Acceptance Tests and Commissioning Measurements
No ratings yet
Acceptance Tests and Commissioning Measurements
35 pages
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
From Everand
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
Fouad Sabry
No ratings yet
Men2012 PDF
No ratings yet
Men2012 PDF
11 pages
3.machine learning based automatic pro source med phys so 2023
No ratings yet
3.machine learning based automatic pro source med phys so 2023
23 pages
Qa Tables Final
No ratings yet
Qa Tables Final
11 pages
Survey of Patient Specific Quality Assurance Practice For IMRT and VMAT
No ratings yet
Survey of Patient Specific Quality Assurance Practice For IMRT and VMAT
10 pages
MahuvavaC - BPEX (2020)
No ratings yet
MahuvavaC - BPEX (2020)
20 pages
View Synthesis: Exploring Perspectives in Computer Vision
From Everand
View Synthesis: Exploring Perspectives in Computer Vision
Fouad Sabry
No ratings yet
J Applied Clin Med Phys - 2024 - Dunn - Assessing the sensitivity and suitability of a range of detectors for SIMT PSQA
No ratings yet
J Applied Clin Med Phys - 2024 - Dunn - Assessing the sensitivity and suitability of a range of detectors for SIMT PSQA
21 pages
32 IMRT AAPM Guidelines
No ratings yet
32 IMRT AAPM Guidelines
70 pages
Validation of Compass For Pre-Treatment Patient-Specific Quality Assurance
No ratings yet
Validation of Compass For Pre-Treatment Patient-Specific Quality Assurance
9 pages
PerFRACTON FAQ External 041317
No ratings yet
PerFRACTON FAQ External 041317
12 pages
Sadagopan 2009
No ratings yet
Sadagopan 2009
16 pages
Precision Radiation Oncology - 2023 - Roy - Comparison of Rapid Arc and Intensity Modulated Radiotherapy in a True Beam
No ratings yet
Precision Radiation Oncology - 2023 - Roy - Comparison of Rapid Arc and Intensity Modulated Radiotherapy in a True Beam
12 pages
TPS QA 2022 30min
No ratings yet
TPS QA 2022 30min
69 pages
J Applied Clin Med Phys - 2020 - Mehrens - Survey Results of 3D CRT and IMRT Quality Assurance Practice
No ratings yet
J Applied Clin Med Phys - 2020 - Mehrens - Survey Results of 3D CRT and IMRT Quality Assurance Practice
7 pages
embc_2021-6
No ratings yet
embc_2021-6
7 pages
Applied Machine Learning and Multi-criteria Decision-making in Healthcare
From Everand
Applied Machine Learning and Multi-criteria Decision-making in Healthcare
Ilker Ozsahin
No ratings yet
Intelligent Technologies for Automated Electronic Systems
From Everand
Intelligent Technologies for Automated Electronic Systems
S. Kannadhasan
No ratings yet
Evaluation of Elekta Agility Multi Leaf Collimator Performance Using Statistical Process Control Tools
No ratings yet
Evaluation of Elekta Agility Multi Leaf Collimator Performance Using Statistical Process Control Tools
10 pages
01
No ratings yet
01
6 pages
Escude - Point Dose Measurements in IMRT QA
No ratings yet
Escude - Point Dose Measurements in IMRT QA
5 pages
Advances_in_radiation_therapy_dosimetry
No ratings yet
Advances_in_radiation_therapy_dosimetry
9 pages
169_Genetic_algorithm_based_script_for_a
No ratings yet
169_Genetic_algorithm_based_script_for_a
1 page
TG 218
No ratings yet
TG 218
32 pages
1-s2.0-S1350448724002920-main
No ratings yet
1-s2.0-S1350448724002920-main
33 pages
Radiotherapy Treatment Planning - Copy (4)
No ratings yet
Radiotherapy Treatment Planning - Copy (4)
5 pages
1 s2.0 S2405631623000830 Main
No ratings yet
1 s2.0 S2405631623000830 Main
7 pages
Validation of Monaco Treatment Planning System For
No ratings yet
Validation of Monaco Treatment Planning System For
5 pages
Remote Radiotherapy Planning The EIMRT Project
No ratings yet
Remote Radiotherapy Planning The EIMRT Project
7 pages
A Review On Detection of Parkinsons Disease Using ML Algorithms
No ratings yet
A Review On Detection of Parkinsons Disease Using ML Algorithms
6 pages
TG-142 - Medical Accelerators: Performance-Based QA For Radiotherapy
No ratings yet
TG-142 - Medical Accelerators: Performance-Based QA For Radiotherapy
7 pages
TRANSLATED
No ratings yet
TRANSLATED
7 pages
Physical and Radiobiological Evaluation
No ratings yet
Physical and Radiobiological Evaluation
18 pages
Medical Physics - 2019 - Pakela - Quantum‐inspired algorithm for radiotherapy planning optimization
No ratings yet
Medical Physics - 2019 - Pakela - Quantum‐inspired algorithm for radiotherapy planning optimization
14 pages
xcfsa
No ratings yet
xcfsa
9 pages
Dos 542 Qa Table
No ratings yet
Dos 542 Qa Table
28 pages
TG 218 Tolerance and Measurement Methods IMRT
No ratings yet
TG 218 Tolerance and Measurement Methods IMRT
9 pages
Evaluation of Automated Image Registration Algorithm For Image-Guided Radiotherapy
No ratings yet
Evaluation of Automated Image Registration Algorithm For Image-Guided Radiotherapy
11 pages
40108750
No ratings yet
40108750
6 pages
Linac Tps CT Daily Monthly Annual Qa Tables
No ratings yet
Linac Tps CT Daily Monthly Annual Qa Tables
36 pages
Chan 2020
No ratings yet
Chan 2020
8 pages
Li 2024 J Phys Conf Ser 2724 012050 240610 231203
No ratings yet
Li 2024 J Phys Conf Ser 2724 012050 240610 231203
8 pages
Int J Radi-V20n3p563-En
No ratings yet
Int J Radi-V20n3p563-En
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
TG-218 Aapm
No ratings yet
TG-218 Aapm
32 pages
IMRT - Patient - Specific - QA - Adamson
No ratings yet
IMRT - Patient - Specific - QA - Adamson
52 pages
Quality Assurance of Treatment Planning Systems - Practical Examples For non-IMRT Photon Beams PDF
No ratings yet
Quality Assurance of Treatment Planning Systems - Practical Examples For non-IMRT Photon Beams PDF
104 pages
AAPM TG-218 - Tolerance Limits and Methodologies For IMRT Measurement Based Verification QA
No ratings yet
AAPM TG-218 - Tolerance Limits and Methodologies For IMRT Measurement Based Verification QA
31 pages
IMRT_ch22
No ratings yet
IMRT_ch22
32 pages
Vmat Thesis
100% (1)
Vmat Thesis
4 pages
TG 218 Psqa Recomen
No ratings yet
TG 218 Psqa Recomen
45 pages
Dirty Enid's Awesome Alternative Pub Quiz: Questions & Answers
0% (1)
Dirty Enid's Awesome Alternative Pub Quiz: Questions & Answers
5 pages
Unit 3 PPT Mwe
No ratings yet
Unit 3 PPT Mwe
40 pages
Instruction: Department of Defense
No ratings yet
Instruction: Department of Defense
12 pages
Syllabus Sem-VIII PDF
No ratings yet
Syllabus Sem-VIII PDF
22 pages
Download Complete Chemical Ionization Mass Spectrometry Second Edition Alex G. Harrison PDF for All Chapters
100% (2)
Download Complete Chemical Ionization Mass Spectrometry Second Edition Alex G. Harrison PDF for All Chapters
62 pages
Hercor College Final Exam Gee 101 2
No ratings yet
Hercor College Final Exam Gee 101 2
5 pages
Bibliography
No ratings yet
Bibliography
4 pages
Plan Lectie CL 10
No ratings yet
Plan Lectie CL 10
5 pages
Written Report
50% (2)
Written Report
16 pages
Micro Presentation - TAPMI Interview Round
No ratings yet
Micro Presentation - TAPMI Interview Round
53 pages
Components of GMP - Pharma Uptoday
No ratings yet
Components of GMP - Pharma Uptoday
3 pages
Fairy Breeder 02 A Portal Harem Fantasy (Simon Archer) (Z-Library)
No ratings yet
Fairy Breeder 02 A Portal Harem Fantasy (Simon Archer) (Z-Library)
198 pages
Allahabad High Court - Court Fee - Revenue - DR - Sushil - Suri - Vs - Harish - Suri - and - 3 - Others - On - 31 - January - 2023 - Full Judgement
No ratings yet
Allahabad High Court - Court Fee - Revenue - DR - Sushil - Suri - Vs - Harish - Suri - and - 3 - Others - On - 31 - January - 2023 - Full Judgement
13 pages
Employee Satisfaction Survey QUESTIONNAIRE
No ratings yet
Employee Satisfaction Survey QUESTIONNAIRE
4 pages
Instant Download Economic Sanctions Reconsidered Third Edition Gary Clyde Hufbauer PDF All Chapter
100% (4)
Instant Download Economic Sanctions Reconsidered Third Edition Gary Clyde Hufbauer PDF All Chapter
70 pages
PG Medical Question Bank
No ratings yet
PG Medical Question Bank
12 pages
Self Lubricated Bearings
100% (1)
Self Lubricated Bearings
16 pages
Mebo Ointment
No ratings yet
Mebo Ointment
17 pages
Comstrat 563 Final Project KD
No ratings yet
Comstrat 563 Final Project KD
9 pages
SEAGATE Yachts and Marina Dubai Grundriss Gebaeude 1
No ratings yet
SEAGATE Yachts and Marina Dubai Grundriss Gebaeude 1
38 pages
Veer Enterprise: Devi Shoping Center, Shop No. 23, Sanala Road, Morbi - 363641
No ratings yet
Veer Enterprise: Devi Shoping Center, Shop No. 23, Sanala Road, Morbi - 363641
1 page
Cone Beam CT of The Head and Neck-An Anatomical Atlas (2011) PDF
No ratings yet
Cone Beam CT of The Head and Neck-An Anatomical Atlas (2011) PDF
76 pages
Bibliography
No ratings yet
Bibliography
5 pages
Jubbal Empire
No ratings yet
Jubbal Empire
23 pages
ChatGPT Content Prompt
No ratings yet
ChatGPT Content Prompt
7 pages
38 Souls
No ratings yet
38 Souls
42 pages
How To Install OS Windows 7 / Format?
No ratings yet
How To Install OS Windows 7 / Format?
55 pages

Error Classification

Uploaded by

Error Classification

Uploaded by

The structural similarity index for IMRT quality assurance: radiomics-based

1. INTRODUCTION defined by multileaf collimators (MLCs). To ensure safety

Medical Physics, 48 (1), January 2021

FIG 1. Data analysis pipeline. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (1), January 2021

The simulated random errors served to emphasize the effects

Medical Physics, 48 (1), January 2021

In this study, a square window with side length of 11 pix-

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

2.F. Model evaluation

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

of the misclassification penalty enhanced the ability of model

3.D. Comparison with gamma analysis

Ground Truth Error- Any Error- Any Error- Any

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

Medical Physics, 48 (1), January 2021

You might also like