A Set Theory- And ROC-Based Patient-Specic QA Measure To Compare Planned And Delivered Isodose Distributions In Photon Radiotherapy

Background: The DVH-based and Gamma Index patient-specific QA measures commonly applied in radiotherapy planning are unable to simultaneously deliver detailed locations and magnitudes of discrepancy between isodoses of planned and delivered dose distributions. By exploiting the receiver operating characteristic (ROC) statistical classification tool, compliance between a planned and delivered isodose may be locally evaluated, both for OAR and PTV, at any given isodose level. Thus, a patient-specific QA tool may be developed to supplement those presently available in clinical radiotherapy. Materials and Methods: A method to locally establish and report dose delivery errors in 3-D isodoses of planned (reference) and delivered (evaluated) dose distributions as a function of both the dose level and spatial location was developed. At any given isodose level, the total volume of dose delivery containing the reference and the evaluated isodoses is locally decomposed into four subregions: true positive – subregions within both reference and evaluated isodoses, true negative – outside of both of these isodoses, false positive – inside the evaluated isodose occurs but not the reference one, and false negatives – inside the reference isodose but not the evaluated one, as established over the total volume of dose delivery. From this decomposition a confusion matrix is derived and various indices calculated to quantify the discrepancies between the given planned and delivered isodose distributions. Results: Examples of clinical photon radiotherapy plans underwent analysis using the method developed. At some isodose levels, at anatomically significant locations, dose delivery errors were found which would not be highlighted either by dose volume histogram (DVH)-based QA metrics or by gamma analysis. Conclusions: The proposed method which generalizes the DVH-based QA method approach and is able to spatially locate delivery errors at selected isodose levels may supplement the presently applied gamma analysis and DVH-based QA measures in patient-specific radiotherapy planning.

Measurement-based patient-specific QA procedures have recently been reviewed by AAPM Task Group No. 218 [15]. Patient-specific QA, including point-wise dose difference between planned and delivered spatial dose distributions, distance to agreement analysis (DTA) [16], the composite test [17], and gamma analysis [18] are used in clinical practice and often integrated within commercial radiotherapy systems. However, limitations of these methods are also recognized. For example, gamma analysis, even with stringent tolerance criteria, may not preclude the occurrence of clinically significant dose delivery errors [19,20]. In particular, apart from the passing rate, gamma index analysis returns the spatial locations of regions classified as containing dose delivery errors. However, in indicating such regions of dose delivery errors the global gamma index operates only marginally better than a random classifier [21].
Such drawbacks stimulate the search for alternative approaches to verify the distributions of dose delivered (as measured by the dosimetry system) against distributions of dose planned by the treatment planning system.
Following the introduction of measurement-guided dose reconstruction [22], application of dose volume histograms (DVHs) to compare measured and planned dose distributions became a subject of debate [23][24][25]. Currently, the proposed DVH-based procedure for patient-specific QA is performed individually for volumes of interest (VOI), e.g., for the planning target (PTV) or organ at risk (OAR) volumes, outlined in computed tomography images. For a given VOI, one DVH is constructed for the planned dose distribution, and another for the dose distribution reconstructed from measurements. Various quantities are then derived as quality indicators from these DVHs, such as D90 -doses delivered to 90% of the volumes of the compared PTVs. If differences between such quantities exceed some pre-determined thresholds, the presence of a clinically significant dose delivery error is reported. Notably, in its current form, such DVHbased verification of dose delivery is inconclusive because even if, for a given VOI, the compared DVHs closely coincide (e.g., for PTV, their D90 -dose values are similar), the spatial distributions of planned and delivered doses may still significantly differ, both quantitatively and qualitatively. Moreover, correspondence between errors established from DVH analysis and errors established from gamma analysis has always been observed to be mild [23 ,24], this raising some concern [25].
Clearly, there is still room for further developmnet of methods supporting patient-specific QA. Here, we demonstrate an application of measures derived from set theory and receiver operating characteristics (ROC) to quantitatively compare a reference (planned) against a measured (evaluated) dose distribution. The proposed approach is supplementary to existing QA methods like DVH or gamma analysis because, as demonstrated in this paper, errors which do not manifests themselves in DVH or gamma can be captured and evaluated with the proposed approach.

II.1.1. Illustrative 2D cases
The performance of the proposed measures if firstly demonstrated for several simplified examples relevant from the point of view of practical radiotherapy QA. The examples described in the current section were neither generated by a TPS nor they are measured dose distributions. They were generated with image processing tools with the intention to demonstrate the background of the proposed methods and to show how these methods highlight some aspects of the dose comparison task. Moreover, to demonstrate that the proposed methods are complementary to the current standard QA tools, the examples were selected so that the difference between the reference and evaluated dose distributions are reflected in neither corresponding DVHs nor gamma passing rates.
As a first example we consider a case of structure (which can be e.g. PTV) shown in Fig. 1a, a reference (planned) dose distribution shown in Fig. 1b and an evaluated (delivered) dose distribution shown in Fig. 1c. To obtain the distribution shown in Fig. 1b, the structure shown in Fig. 1a was blurred with an isotropic Gaussian filter. The size of this filter was equal to 5% of the width of the structure shown in Fig. 1a. In contrast, there is a clear anisotropy of blurring in the evaluated distribution shown in Fig. 1c: the horizontal size of the blurring kernel is only 3% and vertical size of the blurring kernel is 7% of the size of structure in Fig. 1a. The different amount of blurring in dose distributions may depend in real settings on, for example, the speed of MLC leaves moving from some initial to some terminating position or from amount of radiation leaking through the leaves.
In the next example of application of quantitative measures for isodose comparison it is assumed that dose must be uniformly delivered to a structure shown in Fig. 2a. The reference (planned) spatial dose distribution, shown in   Fig. 2b can be for example a plan generated by a TPS based on a prescription defined by the spatial distribution shown in Fig. 2a. The spatial dose distribution shown in Fig. 2c is larger by 4% and shifted horizontally and vertically by 4% of the size of the reference spatial distribution with respect to the spatial distribution shown in Fig. 2b. Also, the penumbra region in Fig. 2c is 40% more wide than in Fig. 2b. The distribution shown in Fig. 2c can be for example an erroneously delivered dose based on a plan shown in Fig. 2b. In clinical settings the difference between the sizes of the reference (Fig. 2b) and evaluated (Fig. 2c) dose distributions may result from MLC leafs mispositioning. The difference between the penumbra sizes of the reference (Fig. 2b) and evaluated (Fig. 2c) dose distributions may result from differences between planned and executed speeds of MLC leaves during dose delivery.
In real settings, differences between reference and evaluated dose distribution can also result from other factors like differences in the performance of the real accelerator compared to the planning system (TPS), differences due to the accuracy of measurements and due to accuracy of measurement device settings, differences due to the accuracy of the TPS calculations, and differences in the geometry of a real multi-leaf collimator (MLC) compared to the TPS definition of MLC.

II.1.2. Clinical data
To illustrate the application of the proposed measures for comparison of isodoses, two radiotherapy plans: of prostate and brain tumours, were analysed. Both plans were prepared for a 6 MV photon beam of a Clinac 2300C/D linac. The analysed data sets consisted of CT scans, outlines of OARs and PTV, a radiotherapy plan (prepared using the Varian Eclipse 11.00 TPS and AAA algorithm), and a TPS-calculated dose distribution (planned doses).
In the prostate tumour plan only the prostate gland was irradiated, without lymph nodes. The total prescribed dose to the PTV was 63 Gy (split into 21 fractions). A five-field 6 MV 2300 C/D photon beam IMRT plan was prepared.
The OARs were the rectum, bladder, and femoral heads. For the brain tumour plan the 6 MV 2300 C/D photon beam VMAT plan consisted of two partial-arcs. The dose to the PTV was 81 Gy (40.5 Gy per each arc) in 15 fractions. The OARs were the optic nerves, optic chiasm, and brain stem, all contained within the target volume. The planning constraints were to reduce the dose within the OARs to the necessary tolerance values while assuring that the therapeutic goal would be met.
The respective distributions of the delivered doses were calculated by Monte Carlo (MC) simulations, using the PRIMO software (www.primoproject.net) [26]. This software is based on the general-purpose PENELOPE 2011 MC engine [27]. The initial settings of the simulator were selected to reproduce real dose profiles measured in a water phantom. The simulator dose units were converted to Gy units to match planned and computed average doses to the PTV.

II.2.1. Pairwise comparison of isodoses
Pairwise comparison of isodoses is based on calculating the so-called confusion matrix which is widely used to summarize results of classifier testing in predictive analysis and in machine learning. Numbers of false positives, false negatives, true positives, and true negatives are reported in cells of the confusion matrix -which can be further used to compute other quantities such as sensitivity or specificity, along principles of receiver operating characteristic (ROC) statistics.
The proposed quantitative method compares regions enclosed by isodoses of two spatial dose distributions: the reference dose distribution, DR , and the evaluated dose distribution, DE . Both these spatial dose distributions are defined over some volume of interest V which is formed by a grid of spatial locations xi,j,k such that: where Δx, Δy, and Δz are grid spacings along x-, y-, and z-axes, respectively. We denote by DR(i,j,k) the reference dose at To evaluate conformance between DR and DE with respect to some dose value Th, we define the reference region DR,Th and the evaluated region DE,Th : , ℎ = { , , ∈ : ( , , ) ≥ ℎ}, , ℎ = { , , ∈ : ( , , ) ≥ ℎ}, where |S| denotes the number of elements in the set S , and S C denotes the complement of set S. The numbers TPTh, TNTh, FPTh and FNTh together form a 2x2 confusion matrix, MTh: Various overlap measures can be derived from MTh. In the present study we focus on sensitivity, SENTh, specificity, SPETh, and Dice coefficient, DCTh, but other measures may also prove to be useful in clinical applications. We define our overlap measures for the isodose Th as follows [28]: where FPRTh, and FNRTh are, respectively, the false positive rate and the false negative rate for dose Th. Note that FNRTh > 0 indicates the presence of cold regions, i.e. regions where the planned doses are higher than Th and delivered doses are lower than Th. For example, FNRTh of 2.5% for a certain isodose level Th means that over 2.5% of the volume within that isodose, the delivered dose is lower than that Th value-it is a cold region within the isodose Th.
Similarly, FPRTh > 0 indicates the presence of hot regions, i.e. regions where planned doses are lower than Th , while delivered doses are higher than Th. An FPRTh of 0.1% for the isodose Th means that over 0.1% of the volume exterior to that isodose Th, the dose delivered is greater than Th -it is then a hot region exterior to isodose Th.
Finally, Dice's coefficient for a certain isodose Th, DCTh, is the ratio of the intersection volume of the regions covered by the planned and realized isodose Th to the arithmetic mean of the volumes of two regions: one covered by the planned isodose Th, and the other one covered by the delivered isodose Th.
Note that the DVH for some region can be considered to be equivalent to SEN (Eq. 6) if an artificial reference dose distribution is appropriately constructed from the CT-based outlines of this region (i.e., while constructing this reference distribution it is sufficient to assume dose to be zero outside the outlines and equal to the maximum value of delivered dose inside these outlines -then all isodoses of this reference distribution coincide with the region outlines).
Calculation of overlap measures as a function of dose Th concludes our evaluation of pair-wise overlap between isodoses of the reference dose distribution DR and the evaluated dose distribution DE.

II.2.2. Evaluating the spatial shift between two isodoses
Overlap measures alone are not sufficient to assess differences between two isodoses. Fig. 4a shows two cases of dose distributions which are characterized by exactly the same dose volume histrograms but are otherwise very different. It follows from the definition of generalized Hausforff distance that replacing the two quantiles in Eq. (10) with, respectively, maximum and minimum operators, one computes the largest distance between a point in one set and the nearest point in the other set, i.e. the ordinary Hausdorff distance. While being sensitive to outliers, the ordinary Hausdorff distance offers the highest QA safety margin.

III.1. 2D illustrative cases
In Fig. 5a we show reference and evaluated DVHs, both calculated for, respectively, reference and evaluated dose distributions shown in Fig. 1b-c with respect to the structure shown in Fig. 1a. Clearly, the substantial difference between the two dose distribution is not reflected in the difference between the two DVHs. It is also not reflected in gamma (3%/3mm) passing rates, which are equal to 99.7% for the whole image and 97.4% for the structure shown in Fig. 1a. In contrast, the proposed indices for isodose comparison, plotted in Fig. 5b Fig. 5c demonstrates the difference between the two distributions shown in Fig. 1b and Fig. 1c. In Fig. 5d the results of gamma (3%/3mm) analysis are shown. To calculate gamma index, the image shown in Fig. 1b was used as the reference distribution and the image shown in Fig. 1c was used as the evaluated distribution. White regions of this image correspond to gamma index values larger than 1. Note that the presence of hot (red) and cols (blue) regions is clear in the difference image (Fig. 5c). The proposed indices help in determining how these cold and hot regions are associated with dose levels which association does not follow directly from either the difference image or gamma index analysis. In Fig. 6a we show reference and evaluated DVHs, both calculated for, respectively, reference and evaluated dose distributions shown in Fig. 2b-c with respect to the structure shown in Fig. 2a. The difference between the two distribution shown in Fig. 2b and Fig2c is even more prominent than for the case depicted in Fig. 1 but again this difference is reflected in neither the difference between the two DVHs nor in gamma (3%/3mm) passing rates (equal to 99.1% for the whole image and 99.4% for the structure shown in Fig. 2a). As for the previous case, the proposed indices for isodose comparison demonstrate that there is discrepancy between the two distributions. Hausdorff distance is relatively big which means that, besides some distortion, there is a relative shift between the two distributions. Fig. 6c demonstrates the difference between the two distributions shown in Fig. 2b and Fig. 2c. In Fig. 6d the results of gamma (3%/3mm) analysis are shown. As in the previous case the presence of hot (red) and cols (blue) regions is clear in the difference image (Fig. 6c). However, looking at Fig. 6b it is clear that cold regions are associated with high doses while hot regions are associated with low doses which is not evident from either Fig. 6c or Fig. 6d.
In the next section we demonstrate the application of the proposed methods to clinical cases. In contrast to the cases discussed above the interpretation of both dose difference and gamma index images is, due to noise, no longer straightforward while the proposed methods provide direct insight into the nature of the difference between a reference and an evaluated dose distributions.

III.2. Clinical data
To illustrate the clinical application of the proposed method we first consider the PTV region of the IMRT prostate plan.
As seen in Gy (which is the total dose to the PTV; this dose level is marked by vertical dashed lines in Fig. 7b). The Hausdorff distance between planned and delivered isodoses can be as large as 2 cm, at the 63 Gy isodose. The volume of the cold region for the 63 Gy isodose is about 11% of the PTV volume. The volume of the hot region for isodose 63 Gy is also about 11% of the PTV volume. In Fig. 8 we show samples of CT slices where the PTV is marked as green, cold regions marked as blue, and hot regions marked as red (the same colour convention is applied throughout). The hot and cold regions were always determined for the 63 Gy isodose -for which we observe a substantial discrepancy between the planned and delivered dose distributions, as demonstrated in Fig. 7b. To assess the significance of the observed cold and hot regions, we calculated histograms of differences between planned and delivered dose distributions over these two regions for the 63 Gy isodose, shown in Fig. 9 . 10a). In Fig. 10b we show our measures of discrepancy between isodoses versus isodose level for dose values observed within the PTV volume. Noting large discrepancies between isodoses above and including 81 Gy (which is the total planned dose to the PTV; this dose level is marked by vertical dashed lines in Fig. 10b), with FPR increasing to over In our third clinical example, we consider one of the OAR regions (the brain) of the VMAT brain plan. Doses to 5% of the OAR are 58.3 Gy and 58.6 Gy for the planned and evaluated dose distributions, respectively. The gamma passing rate for this OAR is 99.5%. DVHs for reference and evaluated dose distributions are also very similar (Fig. 13a) In Fig. 13b we show our measures of discrepancy between planned and measured isodoses. The hot and cold regions were determined for the 40 Gy isodose (denoted by vertical dashed lines in Fig. 13b). The relative volumes of the cold and hot regions for this 40 Gy isodose are about 3% and 1.7% of the OAR volume. In Fig. 14 we show samples of CT slices where the OAR, cold and hot regions with respect to the 40 Gy isodose are appropriately marked. Histograms of the dose differences for cold and hot regions for the 40 Gy isodose are shown in Fig. 15. While the hot region of the 40 Gy isodose is relatively small (1.7% of OAR volume), substantial parts of this hot region receive doses higher than those planned, by more than 5% of the total planned PTV dose.

IV. Discussion
We introduce a novel quantitative method of comparing two dose distributions, encoded as collections of regions delimited by isodoses specified within the range of doses encountered in the compared dose distributions. We have demonstrated our method by analysing illustrative 2D cases as well as examples of clinical plans: a prostate IMRT, and a brain VMAT plan. While, for clinical cases, the Monte Carlo-calculated dose distribution was compared here against one calculated by the TPS to represent a true composite dose measurement, the proposed method can be applied without modification to any other dose measurement protocols, such as perpendicular composite or perpendicular field-by-field.
Notably, DVH analysis is a special case of our proposed framework, corresponding to sensitivity, calculated when comparing the evaluated dose distribution (either planned or measured) with an artificial distribution constructed from the CT outlines of the analysed irradiated region. The framework we propose extends this analysis by several other options, such as the Dice coefficient, Hausdorff distance etc. Its main advantage over DVH analysis is in presenting dose delivery errors as a function of the investigated dose level. In contrast to DVH, the proposed approach analyses dose delivery errors from a perspective considerably broader than only via true positive rate. The proposed features Another important advantage of our approach over DVH is that it enables measured and delivered dose distributions to be compared directly and comprehensively -unlike in common DVH-based analysis where comparisons between CToutlined structures of interest are made separately. This particular advantage over common DVH analysis is especially important. We have demonstrated clinically relevant cases where no differences between measured and planned dose distribution being stated by DVH analysis, substantial discrepancies could nevertheless be recognised using our approach.
Analysis of the dependence of our proposed indices on the isodose level delivers information about discrepancies between planned and delivered isodoses. Because each isodose set is associated with a collection of spatial locations, one may readily identify spatial locations where discrepancies between the reference and evaluated Our clinically relevant therapy plans underwent isodose comparison methods based on measures of intersection between two regions enclosed by isodoses, and based on the distance between such regions. These two approaches deliver complementary information. There is close analogy between simultaneous analysis of intersection volume-based indices and Hausdorff distance, and the common composite analysis which considers both dose differences and DTA when comparing two dose distributions.
Notably, the proposed method of comparison of planned and delivered dose distributions highlights local discrepancies which in the case of other methods, such as gamma analysis, may not be apparent. As such, it can be a supplementary method to the existing QA approaches. As seen in Fig. 16, the differences between isodoses are not merely limited to isolated voxels -attributable to noise -but may be structured over larger volumes -due to other factors, perhaps multileaf sliding window (Fig. 16, top) or arc misalignment (Fig. 16, bottom). The close vicinity of tissues of widely differing densities may also present as hot and cold regions in        Fig. 2b and Fig. 2c with respect to structure shown in Fig. 2a. (b) The proposed indices for pair-wise comparison of the distributions shown in Fig. 2b and Fig. 2c. (c) The difference between the two distributions shown in Fig. 2b and Fig. 2c.
(d) The results of gamma analysis -in white regions gamma index, computed for the two distributions shown in Fig. 2b and Fig. 2c is larger than 1.