In this study, the GPR of the prostate VMAT QA using ArcCHECK was predicted via deep learning using the cylindrical dose distribution developed from the calculated 3D dose distribution in the ArcCHECK phantom. Moderate to strong correlation was shown between pGPR and mGPR, the features of the dose distribution on the cylindrical detector plane are considered to have a direct relationship with mGPR.
Some groups introduced the prediction of the GPR for IMRT QA. Ono et al. proposed the machine learning-based method for the prediction based on the data of ArcCHECK by 28 complexity metrics [27]. The CC values and SD achieved using this method were 0.57 and 2.1–2.4% at 3%/3 mm and 0.55 and 5.4–5.8% at 2%/2 mm. The CC values and SD achieved in our study were 0.67 and 2.2% at 3%/3 mm, 0.73 and 3.3% at 2%/2 mm, respectively. We consider that the dose distribution on the detector is one of the appropriate input data for predicting the GPR because our CNN model achieved current results with the cylindrical dose distribution. Most of the complexity metrics were the integrated value of all control points, and they may have lost some plan features. Because the 3D dose distribution at the detector has the potential to be more directly related to the GPR value, it would be recommended input data for predicting the GPR value.
Another example is a comparison with the deep-learning-based method. Tomori et al. proposed a deep-learning-based method using 2D dose distribution on a gafchromic film [17]. They obtained RMSEs of 1.1%, 1.5%, 1.5%, and 2.2% for 3%/3-mm, 3%/2-mm, 2%/3-mm, and 2%/2-mm tolerances, respectively. These values are smaller when compared with our results (2.4, 3.0, 2.6, and 3.4%). Since our results are from an ArcCHECK system with a 1mm pitch detector element, and the values of mGPR was significantly different, a direct comparison of these values does not provide an accurate goal to achieve. The distribution of mGPR in this study is different from that in a previous study. Tomori et al. had few cases with mGPR lower than 95%, and the SD (mGPR) value was 0.59% for the 3%/3-mm tolerance. Our study had 100 cases (74% of the total 135 cases) with mGPR values lower than 95%, and the SD (mGPR) was 3.0%. Our coefficient of variation (CV) values of mGPR were 0.03, 0.05, 0.04, and 0.06 for 3%/3-mm, 3%/2-mm, 2%/3-mm, and 2%/2-mm tolerances, respectively. These values were significantly different from the values of Tomori et al. (0.01, 0.01, 0.01, and 0.02). The distribution of mGPR is considered to impact the accuracy of the prediction [28]. In both the previous studies and our study, the prediction accuracy of the 2%/2-mm tolerance was worse compared with the tolerance for 3%/3 mm [17, 28]. It would be difficult for the CNN model to predict accurately tight tolerances, including smaller GPR values, due to a larger variation in mGPR. For correlation between pGPR and mGPR, we achieved CC values of 0.67, 0.70, 0.66, and 0.73 for 3%/3-mm, 3%/2-mm, 2%/3-mm, and 2%/2-mm tolerances, respectively, despite the larger variation in mGPR. This result also demonstrated predictability by combining the CNN model and cylindrical dose distribution. The dose distribution of ArcCHECK was the dose on the entry and exit surfaces measured with diodes at 2.9 cm below the surface. These doses could retain more features than planar dose at the center because it is less composite. Thus, the deep learning-based prediction method using the dose distribution on the detector may be more suitable for the GPR of 3D VMAT measurement. Applying the CNN model to the same dataset of GPR measured by multiple device (e.g., gafchromic film, 2D or 3D detector array) may provide useful insight to understand the suitable features for deep learning.
In this study, the probability of the underestimate error was higher than that of the overestimate error. This bias is attributed to the over-representation of cases with a GPR value considerably lower than the mean value. Because the range of the prediction of the GPR value is close to the upper limit (100%), there are no cases with considerably higher GPR values. The median value is lower than mean value in the GPR of the modeling set. Only cases with a considerably lower GPR value may have contributed to the learning. Therefore, the CNN model could have a low prediction bias with less restriction on the GPR values. Thus, the proportion of underestimated cases may have increased. For introducing a CNN model into clinical practice, it is essential to pay attention to the error characteristics of the prediction model. Setting the tolerances for each underestimate and overestimate error is recommended because it is possible that the prediction error does not follow a normal distribution.
There are some limitations to this study. The treatment site for the prediction was limited to 135 prostate plans in this study. To apply our method to clinical practice and simplify the QA process for other treatment cases, it is necessary to broaden the target of predicted treatment sites. Additionally, the expected clinical advantage was not described. The practical advantage of the CNN model is important. However, the result in this study is from a limited number of 135 cases, and some of mGPR values in this study were lower than the acceptance criteria recommended by TG218. Thus, we consider that clinical feedback needs to be discussed carefully. This study used only dose distribution to develop the CNN model. To improve the accuracy of the prediction, it may be necessary to perform further studies using other components related to dosimetric accuracy, such as the dose uncertainty potential and complexity metrics of the treatment plan as additional input data to improve the predictability of the CNN model. The further study will be performed to improve a CNN model.