With the increasing subspecialization of medical fields, the demand for more accurate and informative image reports is booming, challenging radiologists, and medical imaging specialists to know everything about all exams and regions (28). The purpose of image examination today is not only qualitative diagnosis but also obtaining rich quantitative information such as the severity of the disease, prognosis, therapeutic effect of drugs, etc. (29, 30), in which artificial intelligence will make an important difference. The pathological processes of tumor occurrence, growth, and invasion are affected by gene regulation and tumor microenvironment and will show corresponding manifestations in medical images (30). The "common" manifestations of some tumors can be identified by the naked eye and empirically summarized as the image characteristics of specific tumors. However, more "hidden" information that contains personal data, such as the individual prognosis and response to specific drug treatment, cannot be recognized. With the development of algorithms, the efficacy and efficiency of information extraction from images have significantly improved, thus enabling researchers to make more accurate predictions of prognosis, and greatly benefit the clinical management of cancer (31). It has been reported that DL matches and even surpasses human performance in task-specific applications (32, 33).
Most of the previous studies on tumor prognosis prediction were developed based on the approach of radiomics, and the classic steps generally include image acquisition, manual tumor segmentation, feature extraction, feature filtering, and classification (34). Although there are several tumor segmentation methods, segmenting images along the edge of the primary tumor were preferred. However, all the criteria for T staging of NPC which combining evidence-based findings with empirical knowledge were composed of the relationship between tumor and surrounding tissues and organs (35), and were abandoned by the approach of primary tumor segmentation. The characteristics of these relationships undoubtedly contain much valuable information related to tumor prognosis as the C-index of T staging can reach approximately 60%-70% (20, 36). However, an analysis based on the whole MR image is an indispensable step to realize the clinical practical value of these predictive models. Based on this ideal, we established a model based on the whole MR image for prediction in the pre-experiment stage of our study. Although we obtained an accuracy of nearly 70%, the heatmaps based on the model indicated unreasonable extraction of features, such as cerebrospinal fluid, cerebellum, orbital, and parotid gland, that were considered to be related to prognosis in most cases, even if the tumor was far away from these structures. Excessive image noise and small sample size were considered to be the main reasons; therefore, analysis based on the whole MR image was abandoned. Based on this reason, a rectangular ROI composed of the tumor and the surrounding tissues and organs was included in our study. The heat maps generated by our DL models indicated that tumor peripheral signals contained very important prognostic information.
The purpose of tumor risk assessment is to guide the development of an appropriate treatment plan. We expect tumors to respond after receiving the treatment, however, it is not uncommon for the “best treatment plan” to yield poor results. Radiotherapy resistance, recurrence, distant metastasis, and complications caused by radiotherapy are analyzed and considered in many studies as the main causes of treatment failure and patient death (37–40). Oncologists could adjust the predetermined treatment plan according to the obtained evidence to match the estimated tumor risk. As the common RECIST method only evaluates the changes of tumor scope, AI-based analysis is valuable. In the “eye” of AI, medical images are not only pictures, many more prognostic features that are not limited to the tumor scope can be extracted. In our study, we included MR images of NPC before and after a course of treatment for prognosis prediction, and the AUCs of the pre-model, post-model, and ensembling models were 0.745, 0.820, and 0.841, respectively. The post-model shows a better prediction than the pre-model, while MR images of post-treatment were rarely used for AI-based prognosis prediction. For advanced NPC or other advanced malignant tumors that require multiple courses of treatment, it is worth recommending including post-treatment medical images when performing AI-assisted prognosis assessment. In fact, depending on a more mature condition, the best imaginable scenario to assess the images after each treatment course based on AI is to evaluate the real-time risk to guide the optimization of the treatment plan (Fig. 8).
Figure 8: Cycle assessment of cancer risk after treatment guiding treatment plan adjustment for the best outcome
Regarding the topic of predicting the prognosis of NPC based on imaging data, we listed C-indexes and AUCs of several reports which are commonly used as evaluation indicators in Table 3 for comparison. Because it is impossible to analyze all the variables that affect survival, the accuracy given by prediction models inevitably has an upper limit, irrespective if they are based on medical imaging data, clinical data, or both. The C-indexes or AUCs of the prediction models established in the previous studies were between 0.694 and 0.863, which exceeded the predictive ability of TNM staging.
Table 3: Studies predicting the prognosis of nasopharyngeal carcinoma based on medical imaging
There are several shortcomings in our study. First, the number of cases in our study was limited. The size of the dataset has a complex impact on the performance of DL models that are based on a convolutional neural network. Although transfer learning provides a good solution for small datasets, large samples are expected, especially when confronted with MRI-related tasks. However, to ensure the quality of the dataset, only 206 patients remained in our study after filtering the 1034 patients in the initial list. Second, there are no external datasets for validation in our study. The variety of hospitals has an impact on the outcome of tumors, which cannot be reflected in a single-center dataset. Testing based on multicenter data can provide a better understanding of the generalization ability of the established DL models.