Using 18 F-FDG PET/CT to Predict Esophageal Cancer Survival: A Meta-Analysis

Purpose: This study aimed to explore whether metabolic responses to 18 F-fluorodeoxyglucose positron emission tomography/computed tomography collected before, during, or after the treatment can predict the long-term survival rate of patients with esophageal cancer. Patients and Methods: We searched for the following indices in articles listed in English and Chinese literature databases: the maximum standard uptake value (SUV max ), mean standard uptake value (SUV mean ), Metabolic Tumor Volume (MTV), and Total Lesion Glycolysis (TLG). If their values exceeded the thresholds, we defined them as responders; if they did not, we defined them as non-responders. We then performed a meta-analysis by extracting the Hazard Ratio (HR) and 95% confidence interval (95% CI) from each report to predict whether the status of responder or non-responder had an impact on prognosis. Results: We identified 34 articles with a combined sample size of 2794 patients. HRs and 95% CIs were measured as follows: SUV max = 1.15 (0.98-1.35), MTV = 3.45 (0.78-15.25), TLG = 1.04 (1.02-1.07), and SUV mean = 1.85 (1.33-2.57) (before treatment); ΔSUV max = 1.22 (1.06-1.39), Δ MTV = 1.07 (0.54- 2.15), and ΔTLG = 1.09 (0.59-2.02) (during treatment); and SUV max = 1.13 (1.05-1.22) and TLG = 1.05 (1.02-1.09) (after treatment). The results showed that the overall survival of the patients with low SUV (MTV, TLG) values was significantly higher than that of the patients with high SUV (MTV, TLG) values. Conclusions: This meta-analysis shows that the prognoses of patients with PET metabolic responses are significantly better than those of non-responders. Our findings may help inform the clinical treatment and prediction of the prognoses of patients with esophageal cancer.


Introduction
Likely due to differences in economic development and living habits, the incidence of upper gastrointestinal cancer is high in economically underdeveloped areas, especially in East Asia and East Africa [1]. The annual incidence of upper gastrointestinal cancer in China, for example, accounts for 44.6% of the global incidence of the disease with a crude mortality rate of 13.68/100000 [2]. Esophageal cancer is one of the most common tumors of the upper digestive system. It is principally treated with a combination of surgery and neoadjuvants or definitive radiotherapy and chemotherapy. While this multimodal treatment has greatly reduced the mortality and improved the disease-free survival rate of patients with esophageal cancer, the accurate prediction of the prognoses of patients following

Research Article
Using 18 F-FDG PET/CT to Predict Esophageal Cancer Survival: A Meta-Analysis and after treatment was greater than the standard percentage (e.g., ΔSUV max > 23%). The values of PET parameters used as response thresholds differ greatly, and are primarily based on experience. Due to the differences in reported thresholds, we have not listed the values here.
As the literature featured no standardized guidelines, what changes in PET parameters across treatment are considered to indicate prognosis vary. Further, whether PET can predict the mortality and disease-free survival rate of patients remains controversial. To help inform the resolution of this controversy and contribute to a reference for clinical practice, the present meta-analysis of all relevant and available literature aimed to conduct a systematic, objective analysis of PET factors predictive of survival following esophageal cancer.

Literature search
We searched the Cochrane library MEDLINE, EMBASE, and China National Knowledge Internet for documents published in Chinese or English from any year. The following search query was used: "esophageal cancer" OR "carcinoma of esophagus" OR "esophageal carcinoma" OR "esophagus cancer" AND "positron emission tomography" OR "PET" AND " 18 F-FDG" OR "fluorodeoxyglucose" AND "prognosis" OR "outcome" OR "prognostic" OR "existence" OR "survival" OR "predict" (Figure 1).

Selection of studies
The selected articles were independently evaluated by four researchers (three clinical doctors and one professor of statistics) who did not communicate with one another. Scores were tallied out of 36 points. Clear mention of indices in the article earned 2 points, unclear mention of indices earned 1 point, and no mention of indices earned 0 points (or based on the explanation in the comments). The average of the four scores awarded by the researchers was used as the final score. Disagreements were settled through discussion (Table 1). Further details regarding the method used to score each article are described in the Appendix.

Statistical methods
This paper selected four indices in each report to distinguish whether responding depends on each author's experience or practical results: the maximum standard uptake value (SUV max ), mean standard uptake value (SUV mean ), metabolic tumor volume (MTV), and total lesion glycolysis (TLG). When merging statistical results, it was necessary to perform a heterogeneity test to judge whether the statistics were heterogeneous. P-values of ≤0.100 were considered to indicate heterogenous statistical results.
In Revman software, I 2 can be used to describe the percentage of heterogeneity caused by various studies rather than sampling errors in the total heterogeneity. The formula used to calculate I 2 is as follows: where Q represents the chi-square value (χ 2 ) of the heterogeneity test, and k represents the number of included studies. I 2 values of ≤50% were considered to indicate statistical significance. The values of the four indicators of the survival rate selected in these papers were generated by the comparison of the Overall Survival (OS) rate, as calculated from the Hazards Ratio (HR) and 95% Confidence

Project
Specific meaning Comments (Score) If HR and variance (V) were mentioned in the original text, they could be directly applied to the meta-analysis. The method of Jayne et al. [6] can be used to calculate the HR and 95% CI in any case from the K-M curve and P-value. First, the approximate value of each point on the curve is obtained by using Engauge Digitizer, and the approximate value of HR is calculated from the Excel

Quality assessment
The lowest quality score of the 34 selected articles was 39, and the highest was 84. The scoring system adopted by the reviewers was relatively strict, and the document quality was relatively high. If an article lacked necessary information, the corresponding author of the article was contacted.

Meta-analysis before treatment
A meta-analysis of the four indicators (SUV max , SUVmean, MTV, and TLG) before treatment was performed for OS. Twentyfour articles included the SUV max . Because the I 2 = 82% >50%, these articles were analyzed with the QE model (HR = 1.15, 95% CI = 0.98-1.35). The results showed that the OS of the patients with low SUV max was significantly higher than that of the patients with a high SUV max (Figure 2a-2c).
The asymmetry of the funnel chart suggested publication bias.

Study
Publication  The two methods of Begg and Egger of Stata used to detect the publication bias indicated contradictory results. For a small sample, the Egger method ( Figure 3a) is more sensitive than the Begg ( Figure  3b) method. The result of P=0.000 indicated that the selected articles were subject to publication bias.
The heterogeneity of the 34 articles selected after manual review did not change greatly, indicating that the results are relatively robust; therefore, we performed subgroup analyses. The patients were categorized according to the following pathological types (articles that did not mention pathological types were excluded): squamous cell carcinoma, adenocarcinoma, and unsegmented. The HR and 95% CI of each subgroup Nine articles included in our analysis considered MTV. Because the I 2 = 100% >50%, these articles were analyzed with the QE model (HR = 3.45, 95% CI = 0.78-15.25). Our results showed that the OS of the patients with low MTV values was significantly higher than that of the patients with high MTV values.
Seven articles included in our analysis considered TLG. Because the I2 = 81% >50%, these articles were analyzed with the QE model (HR = 1.04, 95% CI = 1.02-1.07). The results showed that the OS of the patients with low TLG values was significantly higher than that of the patients with high TLG values.
Three articles included in our analysis considered the SUVmean. Because the I 2 = 48% <50%, these articles were analyzed with the fixedeffect model (HR = 1.85, 95% CI = 1.33-2.57). The results showed that the OS of the patients with low SUVmean scores was significantly higher than that of the patients with high SUVmean scores.

Meta-analysis during treatment
Meta-analysis of the three indicators (Δ SUV max , Δ MTV, and Δ TLG) measured during treatment was performed. Ten articles included in our analysis considered the ΔSUVmax. Because the I 2 =       Four articles included in our analysis considered the Δ MTV. Because the I 2 = 90% >50%, these articles were analyzed with the QE model (HR=1.07, 95% CI = 0.54-2.15). The results showed that the OS of patients with high absolute values of ΔMTV was significantly higher than that of the patients with low absolute values of ΔMTV.
Five articles included in our analysis considered the ΔTLG. Because the I 2 = 87% >50%, these articles were analyzed with the QE model (HR = 1.09, 95% CI = 0.59-2.02). The results showed that the OS of the patients with high absolute values of ΔTLG was significantly higher than that of the patients with low absolute values of ΔTLG.

Meta-analysis after treatment
Meta-analysis of the two indicators (SUV max and TLG) measured after treatment was performed. Six articles included in our analysis considered the SUVmax. Because the I 2 = 58% >50%, these articles were analyzed with the QE model (HR = 1.13, 95% CI = 1.05-1.22). The results showed that the OS of the patients with low SUVmax values was significantly higher than that of the patients with high SUV max values.
Three articles included in our analysis considered TLG. Because the I 2 = 91% >50%, these articles were analyzed with the QE model (HR = 1.05, 95% CI = 1.02-1.09). The results showed that the OS of the patients with low TLG values was significantly higher than that of the patients with high TLG values.

Discussion
The sixth leading cause of cancer-related death and the eighth most common cancer in the world, esophageal cancer is associated with a 5-year survival rate of less than 25% [42]. While endoscopy, CT, and MRI have conventionally been used to examine patients with esophageal cancer, the relatively new technique of PET has been increasingly used for the diagnosis, differential diagnosis, and clinical staging of patients with esophageal cancer. Imaging also helps to identify patients with significant complications who may respond to and benefit from more conservative treatment (i.e., without esophagectomy) after CRT is demonstrated to be fully or partially effective. Finally, PET/CT has demonstrated value as a followup tool for the timely detection of tumor recurrence after surgical treatment [43]. However, because 18 F-FDG PET can help to inform the metabolic diagnosis of esophageal cancer, it can compensate for the shortcomings of traditional methods and predict the prognosis of patients when combined with CT to construct a clear anatomical image. A study found 18 F-FDG PET/CT to be a powerful prognostic tool for evaluating OS in patients with esophageal cancer before, during, or after Chemoradiotherapy (CRT). PET parameters (TLG Submit your Manuscript | www.austinpublishinggroup.com  = 50) can guide future treatment strategies by stratifying stage II/ III patients who will receive CRT according to their predicted OS [44]. Another study showed that PET could reflect the response of esophageal cancer to neoadjuvant chemotherapy: the SUV values of the PET responders were significantly higher than those of the PET non-responders [45]. However, there are no large samples of clinical studies on the relationship between PET/CT metabolic response (or not) and prognosis to guide clinical treatment.
The articles selected in this meta-analysis featured considerable heterogeneity. The use of the traditional RE model and the square of tau (τ 2 ) to measure the differences between studies indicated large variance in the results of small samples, which leads to small weights. When calculating the weights in each study, the same τ 2 values are used for the denominators; hence, small studies will contribute a disproportionately large weight, while the weight of large studies will be reduced. The QE model is used to resolve the drawback of the RE model.
For cases with large heterogeneity, subgroup analysis was used to identify the source of heterogeneity. For studies providing the SUVmax before treatment, the possible causes of heterogeneity include, sex, age, treatment plan, clinical stage, pathological type, sample size, and article quality scores. However, as most articles did not make a clear distinction between sex and age, the present metaanalysis considered the patient's treatment plan, clinical stage, and pathological type as sources of heterogeneity.
When the patients were divided according to pathological type, the value of SUV max could predict the OS of patients with squamous cell carcinoma and undifferentiated pathologies but not for those with adenocarcinoma pathologies. The difference between the three groups was statistically significant, indicating that the relationships between pathological type, the value of SUV max , and OS are unclear and that the 18 F-FDG uptake of adenocarcinoma cells is not as effective as that of squamous cells (low or no uptake can be seen in 10% to 15% of undifferentiated adenocarcinomas). Hence, caution should be exercised when using the SUV max to predict the OS of patients whose esophageal cancer follows the pathological pattern of adenocarcinomas.
When subgroups were divided according to stage, we found no significant difference between patients with cancer before or at stage I and those with cancer before or at stage II. However, it is possible that SUV max is more effective as a predictor of esophageal cancer in the early and middle stages of cancer because the group of patients with cancer before or at stage IV includes patients with cancer before or at stage IV. More experiments are needed to confirm this hypothesis.
When the patients were sorted according to treatment, we found no significant difference between the four groups. While the methods of radiotherapy and chemotherapy, drug use, radiation dose, target delineation, and even surgical methods differed among the reviewed studies, the analyses of each subgroup confirmed that SUVmax could still be used to predict OS.
The overall analysis revealed that regardless of whether the indices were measured before or after treatment, SUV max , MTV, TLG, and SUV mean could perform well in predicting the OS of patients; the value of MTV is related to the size of the solid tumor, while the values of SUV max and TLG are related to the pathological response. Hence, SUV max and TLG can directly predict the efficacy of radiotherapy, chemotherapy, and surgery.
The results of this paper have important guiding significance for clinical work. However, due to the large heterogeneity in the articles included in this study, the prognostic value of PET/CT for the clinical response or choice of treatment should be used with caution. Further multi-center clinical studies with large sample size was conducted for verification.
Due to the high cost of PET/CT, many medical institutions do not perform PET/CT routinely in pre-treatment examinations in order to minimize the financial burden of patients. However, PET/ CT improves the accuracy of tumor staging and target delineation as compared with simple CT. According to this paper, the response of parameters of PET/CT also plays a positive role in the prognosis. Especially for patients with locally advanced disease, continuing neoadjuvant chemotherapy may be beneficial if they respond well; however, if a patient responds poorly or weakly to neoadjuvant radiotherapy and chemotherapy, that treatment should be stopped as soon as possible [46][47]. This is of great value to therapeutic economics. For example, Angela and her groups have made a large number of statistics on the cost of patients with esophageal cancer with different treatment methods for a long time. For example, the average cost of radiotherapy for stage III patients is $7530, and the average cost of chemoradiotherapy is $11460 [48], if we can predict how well a patient will respond to treatments, it will save individuals and Medicare a lot of money.
At present, there are a variety of histopathological methods to evaluate the response of esophageal cancer to neoadjuvant radiotherapy and chemotherapy. however, there is no unified standard. As these methods are based on invasive procedures, they are not conducive to clinical application [49]. In contrast, the efficacy of 18 F-FDG PET/CT after neoadjuvant radiotherapy and chemotherapy is related to histopathological tumor regression and can reflect the prognosis of patients to some extent. According to this meta-analysis, we believe that PET/CT should be one of the routine tests performed before neoadjuvant radiotherapy, chemotherapy or surgery. If a patient responds well on PET/CT, treatment should proceed as planned; if the patient is non-responder, treatments other than neoadjuvant chemotherapy should be considered.
In clinical work, SUV max is the most widely used parameter. As many radiologists ignore the significance of other parameters such as SUV mean , MTV, and TLG, there are relatively few clinical studies with that data. In our study, parameters such as MTV and TLG may also be predictive of prognosis, and to a certain extent, may be more sensitive than SUV max . In particular, when SUV max is near the critical value, other parameters can be used as reference factors. Because it is not difficult to obtain these parameters, we suggest that they should be used as common predictive parameters in the clinic, in order to provide more support for the prognosis of esophageal cancer. In this paper, it can be seen that the critical value of SUV max varies widely amongst the articles analyzed. While this is related, in part, to the different instruments and image processing methods used, it also highlights the lack of a unified standard to apply for the distinction between PET/CT responders and non-responders. Currently, SUV max Submit your Manuscript | www.austinpublishinggroup.com thresholds are typically set between 4 and 10, but further research is needed to establish a unified standard.
This report is subject to several limitations. First, many of the included articles did not directly report HR values but instead extracted them through the K-M curve. This method inevitably results in mistakes. Second, the funnel chart of the reports collected from the literature was subject to publication bias, likely resulting in the overestimation of the presently identified predictive effect of the indices. Finally, all of the reports sourced from the literature are casecontrol or cohort studies, highlighting the need for large randomized controlled trials of the potential of PET/CT for predicting the prognoses of patients with esophageal cancer.

Conclusion
Our study demonstrates that the prognoses of patients who respond to PET/CT are significantly better than those of nonresponders; however, the clinical courses for patients with esophageal cancer still need to be determined through a variety of examinations. Therefore, our study confirmed that 18 F-FDG PET/CT is helpful in predicting the prognosis of patients with esophageal cancer, thus guiding their treatment to a certain extent.