Baseline characteristics and flow chart
The baseline characteristics of cohort were shown in table 1. The exact flow chart is summarized in figure 1.
Validation of the seven-gene mutation panel
The seven-gene mutation panel achieved a sensitivity of 0.67 and a specificity of 0.90 in the expanded cohort, yielding an AUC of 0.81. For the subgroup analysis, the panel resulted in sensitivity of 0.71 and 0.56 for UBC and UTUC, and specificity of 0.90 and 0.91 for benign control and malignant control, respectively. For further analysis, the sensitivity of the panel were 0.73 and 0.64 for NMIBC and MIBC, respectively(Figure 2).
Optimization of panel by incorporating methylation biomarkers
Discovery of DNA methylation biomarkers
We analyzed DNA methylation data of 21 pairs of UBC and adjacent tissue, 412 UBC tissues and 656 normal blood samples, 412 UBC tissues and 533 Kidney renal clear cell carcinoma (KIRC) tissues, and 412 UBC tissues and 498 Prostate carcinoma (PRAC) tissues from TCGA or GEO database (Supplementary Figure 1 A~D). Through differential methylation analysis and a series of statistical filters to reduce the number of markers, we finally identified 9 most powerful markers, including cg13974773, cg16966315, cg17945976, cg21472506, cg23229261, cg24720571, cg25510609, cg25947619, cg27404023 (Table S1).
Verification of putative methylation biomarkers
We then recruited an independent set of 71 UC patients (38 UBC and 33 UTUC) and 70 controls (31 benign controls and 39 malignant controls) to verify the methylation status of 9 biomarkers using MS-PCR (Table 2&Figure 3&Supplementart Figure 2&3). Based on the comprehensive analysis of AUC, cutoff value, sensitivity and specificity, cg16966315, cg17945976 and cg24720571 were selected for optimization of the panel. 4.50, 9.19 and 9.73 were set as the cut-off value for cg16966315, cg17945976 and cg24720571, respectively. The 3 biomarkers were then examined in Hunan multicenter cohort using MS-PCR. AnyΔct equal to or below the determined cut-off value is considered as being positive(+), while negative(-) is indicated otherwise.
Construction of the novel panel
The frequency was set at > 0.5% as abnormal cut-off value, and 33 point mutations status were converted to ‘+’ or ‘-’. Boruta feature selection algorithm was used to rank the 33 point mutation biomarkers and 3 methylation biomarkers by their importance, and highlight the most powerful biomarker combinations for distinguishing UC from controls. Cg24720571, TERT 228(G_A), FGFR3 568(C_T), TERT 250(G_A), FGFR3 099(A_G), PIK3CA 091(G_A), PIK3CA 085(A_G), PIK3CA 082 (G_A) and HRAS 874(T_C) were determined(Figure 4). The 9 biomarkers were used to construct a novel panel using random forest algorithm. Then 333 participants were randomly divided to training and test sets with ratio 7:3. The novel panel achieved an excellent performance which exhibited a high AUC of 0.95 in test group. For the total cohort, the model gave a diagnosis with sensitivity of 0.86 and specificity of 0.84 (Figure 5 A~E).
External validation of the optimal panel
To further test the robustness of the novel panel, an additional set of 89 participants was obtained from BJH to carry out external validation. The optimal model achieved excellent discrimination, with an overall AUC of 0.98 in the external cohort). Resulting in a sensitivity of 0.93, and a specificity of 0.93(Figure 5 J&H).
Integrated analysis of two cohorts
Combining data from the two cohorts, a total of 422 participants consisting of 236 UC participants and 186 controls, the novel panel showed an overall sensitivity of 0.88 and specificity of 0.86 (Figure 6 A&B). In subgroup analysis, the sensitivity of the optimal panel reached 0.91 for UBC and 0.74 for UTUC, with a significant difference. In addition, the specificity of the panel was 0.89 for benign controls and 0.81 for malignant controls. To better understand how each biomarker contributes to the panel, we calculated the percent of cases in each variant subgroup. A significantly lower frequency of TERT 250(G_A) and cg24720571 in NMIBC versus MIBC&UTUC was observed(Figure 6 C&D).
From further analysis of the sensitivity and specificity using various clinical variables, no obvious difference was observed in gender and smoking status. Also, no significant difference of gene mutation frequency or methylation degree was found with gender and smoking history except TERT 228(Figure S4).
Application of the novel panel for early detection of UC
We then assessed the model for differentiating different grade and stage tumors. The model achieved a high sensitivity of 0.50(1/2), 0.90 and 0.89 in the PUNLMP, low grade and less than T2 tumors(Figure 7 A&D). For detailed analysis, the model showed an overall sensitivity of 0.75, 0.76 and 0.91 in carcinoma in situ(CIS), Ta and T1 tumors, with no significant difference(Figure 7 G). Besides, FGFR3 568(C_G) and PIK3CA 082 (G_A) occurred with higher frequency in low grade tumors, while TERT 250(G_A) and cg24720571 exhibited with lower frequency in low grade tumors, both with significant differences. FGFR3 568(C_G) occurred with higher frequency in less than T2 tumors, while cg24720571 exhibited with lower frequency in less than T2 tumors, both with significant differences(Figure 7 B&C&E&F).
Novel panel and cytology comparison
In the present study, urine cytology was available for only 210 UC patients and 119 controls (Table S2). Compared with cytology, novel panel showed significantly improved sensitivity (0.88 vs. 0.39, P < 0.001) and comparable specificity (0.86 vs. 0.91, P﹥0.05). In tumor subgroup analysis, novel panel achieved a great improvement in sensitivity over cytology in NMIBC(0.91 vs. 0.41, P < 0.001), MIBC(0.90 vs. 0.44, P < 0.001) and UTUC(0.74 vs. 0.23, P < 0.001) detection, respectively. Further analysis showed that novel panel outperformed cytology in low (0.90 vs. 0.21, P < 0.001) and high grade(0.89 vs. 0.48, P < 0.001) tumors, and < T2(0.89 vs. 0.39, P < 0.001) and﹥T2(0.91 vs. 0.42, P < 0.001) tumors(Figure 8).