Urine Cellular DNA Point Mutation and Methylation as Potential Biomarkers for Early Detection of Urothelial Carcinoma

Background: Previously, our team identi�ed a seven-gene mutation panel in urine sediment to discriminate UBC from benign urological diseases. In the present study, we aimed to validate the panel in an expanded and close to natural population cohort of hematuria. Also, we tried to optimize the panel by incorporating methylation biomarkers. We performed external validation to investigate the robustness and stability of the novel panel. Methods: Patients with urothelial carcinomas and controls were prospectively recruited in clinical trial ChiCTR2000029980. The mutation panel was validated in the expanded cohort(n=333) from Hunan multicenter. Several UBC-specic methylation biomarkers were identi�ed by comprehensive analyses of a series of TCGA, GEO and an independent cohorts, and examined in the expanded cohort. Random Forest algorithm was used to construct an optimal panel. External validation of the optimal panel was carried out in Beijing single center cohort(n=89). NGS technique was used to analyze the DNA point mutations and MS-PCR for methylation. Results: The AUC, sensitivity and speci�city of the mutation panel in expanded cohort were 0.81, 0.67 and 0.90, respectively. After screening, only cg16966315, cg17945976 and cg24720571 were left for further analysis. The optimal panel consisted of cg24720571 and 8 point mutations, including TERT 228(G_A), FGFR3 568(C_T), TERT 250(G_A), FGFR3 099(A_G), PIK3CA 091(G_A), PIK3CA 085(A_G), PIK3CA 082 (G_A) and HRAS 874(T_C). The AUC, sensitivity and speci�city of the optimal panel in training group were 0.89, 0.84 and 0.79, respectively, and in test group were 0.95, 0.91 and 0.95, respectively. In the external validation, the AUC, sensitivity and speci�city were 0.98, 0.93 and 0.93,


Introduction
Urothelial bladder carcinoma (UBC) is the most common malignancy of the urinary tract, with approximately an estimated 550,000 new cases and 200,000 deaths per year worldwide 1 .The majority of newly diagnosed cases are non-muscle invasive bladder cancer (NMIBC).Nearly 70% of these patients will experience recurrence, and 10-30% progress to muscle-invasive bladder cancer (MIBC) inevitably 2,3 .
Typical diagnosis and surveillance of UBC involves the use of cystoscopy, cytology, FISH and computed tomography(CT) 4,5 .Cystoscopy is regarded as the gold standard for the detection of UBC, which exhibits relatively high clinical sensitivity but low patient acceptance owing to its invasive nature 6 .In contrast, urine cytology and FISH are noninvasive and speci c, but lacks sensitivity, especially in low-grade tumors.
CT is a good tool but still has the potential to cause radiation damage.These facts, together with the high cost and follow-up biopsy procedures, have led to many attempts to develop alternative noninvasive methods to detect UBC.
Given the special anatomical characteristic, noninvasive strategies to identify UBC mainly include urinebased genetic, epigenetic and protein assays.Currently, FDA has approved several tests for UBC diagnosis and surveillance, including NMP22 and UroVysion, with sensitivity ranging 30-100% and speci city of 55-98% 7,8 .However, due to assay performance inconsistencies, technical expertise and high cost, integration of such assays into routine clinical practice has not yet occurred.In addition, none of these assays have been validated for detection of upper tract urothelial carcinoma (UTUC), accounting for 5% of urothelial carcinoma (UC).
Previously, we identi ed a seven-gene mutation panel in urine sediment to discriminate UBC from benign urological diseases in hematuria patients 9 to get a t value, f(x).A malignancy is considered when f(x) > 0.4491049, while the benign state is indicated otherwise.In the present study, we aim to validate the panel in an expanded and close to natural population cohort.Besides, we try to optimize the panel by incorporating methylation biomarkers.We also performed external validation to investigate the utility and stability of the novel panel.

Materials And Methods
Patients' characteristics and ethics statement Participants were prospectively recruited (Chinese Clinical Trial Registry, ChiCTR2000029980) as approved by the Ethics Committee of Xiangya Hospital(XYH), The Second Xiangya Hospital(SXYH), Hunan Provincial People's Hospital(HPPH), Hunan Cancer Hospital(HCH) and Beijing Hospital(BJH) after written informed consents were obtained.Studies were conducted in accordance with the ethical principles in the Declaration of Helsinki.

Study design
In the prospective, multicenter expanded cohort, a total of 385 individuals with macroscopic or microscopic hematuria were initially enrolled from XYH(n=143), SXYH(n=87), HPPH(n=82) and HCH(n=73) (Hunan, China) between August 2019 and January 2020, of whom 333 were eligible for inclusion(Figure 1).Previous seven-gene panel was validated in this cohort to explore the possibility of discriminating UC from other urological diseases.
In the panel optimization stage, we identi ed several UBC-speci c methylation biomarkers by comprehensive analyses of a series of TCGA, GEO and an independent cohorts from Hunan multicenter.
Candidate methylation biomarkers were examined in 333 participants.We established important predictor features using Boruta feature selection algorithm based on the analysis of DNA mutations and methylation, and reconstructed a novel panel using Random Forest algorithm.
In the external validation stage, 99 participants with hematuria were recruited from BJH (Beijing, China) from May 2020 to August 2020, of whom 89 were eligible to evaluate the stability and reproducibility of the optimal panel.

Sample collection and DNA Isolation
For all participants, each urine sample (at least 30ml) was collected from the rst miction in the morning.The urine samples were centrifuged at 1,600 g for 10 mins at 4 °C, the supernatant was discarded and the pellet was carefully collected into new vacant 2 mL tubes.Same procedure was performed again at 12,000 g for 10 mins at 25 °C.Then 200 μl of 1× PBS was added to each tube to resuspend the cells.DNA isolation was performed using Tissue Genomic DNA Extraction Kit (cat DP304, Tiangen Biotechnology, China) according to the manufacture's instruction.The modi ed DNA was stored at -80°C for further processing.
Library Preparation and Sequencing (Mutation) 50ng genomic DNA from each sample was fragmented and tailing by TIANSeq Fragment/Repair/Tailing Module (TIANGEN, Cat: NG301), and then ligased to forward oligos with UMI.After two rounds puri cation with 1.2× AMPure XP beads (Beckman), the ligased-product was PCR-ampli ed using speci c backward primers and universal primers.After another round of puri cation using 1× AMPure beads, the nal library pool was quanti ed by ABI 7500 fast Real-Time PCR system (Applied Biosystems) and sequenced on a NextSeq 500 system (Illumina, USA) to obtain paired-end 150 bp reads.
All reads were quality trimmed and sequences of adapters were removed.The index sequences and UMI were appended to the read identi er for the next analysis.Sequence reads were mapped to the human genome (hg19) using the Burrows-Wheeler aligner (BWA-MEM).The reads with the same UID were cluster to get the nal consensus sequence.Variant calling was performed using the Genome Analysis Toolkit (GATK v3.8) and emanated variants were annotated using ANNOVAR.

Methylation speci c-PCR(MS-PCR)
Sodium bisul te conversion and puri cation of 100ng genomic DNA were performed using EZ DNA MethylationLightningTM Kit (Zymo Research Corporation, Irvine, California, USA), according to the manufacturer's protocol.GAPDH was set as the internal reference.Ct values represented the relative methylation quantity of CpG markers and the internal reference gene (GAPDH), which was measured by FAM and VIC signals separately.The delta ct (Δct ) values were calculated as methylation score.

Statistical analysis
The model performance was evaluated by the area under the curve (AUC) statistics.

Baseline characteristics and ow chart
The baseline characteristics of cohort were shown in table 1.The exact ow chart is summarized in gure 1.

Validation of the seven-gene mutation panel
The seven-gene mutation panel achieved a sensitivity of 0.67 and a speci city of 0.90 in the expanded cohort, yielding an AUC of 0.81.For the subgroup analysis, the panel resulted in sensitivity of 0.71 and 0.56 for UBC and UTUC, and speci city of 0.90 and 0.91 for benign control and malignant control, respectively.For further analysis, the sensitivity of the panel were 0.73 and 0.64 for NMIBC and MIBC, respectively(Figure 2).

Discovery of DNA methylation biomarkers
We analyzed DNA methylation data of 21 pairs of UBC and adjacent tissue, 412 UBC tissues and 656 normal blood samples, 412 UBC tissues and 533 Kidney renal clear cell carcinoma (KIRC) tissues, and 412 UBC tissues and 498 Prostate carcinoma (PRAC) tissues from TCGA or GEO database (Supplementary Figure 1 A~D).Through differential methylation analysis and a series of statistical lters to reduce the number of markers, we nally identi ed 9 most powerful markers, including cg13974773, cg16966315, cg17945976, cg21472506, cg23229261, cg24720571, cg25510609, cg25947619, cg27404023.

Veri cation of putative methylation biomarkers
We then recruited an independent set of 71 UC patients (38 UBC and 33 UTUC) and 70 controls (31 benign controls and 39 malignant controls) to verify the methylation status of 9 biomarkers using MS-PCR (Table 2&Figure 3&Supplementart Figure 2&3).Based on the comprehensive analysis of AUC, cutoff value, sensitivity and speci city, cg16966315, cg17945976 and cg24720571 were selected for optimization of the panel.4.50, 9.19 and 9.73 were set as the cut-off value for cg16966315, cg17945976 and cg24720571, respectively.The 3 biomarkers were then examined in Hunan multicenter cohort using MS-PCR.AnyΔct equal to or below the determined cut-off value is considered as being positive(+), while negative(-) is indicated otherwise.

Construction of the novel panel
The frequency was set at > 0.5% as abnormal cut-off value, and 33 point mutations status were converted to '+' or '-'.Boruta feature selection algorithm was used to rank the 33 point mutation biomarkers and 3 methylation biomarkers by their importance, and highlight the most powerful biomarker combinations for distinguishing UC from controls.Cg24720571, TERT 228(G_A), FGFR3 568(C_T), TERT 250(G_A), FGFR3 568(C_G), FGFR3 099(A_G), PIK3CA 091(G_A), PIK3CA 085(A_G), PIK3CA 082 (G_A) and HRAS 874(T_C) were determined(Figure 4).The 9 biomarkers were used to construct a novel panel using random forest algorithm.Then 333 participants were randomly divided to training and test sets with ratio 7:3.The novel panel achieved an excellent performance which exhibited a high AUC of 0.95 in test group.For the total cohort, the model gave a diagnosis with sensitivity of 0.86 and speci city of 0.84 (Figure 5 A~E).

External validation of the optimal panel
To further test the robustness of the novel panel, an additional set of 89 participants was obtained from BJH to carry out external validation.The optimal model achieved excellent discrimination, with an overall AUC of 0.98 in the external cohort).Resulting in a sensitivity of 0.93, and a speci city of 0.93(Figure 5 J&H).

Integrated analysis of two cohorts
Combining data from the two cohorts, a total of 422 participants consisting of 236 UC participants and 186 controls, the novel panel showed an overall sensitivity of 0.88 and speci city of 0.86 (Figure 6 A&B).
In subgroup analysis, the sensitivity of the optimal panel reached 0.91 for UBC and 0.74 for UTUC, with a signi cant difference.In addition, the speci city of the panel was 0.89 for benign controls and 0.81 for malignant controls(Figure 6 A&B).To better understand how each biomarker contributes to the panel, we calculated the percent of cases in each variant subgroup.A signi cantly lower frequency of TERT 250(G_A) and cg24720571 in NMIBC versus MIBC&UTUC was observed(Figure 6 C&D).
From further analysis of the sensitivity and speci city using various clinical variables, no obvious difference was observed in gender and smoking status.Also, no signi cant difference of gene mutation frequency or methylation degree was found with gender and smoking history except TERT 228(Supplementary Figure 4).

Application of the novel panel for early detection of UC
We then assessed the model for differentiating different grade and stage tumors.The model achieved a high sensitivity of 0.50(1/2), 0.90 and 0.89 in the PUNLMP, low grade and less than T2 tumors(Figure 7 A&D).For detailed analysis, the model showed an overall sensitivity of 0.75, 0.76 and 0.91 in carcinoma in situ(CIS), Ta and T1 tumors, with no signi cant difference(Figure 7 G).Besides, FGFR3 568(C_G) and PIK3CA 082 (G_A) occurred with higher frequency in low grade tumors, while TERT 250(G_A) and cg24720571 exhibited with lower frequency in low grade tumors, both with signi cant differences.FGFR3 568(C_G) occurred with higher frequency in less than T2 tumors, while cg24720571 exhibited with lower frequency in less than T2 tumors, both with signi cant differences(Figure 7 B&C&E&F).

Discussion
Compared with previous mutation panel, the focus was shifted from genes to point biomarkers.And the number of point mutation reduced from 33 to 8, which signi cantly improved the detection e ciency.
Also, the novel panel was more precise and speci c.In addition, the cohort is expanded and more close to natural hematuria population.This may suggest that the novel panel could be applied for hematuria population screening in the future.
Genetic mutations are often the subject of investigation and play basic roles in the malignant transformation of urothelial cells 10 .However, not all UC harbor mutations in the most commonly altered oncogenes.Mutation panel only produced sensitivities of 0.70 and 0.67 in previous validation group and present expanded cohort, respectively.The abnormal DNA methylation status is also an important mark in the development of UC, and could be the rst detectable neoplastic changes associated with tumorigenesis 5,11 .The novel panel, consisting of 8 point mutations and 1 methylation biomarker, showed a signi cant improvement in sensitivity.Epigenetic and genetic biomarkers therefore can complement and reinforce each other, resulting a more stable diagnostic performance [12][13][14] .
Cytology is highly speci c, and in expert hands nearly always indicates the presence of urothelial malignancy when positive.It is noninvasive, inexpensive, simple, and valuable for high-grade and at lesions 15,16 .However, cytology is not particularly sensitive, especially for low grade and early stage tumors.In the present study, cytology only achieved a sensitivity of 0.21 and 0.39 in low grade and < T2 tumors, respectively.The novel panel signi cantly outperformed cytology in nearly all aspects and exhibited comparable speci city.Besides, novel panel correctly identi ed 53 cases of low-grade UC while none were detected by cytology.This highlighted that the novel panel might replace the cytology for early detection of UC.
UTUC is an uncommon disease, accounting for only 5%~10% of UC 17 .Currently, most UC biomarkers focus on UBC, and UTUC associated biomarkers are relatively rare.Non-invasive and sensitive methods to screen at-risk individuals for UTUC are clearly desirable.Xu et al. constructed a diagnostic panel for UTUC detection, resulting a sensitivity of 0.94 and a speci city of 0.93 12 .Zeng et al. developed a panel based on the analysis of copy number variant, producing a sensitivity of 1.0 and 0.64 in the training and validation cohort, respectively 18 .In our study, the sensitivity of the optimal panel reached 0.74 for UTUC, demonstrating the potential for UTUC screening purposes.However, the value of novel panel in detecting UTUC still needs to be validated given the small number of patients evaluated.
Neuritin 1 is associated with mental illness, such as schizophrenia, bipolar disorder and depression 20,21 .Recently, aberrant methylation of nrn1 gene promoter region is associated with tumor development, such as gastric cancer and melanoma 22,23 .In our study, the CpG site cg24720571 located on the promoter region of nrn1 gene, was rst discovered as a useful biomarker to detect UC in urine.However, the biological function and methylated mechanism of NRN1 remain largely unknown, and further clari cation is needed.

Conclusions
In summary, we developed an optimized model consisting of 1 methylation and 8 point mutation biomarkers for UC detection, which showed a highly speci c and robust performance.It may be used as a replaceable approach for early detection of UC, resulting in less extensive examinations in patients at low risk.

Figures
Figures

Figure 1 Flow
Figure 1

Figure 3 The 4 The 9 Figure 5 ROC
Figure 3 . Seven genes include 33 different types of point mutation.The model was 'X=-0.6685+ 416.2208*TERT + 16.3065*FGFR3 + 21.6375*TP53 + 1030.8943*HRAS+ 269.6423*KRAS+(-6.6597)*PIK3CA+ 1365.2377*ERBB2'.The value x was substituted into the sigmoid function The sensitivity, speci city, accuracy, positive predictive value (PPV), and negative predictive value (NPV) of the panel and cytology in detecting UC were obtained by comparison to pathology and presented as univariate values in bar graph.The percent of cases in each variant subgroup using different clinical characteristics were also presented as univariate values in bar graph.TheΔct distribution were presented as boxplots with median and the interquartile range (IQR) marks.Random forest analysis was applied to highlight the most powerful mutation and methylation biomarker combination for distinguishing UC from controls.Chisquare test was used for categorical variables, and t-test was used for continuous variables.All statistical analyses and data visualizations were carried out in R software (R version 3.4.3)and GraphPad Prism 8 (version 8.0.2).Adobe Illustrator (CC 2017) was used for image processing.All hypothesis tests were twosided with a p value < 0.05 considered to be statistically signi cant.