Study design and participant characteristics
The flowchart of developing a blood-based methylation test for CRC detection is shown in Figure 1 which consisted of three phases. In the first phase, candidate markers were identified using the TCGA CRC dataset and validated in GSE48684 dataset according to previous studies (25, 26). Briefly, the candidate CpG sites were identified as exhibiting significantly higher methylation levels in cancer samples than in normal samples (adjusted P < 0.05), and the △β values between them were greater than 0.3, and the average β values of normal samples were less than 0.1. To obtain the markers applicable to plasma samples, we used the GSE40279 dataset to exclude the CpGs with average β values exceeding 0.1 in 656 healthy individuals’ WBC samples. The methylation status of candidate CpGs were then validated on CRC plasma samples and confirmed by Sanger sequencing. In the second phase, an MSP system was established to detect the methylation signals of candidate markers. This phase included designing appropriate primers, optimizing qPCR amplification system, and evaluating the assay’s technical parameters. In the third phase, the performance of the developed assay was assessed in training and validation sets. The primary indicators, including sensitivity, specificity, and AUC, were estimated in phase 3.
Identification of candidate markers
Two candidate CpG sites, cg14015706 and cg08247376, were identified within the first exon of NTMT1 and 200 bp of the upstream MAP3K14-AS1 transcription start site, respectively (Supplemental table 7). In TCGA dataset, the two probes showed frequently hypermethylated events in cancer samples compared to normal samples (median β= 0.62 [1st quantile - 3rd quantile: 0.55-0.69] vs 0.037 [1st quantile - 3rd quantile: 0.031-0.042] for cg14015706 and 0.59 [1st quantile - 3rd quantile: 0.48-0.68] vs 0.067 [1st quantile - 3rd quantile: 0.057-0.081] for cg08247376) (Figure 2A). Meanwhile, their methylation levels did not show strong correlations between patient age and methylation level in TCGA CRC cohort (Pearson's correlation coefficient = 0.13 for cg14015706 and 0.061 for cg08247376) (Supplemental figure 1). In GSE48684 dataset, significantly higher methylation levels for both probes were also observed in cancer samples than in normal samples (Figure 2B). Similarly, they showed pretty low methylation levels in 710 adjacent normal samples, with median β values of 0.028 [1st quantile - 3rd quantile: 0.033-0.041] for cg14015706 and 0.053 [1st quantile - 3rd quantile: 0.081-0.18] for cg08247376 (Figure 2C). Similar results were observed in 656 healthy WBC samples (Figure 2D). The median β values of cg14015706 and cg08247376 were 0.058 [1st quantile - 3rd quantile: 0.051-0.065] and 0.073 [1st quantile - 3rd quantile: 0.067-0.079] respectively. The GSE122126 dataset consisted of three CRC and 13 healthy plasma samples. We found that both probes exhibited hypermethylated in CRC plasmas and hypomethylated in healthy plasmas (Figure 2E), indicating a high consistency of the methylation status between tissues and plasmas.
Performance of the dual-strand and dual-MGB probe technique
For NTMT1, two pairs of methylation primers targeted sense and antisense strands, and their corresponding MGB-probes were designed (Figure 3A). In the two-fold serially diluted experiment, the detected copy numbers by sense and antisense strand assays were slightly less than half of the theoretical copy numbers with slopes of 0.39 and 0.31, respectively. The detected copy numbers by dual-strand assay were about 0.73 times of theoretical copy numbers (Figure 3B), which was approximately two-fold increase compared to any single-strand assay (average fold change of 2.35 for all diluted concentrations, Figure 3C), as expected. Meanwhile, Ct value of dual-strand assay was almost one cycle earlier than that of any single-strand assay (average △Ct=1.21, Figure 3D), which was in line with theoretical △Ct value (should be one).
For MAP3K14-AS1, the MGB-probe1 and MGB-porbe2 were designed according to the antisense strand sequence and the reverse complementary sequence of antisense strand (Figure 3E). Theoretically, the copy numbers detected by any single MGB-probe assay should be half of the diluted copy numbers, and that of dual-MGB probe assay should be equal to the diluted copy numbers. We observed slopes of 0.48, 0.43 and 1.01 for MGB-probe1, MGB-probe2 and dual-MGB probe assays, respectively (Figure 3F). Similarly, the dual-MGB probe assay was able to detect about 2.06-fold higher copy numbers than single MGB probe assays (Figure 3G), and the Ct values were shifted forward by 0.99 cycles (Figure 3H).
Additionally, in the serially diluted experiment, no amplification cures for both targets were observed when a high proportion of unmethylated DNA was used as templates (the methylated DNA copies were 0), and accordingly, no Ct values were obtained. These results suggested that the developed assays were explicitly targeted for methylated DNAs even at the background numerous unmethylated DNAs. We also found that the dual-strand assay was able to robustly detect methylated DNAs at a concentration of 10 copies/ul, while it was 5 copies/ul for the dual-MGB probe assay (Supplemental figure 2), indicating that their technical sensitives were as low as 10 and 5 copies per microliter. Besides, the standard curve experiment was carried out to evaluate the two assays’ amplification efficiencies. When incorporated into a single multiplex MSP system, the estimated amplification efficiencies of MTNT1 and MAP3K14-AS1 were 98.14% and 110.08%, respectively (Supplemental figure 3). The standard curve experiment also revealed no apparent interference between the two markers in the multiplex MSP system.
Validation of the methylation status of MTNT1 and MAP3K14-AS1 by Sanger sequencing
Pyrosequencing was performed for 30 CRC tissues and 30 normal controls to confirm the methylation status of the target regions of MTNT1 and MAP3K14-AS1. The MTNT1 marker contained two amplification regions in sense and antisense strands, covering 10 and 3 key CpG sites, respectively. The MAP3K14-AS1 marker contained one amplification region in the antisense strand, covering six key CpG sites. We successfully obtained the sequencing results of amplified products from the NTMT1 antisense strand assay on 25 normal and 29 cancer tissues. Overall, these CpG sites were widely methylated in cancer samples but unmethylated in normal samples (Figure 4A, Supplemental table 7). Though the methylation status of several CpG sites in the NTMT1 sense-strand was missing due to sequencing failure, we still observed frequently methylated events in cancer samples (Figure 4B, Supplemental table 7). Hypermethylated events were also found for the amplified products of MAP3K14-AS1 antisense-strand assay in cancer samples but not in normal samples (Figure 4C, Supplemental table 7). Moreover, each CpG site of the three target regions showed significantly higher methylated frequency in cancer samples than in normal samples (Supplemental table 8).
Performance of the test in training set
According to the standard curves of NTMT1 and MAP3K14-AS1 (Supplemental figure 4A-C), we first estimated the cfDNA input of these two genes in plasma samples in training set. The median copy numbers of NTMT1 and MAP3K14-AS1 were71.76 [1st – 3rd quantile: 11.29 - 177.54] and 37.17 [1st – 3rd quantile: 1.637 - 73.096] on CRC samples, both significantly higher than those of healthy and other non-CRC samples (Figure 5A). In addition, both copy numbers exhibited an increasing trend in the CRC samples from stage I to IV (Supplemental figure 5A&B). Meanwhile, CRC samples showed much lower Ct values than non-CRC and healthy samples (Figure 5B). ROC curve analysis was conducted to assess the ability of the two genes to discriminate CRC samples from non-CRC samples, with Ct values and disease status passed as the parameters of predictor and response. We obtained AUC values of 0.87 (95% CI: 0.81-0.93) and 0.77 (95% CI: 0.70-0.82) for NTMT1 and MAP3K14-AS1, respectively (Figure 5C&D). Since single gene provided limited methylation information, we then attempted to combine the two markers, and the combined assay was then named MethDT test.
Two strategies were adopted to obtain a better appropriate combination algorithm for MethyDT test. Strategy 1 was to construct a logistic regression model, and the estimated AUC value was 0.89 (95% CI: 0.85-0.94) with optimal sensitivity and specificity of 83.10% and 89.23%, respectively (Figure 5E). Strategy 2 was the 1/2 algorithm, where a positive measurement was determined when the Ct of any single marker was less than its corresponding threshold. The optimal Ct cutoff values were 49.73 and 48.36 for NTMT1 and MAP3K14-AS1, respectively, when Youden’s index achieved maximal (Supplemental table 9&10). At these thresholds, the two target sensitivities were 78.87% and 54.93%, with specificities of 91.54% and 97.69% (Table 3). Interestingly, strategy 2 obtained an equal sensitivity and specificity as strategy 1 (Table 3). Since strategy 2 is much simpler for examining physicians to interpret the test results in clinical practice, we adopted the 1/2 algorithm as the combination algorithm for MethyDT test. After the algorithm was fixed, the MethyDT test obtained sensitivities of 79.17% and 91.30% for early- (I-II) and late- (III-IV) stage CRCs (Table 4). Additionally, no significant variations were observed for the MethyDT test sensitivity in detecting CRC patients with different ages and sex (Table 4).
Performance of the test in validation set
Preliminary results of the training set suggested that MethyDT test showed an improved sensitivity for CRC detection compared to single target alone. We then assessed the test performance in an independent validation set. The estimated copy numbers of NTMT1 and MAP3K14-AS1 in plasma samples in validation set were the highest in CRC samples (Supplemental figure 6A&B) and did not show significant variations across different stages (Supplemental figure 6C&D). Overall, when using all non-CRC samples (healthy donors, interfering diseases, polyps, adenomas, and intestinal diseases) as control, the sensitivity and specificity of NTMT1 for CRC detection were 75.83% and 92.07%, while they were 64.17% and 96.16% for MAP3K14-AS1 (Table 5). According to the fixed algorithm in training set, the sensitivity and specificity of MethyDT test were 85.83% and 90.28%, better than those of any single target (Table 5). When using interfering diseases and healthy donors as controls, the test specificities were 87.96% and 95.73%, respectively (Table 5). The positive prediction rate of MethyDT test was 73.05% (95%CI: 65.73% ~ 80.37%) when non-CRC samples were control, but improved to 75.74% (95%CI: 68.53% ~ 82.94%) and 95.37% (95%CI: 91.41% ~ 99.33%) when interfering diseases and healthy donors were controls (Table 5). The negative prediction rates for non-CRCs, interfering diseases and healthy donors were 95.41% (95%CI: 93.27% ~ 97.54%), 93.41% (95%CI: 90.38% ~ 96.44%) and 86.82% (95%CI: 80.98% ~ 92.66%) respectively. For early- and late-stage CRCs, the sensitivities were 82.61%, 88.64% (Table 6). For adenomas and polyps, the MethyDT test obtained positive detection rates of 30.00% (12/40) and 10.00% (2/20) (Supplemental table 11). Meanwhile, the MethyDT test did not show significantly different sensitivities in detecting CRC patients with different ages and sex in validation set (Table 6).