It is generally believed that early detected CRC patients can be treated more straightforwardly and have better prognoses. Several stool DNA-based tests have been provided, showing excellent performance in detecting CRCs at their early stages [18–21]. Blood sampling is more acceptable than stool sampling, but the blood-based tests are less reported and usually exhibited lower sensitivities than stool-DNA tests, ranging from 47–87% [21]. This study presented a systemic pipeline for the methylation markers discovery, test development and evaluation in training and validation sets. The developed MethyDT test creatively utilized a sense-antisense and dual-MGB probe (SADMP) technique, showing an enhanced ability to detect methylation signals in plasma samples. After a comprehensive evaluation, the test obtained an overall sensitivity and specificity of 85.83% and 90.28% respectively, for CRC detection at ten milliliters of blood (2 ~ 3 ml plasma).
The colon lesions (adenoma and CRC) display lower methylation levels overall, except in regulatory regions, as shown by the fact that tumor and adenoma had more DMCs in promoters than normal, which has been reported in previous study [22]. Previous studies have focused on the CpG Island Methylator Phenotype (CIMP) found in CRC. In this study, it seems that adenoma can be divided into two subclasses, methy-H and methy-L based on the global methylation levels, where they have similar methylation profiles to CRC and normal, respectively. Moreover, we also found that tubular adenomas are common in methy-L subclass, while villous adenomas are more often in methy-H subclass. A few studies implied that CIMP is rarely found in tubular adenomas, but frequently in tubulovillous and villous adenomas [23], which is confirmed in this study. This study also found a large proportion of overlapping DMCs between cancer vs normal and adenoma vs normal, indicating that many CpGs have undergone aberrant methylation events at the adenoma phase (precancerous lesion) during the developing sequence of normal-adenoma-CRC, which provides robust evidence for discovering the methylation markers for CRC early detection.
One of the challenges of blood-based tests is accurately detecting the target DNA fragments derived from intended tumor tissues. Currently, the origins of cfDNA in blood are still poorly understood, although they have been used in many areas, including drug assistance [24], recurrence monitoring [25], and cancer diagnosis [26]. Usually, cfDNA in blood system is thought to be released by apoptotic cells or necrotic cells or positively secreted by some activated cells [27]. The complicated origin of cfDNA makes blood-based tests more susceptible to interfering diseases, leading to a high false-positive rate. Therefore, specificity is a critical indicator for a blood-based test, and a high specificity can reduce the false-positive measurements caused by other non-CRC diseases. In the marker discovery step, adjacent normal samples from according 32 cancer types in TCGA database were used to control the low methylation levels of candidate markers in other tissues, which effectively attenuated the interference of unintended cfDNAs derived from other tissues or organs. In assay development, we designed highly selective MSP primers that did not show normal amplification curves even when unmethylated DNAs were used as templates at 107 copies. In assay assessment, MethyDT test achieved a specificity of 90.28% when interfering diseases and healthy individuals were grouped as normal controls. For interfering cases, the test showed a positive detection rate of less than 10%. These results suggested that MethyDT test had an excellent ability to discriminate CRC from other diseases. In addition, the methylation levels of candidate markers in whole blood cells were limited to no more than 0.1, which ensured a low methylation background noise.
The test sensitivity is associated with the amount of DNA input. Theoretically, methylated signals in larger blood amount are more likely to be detected because of the availability of more cfDNA templates. However, accessible blood amount is often limited in clinical practice due to participants body conditions or other factors. In this study, 10 ml of blood (approximately 2–3 ml of plasma) was drawn from participants for performing MethyDT test. The estimated average median copy numbers in CRC samples between training and validation sets were 92.44 [1st-3rd quantile: 12.45–186.99] for NTMT1 and 46.62 [1st-3rd quantile. 1.64–85.97] for MAP3K14-AS1. Since the lowest detection limits of the two markers were 10 and 5 copies/ul, which is lower than their estimated input copies, we thought that the current blood amount is sufficient for MethyDT test to detect CRC samples without the risk of missing measurements due to insufficient DNA input.
The application of SADMP technique also contributed to the improved sensitivity of MethyDT test. The dual-strand technique was first used for ctDNA methylation detection and had been proved enhancing the markers’ performance in previous studies [11, 12], which was observed in this study too. Simultaneously detecting the methylation signals of NTMT1 sense- and antisense-strand allowed the MSP Ct value of dual-strand assay to shift forward by one compared to single-strand assay. As a result, the detection limit of NTMT1 assay reached ten copies which were lower than any single strand assay. Meanwhile, two MGB probes located downstream of forward and reverse primers of MAP3K14-AS1, respectively, were designed in the current study. During PCR strand extension, the polymerase enzymes cleaved the 5-primer sequence of probes and released two fluorescent groups. The dual-MGB probe technique would theoretically double the fluorescent signals when both probes share the same channel, leading to an earlier Ct value similar to that of the dual-strand technique. Serial dilution experiments confirmed the superiority of dual-MGB probes over one MGB probe. These results suggested that applying the SADMP technique can be a feasible strategy to enhance the detection sensitivity of candidate markers.
Two combination algorithms were adopted to evaluate the MethyDT test performance in training set, and both suggested greater AUC values and higher sensitivities for the combined markers than any single marker. However, the MethyDT test showed a decreased specificity compared to both single markers (from 91.54% and 97.69–89.23%), which was also observed in other studies [28, 29]. In validation set, using the locked algorithm, the test achieved an overall sensitivity of 85.36% and specificity of 90.28%. The specificity improved to 95.73% when healthy individuals were selected as control, comparable to that of SEPT9 [2]. These data demonstrated the robust performance of MethyDT test for CRC detection.
The current test utilized a 1/2 algorithm instead of a logistic regression model for several reasons, though they showed the same sensitivity and specificity in the training set. First, in clinical practice, the 1/2 algorithm allowed the examination staff to determine the measurements according to Ct values reported by the device directly, facilitating the interpretation of detection results. Second, the 1/2 algorithm can avoid outputting ambiguous results near the threshold of predicted probabilities by a logistic regression model. Since the logistic regression model predicts the probability of each sample being CRC, the probability cutoff is a critical parameter. Therefore, samples near different probability cutoff values will be determined with opposite results. Third, the 1/2 algorithm provided a redundancy strategy because approximately 65% of CRC cases were detected positively by both markers (Supplemental table 14&15).
Early diagnosis or screening techniques are essential to improve patient survival time when curable treatments are available. Studies have shown that the 5-year survival rate of early detected CRC is almost 90%, while it was only 20% for advanced CRC [30]. In validation set, the MethyDT test sensitivity was 82.61% for early-stage CRC detection, slightly lower than that of late-stage (stage III-IV, 88.64%), but without significant variation between them. Notably, the MethyDT test obtained a positive detection rate of 30.00% (12/40) for advanced adenomas, significantly higher than for polyps and other interfering diseases, implying its ability to detect the CRC precancerous lesions. Although the detected adenomas will lead to a high false-positive rate, it is meaningful in clinical practice because it provides a risk warning for individuals before the adenomas progress to CRC, and they should undertake ongoing follow-ups in the future.
The current study has some limitations that may hamper the interpretation of these results. 1) The training and validation sets were retrospective cohorts, and most CRC patients exhibited symptoms. While for asymptomatic subjects, the higher proportion of early-stage CRCs and precancerous lesions may result in a lower sensitivity than reported here. Besides, the CRC patient age deviates from healthy individuals, which may impact the test accuracy. 2) Participants in this study were enrolled from a single center. The patients enrolled in this study represented a subset of CRC, which may bias these results. 3) The two markers used in this study were identified from the methylation profiles of CRC tissue samples, not representing the methylation characteristics of cfDNA. Therefore, several eligible cfDNA methylation markers can be missed. 4) MethyDT test showed relatively lower sensitivity for early-stage CRC detection, especially for precancerous adenomas. Further improvement is needed in the future. 5) The dual-strand and dual-MGB probe techniques are able to enhance the sensitivity of MethyDT test for methylation signals detection, but they are not applicable for all candidate markers. The dual-strand technique may be attempted when both sense and antisense strands are suitable for designing MSP primers, while the multiple MGB probe technique is limited by the amplicon length, which is usually less than 100 bp.