Biomarkers related to eRNAs predict early diagnosis and prognosis of colon cancer patients

doi:10.21203/rs.3.rs-1622293/v1

Download PDF

Research Article

Biomarkers related to eRNAs predict early diagnosis and prognosis of colon cancer patients

https://doi.org/10.21203/rs.3.rs-1622293/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

We aimed to explore the potential roles of enhancer RNAs (eRNAs) and their target enhancer regulated genes (ERGs) in the diagnosis and prognosis of colon cancer. The selected colorectal cancer cases (stage I-III) from the TCGA were used as a training cohort and the GEO as a validation cohort. In the first part, ERGs related to diagnosis and TNM classification were identified by limma and univariate analyses. Then, based on the logistic regression model, a diagnosis score model was established. In the second half, ultimate ERGs were screened through three steps: the potential, candidates, and prognosis ERGs. Multivariate cox analysis was recruited to identify independently prognostic factors. The biology significance and clinical application of the genes obtained were investigated in different aspects. 2 ERGs (AXIN2, SLC12A2) were filtered, and the AUC of the diagnostic model was 0.94 and 0.89 respectively. Based on the multiple tiers of strict screening, 11 ERGs were obtained which were combined a signature. The model was reliable in distinguishing the risk of stage Ⅰ-Ⅲ colon patients, with 0.78, 0.70 at 5, 7 years and 0.73, 0.70 in the validation set, and also yielded good calibration. In this study, the usefulness and specificity of the ERGs markers in diagnosis and prognosis were described, which should be considered as a key feature in early examination, warning, and clinical guidance of colon cancer. We conclude that the major implications of the enhancers and ERGs should be valued, which would be an emerging hallmark in the diagnosis and prognosis of cancer.

enhancer RNA

super-enhancer RNA

colon cancer

diagnosis

prognosis

Colorectal cancer (CRC) is the third most commonly diagnosed cancer diagnosed and the fourth most common cause of cancer-related deaths worldwide¹. In 2020, there were an estimated 147,950 cases in the US, accounting for 44% of all cases of digestive system tumors². Colon cancer (CC), is the most commonly diagnosed disease and the major leading cause of death by CRC, accounting for 63%³. Furthermore, patients with CC have no obvious symptoms until reaching an advanced stage of cancer. Although some measures, such as improved colonoscopy and early screening, have contributed to declining incidence and mortality of colon cancer, limited screening coverage still leads to a high number of cases⁴. Studies have outlined that age is known to be a risk factor that the incidence of CC increases greatly after 50 years old⁵. With aging worldwide, cancer occurrence depends on a range of unfavorable risk factors such as hereditary, sex, and associated social and life factors. Previous studies have shown that demographic information, such as age, sex, and race, and clinical information such as the tumor, lymph node, metastases (TNM) classification and tumor grade, could be used as predictive factors to build prognostic tools that would help estimate the probability of survival and choose optimal therapies^6,7. The TNM classification, based on the histopathological evaluation, is one of the most routine and widest used prognostic tools for cancer survival. However, it is not useful in suggesting personalized clinical strategies, such as the options of therapeutic method and drugs. Therefore, it remains necessary to uncover more valuable tools or biomarkers for predicting patient status, directing clinical practice, as well as reducing overtreatment.

Enhancers are functionally defined as a significant part of noncoding elements that can activate transcription by forming chromatin loops⁸. According to the Encyclopedia of DNA Elements, there are up to four million enhancers in the human genome, some of which can combine with more distal TFs independent of sequence orientation⁹. Commonly, enhancers are realized to affect transcription and enhancer-derived RNAs (eRNAs) synthesis, as well as cellular responses to diseases and environmental stimuli. Upon initiation, the enhancers physically approach promoters in physical space to form chromatin loops and then begin to amplify RNA polymerase II(RNAPII) transcription to produce eRNAs¹⁰. Since the discovery of eRNAs, numerous studies have provided evidence for the role of eRNAs, as the function of control oncogene or tumor suppressor gene expression¹¹. Moreover, eRNAs are cell-type-specific and widely expressed in cancer tissues¹², indicating their potential value as biomarkers. As compared with regular enhancers, super-enhancers(SEs) greatly increased transcriptional activation, which clustered is in close genomic proximity and regulates the same enhancer target gene (ERGs)¹³. The SE region is enriched with enhancer markers in high density, such as H3K27ac and H3K4me1, which makes the functions of SEs to identify and control cell-special genes.

Despite several studies that have explored the associations between eRNAs and the outcome of cancer in patients^14,15, eRNAs and their ERGs have not yet been identified as biomarkers significant for the diagnosis and prognosis of colon cancer. To verify the potential value of eRNAs and their target ERGs, integrated models were created by combining the diagnostic or prognostic ERG expression profiles and clinical information from colon cancer patients (stage Ⅰ-Ⅲ). These models, served as predictive tools for colon cancer treatment. Additionally, concordance index (C-index), calibration curves, and ROC analyses were used to evaluate the accuracy and robustness of the model. To evaluate the strength of the discovery, the biological pathways and clinical practice valuation was explored.

Data Obtainment and Selecting Criteria

In this study, five datasets related to colon cancer were obtained from different libraries. First, the enhancer expression and corresponding target genes (ERGs) were acquired from the database of Enhancer RNA in Cancer (https://hanlab.uth.edu/eRic/) for identifying eRNAs and their corresponding ERGs. Second, gene expression data of ERGs and corresponding clinical information of 512 patients was downloaded from The Cancer Genome Atlas dataset (https://portal.gdc.cancer.gov/) to construct an ideal diagnostic and prognostic model and assess the prediction effect from internal. Third, the GSE39582 and GSE17536 gene expression datasets were taken from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) as the external validation series to verify the model. Fourth, the canonical pathway gene sets derived from the KEGG pathway database were obtained from the Molecular Signatures Database (https://software.broadinstitute.org/gsea/downloads.jsp) to identify the correlated differential pathways. Additionally, a set of marker genes for immune cells was obtained from the Bindea G et al. study to explore the relationship between genes and the immune microenvironment¹⁶. Finally, infiltration data of six immune cell types (B cells, CD4+ T-cells, CD8+ T-cells, dendritic cells, macrophages, and neutrophils) from the Tumor Immune Estimation Resource (http://cistrome.org/TIMER) were utilized to estimate the potential immune mechanisms in detail.

Selection criteria at each stage were as follows: (1) patients have complete basic information of age, sex, TMN stage, overall survival (OS) interval, and OS status; (2) patients with OS time more than 30 days. (3) cases with stage Ⅰ-Ⅲ colon cancer.

Diagnostic ERGs and Prediction Model

Through the limma package of R, a differential gene analysis was conducted between tumor tissues and non-tumor tissues according to the filter criteria (FDR < 0.05, log2-fold-change > 1.5) to find significant ERGs. The expression levels of differential ERGs were further compared among stage I-III using t-tests or the Wilcoxon rank-sum test to determine the early diagnostic markers. Then, based on the genes above, a diagnosis score and diagnostic model were constructed using the logistic model. An ROC analysis was performed and the area under the curve for diagnosis score was calculated to ensure the prediction ability between the normal and cancer patients. The potential value of the diagnosis score was further evaluated in the validation set.

Identification of Potential, Candidate, Prognostic ERGs

This procedure can be divided into three steps. First, the genes regulated by eRNAs were found as the potential ERGs from the first dataset. Next, using ERG expression as independent variables and survival information (status & time) as dependent variables, univariate Cox proportional hazards regression analysis was conducted to filter potential ERGs at the threshold of p < 0.05. At the same time, taking into consideration the correlation with of ERGs with eRNAs, Spearman’s correlation test with r > 0.4 and p<0.05 was recruited to select the candidate ERGs that may relate to colon cancer patient survival. Finally, the LASSO regression algorithm was used to further narrow candidate ERGs to certain the prognostic ERGs by the “glmnet” R package. To confirm the suitable lambda value, 10-fold cross-validation was used.

Prognostic Score Calculation, Prognostic Model Establishment and Evaluation

Based on the expression level and the LASSO Cox regression coefficient of the genes, the risk score for each patient was generated. In the validation set, the risk score was also obtained by the genes above. By ROC analysis, the critical point with the largest Youden index was determined as the cutoff value of the risk score which could stratify the patients into high-risk and low-risk groups. The prognostic difference between the two groups was assessed by the Kaplan-Meier survival curve analyses and log-rank test.

The risk score and clinical parameters, including age, gender, and TNM classification, were incorporated into a multivariate Cox proportional hazards regression analysis. Then in the training set, variables with statistically significant differences (p < 0.05) were determined, from which a prognostic nomogram model of OS was built using “rms” package of R.

The C-index was produced to manifest the discrimination performance of the nomogram. A C-index value between 0.5 and 1 indicates the model has a good discriminative ability. The time-dependent ROC curves were plotted to evaluate the prognostic value of the variables and indicate the sensitivity and specificity of the prognostic model using the survival ROC package. Through the calibration curves, the relationship between the predicted probability and actual probability was estimated by a 1000 iterations of the bootstrap method. which displays the forecast accuracy.

The same approach was used in the validation set to evaluate the model from an external perspective.

Matchup Analysis between ERGs and eRNAs and Gene Set Variation Analysis

We analyzed the correlation between the target ERGs and their regulating eRNAs, to explore their relationship. Then through the location of eRNAs, the (super enhancer RNAs) seRNAs which regulate the same genes and share the same area were determined tentatively.

Next, to further characterize the biological pathways related to our signature between the two condition groups, the gene set variation analysis (GSVA) for gene expression data was performed using the GSVA package in R. Based on the canonical pathways gene sets downloaded from the Molecular Signatures Database and the cut-off criteria (p < 0.05), the statistically significant pathways were identified.

Exploration of the Immune Microenvironment

Immune response and immune microenvironment are linked to the outcome of cancer progression. For further confirmation of the association between the signature and the cancer prognosis from the perspective of immunity, the immune microenvironment was analyzed next. The enrichment score of each cancer individual, which reflects the infiltration level, immune-related pathways, and functions of 23 kinds of immune cells was evaluated using the ssGSEA method of the GSVA package in R. As for the high- and low- groups, ESTIMATE provides the difference with the predicting infiltrating, estimate score, stromal cells, content of immune cells and tumor purity in the tumor tissue of colon cancer patients. The correlation between risk score and the infiltration level of immune cells was determined by Spearman’s correlation test with TIMER.

Evaluation of Drug Sensitivity and Immunotherapy

Currently, traditional chemotherapy and immunotherapy are commonly used for cancer patients, however, due to drug resistance and heterogeneity the benefits are limited. To explore the relationship between risk stratification and immunotherapy efficacy and infer the curative outcomes and responses of colon cancer patients to these protocols, the half-maximal inhibitory concentration (IC50) and immune checkpoint inhibitor (ICI) expression of two groups were compared. The drugs associated with eRNAs that regulate the prognostic ERGs were queried by the first dataset. Through immune checkpoint inhibitor expression, the immunotherapeutic response was predicted. The IC50s of the drugs were calculated to quantify drug sensitivity and resistance.

In vitro experimental validation of genes regulated by eRNAs

CRC cell lines (DLD1) were selected for the following, which were purchased from American Type Culture Collection. A predesigned siRNA nucleotide sequence against AXIN2 was as follows: 5′-GCAGAGGGACAGGAATCAT-3′ (Invitrogen, Carlsbad, CA, USA), The following ABCG1 siRNA sequences were designed: forward, 5’GAGUCUUUCUUCGGGAACATT-3’, and reverse,5′-UGUUCCCGAAGAAAGACUCTT-3′ (Shanghai GenePharma Co., Ltd). Transfections were started into DLD1 cells using Lipofectamine® 2000 as a transfection reagent, and transfected cells were harvested at 48 hours post-transfection.

To explore the effect of AXIN2 and ABCG1 knockdown in the DLD1, samples were subjected to Western blot. After AXIN2 and ABCG1 knockdown, EdU staining was performed in each group to access proliferating colon cancer cells. Afterward cell proliferation activity was assessed by CCK8. Transwell migration assay and Transwell invasion assay were carried out using 24-well Transwell system under ×20 original magnification, and the scale bar was 75 μm. Cell apoptosis levels were analyzed by flow cytometry, and typical images were depicted.

Basic Data Information and Individual Selection

The detailed flowchart of this study is shown in Figure 1. The expression profiles and corresponding demographical information of 30 non-cancer patients and 365 stage I-III colon cancer patients that satisfied the selection criteria were enrolled as the training cohort. Meanwhile, 585 cases in GSE39582 and 138 cases in GSE17536 were selected as the validation cohort for diagnostic and prognostic models, respectively. The characteristics of the cases were listed in Supplemental Table 1.

Diagnosis Score and Potential Value

Through limma analysis, the expression levels of 36 genes were tested (Supplemental Table 2). To compare the expression levels in each stage of cancer, analysis of variance (ANOVA) was performed and used for intergroup comparison. The overlapping areas “a,” “b,” and “c” were shown in the Venn diagram, which reflects a distinct demarcation in stage progression (Figure 2A). Area “a” demonstrated that the genes were different among each group. Area “b” showed that the genes of stage I were significantly different from those of stage II and III. Similarly, area “c” demonstrated that the genes of stage III were expressed significantly more compared with those of stage I and II. Area “a” revealed no significant ERGs; however, AXIN2 and SLC12A2 were shown in areas “b” and “c” respectively. Thus, these genes can be considered as biomarkers related to the early process.

In combination with AXIN2 and SLC12A2, a diagnosis score model between cancer and non-cancer cases was constructed (Table 1). The heatmap of the model is described in Figure 2B. The calculated AUC was 0.94, which indicates that the diagnosis score has better performance in evaluating consistency between predicted results and actual diagnosis results (Figure 2C1). Higher expression levels of AXIN2 and SLC12A2 in the cancer group was observed compared with the non-cancer group, and the comparison among stages was in-line with the Venn diagram (Figure 2C2). The discrimination power of the diagnosis score was further validated; the ROC curve analysis showed that the AUC was 0.89 (Figure 2D1). Also, the expression of AXIN2 and SLC12A2 among cancer patients was significantly greater than that of non-cancer individuals in the validation cohort (Figure 2D2).

ERGs Related to Colon Cancer Patient Survival and Prognostic Score

In the first dataset, 478 eRNAs were evaluated to find their target genes and 1322 genes were found as potential ERGs. Subsequently, from the 675 intersecting target genes in the training and validation dataset, 11 overlapping candidates were filtered by Spearman’s correlation test and univariate Cox analysis (r > 0.40, p < 0.05, Supplemental Table 3-4) in the training set. After the LASSO Cox regression analysis, 11 genes (ZNF160, ZNF467, DCST2, ATP2A1, ABCG1, ZMIZ2, SMARCD3, PLCB4, PCCA, DNAJC15, EPPK1) were retained to calculate the risk score of each patient (Figure S1). The formula of risk score was following: Risk score = 0.34848396 * ZNF467 expression + 0.71141290 * ZNF160 expression + 0.13741571 * ZMIZ2 expression + 0.01644179 * SMARCD3 expression + (-0.06930923) * PLCB4 expression + (-0.13361941) * PCCA expression + (-0.53763715) * EPPK1 expression + (-0.16501684) * DNAJC15 expression + 0.62206947 * DCST2 expression + 0.50923217 * ATP2A1 expression + 0.50157821 * ABCG1 expression. Most of the genes, including ZNF160, ZNF467, DCST2, ATP2A1, ABCG1, ZMIZ2, and SMARCD3 had positive coefficients, which meant their higher expression may result in a worse outcome, while the negative coefficient (PLCB4, PCCA, DNAJC15, and EPPK1) meant that higher expression levels were related to a longer OS. The density distribution plot of the prognosis score was presented in Figure 3A. The cut-off value of the risk score (2.551676) was determined using the largest Youden index from the survminer package in R, and was used to divide the patients into the low-risk (n1 = 310) and high-risk groups (n2 = 55, Figure 3B1). In addition, the clinical differences between the two groups were compared and displayed as follows:

The heatmap of risk stratification is displayed in Figure 3B2. The Kaplan-Meier survival curves of the two groups using the log-rank test showed that a remarkable difference was observed and patients with a lower risk score had a better outcome (p < 0.001, Figure 3C). There was also a significant difference in the expression of each ERG between the low- and high-risk groups (Figure 3D).

Prognostic Model and Performance Estimation

Taking the clinical features and prognosis score into account, univariate and multivariate Cox analyses were performed to identify the prognostic variables. After stepwise selection, age, TNM classification, and risk score were retained in the model and found to have a significant value with OS for CRC patients, and the Cox regression model satisfied the PH assumption (Table 2; p = 0.097, Figure S2). Integrating age, TNM classification, and prognosis score, a compound nomogram was established as a predictive device (Figure 4A). The C-index of the nomogram was 0.76 (95% confidence of interval (CI): 0.69-0.83), suggesting a good discrimination performance. Calibration curves showed that the outcome predicted through the nomogram was close to the observed outcome (Figure 4B1), which indicated accuracy. As shown in Figure 4B2, the area under the curve (AUC) of the time-dependent ROC curve was 0.78 and 0.70 at 5 and 7 years, respectively, which confirmed the strong predictive accuracy of the integrated nomogram.

The nomogram was validated in the third dataset to further indicate the model stability. Consistent with the results in the training set, the calibration plot showed substantial agreement between predictive and reference lines (Figure 4C1). The model also showed a strong ability to predict survival, with an AUC above 0.7 in the independent validation (Figure 4C2).

Correlation between ERGs and eRNAs and Related Biological Pathways

Through the matchup analysis among the ERGs and eRNAs, two clusters of five eRNAs (ENSR00000061966, ENSR00000061967, and ENSR00000061968; ENSR00000317100 and ENSR00000317101) were found to regulate the same diagnostic ERG (SLC12A2) synchronously and were highly correlated with each other; the correlation coefficients between eRNAs in each cluster were all beyond 0.95 (Figure 5A1). Additionally, there was a correlation observed among these eRNAs and SLC12A2. Interestingly, eRNAs of the same cluster were located in the adjacent area.

The ZNF160 gene was regulated by a three eRNAs, including ENSR00000111307, 19:52622405-52628405, and 19:52623389-52629389, and the coefficients for them were close to 1 (Figure 5A2). Moreover, similar results were shown in ZMAZ2, PLCB4, DNAJC15 and their responding eRNAs. The matrix based on the correlation coefficients was listed in Supplemental Table 5. Due to their matching regulatory relationship and close proximity, the six clusters of eRNAs were likely supposed to be seRNAs.

According to the functional enrichment of the GSVA analysis, the activated pathways were identified. In the high-risk group, the upregulated genes are primarily enriched in aminoacyl tRNA biosynthesis; valine, leucine, and isoleucine degradation; butanoate metabolism; and the citrate cycle TCA cycle (Figure 5B).

Differences between Risk Stratification and Tumor Immune Landscape

Because seven ERGs were immune-related among eleven prognostic features, the tumor immune landscape was further investigated. Comparisons between the two risk groups revealed that there were significant differences among the infiltration levels of 23 immune cells (Figure 6A). The proportion of the immune cells, such as central memory CD4+ T cells, immature B cells, and natural killer T cells, in the high-risk group were higher than those in low-risk group (p < 0.05). As presented in Figure 6B, the estimate score and stromal score were also notably higher in the high-risk group than in the low-risk group (p < 0.05), while the tumor purity of the high-risk group was notably lower than that of the low-risk group (p < 0.05). The infiltration of the five immune cells (B cells, CD4+ T-cells, dendritic cells, macrophages, and neutrophils) was positively correlated with the risk score in colon cancer patients (Figure 6C).

Relationships between Gene Signature, Chemotherapy, and Immunotherapy

We chose three immune checkpoint inhibitors (CTLA-4, CD40, and TIGIT) as targets in immunotherapy. The effectiveness of the immune checkpoints differed significantly among conditions (p < 0.01, Figure 7A). The predicted IC50 of three drugs (TW37, Bicalutamide, and Cisplatin) in the low-risk group was higher than that of the high-risk group, which indicated patients with higher risk scores were more sensitive to these chemotherapy agents (p < 0.01, Figure 7B).

Knockdown of AXIN2 and ABCG1 and Their eRNAs

As shown in Figure 8A, both siRNA types were able to efficiently decrease the AXIN2 and ABCG1 expression and protein levels. EdU staining showed that the number of EdU stained cells was markedly reduced compared with the control by AXIN2 and ABCG1 silencing (Figure 8B). Cell growth curves using CCK8 revealed that the knockdown of AXIN2 and ABCG1 led to a reduction in cell viability (Figure 8C). Transwell migration and invasion experiments further validated that AXIN2 and ABCG1 significantly restained both cell migration and invasion abilities of CRC cells (Figure 8D). To measure the quantification of cell apoptosis, flow cytometry was done, which showed apparent inhibition of migration and invasive ability (Figure 8E).

Although the field has made progress in the exploration of the relationship between multiple biomarkers and diagnosis or prognosis, very little attention has been paid to the relationship between anomalous eRNAs and patient status, and there is still a lack with an aid alert system and clinical practice. A hallmark of active enhancers, eRNAs account for an indispensable part of the genome and regulate gene expression and cell progression¹⁶.. Moreover, due to the specific enhancer clusters and super-enhancers which further affect disease progression, eRNAs are critical factors and provide novel tools for predicting patients' status. In this study, we combined eRNAs and their target genes with patient status to build diagnostic and prognostic models and to predict patient condition and guide clinical practice in patients with CC. The two ERGs we found in the diagnostic procedure were differently expressed not only between cancer and non-cancer but also among stages I-III. Thus, these ERGs can serve as early diagnosis biomarkers and can contribute to screening in CC. Additionally, eleven ERGs obtained in the prognostic evaluation were considered to be significant features for identifying patients at risk. Incorporating comprehensive gene expression level scores and clinical risk factors into a clear prognostic nomogram facilitates the preoperative individualized prediction of CC patients. Due to the robustly specific expression in different tissues of eRNAs, the model including genes regulated by eRNAs or seRNAs is a promising innovative diagnostic and prognostic biomarker.

Early detection of CC is of great importance for treatment. Although, at present, direct visualization by colonoscopy is considered the gold standard for screening, its downfalls, such as high cost, invasiveness, and bowel preparation, restrict the number of people receiving colonoscopy¹⁷. Our model, an accurate diagnosis test with less limitations than the current colonoscopy suggests promising implications from a clinical practice perspective. In the first part of this study, AXIN2 and SLC12A2 were detected, and the diagnostic model was constructed. The ROC was 0.94 and 0.89 in the training and validation cohorts respectively, indicating the satisfactory discrimination of the model. In contrast to other diagnostic models, our proposed model has an improved predictive performance (0.94 versus 0.82)¹⁸, which suggests more effectiveness in clinical practice. Hence, the diagnosis score from the model can be considered as an early warning sign of colon cancer and an aid to diagnosis in routine physical examination. AXIN2 codes for a protein called axis inhibition protein 2, which mainly regulates the Wnt pathway associated with proliferation and recurrence of some malignancies and is highly activated in colorectal cancer¹⁹. CDX2 could transcriptionally activate the expression of AXIN2 by binding the AXIN2 upstream enhancer, and suppress the Wnt singling pathway, inhibiting cell proliferation and cancer progression³. The protein encoded by the SLC12A2 gene is Na-K-2Cl cotransporter-1 (NKCC1), which acts primarily in the transport and reabsorption of sodium and chloride. The variants of the two isoforms of SLC12A2 were observed in the precancerous and cancerous colorectal lesions²⁰, which may result in changes of expression and relate to the determination of the disease state. Also, the SLC12A2 combined with other genes has been developed as a signature to predict survival in patients with pancreatic adenocarcinoma ²¹.

In the second part of our study, for the reduction of the range of eRNAs and genes, 11 predictors were filtered from potential ERGs. A prognostic risk score was built based on the panel of the final genes. Then, a prediction model was established, which combines the signature, age, and TNM stage. The mortality rate decreased over age, TNM classification, and the risk score. There were studies reported that age and TNM classification have been proved as prognostic factors related to the survival of colon cancer patients²². For this reason, although age was not found to be statistically significant in the stepwise regression (p = 0.12), it was still included in the model. Compared with other predictive models, our model was similar in effectiveness with fewer observations²³ (C-index, 0.79). Therefore, we provide a highly accurate nomogram combining age, TNM classification, and the signature derived from eRNAs. In addition, we found that gene expression varied greatly among distinct groups. ZNF160, with the largest coefficient, and ZNF467 belong to zinc finger protein (ZNF), which is the largest transcription factor family in the human genome. Dysregulated transcription factors may result in aberrant gene expression closely related to the regulation of cell proliferation and cancer progression. Moreover, PCCA was identified as a potential prognosis predictor for gastric cancer²⁴. ZMIZ2 enables the USP7 enzyme and functions as an accelerator of CRC by stabilizing β-catenin²⁵. Therefore, evidence shows that the signature from the prognostic ERGs has a profound effect between the two groups.

The number of enhancers that regulate the same gene was one to five in our study. The eRNAs cluster of ENSR00000211831 and ENSR00000211832 are found in the same location of the chromosome and regulated ZMIZ2 simultaneously, and the Spearman correlation coefficient between them was 0.98. Therefore, they are likely seRNAs. The target gene of ENSR00000065185 is PCCA, and the correlation coefficient between them was 0.74. Moreover, ENSR00000065185 is only detected in colorectal cancer, which indicates tissue specificity of the eRNAs and their target genes. These findings shine light on potential treatment targets of these genes.

Among the eleven prognostic ERGs, seven of them were immune-related, so the conditions of the immune microenvironment were further investigated. Tumorigenesis and tumor development are closely related to the microenvironment, which involves the interaction among lymphatic, immune cells, and so on. There have been many studies on the association of primary colorectal cancer patients' prognosis and immune microenvironment, demonstrating that patients with tumor-infiltrating lymphocytes were related to enhanced prognosis²⁶. We explored the immune microenvironment through ssGSEA, ESTIMATE, and Timer and observed that cases in the high-risk group have a higher stromal score and estimate score, suggesting that high-risk patients have a worse prognosis. Moreover, tumor purity represents the proportion of tumor cells in the tumor immune microenvironment. Inflammatory response often occurs in tumors with low tumor purity, causing tumor mutation, thereby leading to a poor prognosis and an effective immunotherapy²⁷. Existing studies have reported that tumor purity has a substantial impact on the prognosis of various cancers including colorectal cancer²⁸, glioma²⁹, among others, and a lower tumor purity is related to a worse prognosis of colorectal patients. Similarly, high-risk patients had lower tumor purity and reduced survival time. Thus, the signature is highly related to the immune microenvironment, and indirect evidence indicates that the signature from the ERGs is associated with the prognosis.

In addition, the immune checkpoint inhibitors expression of two groups was detected. Immune checkpoints were found to be important predictors of response to immunotherapy and could provide suggestions for treatment strategies, including CTLA-4, CD40, and TIGIT^30,31. We noticed that the expression of immune checkpoints was elevated in the high-risk group compared to the low group, indicating that the people with higher risk scores would be more likely to respond to immunotherapy. Moreover, the drug sensitivity of traditional drugs such as TW37, bicalutamide, and cisplatin was also calculated. Increased IC50s were found in the high-risk group, which may result in a better response to chemotherapy.

This study has several strengths. To begin, it was the first study of the ability of eRNAs and their target genes to predict individual status of colon cancer patients using regulation of gene expression from data exploration and experimental validation. Next, the potential genes to be used in early diagnosis and prognosis were identified, some of which were found to be regulated by specific eRNAs or seRNAs. Additionally, the prognostic power of eRNAs and their target genes is not restricted to colon cancer, and could be extended into other cancers. Also, indeed limitations have to be mentioned. First, the expression data of eRNAs was limited. There were only 477 eRNAs in our study downloaded from the Enhancer RNA in Cancer database. However, with the improvement of technology, more eRNAs will be discovered. Second, based on the same region of chromosomes and regulatory gene, the seRNAs were temped, which was not inadequate and rigorous. Future experiments are needed to confirm the direct regulatory relationship between eRNAs and their ERGs with certain seRNAs.

In conclusion, eRNAs and their target ERGs have a profound effect on the diagnosis and prognosis of colon cancer, benefiting patients by allowing more personalized treatments. For clinical practice, this study presents a diagnostic and prognostic model to evaluate the risk of colon cancer, that could be used as an early signal to intervene with treatment. Therefore, we conclude that due to their effectiveness and specificity, eRNAs and seRNAs are promising tools in the clinical guidance of colon cancer patients.

Acknowledgments: The authors sincerely appreciated the TCGA and GEO database for the availability of the gene expression profiles.

Author Contributions: All authors participated in the conception and design of the report and are responsible for the integrity of the work.

Source of Funding:This work was supported by a grant from the National Natural Science Foundation of China [Grant number: 82073666 to Qiuju Zhang] and the National Science Foundation for Young Scientists of China [Grant number: 82003554 to Huixun Jia].

Conﬂict of Interest: None.

Cadilhac, D.A., Andrew, N.E., Lannin, N.A., Middleton, S., Levi, C.R., Dewey, H.M., Grabsch, B., Faux, S., Hill, K., Grimley, R., et al. (2017). Quality of Acute Care and Long-Term Quality of Life and Survival: The Australian Stroke Clinical Registry. Stroke. 48, 1026–1032.
Lingsma, H.F., Eijkemans, M.J., and Steyerberg, E.W. (2009). Incorporating natural variation into IVF clinic league tables: The Expected Rank. BMC Med Res Methodol. 9, 53.
Siregar, S., Groenwold, R.H., Jansen, E.K., Bots, M.L., van der Graaf, Y., and van Herwerden, L.A. (2012). Limitations of ranking lists based on cardiac surgery mortality rates. Circ Cardiovasc Qual Outcomes. 5, 403–409.
Brook, R.H., McGlynn, E.A., and Cleary, P.D. (1996). Quality of health care. Part 2: measuring quality of care. N. Engl. J. Med.. 335, 966–970.
Vadlamudi, C., and Brethauer, S. (2020). Quality in Endoscopy. Surg. Clin. North Am.. 100, 1021–1047.
Li, X., Zhou, Q., Wang, X., Su, S., Zhang, M., Jiang, H., Wang, J., and Liu, M. (2018). The effect of low insurance reimbursement on quality of care for non-small cell lung cancer in China: a comprehensive study covering diagnosis, treatment, and outcomes. BMC Cancer. 18, 683.
Paddison, C., Elliott, M., Parker, R., Staetsky, L., Lyratzopoulos, G., Campbell, J.L., and Roland, M. (2012). Should measures of patient experience in primary care be adjusted for case mix? Evidence from the English General Practice Patient Survey. BMJ Qual Saf. 21, 634–640.
Alexandrov, A.V., Molina, C.A., Grotta, J.C., Garami, Z., Ford, S.R., Alvarez-Sabin, J., Montaner, J., Saqqur, M., Demchuk, A.M., Moyé, L.A., et al. (2004). Ultrasound-enhanced systemic thrombolysis for acute ischemic stroke. N. Engl. J. Med.. 351, 2170–2178.
Patterson, M.E., Hernandez, A.F., Hammill, B.G., Fonarow, G.C., Peterson, E.D., Schulman, K.A., and Curtis, L.H. (2010). Process of care performance measures and long-term outcomes in patients hospitalized with heart failure. Med Care. 48, 210–216.
Li, X., Wang, C., Rehman, S., Wang, X., Zhang, W., Su, S., Bao, X., Li, J., Liu, M., and Wang, Y. (2021). Setting performance benchmarks for stroke care delivery: Which quality indicators should be prioritized in quality improvement; an analysis in 500,331 stroke admissions. Int J Stroke. 16, 727–737.
Kelley, E. (2007). All or none measurement: why we know so little about the comprehensiveness of care. Int J Qual Health Care. 19, 1–3.
van Dishoeck, A.M., Lingsma, H.F., Mackenbach, J.P., and Steyerberg, E.W. (2011). Random variation and rankability of hospitals using outcome indicators. BMJ Qual Saf. 20, 869–874.
Mant, J. (2001). Process versus outcome indicators in the assessment of quality of health care. Int J Qual Health Care. 13, 475–480.
Gonzalez-Castellon, M., Ju, C., Xian, Y., Hernandez, A., Fonarow, G.C., Schwamm, L., Smith, E.E., Bhatt, D.L., Reeves, M., and Willey, J.Z. (2018). Absence of July Phenomenon in Acute Ischemic Stroke Care Quality and Outcomes. J Am Heart Assoc. 7.
Phan, H.T., Gall, S.L., Blizzard, C.L., Lannin, N.A., Thrift, A.G., Anderson, C.S., Kim, J., Grimley, R.S., Castley, H.C., Kilkenny, M.F., et al. (2021). Sex differences in quality of life after stroke were explained by patient factors, not clinical care: evidence from the Australian Stroke Clinical Registry. Eur. J. Neurol.. 28, 469–478.
Reeves, D., Campbell, S.M., Adams, J., Shekelle, P.G., Kontopantelis, E., and Roland, M.O. (2007). Combining multiple indicators of clinical quality: an evaluation of different analytic approaches. Med Care. 45, 489–496.
55, 1755–1766.
Qi, W., Ma, J., Guan, T., Zhao, D., Abu-Hanna, A., Schut, M., Chao, B., Wang, L., and Liu, Y. (2020). Risk Factors for Incident Stroke and Its Subtypes in China: A Prospective Study. J Am Heart Assoc. 9, e016352.
Lingsma, H.F., Steyerberg, E.W., Eijkemans, M.J., Dippel, D.W., Scholte Op Reimer, W.J., and Van Houwelingen, H.C. (2010). Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM. 103, 99–108.
Dimick, J.B., Ghaferi, A.A., Osborne, N.H., Ko, C.Y., and Hall, B.L. (2012). Reliability adjustment for reporting hospital outcomes with surgery. Ann. Surg.. 255, 703–707.
Romano, P.S., Marcin, J.P., Dai, J.J., Yang, X.D., Kravitz, R.L., Rocke, D.M., Dharmar, M., and Li, Z. (2011). Impact of public reporting of coronary artery bypass graft surgery performance data on market share, mortality, and patient selection. Med Care. 49, 1118–1125.
Chen, Y., Wright, N., Guo, Y., Turnbull, I., Kartsonaki, C., Yang, L., Bian, Z., Pei, P., Pan, D., Zhang, Y., et al. (2020). Mortality and recurrent vascular events after first incident stroke: a 9-year community-based study of 0·5 million Chinese adults. Lancet Glob Health. 8, e580-580e590.
Meng, Q., Xu, L., Zhang, Y., Qian, J., Cai, M., Xin, Y., Gao, J., Xu, K., Boerma, J.T., and Barber, S.L. (2012). Trends in access to health services and financial protection in China between 2003 and 2011: a cross-sectional study. Lancet. 379, 805–814.
Henneman, D., van Bommel, A.C., Snijders, A., Snijders, H.S., Tollenaar, R.A., Wouters, M.W., and Fiocco, M. (2014). Ranking and rankability of hospital postoperative mortality rates in colorectal cancer surgery. Ann. Surg.. 259, 844–849.
Donabedian, A. (1988). The quality of care. How can it be assessed. JAMA. 260, 1743–1748.
Liu, Y., Rao, K., Wu, J., and Gakidou, E. (2008). China's health system performance. Lancet. 372, 1914–1923.
Jacobs, R., Goddard, M., and Smith, P.C. (2005). How robust are hospital ranks based on composite performance measures. Med Care. 43, 1177–1184.
Shahian, D.M., Normand, S.L., Torchiana, D.F., Lewis, S.M., Pastore, J.O., Kuntz, R.E., and Dreyer, P.I. (2001). Cardiac surgery report cards: comprehensive review and statistical critique. Ann. Thorac. Surg.. 72, 2155–2168.
Abel, G., Saunders, C.L., Mendonca, S.C., Gildea, C., McPhail, S., and Lyratzopoulos, G. (2018). Variation and statistical reliability of publicly reported primary care diagnostic activity indicators for cancer: a cross-sectional ecological study of routine data. BMJ Qual Saf. 27, 21–30.
Jencks, S.F., Cuerdon, T., Burwen, D.R., Fleming, B., Houck, P.M., Kussmaul, A.E., Nilasena, D.S., Ordin, D.L., and Arday, D.R. (2000). Quality of medical care delivered to Medicare beneficiaries: A profile at state and national levels. JAMA. 284, 1670–1676.
Lilford, R., Mohammed, M.A., Spiegelhalter, D., and Thomson, R. (2004). Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet. 363, 1147–1154.

No competing interests reported.

supplementmaterial.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Biomarkers related to eRNAs predict early diagnosis and prognosis of colon cancer patients

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Results

Discussion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1