Human Cytomegalovirus Induced Immune Regulation Is Correlated With Poor Prognosis in Colorectal Cancer Patients

Available evidence suggests that human cytomegalovirus (HCMV) infection may be implicated in the progression of colorectal cancer (CRC). However, the correlation between HCMV infection and survival outcomes in CRC patients is unclear. Here, we constructed a ow algorithm to identify HCMV sequences based on the RNA-seq data of CRC patients derived from Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO). The patients' clinical information matrix was used to calculate Euclidean distance to lter out suitable patients not infected with HCMV, combining with the patient's survival outcome to reveal how HCMV infection is involved in CRC progression. HCMV infection is widespread in CRC patients., The prevalence of HCMV infection is ranging from 10% to 36% in 4 independent CRC datasets with infection being concentrated in carcinoma tissue rather than in normal tissue. In addition, HCMV positive patients had a poor survival prognosis, with three of the HCMV genes associated with poor patient survival outcomes, UL82, UL42 and UL117. Most importantly, we suppose that the regulation of immune function by HCMV may be the key to the poor prognosis of CRC patients. We found that HCMV infection was associated with poor prognosis in CRC patients and identied three prognosis associated HCMV genes. The regulation of immune function caused by HCMV infection was the key factor while HCMV positive CRC patients mostly presented a state of immunosuppression. This may provide new ideas for personalized treatment of CRC patients, especially in immunotherapy.


Introduction
As one of the most prevalent cancers, colorectal cancer (CRC) is responsible for overwhelming cancerrelated death worldwide [1]. The underlying cause of the tumorigenesis of CRC remains unclear. Thanks to advanced screening methods and therapies, the mortality rates of CRC have generally decreased [1,2].
HCMV is under the continuous control of the host immune system and is considered harmless in healthy individuals. In the process of in ammation, latent HCMV infection is reactivated and these in ammatory cells can spread the virus to surrounding organs [6]. HCMV can infect various organs such as the gastrointestinal tract, liver, lung, retina and brain and multiple cell types, including broblasts, epithelial, endothelial, granulocytes, monocytes, macrophages and smooth muscle cells [7].
HCMV infection may lead to chronic in ammation in the host, which is may be associated with proliferative diseases such as cancers [8,9]. Nevertheless, the precise correlation between HCMV and cancer remains to be clari ed. It has been proved that HCMV plays a role in malignancies including ovarian cancer, cervical cancer, breast cancer, brain tumors and CRC [10][11][12][13][14]. Studies have demonstrated that HCMV infection was associated with the risk of cancer-related mortality [15]. However, the relationship between HCMV infection and CRC remains a matter of debate.
In this study, we aimed to investigate the possible interactions between HCMV infection and CRC by analyzing the prevalence of HCMV in CRC and the changes in the immune status of CRC patients with HCMV. Our data indicate a signi cant role of HCMV in immune in ltration and regulatory effect on the prognosis of CRC.

Data Source
The raw data les and related clinical information of CRC were downloaded from the TCGA ( https://portal.gdc.cancer.gov/ ) and the GEO ( http://www.ncbi.nlm.nih.gov/geo ) database (GSE158559, n = 53; GSE107422, n = 110; PRJNA387172, n = 208). The RNA sequencing (RNA seq) gene expression level-3 data and clinical data of CRC patients were retained for further analysis. We downloaded 1793 immune-related genes from the ImmPort database ( https://immport.niaid.nih.gov ) along with the functional annotation les for these genes [16].

HCMV Gene Expression Count Generation
The raw fastq data were adaptor-trimmed by fastp [17]. Then, we mapped to hg38 human reference genome and rRNA reference genome using the Bowtie2. Unaligned (non-human) reads were extracted using the kneaddata pipeline with the default parameters [18]. Next, unmapped sequences were aligned to the viral genomes from NCBI. Alignment les were sorted and indexed with SAMtools [19]. Aligned reads were counted using FeatureCounts [20]. If the sample matches on 10 viral sequences, we consider it a positive sample.
Based on the genome le of the Merlin strain of HCMV (NC_006273.2), we produced a gene transfer format (GTF) le for reading analysis of the coding sequences. Normalization of virus coding sequences using rpkm function.

PCR and nested RT-PCR analysis
The total genomic DNA of 20 paired colorectal cancer tumors and adjacent normal tissues was extracted using the TIANamp Genomic DNA kit (Tiangen Biotechnology Co., Ltd., Beijing, China) according to the manufacturer's protocol. The PCR products were observed after staining with GelGreen (Solarbio, Beijing, China) on 2% agarose gels. The sequences of the ve HCMV gene-speci c primers used for PCR are listed in Table 1. Total RNA was isolated from the tissue using the RNAiso Plus (Takara,Japan), and cDNAs were synthesized from total RNAs with PrimeScript RT reagent Kit (Takara,Japan),according to the manufacturer's protocols. For the quanti cation of the relative mRNA expressions of each group, real-time PCR was carried out using CFX96 real-time PCR detection system (Bio-Rad, USA) as follows. cDNAs were ampli ed using TB Green Premix Ex Taq II (Takara,Japan)and the speci c primers ( Table 2). Output data were obtained as Ct values and the differential mRNA expression of each gene among samples was calculated using the 2-ΔΔCt method.

Sample Selection
We constructed a matrix based on all clinical data of CRC in TCGA, which included gender, TNM stage, histological type, cancer status, Consensus molecular subtype and age at initial pathologic diagnosis. All values were vectorized except age at rst diagnosis, and missing values were replaced by NA. A distance matrix was constructed by using the R function "dist()". The distance between two samples was calculated by Euclidean distance.

Differential Expression Analysis
DESeq2 software was used to identify genes that were signi cantly differentially expressed in the HCMV and non-HCMV groups [21]. Transcripts with |logFC| >1.0 and FDR < 0.05 were considered as differentially expressed genes (DEGs). GO and KEGG enrichment analysis and GSEA enrichment analysis were performed using the cluster pro le package (Version 3.14.0) [22]. Identi cation of colorectal cancer subtypes by CMScaller machine learning model [23].

Analysis of Immune Genes and Immune In ltration Levels
Annotation of immune-related genes obtained from ImmPort. The GSCALite ( http://bioinfo.life.hust.edu.cn/web/GSCALite/ ) website was used for the analysis of immune-related differential gene sets associated with viral infections, and the site provides pathway enrichment pathways with active/inhibitory/non-signi cant effects [24], Immune cells were predicted and visualized using ImmuCellAI analysis [25].

Statistical Analysis
Statistical analysis, as well as graphics, were performed using the freely available software R (R CoreTeam 2020, Version 4.0.2, https://www.r-project.org/ ). Survival data was analyzed using "Kaplan-Meier survival estimates" feature of Origin Pro (OriginLab Corporation, USA).

Results
The Prevalence of HCMV in CRC  S2a). There was no signi cant difference in the proportion of TNM stages between the HCMV positives and control groups. However, there are differences in infection between subtypes, with the CMS4 subtype having a higher rate of infection. (Fig. 1c and Fig. S2b).
Moreover, the data of PRJNA387172 cohort demonstrated that HCMV was presented in the tumor tissues and not in adjacent normal tissues (Fig. 1d). We designed primers using ve HCMV genes with high positivity in colorectal cancer gene expression pro le data and analyzed positivity in tumor and peritumor (at least 5 cm from the tumor site) tissue samples from 20 patients. HCMV was found positive in 3 patients. and none of the genes for HCMV were detected in the corresponding paracancerous tissues ( Fig. 1e and TableS2). Then, we assessed whether the presence of HCMV would have an effect on the clinical prognosis for CRC patients. Interestingly, our results indicate that lower rates of overall survival (P = 0.015) in the CRC with HCMV group than the non-HCMV group (Fig. 1f).
The contribution of gene expression of HCMV to the survival outcome of CRC One step further, we attempted to investigate the expression of HCMV genes in CRC. Using the criterion of more than 5 viral gene expressions in a single sample, 51 cases of 55 HCMV positive patients in the TCGA datasets were included in statistics. Following this, 108 out of 173 HCMV genes remained after screening based on the condition that a single HCMV gene was positive in more than 5 samples. Among these genes, RNA2.7, RL5A, RNA4.9, UL82 and UL22A were the most abundantly expressed ( Fig. 2a and Fig. S2c).
In addition, univariate survival analysis of these 108 HCMV genes showed that three viral genes, which were UL82, UL42, and UL117, were associated with poor patient prognosis (Fig. 2b-d). Among them, UL82 was not only signi cantly correlated with poor prognosis in CRC patients but also had a high positive rate in both datasets. P values for the complete univariate survival analysis of the 108 HCMV genes are provided in Table S3.
The correlation of HCMV induced gene expression to the survival outcome of CRC In order to further observed the relationship between CRC and HCMV, we constructed distance matrix analysis based on all clinical data of CRC samples in TCGA and singled out 52 CRC samples in 544 CRC samples without HCMV for subsequent studies (Fig. S1b and TableS4). Differentially expressed genes (DEGs) in CRC with HCMV compared with non-HCMV were identi ed by DESeq2, where a total of 538 DEGs were recognized. The DEGs comprised 522 upregulated genes and 16 down-regulated genes (Fig. 3a). A list of all differential genes is presented in Table S5. To identify the potential function of the DEGs, function enrichment analyses were performed with clusterPro ler. KEGG enrichment analysis revealed a strong relationship with several pathways involved in cytokine-cytokine receptor interaction, Tuberculosis and Human cytomegalovirus infection ( Fig. S3a and TableS6). GO enrichment analysis showed that the DEGs were enriched in granulocyte chemotaxis, neutrophil chemotaxis, lymphocyte migration, etc ( Fig. 3b and TableS7). Among the top 10 enriched functions 7 are related to the immune system. These enrichment results demonstrate that HCMV infection causes alterations in the immune status of CRC patients.
We then investigated the potential mechanism of the HCMV in the pathogenesis of CRC in terms of immune genes. 97 immune genes were signi cantly upregulated compared to non-HCMV patients (TableS8). In addition, functional annotation indicates that most of they are referred to antimicrobials, cytokines and cytokine receptors (Fig. 3c).
The protein-protein interaction network of 97 immune-related DEGs was constructed based on the STRING database. An interconnected network of these proteins was discovered. Then, the signi cant modules were identi ed via the MCODE plugin. As illustrated in Fig. 3C, the top functional clusters of modules were selected (MCODE score = 21.524). 22 hub genes (CCL2, PTGER3, C3AR1, C5AR1, IL6, CXCL10, CXCL6, CCL13, CXCL9, CCR3, C3, PPBP, CXCR2, FPR1, CXCL8, CCR1, CCL4L1, CXCL5, FPR2, CXCR1, CXCL11, CCL21) were identi ed in the top functional clusters (Fig. 3d). Strikingly, the 22 hub genes act a role in activating the apoptosis pathway, EMT pathway. However, they inhibited DNA Damage Response (Fig. S3b). Collectively, these results indicated that the potential role of the HCMV in promoting the development of CRC. In addition, Kaplan-Meier survival analysis indicates that among the 22 hub genes, 3 hub genes were associated with the OS of CRC patients. CRC patients with higher expression of C3(P = 0.022), CCL21(P = 0.038) and CCL2(P = 0.049) and exhibited worse overall survival compared with those with lower expression. We ltered negative patients with clinical information similar to that of HCMV-positive patients based on the clinical information of 20 patients, among the HCMV positive samples, C3,CCL21 and CCL2 genes showed high expression ( Fig. 3e-g).
The correlation of HCMV induced immune regulation to the survival outcome of CRC With the approval of immune checkpoint inhibitors as routine drugs, the possibility of immunotherapy waiting for further investigations. Gene set enrichment analysis (GSEA) revealed signi cant enriched gene sets cancer immunotherapy by PD-1 (Programmed death-1 receptor) blockade and cancer immunotherapy by CTLA4 (Cytotoxic T-lymphocyte antigen 4) blockade (Fig. 4a). We compared the differences in the expression of seven immunosuppressive checkpoints, PDCD1LG2, CD274 and HAVCR being the most pronounced ( Fig. 4b and Fig. S3c). Subsequently, we similarly validated the expression of three immune checkpoint genes in HCMV positive tissues (Fig. 4c).
Then we explored the correlation between HCMV and immune cell levels in CRC visualized by ImmuCellAI. The In ltrationScore was signi cantly higher in CRC patients with HCMV than non-HCMV. Moreover, the analysis revealed that CRC patients with HCMV were more enriched with immune cells such as Tr1 cells, DC cells, monocyte cells and macrophage cells (Fig. 4d). Strikingly, groups with high expression of Tr1, and macrophage cells showed poorer overall survival rates (Fig. 4e). In summary, these results indicate that HCMV may play a signi cant role in the poor prognosis of CRC. HCMV infection induces an immunosuppressed state in CRC patients. In addition, altered immune cell in ltration, especially high in ltration of Tr1 and macrophages, was associated with malignant progression of CRC.

Discussion
In this study, we demonstrated that HCMV are common in CRC and mainly presented in the tumor tissues and not in adjacent normal tissues. Furthermore, CRC patients with HCMV have lower rates of overall survival than non-HCMV patients. Chen et al. detected HCMV DNA in 42.3% of the tumor specimens by PCR, while only 5.6% of the adjacent non-neoplastic tissues were positive for HCMV [27]. We suppose that HCMV preferentially infects tumor tissue, but the exact reason remains unknown. In addition, Chen et al. found that in CRC, the presence of HCMV in the tumor was associated with poor prognosis in older patients [28]. However, it remains unclear how HCMV infection is involved in the malignant progression of CRC and what are the key genes that lead to poor patient prognosis. In this study, we found that HCMV infection caused alterations in immune genes, immune in ltration levels, and immune checkpoints in CRC patients. The altered immune status may be a critical contribution to poor prognosis. In ammation and immune system relate to cancer pathogenesis [29]. In addition, the imbalances between tumor progression and the host immune response will lead to the tumor progression [30]. There is a complex biological process between immune cells and tumor. It is well acknowledged that HCMV regulates the secretion of numerous cellular chemokines, cytokines, and growth factors and encode cytokine, chemokines and receptors homologs at different stages of infection. And HCMV take part in controlling immune activity of the host [31][32][33]. CCL21 targets binding to CCR7 ligand and participate in CRC metastasis by regulating the production of MMP-9 [34]. CCL2 can act directly on the CCR2 receptor on the vascular endothelial cell membrane to chemotactic endothelial directed cell motility. What's more, CCL2 promotes tumor cell growth and angiogenesis by recruiting tumor-associated macrophages and secreting cytokines such as VEGF, TGF-β and TNF-α. It also secretes matrix metalloproteinases MMP-2 and MMP-9, which are involved in extracellular matrix destruction and remodeling and promote tumor cell invasion and metastasis [35,36]. Recent work shows that tumor-derived C3 is involved in suppressing antitumor immunity by regulating tumor-associated macrophages [37].
Immune cells in ltration was associated with tumorigenesis and the growth and metastasis of tumor cells [38]. Study has showed that high expression of HCMV gene changes T-cell in ltration and is associated with better gastric cancer patient's survival [39]. Studies in CRC and breast tumors showed that quantify immune cell abundance prognostic predictive value [40][41][42][43]. We show that macrophages emerge as pro-tumor players and more enriched in CRC patients infected with HCMV. Tr1 cells derived from naïve CD4 + T cells, regulate immune responses and play important roles in adaptive immune response [44]. Tr1 cells involved in tumor were perceived as having anti-tumor response [44,45]. However, some studies demonstrate that tumor associated Tr1 cells were tumorigenic because they secreted immunosuppressive cytokines and associated with poor prognosis [46]. In this study, CRC patients with high expression of Tr1 showed poorer overall survival rates and CRC patients with HCMV were more enriched with Tr1 cells. Macrophages, as phagocytes, were the most frequently found in ltrated immune cells in the tumor and which not only can act as tumor promoter but also being antitumor [47]. In the large majority of cases, macrophages synthesize in ammatory cytokines, tumor angiogenesis and stimulate tumor initiation, progression, metastasis [48]. Immune checkpoint proteins play a crucial role in the regulation of cellular immunity, and the expression of immunosuppressive checkpoints impairs immune function and consequently inhibits antitumor responses. ICPs (Immune checkpoint inhibitors) targeting CTLA-4 and PD-1 have been developed as antitumor drugs. Nevertheless, the sensitivity of different CRC patients against ICPs therapy is not similar [49]. CRC patients infected with HCMV may present a suppressive tumor microenvironment. These patients appear to be better suited for drugs therapy targeting ICPs.
During HCMV infection, UL117 is mainly localized to the nucleus of the cell, which is vitally important for the maintenance of broblast infection. UL117 acts to promote the development of nuclear replication compartments to facilitate viral growth [50]. UL42 and UL82 can inhibit cGAS-STING mediated signaling to evade antiviral immunity [51,52]. UL82 encodes a protein located in the outer tegument protein of viral particles. UL82 promotes cell cycle progression from G1 to S phase through ubiquitin-dependent degradation of Rb proteins [53]. Expression of UL82 suppresses the level of miR21, which promotes cell cycle progression by downregulating tumor suppressor proteins [54]. UL82 involved in cancer progression by regulating cell cycle processes, and has a high positive rate in CRC patients and is highly relevant with the patient's prognosis. Whether UL82 can be used as a biomarker to predict the prognostic level of HCMV-infected CRC patients remains to be further investigated.
In summary, we revealed the prevalence of HCMV in CRC and their associations with clinical prognosis. HCMV in CRC may stimulate the expression of immune-related genes and immune associated cells in ltration then lead to poor prognosis.

Declarations
Author Data Availability All data generated or analyzed during this study are included in this published article.
Con ict of interest The authors declare no potential con cts of interest.
Ethical approval Each patient provided informed written consent and the study was performed after the approval of the Human Research Ethics Committee at the Second A liated Hospital of Wenzhou Medical University in accordance with the Declaration of Helsinki.
Consent to participate Informed consent was obtained from all subjects involved in the study.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.