The IRF family can inuence tumor immunity and the prognosis of patients with colorectal cancer

Background: The roles of interferon-regulatory factors (IRFs) in colorectal cancer (CRC) have not been studied through bioinformatics analysis. Methods: We used gene- and microRNA-expression data for patients with somatic mutations and colon adenocarcinoma/rectum adenocarcinoma from The Cancer Genome Atlas Genomic Data Commons as a training dataset. Gene-expression data (accessions GSE17536 and GSE39582) were downloaded from the Gene Expression Omnibus database as the validation dataset. Expressional differences, clinical correlations, disease prognosis, functional enrichment, and immune correlations of IRF genes were analyzed. The results were validated via immunohistochemistry. Results: The mRNA-expression levels of IRF3 and IRF7 differed between tumor and normal tissues and were correlated with patient prognosis. The IRF score was an independent risk factor for overall survival. IRFs recruited inammatory cells; however, the immune and stromal scores showed inammatory-cell recruitment only in the tumor stroma; therefore, they did not help eliminate tumor cells. Functional-enrichment analysis and pan-cancer expression analysis revealed that IRFs were differentially expressed in tumor tissues and associated with patient prognosis. Conclusions: IRFs were differentially expressed in tumor tissues and were associated with prognosis in CRC patients. Although IRFs can promote the inltration of immune cells, the immune and stromal score showed that the inltrated immune cells mostly stayed in the tumor stroma and cannot directly eliminate the tumor. Our ndings can help to improve CRC prognosis and treatment strategies. Differentially expressed gene (DEG) and functional-enrichment analysis between high- and low-IRF groups. (A, B) Volcano and heat maps showing DEGs between high- and low-IRF groups. (C) GO analysis suggested that DEGs were closely correlated to the terms gas transport, antimicrobial humoral response, humoral immune response, and sensory organ morphogenesis. (D) Kyoto Encyclopedia of Genes and Genomes analysis showed that these DEGs were enriched for terms such as nitrogen metabolism, the JAK-STAT signaling pathway, Staphylococcus aureus infection, and cytokine-cytokine receptor interaction. (E) Volcanic maps showing gene-set enrichment analysis (GSEA) results for upregulated and downregulated pathways. (F) Patients in the high-IRF group showed correlations with the terms ribosome and cardiac muscle contraction pathways, whereas the terms hematopoietic cell lineage, intestinal immune network for IgA production, and chemokine signaling pathway (among other pathways) were signicantly underrepresented for patients in the high-IRF group.

theoretical basis for their mechanistic roles in tumor development and tumor immunity and for choosing drug therapies.
The roles of IRFs in CRC have not been investigated through bioinformatics analysis. Here, we used public databases to analyze IRF-expression levels and mutations in patients with CRC to determine distinct prognostic values, study tumor-immunity regulation, and identify potential functions of IRFs in CRC. We veri ed these results via immunohistochemistry (IHC) analysis with our own cohort of patients with CRC.

Data Acquisition
Data regarding fragments per kilobase million (FPKM) values and microRNA (miRNA)-expression levels for patients with colon adenocarcinoma/rectum adenocarcinoma (COAD/READ) were downloaded from The Cancer Genome Atlas (TCGA) Genomic Data Commons website (https://portal.gdc.cancer.gov/) and used as the training dataset. FPKM values were converted to transcripts per million values and divided into mRNA-and long non-coding RNA (lncRNA)-expression groups. "Masked Somatic Mutation" data were downloaded for patients with COAD/READ, pre-processed using VarScan software, and visualized using the R software package, maftools [7]. The clinicopathological features and prognoses of patients with COAD/READ, such as gender, age, and stage, were downloaded from the UCSC Xena website (http://xena.ucsc.edu/). After removing samples with missing clinical information, 597 tumor samples and 51 normal tissue samples were obtained. Table 1 shows clinical information for patients with COAD/READ. The likelihood of each response to immunotherapy was predicted using the Tracking of Indels by DEcomposition (TIDE) algorithm (http://tide.dfci.harvard.edu) [8]. Gene-expression data from different organizations and in different cell lines were downloaded from TCGA and the Cell Line Cancer Encyclopedia (CCLE) databases (https://portals.broadinstitute.org/ccle/about) to compare IRFexpression levels between tumor and normal tissues.
Gene-expression data in GSE17536 [9] and GSE39582 [10] and clinicopathological patient characteristics were downloaded as validation datasets from the Gene Expression Omnibus (GEO) database. The data were downloaded from Homo Sapiens; this platform is based on the GPL570 [HG-U133_PLus_2] Affymetrix Human Genome U133 Plus 2.0 Array. GSE17536 included 177 COAD tissue samples, and GSE39582 included 566 COAD tissue samples and 19 colon non-tumor tissue samples.

Genetic Characteristics of the IRF Family and Validation with Clinical Prediction Models
We incorporated the expression levels of IRF-family genes into our model. The random forest package of R [11] was used to develop an IRF-based risk-assessment model for patients with COAD/READ. Patients were divided into high-and low-IRF risk groups, based on the median value.
To assess patient prognosis by combining the IRF risk score with clinicopathological features, univariate and multivariate Cox proportional-hazards analysis were used to analyze the independent predictive power of risk scores for overall survival (OS) and disease-free survival (DFS). Subsequently, a survivalprediction nomogram was constructed for patients in the TCGA dataset and was validated for patients in the GEO dataset. To quantify differentiation performance, Harrell's consistency index (C-index) was measured. A calibration curve was generated to evaluate the performance of the line map by comparing the predicted value of the line map with the observed OS rate.

Differentially Expressed Genes (DEGs) and Clinical Correlation Analysis
Data for patients with COAD/READ were downloaded from TCGA and the GEO databases, and the patients were divided into high-and low-expression groups, according to the IRF score. The DESeq2 package of R [12] was used to analyze DEGs in both groups, where a log fold-change (logFC)>1.0 and P<0.05 were set as threshold values for DEGs.
We compared the expression levels of IRF-family genes at different TNM stages. The Human Protein Atlas (HPA, https://www.proteinatlas.org) provides immunohistochemical expression data for nearly 20 different cancers [13] and enables identi cation of tumor type-speci c, differential protein-expression patterns, where protein-expression levels of all IRF-family genes were compared between normal and CRC tissues.

Functional-Enrichment Analysis and Gene-Set Enrichment Analysis (GSEA)
Gene Ontology (GO) analysis is commonly used for large-scale functional-enrichment research of biological processes (BPs), molecular functions, and cellular components. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a widely used database containing information regarding genomes, biological pathways, diseases, and drugs. GO and KEGG pathway-enrichment analyses were performed with signature genes using the clusterPro ler R package [14]. A false-discovery rate of <0.05 was considered statistically signi cant.
To investigate differences in BPs among different subgroups, GSEA was performed using the geneexpression pro les of patients with COAD/READ. GSEA can be used to identify statistical differences between two groups in a gene set and estimate changes in pathways and BP activities [15]. The gene set "C2.CP.kegg. V6.2.-symbols" [15] was downloaded from the Molecular Signatures Database for GSEA analysis. An adjusted P value of <0.05 was considered statistically signi cant.

Constructing a Protein-Protein-Interaction (PPI) network and Screening Hub Genes
We used the Search Tool for Retrieving Interacting Genes (STRING) database [16], which predicts PPIs, to construct PPI networks for the selected genes. Genes with scores of >0.4 were selected to construct a network model, which was visualized with Cytoscape V3.7.2 [17]. In the co-expression network, the maximum clique centrality (MCC) algorithm most effectively located the node in a set. The MCC of each node was calculated using CytoHubba plugins [18] in Cytoscape, and genes with the highest eight MCC values were selected as hub genes.
Constructing a Competing-Endogenous RNA (ceRNA) Network Based on miRNA-mRNA and miRNA-lncRNA interactions LncRNA-miRNA-interaction data were downloaded from the miRcode database. and miRNA-mRNAinteraction data were downloaded from the miRTarBase, miRDB, and TargetScan databases. The DESeq2 packet of R [12] was used to analyze miRNA-and lncRNA-expression differences between the high-IRF and low-IRF risk groups. LogFC>1.0 and P<0.05 were set as criteria for a statistically signi cant difference. Cytoscape (V3.7.2) was used to construct a ceRNA network by analyzing correlations between lncRNA-and mRNA-regulated miRNAs simultaneously.
Tumor Immune Estimation Resource (TIMER)-Database Analysis and Comparing Immune-Correlation

Scores Between Both Groups
The TIMER database (https://cistrome.shinyapps.io/timer/) enables users to estimate B-cell, CD4+ T-cell, CD8+ T-cell, macrophage, neutrophil, and dendritic-cell in ltration into different tumor types [19]. We used the TIMER database to analyze correlations between the expression levels of different IRF genes and immune cell in ltrations in COAD/READ samples.
The R estimate package [20] quanti es immune cell-in ltration levels in tumor samples, based on geneexpression pro les, and was used to assess the immune-activity and stromal score of each tumor sample. Immune cell-in ltration levels between both groups were compared using the Mann-Whitney U test.

Analysis of Anticancer Therapy-Sensitivity
The Genomics of Drug Sensitivity in Cancer (GDSC) database (https://www.cancerrxgene.org/) enables exploration of gene mutations and targeted cancer therapies. We downloaded gene-expression data from cell lines and IC 50 values to analyze correlations between differentially expressed-IRF genes and anticancer drug sensitivities.

Calculating Tumor-Mutation Load Fractions and Analyzing Genetic Variation of IRF Family Members in CRC
The tumor burden (TMB) of each tumor sample was de ned as the number of somatic cell mutations identi ed, excluding silent mutations. Patients with COAD/READ were divided into high-TMB and low-TMB groups according to the median TMB value. The Wilcoxon rank-sum test was used to compare risk scores of IRF-family genes between both groups.

Patients and Specimens in the Validation Cohort
Tumor specimens were obtained from 114 patients with CRC who underwent treatment at Zhongshan Hospital (Fudan University) between 2008 and 2016. The inclusion criteria were as follows: (a) a clear pathological diagnosis of CRC, (b) complete follow-up data until December 2019, (c) suitable formalinxed and para n-embedded tissues, and (d) agreeing to participate in the study and provide signed informed consent. CRC diagnosis was based on the World Health Organization criteria, and tumor stages were classi ed according to the 7 th edition of TNM classi cation of International Union Contra Cancrum. Ethical approval was obtained from the research ethics committee of Zhongshan Hospital. The clinical characteristics of the 102 patients with follow-up data are presented in Supplemental Table 1.

IHC-staining Evaluation
Cancer and adjacent normal tissues were formalin-xed, para n-embedded, and prepared as tissue microarrays (TMAs) after hematoxylin and eosin staining and histopathology-guided location. Fivemicron-thick TMA sections were depara nized and rehydrated in 0.1 M citrate buffer (pH 6.0), followed by high-temperature antigen retrieval in a microwave for 15 min. The sections were incubated overnight at 4°C with primary antibodies against IRF3 and IRF7 (Abcam, Cambridge, UK), CD4 (Servicebio Technology, Wuhan, China), CD8 (Servicebio Technology), CD19 (Servicebio Technology), and CD68 (Servicebio Technology). The sections were incubated for 30 min with a secondary antibody at room temperature and immunostained based on avidin-biotin complex formation, using 3,3¢diaminobenzidine. Hematoxylin was used as a counterstain.
Antigen-antibody complexes in whole samples were detected using a panoramic slice scanner (3DHISTECH, Hungary) and viewed with CaseViewer 2.2 (3DHISTECH). H-scores were calculated to evaluate gene-expression levels using Quant Center 2.1 (3DHISTECH): H-score=Σ (PI×I)=(% of weakly stained cells×1)+(% of moderately stained cells×2)+(% of strongly stained cells×3), where PI is the proportion of the positive area, and I is the staining intensity.

Statistical Analysis
The data were analyzed with R software (V4.0.2). The independent Student t test was used to estimate the statistical signi cance of normally distributed variables, and the Mann-Whitney U test was used to analyze differences in non-normally distributed variables between two groups of continuous variables. The chi-squared test or Fisher exact test was used to analyze statistical signi cance between two groups of categorical variables. Correlation coe cients between different genes were calculated by Pearson correlation analysis. The survival package of R was used for survival analysis, Kaplan-Meier survival curves were used to determine survival differences, and the log-rank test was used to evaluate signi cant differences in survival times between two groups. Univariate and multivariate Cox analyses were used to determine independent prognostic factors. The pROC package of R [21] was used to draw receiver operating-characteristic (ROC) curves, and area under the curve (AUC) values were calculated to assess the accuracy of risk scores for prognosis estimations. All statistical P values were bilateral, and P<0.05 was considered statistically signi cant.
Analyzing protein-expression levels in CRC and normal tissues with the HPA database ( Figure 1D, 1F; Supplementary Figure 1) revealed that IRF3 and IRF7 were upregulated in cancer tissues ( Figure 1D, 1F). IHC con rmed these results and also that the IRF3 protein was more highly expressed in cancer tissues than that in normal tissues ( Figure 1E, 1G).

An IRF Risk Model Predicted OS and DFS in Patients with COAD/READ
We compared IRF-expression levels with tumor stages in patients with COAD/READ. IRF1 and IRF6 expression signi cantly varied (IRF1: P<0.001, Figure 3A; IRF6: P=0.041, Figure 3B), whereas IRF2-5 and IRF7-9 expression did not. IRF6 expression was positively correlated with TNM staging, whereas IRF1 expression was not completely positively correlated. A random-forest model was applied to the GEO dataset, and patients in the TCGA and GEO datasets were divided into high-and low-IRF score groups, based on the median risk score ( Figure 3C). Patients in the low-IRF score group showed a better prognosis (TCGA: log-rank P<0.001, Figure 3D; GEO: log-rank P=0.045, Figure 3E).
Univariate and multivariate Cox analysis showed that IRF risk score was an independent risk factor for OS and DFS (Table 2, 3; Figure 3F, 3G). IRF risk scores and clinicopathological features were used to construct a nomogram to predict OS and DFS ( Figure 3H

Relationship Between IRF Scores and Gene-Expression Pro les
Analysis of data for patients in the high and low IRF-score groups identi ed 126 DEGs (|logFC|>1.0 and P<0.05; Figure 4A, 4B).
GO analysis showed that the DEGs were closely related to BP terms such as gas transport, antimicrobial response, humoral immune response, and sensory organ morphogenesis ( Figure 4C, Supplemental Table  2). Differentially expressed IRF genes were associated with enriched KEGG terms such as nitrogen metabolism, JAK-STAT signaling pathway, Staphylococcus aureus infection, and cytokine receptor interaction pathways ( Figure 4D; Supplemental Table 3).
GSEA showed that the ribosome and cardiac muscle contraction terms were signi cantly enriched for patients with high IRF scores ( Figure 4E; Supplemental Table 4), whereas the terms hematopoietic cell, intestinal immune network for IgA production, and chemokine signaling pathway were signi cantly underrepresented for patients with high IRF scores ( Figure 4E). Figure 4F shows enrichments for the related pathways.

IRF-Expression Levels Corresponded with Immune Cell In ltration
In patients with COAD/READ, IRF mRNA-expression levels correlated positively, in most cases, with the in ltration levels of different immune cells ( Figure 5A, B; Supplemental Figure 2). We also observed positive correlations between IRF3 and IRF7 protein-expression levels and tumor-in ltrating immune cell markers via IHC in 102 patients with CRC. IRF3 expression was positively correlated with CD4 expression, suggesting a correlation with CD4+ T cell-in ltration, whereas IRF7 expression was positively correlated with CD4 and CD68 expression, suggesting correlations with T-cell and macrophage in ltration ( Figure  5C, D).

Correlations Between IRF Gene-Expression Levels and the Biological Characteristics of Patients with COAD/READ
Analysis of TCGA and GEO datasets showed that patients in the high-IRF risk group had lower immune and stromal related scores than those in the low-IRF risk group (P<0.05; Figure 6A, 6B). Signi cant differences in IRF scores were found between patients that bene tted from immune therapy and those that did not (P=0.025, Figure 6C), based on TIDE scores. We analyzed the effects of IRF gene-expression levels on sensitivities to different chemotherapeutic drugs or small-molecule inhibitors. In Figure 6D, red font indicates increased drug sensitivity with increased IRF gene-expression levels, and green font indicates negative correlations between drug sensitivity and gene-expression levels. Signi cant differences in IRF scores were also found between patients with high and low TMBs ( Figure 6E). Analysis of TCGA data for IRF gene mutations in patients with COAD/READ showed that the IRF2 gene has the highest mutation rate ( Figure 6F).
The STRING database was used to build a PPI network for the DEGs identi ed in this study ( Figure 6G), which was imported into Cytoscape software ( Figure 6H). The top eight DEGs were selected from the PPI network as hub genes with CytoHubba plug-ins, using the MCC algorithm ( Figure 6I). We also constructed a ceRNA network based on differentially expressed mRNAs, miRNAs, and lncRNAs in patients with COAD/READ ( Figure 6J).

Analysis of IRF Gene-Expression Levels in Different Tumors
The CCLE and TCGA databases were used to analyze mRNA-expression levels of IRFs in different tumor cells (Supplemental Figure 3 and 4, respectively).

Discussion
Differential expression of IRF genes has been reported in many cancers [6], and IRFs play important roles in CRC tumorigenesis and prognosis. However, this study is the rst to explore IRF-expression levels, at both the mRNA and protein levels, and the prognostic value, effects on immune cells, and potential molecular pathways of IRFs in CRC.
IRF3 and IRF7 are closely related, and unlike other IRFs, they are considered key for evading innate immune responses to virulence factors [22]; thus, they may play crucial roles in anticancer immunity. IRF3 plays important roles in DNA-damage responses (DDRs) in cancer [23]. During chemotherapy with DDR agents and immunotherapy involving checkpoint blockade, IRF3 expression is upregulated via cGAS-STING pathway activation [24,25]. IRF3 activation in response to DDR promotes its role in upregulating RAE1 [26], which is the tumor-cell ligand for NKG2D on NK cells. Together, RAE1 and NKG2D stimulate NK cell-effector function. IRF3 overexpression inhibits tumor-cell growth by increasing p53 activity in vitro [27]. Additionally, IRF3 may be involved in STING activity [28]. Increased PD-L1 expression following treatment with DDR inhibitors is mainly IRF3-dependent [25], and tumor-growth inhibition and immunecheckpoint blockade with DDR inhibitors is completely dependent on the cGAS-STING-IRF3 axis. Our current ndings further suggest an additional bene t of cGAS-STING-IRF3 axis activation due to increased expression of the CXCL10 and CCL5 chemokines, leading to T-cell tumor in ltration. Previously, we found that IRF3 and IRF7 could mediate the acquisition of new anti-tumor effector functions in macrophages [29]. In the present study, we observed that high IRF3 and IRF7 expression was related to CD4+ T-cell, CD8+ T-cell, B-cell, and macrophage activation, indicating that IRF3 and IRF7 could promote the anticancer effect of immune cells.
Interestingly, among all IRF factors, the mRNA-and protein-expression levels of IRF3 and IRF7 were signi cantly upregulated in tumor tissues and associated with poor OS in patients with CRC. As IRFs are transcription factors, they may also in uence tumor-cell development by regulating the transcription of other oncogenes, although the related mechanisms require further investigation. We further assessed the relationship between IRF risk scores and immune and stromal scores in tumor patients to examine why increased IRF3 and IRF7 expression can promote immune cell recruitment without killing tumors. Using TCGA and GEO datasets, patients in the high-IRF risk group showed lower immune and stromal scores than those in the low-IRF risk group, indicating that the immune cells clustered in the tumor stroma without reaching the tumor cells. This may also explain why CRC is not sensitive to tumor immunotherapy.
In conclusion, we investigated the mRNA and protein levels of IRFs in patients with COAD/READ and their effects on prognosis. IRF3 and IRF7 were signi cantly upregulated in tumor tissues and associated with poor OS in patients with CRC. Although IRFs can promote immune cell in ltration, the immune and stromal scores showed that in ltrating immune cells mostly stayed in the tumor stroma and did not directly kill the tumor. We also investigated pathways potentially related to IRFs; however, these results require further study and con rmation.

Competing interests
The authors declare that there are no con icts of interest.

Ethics approval and consent to participate
Ethical approval was obtained from the research ethics committee of Zhongshan Hospital (SK2020-104).

Author Contributions
YJC conceived and designed the experiments; YJC and SNL performed the bioinformatics analysis; LD and TTL performed the immunohistochemical analysis and patient data compilation; XZS followed up the patients; YJC, NPZ and LL wrote and edited the manuscript.  IHC analysis of cancer and paracancerous tissues in 12 patients with CRC con rmed the IRF3 and IRF7 protein levels in CRC tissues, revealing that IRF3 was upregulated in CRC tissues, whereas the IRF7 level showed no signi cant difference between cancer and normal tissues.    The Search Tool for Retrieving Interacting Genes (STRING) database was used to analyze a PPI network of DEGs. (H) The STRING results were imported into Cytoscape software. Red text represents upregulated genes and blue text represents downregulated genes, where the color intensity was positively correlated with logFC. (I) The maximum clique centrality algorithm was used to identify core genes in the PPI network, and the red and yellow nodes represented the top eight hub genes. (J) A ceRNA network was constructed based on differentially expressed mRNAs, miRNAs, and lncRNAs, where the yellow diamonds represent lncRNAs, the green triangles represent miRNAs, and the red ovals represent mRNAs.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.