A Comprehensive Bioinformatic Analysis of RFC1/2/3/4/5 As Signi cant Clinical Indicators in The Diagnosis and Prognosis of Low-Grade Glioma


 Background: Replication factor C (RFC) proteins play a very important role in nuclear DNA replication, mismatch repair and cell cycle checkpoint pathways. However, the relationship between RFCs and brain tumors is still unclear, especially the diagnostic and prognostic significance of RFCs in low-grade glioma. Methods: In our study, we applied the Oncomine, GEPIA, Human Protein Atlas, cBioPortal, STRING, LinkedOmics and Tumor Immune Estimation Resource (TIMER) databases to analyze the transcriptional and survival data of RFCs in brain and central nervous system (CNS) cancer, especially in low-grade glioma.Results: We found that RFCs were highly expressed in brain and CNS cancer, including low-grade glioma. The transcriptional levels of RFCs were also associated with the tumor stages. Moreover, the survival analysis of RFCs in low-grade glioma patients revealed that enhanced expression of RFCs led to poorer prognosis. RFCs and their 50 frequently altered neighbor genes were found enriched in certain pathways. We also discovered the kinase targets, transcription factor targets and miRNA targets of RFCs in low-grade glioma. RFCs also showed positive correlation with certain infiltrated immune cells in low-grade glioma.Conclusions: These implied that RFCs are the possible biomarkers of the diagnosis and prognosis of low-grade glioma.

unique functions in normal cell processes. RFC1 contains the main DNA binding area and directly participates in DNA replication and repair (13). RFC2 is correlated with cell cycle checkpoint signaling pathway and it also plays a part in PCNA-related mismatches repair after the DNA damage (14). RFC4 functions in DNA damage checkpoint pathways while RFC5 is needed for opening PCNA clamp during DNA replication (15,16). Moreover, the interaction between RFC1 and RFC complex which consists of RFC1, RFC2 and RFC5 needs the participation of RFC3 (17).
RFCs also play a very important role in cancer proliferation, migration and invasion (11). Many previous researches have reported the subunits of RFCs to be remarkable differentially expressed genes when comparing tumor tissue and normal tissues in many cancer types, such as breast cancer, nasopharyngeal carcinoma, ovarian tumor, acute myeloid leukemia and so on (18,19,20,21,22,23,24,25). Some of the RFCs subunits including RFC2, RFC4 and RFC5, have been reported in glioma and glioblastoma, however, the studies are not su cient (26,27,28,29,30). The studies of RFC1 and RFC3 in brain and central nervous system cancer can rarely be found. With the development of sequencing technology and bioinformatic analysis, we wondered if RFCs could be clinical biomarkers in the diagnosis and prognosis of LGG patients. In our study, we completed a comprehensive bioinformatic analysis of the expression of

ONCOMINE database
ONCOMINE database (www.oncomine.org) is a comprehensive cancer microarray database with precise data-mining functions based on website (31). We used ONCOMINE to look in the expression patterns of RFCs in different kind of cancers with the p-value < 0.05, fold change > 1.5 and a gene rank in the top 10% set for the screening. We veri ed the data signi cance by the Student's t-test. Moreover, we also used ONCOMINE to dig the expression levels of RFCs in different grades of LGG and Brain cancer patients.
GEPIA database GEPIA (http://gepia.cancer-pku.cn/index.html) is an online data-mining tool based on the RNA sequencing data from TCGA and GTEx and it was developed by Tang et al (32). We evaluated the expression pro les of RFCs in patients with LGG by GEPIA and assessed the statistical signi cance by the log2 Fold change cutoff at 1, p-value cutoff at 0.01. We also use GEPIA to analyzed the survival plots of RFCs in LGG patients with the expression cutoff 75% as RFCs high expression group and the expression cutoff of 25% as RFCs low expression group. What's more, we evaluated the expressive correlation between RFCs one to one by GEPIA with the Pearson method and take p-value < 0.05 as the criteria of signi cance. GEPIA2 database was used to identify the 50 frequently altered neighbor genes of RFCs.

Human Protein Atlas database
The Human Protein Atlas database(https://www.proteinatlas.org) contains pro ling expression patterns of immunohistochemistry(IHC) and immuno uorescence(IF) in normal and cancer tissues (33). In current study, Human Protein Atlas provided us for the immunohistochemistry staining images of RFCs in normal brain tissues and brain glioma tissues. cBioPortal database CBioPortal database(www.cbioportal.org) is an online resource storing readily understandable genetic, epigenetic, gene expression, and proteomic events. It also provides a platform for analysis of gene mutation patterns in different cancers (34). We used cBioPortal for RFCs mutation analysis in LGG patients based on the TCGA, with a mRNA expression z-score(RNAseq V2 RSEM) for diploid samples at ± 2.0. The survival analysis for RFCs mutations was also carried out by cBioPortal and the survival curves of overall survival (OS) and disease free survival(DFS) were considered signi cant when p-value < 0.05. STRING database and Cytoscape STRING database(https://string-db.org/) provided free public information of protein-protein associations and their connectivity network with computational predictions (12). Cytoscape is a free resource software to construct and exhibit the biomolecular interaction networks with input data of molecules interactions (35). We predicted our RFCs and 50 frequently altered neighbor genes protein-protein interaction (PPI) network with STRING database and got the nodes information of the network. Then, we input the information into Cytoscape and constructed our PPI network frameworks.

R packages for GO and KEGG enrichment analysis
To performed the Gene Ontology(GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway enrichment analysis of RFCs and 50 frequently altered neighbor genes, we applied the "clusterPro ler" R package and set the p-value as 0.05 for statistical signi cance (36). Then, the GO enrichment plots were drew by "Goplot" R package (37) and the KEGG enrichment plots were drew by "pathview" R package (38).

LinkedOmics database
LinkedOmics (http://www.linkedomics.org/) database is an open source for multi-omics data and clinical data of 32 cancer types based on the TCGA and provides comprehensive analysis by the LinkedOmics module, the LinkFinder module and the LinkInterpreter module (39). We predicted the kinase targets, the miRNA targets and the transcriptional targets of RFCs on LinkedOmics by the method of by Gene Set Enrichment Analysis (GSEA) in patients with LGG. A minimum number of gene(size) of 3 and a simulation of 500 was set. The data were considered signi cant when < 0.05.

TIMER database
Tumor Immune Estimation Resource (TIMER; https://cistrome.shinyapps.io/timer/) provides investigation of the correlation between molecular characterization and tumor-immune interactions. Levels of six tumor-in ltrating immune subsets were observed from 32 cancer types (40). TIMER database was applied to evaluate the correlation between RFCs expression and the six kinds of immune cells in ltration in LGG patients. The survival analysis in uenced by the immune cell in ltration in patients with LGG was also carried out on TIMER and the p-value was set at 0.05. Moreover, the Cox proportional hazard model for LGG patients was also constructed and calculated by TIMER.

Results
Over-expression of RFCs family was found in Brain and Central Nervous System (CNS) cancer patients.
At the beginning of our study, we looked into the different mRNA expressions of RFCs between Brain and CNS cancer tissues and normal tissues via ONCOMINE database. A comprehensive view of RFCs expression in different types of cancers was shown in Fig The mRNA expression of RFCs was elevated in low-grade glioma (LGG) patients.
To further explore the expression of RFCs in low-grade glioma (LGG) and normal tissues, we conducted an analysis via GEPIA and GTEx databases, which were entirely different bases from Oncomine. From the expression pro le showed in Fig. 2, RFC1 and RFC3 showed signi cantly higher expression in LGG than normal tissues (p < 0.05). Although RFC2, RFC4, RFC5 got higher average expression in LGG tissues, the differences were lack of statistical signi cance (p > 0.05).
Then, the protein expression patterns of RFCs were analyzed on the Human Protein Atlas database. As the immunohistochemical photos displayed in Fig. 3, higher expression of RFC2, RFC4, RFC5 were detected in brain glioma than in normal brain tissues, while RFC1 and RFC3 were both considered high expression in brain glioma tissues and normal brain tissues.
In conclusion, the mRNA expression and protein expression of certain RFCs were enhanced in LGG patients.
Correlation between expression pro les of RFCs and the pathological grades in low-grade glioma.
After demonstrated the enhanced expression of RFCs in LGG, we wondered if there was a correlation between RFCs expression and higher pathological grade in LGG. To seek for the answer, we analyzed the data on Oncomine and GEPIA. As data presented in Fig. 4A, RFC1, RFC2 and RFC3 showed a signi cantly higher expressed in grade 3 LGG patients than in patients with grade 2 LGG, while the results in RFC4 and RFC5 were lack of statistical signi cance. Moreover, in Sun Brain dataset (43), higher mRNA transcriptions of RFC2, RFC3, RFC4 and RFC5 were detected in patients with grade3 or grade4 tumors than in grade2 tumors (Fig. 4B). In short, the expression pro les of RFCs showed a signi cant correlation with the pathological grades of LGG patients.
The prognostic value of RFCs expression in LGG patients.
To nd out if higher RFCs expression were signi cantly correlated to the shorter overall survival (OS) and disease-free survival (DFS), we analyzed the expression pro les and the clinical data of LGG patients in GEPIA. As shown in Fig. 5, the cutoff of high expression group was 75% while the cutoff of low expression group was set for 25%. Surprisingly, except for the correlation between DFS and the expression of RFC4 was lack of statistical signi cance (p > 0.05), other expression of RFCs showed negative association with OS and DFS in patients with LGG. In other words, all 5 RFCs were associated with poorer prognosis. It was clear that higher expression of RFCs was correlated with shorter OS and DFS.
Prognostic signi cance of RFCs genetic mutations and the correlation between the expression of RFCs in LGG patients.
Next, we assessed the mutations of RFCs in LGG patients by the online tool cBioPortal for low-grade glioma based on TCGA. As Fig. 6A and 6B showed, RFCs were altered in 41.79% of patients with Anaplastic Astrocytoma, 32.26% of patients with Astrocytoma, 16.22% of patients with Oligoastrocytoma, 15.56% of patients with Anaplastic Oligoastrocytoma and in 9.09% of patients with Oligodendroglioma.
Next, we analyzed the RFCs mutation with the patient's prognosis in LGG. As in Fig. 6C, we discovered an evident correlation between the higher mutations of RFCs and the lower overall survival in patients with LGG (p = 0.0241) while the correlation between the mutations of RFCs and the disease-free survival was lack of statistical signi cance. Then we evaluated the correlation between RFCs one to one by GEPIA, the result showed signi cantly positive association between RFC1 and RFC2, RFC1 and RFC3, RFC1 and RFC4, RFC1 and RFC5, RFC2 and RFC3, RFC2 and RFC4, RFC2 and RFC5, RFC3 and RFC4, RFC3 and RFC5, RFC4 and RFC5 (P < 0.05). The correlation rates were shown in Fig. 6D.
Predicted functions and pathways of the mutations in RFCs and their 50 frequently altered neighbor genes in LGG patients.
In order to nd out the correlation and functions of RFCs, we rst analyzed the protein-protein interaction (PPI) network of different expressed RFCs on STRING (Fig. 7A). Then, we found out the 50 neighbor genes that were associated with RFCs mutations most frequently by GEPIA2 and constructed the PPI network with RFCs and their 50 neighbor genes with STRING and Cytoscape (Fig. 7B). Next, we enriched RFCs and their 50 frequently altered genes in GO and KEGG by R packages "clusterPro ler", "GOplot" and "pathview". As shown in Kinase targets, miRNA targets and Transcription factor targets of RFCs in patients with LGG.
In order to explore more molecules and functions correlated to RFCs, we predicted the kinase targets, miRNA targets and Transcription factor targets through LinkedOmics database. The results were shown in table 2. ATM and CDK1 were the top two kinases related to RFC2, ATR and PLK1 showed association with RFC2, PLK1 and CDK1 were two kinase targets in uenced by RFC3, PLK1 and ATR were evidently correlated to RFC4, ATM and ATR were top two kinases targeted by RFC5. Speaking of miRNA targets, only RFC1 had more than one closely correlated miRNA targets and MIR-144, MIR-381 were the top two. Moreover, only MIR-144 was signi cantly associated with RFC5, while other miRNA targets predicted were lack of statistical signi cance. The enrichment of transcription factor targets of RFCs were mainly related to E2F families, including V$E2F_Q4, V$E2F_Q6, V$E2F_Q3, V$E2F1_Q6, V$E2F4DP1_01.
The relation of Immune Cell In ltration and the RFCs gene family in patients with LGG.
To assessed the RFCs related immune cell in ltration, we analyzed the data by TIMER, an online tool based on TCGA. The results were shown in Fig. 9A. RFC1

Discussion
RFCs are signi cant molecules involved in several physiological cell processes including DNA replication, DNA damage and mismatches repair, cell cycle checkpoint pathway and so on. However, the abnormal expression of RFCs will break the order and triggered oncological changes in human cells. The oncogenic role of RFCs has been reported in many cancer types. In brain and CNS cancers, some RFCs have been reported as remarkable overexpression genes in glioma and glioblastoma. For example, Ho et al. demonstrated that patients with higher grade of glioblastoma tend to express higher RFC2 and result in the resistance to temozolomide, which could be suppressed by miR-4749-5p (27). Peng et al. reported that overexpression of RFC5 could lead to a temozolomide resistance in human glioma cells (47). Certain bioinformatic analysis digging in the differentially expressed gene in glioblastoma tissues and normal tissues have illustrated that the expression of some RFCs was stimulated in cancer tissues rather than normal tissues (28,29,30). However, the speci c function research for RFCs in brain and CNS cancers are still insu cient. In our study, we looked into the expression pro le of RFCs in low-grade glioma patients and explored the clinical signi cance of RFCs in the diagnosis and prognosis in LGG by in-depth and comprehensive bioinformatic analysis.
Among the RFCs, RFC1 located within the human chromosomal segments 4p13-p14, containing the DNAbinding area and participating in the mismatches repair of DNA damage (48). Although Moggs et al.(18) have reported that genes related to cell proliferation such as RFC1 were downregulated by estrogen E2 and inhibit the growth of MDA-MB-231 breast cancer cells, the studies about the relationship between RFC1 and cancers were inadequate, not to mention the research on RFC1 and LGG. In our study, by analyzing the ONCOMINE datasets and GEPIA database, we discovered that the expression of RFC1 was enhanced in LGG tumor tissues rather than the normal tissues. Moreover, the expression level of RFC1 was related to the tumor grade in ONCOMINE datasets. The survival analysis of RFC1 in patients with LGG showed that higher expression of RFC1 was associated with poorer OS and PFS. What's more, the expression of RFC1 was positively related to certain immune cells in ltration, altering the immune status in LGG patients. ATM and CDK1 were the top two kinases targets in uenced by RFC1 in LGG, they had been reported to induce the resistance to radiation therapy and stimulate the progression of gliomas (49,50,51). Certain miRNA targets and transcription factor targets were also predicted in order to provide additional information for further con rmation research on RFC1 and LGG.
RFC2, being the subunits of RFCs, located within the human chromosomal segments 7q11.23, plays a part in DNA replication repair and cell cycle checkpoint pathway (14,48). There were many studies about RFC2 and certain cancers, most of which stated that RFC2 was obviously upregulated in cancer tissues (19,28,52,53,54). Ho et al (27). demonstrated that impaired the activation of RFC2 signaling could enhance the temozolomide cytotoxicity to glioblastoma. In our study, we recon rmed that the expression of RFC2 was stimulated in brain tumor tissues by Human Protein Atlas database and ONCOMINE datasets. Data of LGG patients from TCGA also revealed that the expression of RFC2 was remarkably related to the grade of LGG. Analyzed by GEPIA, RFC2 expression levels also in uenced the clinical outcomes in LGG patients by cut down their OS and DFS. The immune analysis also showed that RFC2 was related the immune cells in ltration, exerting immune effects on LGG patients. ATR and PLK1 were the predicted kinase targets of RFC2 in LGG patients. ATR can activate the checkpoint of G2/M cell cycle thus ATR inhibitors have been developed as antitumor agents (55,56). Shi et al. proved that the combination of Temozolomide and siPLK1 enhanced the anti-tumor effect of Temozolomide in glioma patients. E2Fs, the predicted transcription targets of RFC2 might had a synergistic effect with RFC2 in LGG patients, which need further con rmation.
RFC3 is located within the human chromosomal segments 13q12.3-q13, acting as a media to combine RFC1 and RFCs complex (RFC1, RFC2, RFC5) (17,48). RFC3 had been reported high expression in certain types of cancer tissues (20,21,22). Gong et al.(57) demonstrated that RFC3 could induce epithelialmesenchymal transition in lung adenocarcinoma cells through the Wnt/βcatenin pathway. However, the research of RFC3 in glioma area need to be developed. In our study, we found that the expression of RFC3 was higher in LGG tissues by the analysis of GEPIA database. The correlation between RFC3 and the LGG grade was clari ed. An enhanced expression of RFC3 was also associated with shorter OS and DFS, resulting in a poorer prognosis of LGG patients. RFC3 had positive correlation with several in ltrated immune cells except for CD4 + T cells. PLK1 and CDK1 were predicted as the kinase targets of RFC3 while E2Fs were predicted as transcriptional factor targets for RFC3 in LGG. Chae et al.(58) have demonstrated an indirect relation between E2F and RFC3 to stimulate the progression in the KG-1 acute myeloid leukemia cell line, however, the direct interaction between E2F and RFC3 in LGG development needs further experimental evidence.
RFC4 plays an important role in DNA damage checkpoint pathway and has been reported to enhance the anti-tumor effect of chemotherapies based on DNA-damaging (15,16). The location of RFC4 on human chromosomal segments is 3q27 (48). RFC4 was widely reported to be an overexpressed gene in several cancer types and induced tumor progression (23,24,59). In brain cancer area, Tang et al. reported the increased expression of RFC4 in human glioblastoma tissues. However, other reports about RFC4 and brain cancers were blank. In our bioinformatic analysis, samples in ONCOMINE datasets and Human Protein Atlas revealed an enhanced expression of RFC4 in glioma tissues. Moreover, in both TCGA datasets and ONCOMINE datasets, RFC4 expression was obviously associated with the tumor grade in LGG patients. The OS of LGG patients was also shorten when RFC4 was upregulated. The correlation between RFC4 and immune cell in ltration was not as strong as other RFCs. PLK1 and ATR were predicted as the kinase targets of RFC4, which were supposed to induced the progression of LGG. The E2F family were also predicted to correlated RFC4. However, the prediction by bioinformatic analysis must be veri ed by reliable experiments.
RFC5 is located within the human chromosomal segments 12q24.2-q24.3 and plays a very important part in opening the PCNA clamp in cell proliferation process (11,48). The oncogenic function had been reported in other cancer types such as oropharyngeal squamous cell carcinomas, hepatocellular carcinoma, lung cancer and so on. (25,60,61). Moreover, Yang et al. reported that the stimulated expression of RFC5 was found in glioblastoma patient tissues. Other studies about RFC5 and LGG were rare. In our analysis of ONCOMINE database and Human Protein Atlas database, RFC5 was highly expressed in glioma and other brain and CNS cancers. In ONCOMINE datasets, RFC5 was also related to brain tumor grades. The survival analysis conducted by GEPIA showed that the enhanced transcription of RFC5 was associated with shorter OS and PFS in LGG patients. It means RFC5 would be a signi cant biomarker in the diagnosis and prognosis of LGG patients. Like RFC1 and RFC2, RFC5 was strongly associated with the in ltration of different immune cells in LGG, modulating the immune environment of LGG. ATM and ATR were the predicted downstream kinase targets of RFC5, which needed furthermore evidence.
There were still some limitations in our research. Due to the limit of bioinformatic analysis, we need larger sample sizes of LGG patients to recon rm our conclusion. Moreover, the correlation between RFCs and other molecules should be veri ed by more experimental evidence. However, we believed that our study about RFCs and LGG could provide hints for researchers in the future.
In conclusion, our study conducted a comprehensive bioinformatic analysis about RFCs and LGG patients. The expression of RFCs was enhanced in brain and CNS cancers while RFC1 and RFC3 were signi cantly stimulated in LGG tissues. Moreover, the RFCs were also associated with tumor grade. All the RFCs showed clinical signi cance in diagnosis and prognosis of LGG, which suggested RFCs being new biomarkers in LGG. Availability of data and materials: The datasets generated and/or analysed during the current study are available in the public databases mentioned in Methods.

Abbreviations
Competing interests: The authors declare that they have no competing interests.    Figure 1 Transcription expression of RFCs in different types of cancers. The statistical analyzing method was Students' t-test. Cut-off of p value was 0.05, fold change was 1.5, gene rank was 10%, data type was mRNA.   The prognostic value of mRNA expression of RFCs in low-grade glioma. Higher levels of RFCs were signi cantly associated with shorter OS of low-grade glioma patients. Higher levels of RFC1/2/3/5 were signi cantly associated with shorter DFS of low-grade glioma patients. The p value was set at 0.05.