Integrative bioinformatics analysis for identication of hub genes and pathways responsible for early pathogenesis of retinoblastoma

Background Retinoblastoma (Rb) is the most common childhood malignancy in which intra-ocular tumors developed at a very young age. It is very crucial to detect Rb at an earlier stage and start appropriate therapy to prevent further metastasis. With the recent advancement of multi-omics analysis of microarray data and sophisticated bioinformatics tools, we could possibly identify potential early diagnostic biomarkers and novel therapeutic targets for retinoblastoma. Methods The PPIs co-expression network of DEGs were created by using STRING ( Search Tools For Retrieval of Interacting Genes/Proteins) database (https://string-db.org/). The database currently contains 24,584,628 proteins from 5090 organisms. The network with a condence score ≥ 0.7 was selected for PPI network construction, and the network was visualized and merged by using the Cytoscape (http://cytoscape.org/) software. In Cytoscape, each gene is represented as “node” and the strength of the co-expression relationship between genes is represented as “edge” and the number of interactions between the specic gene and other genes is represented as “degree”. Node with a degree > 10 was selected as hub nodes. There are several plugins in Cytoscape which are used to analyze network and nd the hub genes such as Network analyzer which is used for determining the topological parameters of the networks and cytoHubba plugins were used for identifying the hub genes and bottleneck genes. which ultimately improve the clinical outcomes. In this study, we identied hub and common candidate genes for retinoblastoma by integrative bioinformatics analysis these genes could be further served as potential therapeutic targets. We compared the gene expression proles of DNA methylation, miRNA, and mRNA for both healthy and tumor samples using individual microarray datasets available in NCBI-GEO. Overall 3 common key genes and 10 hub genes were identied for all the three datasets by comparing the DEGs for all the datasets taken into account.


Conclusion
This study suggested that these hub genes possibly will play vital roles in the onset and progression of retinoblastoma and could be serve as potential biomarkers to facilitate the diagnosis and treatment of this disease in the future. Background Retinoblastoma (Rb), a rare form of childhood malignancy, is the most common ocular malignancy found in children under age ve which causes a great impact on the overall life of the affected child. Both, heritable and sporadic forms of retinoblastoma were observed. Retinoblastoma usually initiates with the biallelic mutation in the retinoblastoma gene (RB1) leading to RB1 −/− , but this mutation alone does not result in retinoblastoma. The mutation initially forms benign retinoma and then further genetic or epigenetic modi cations are required for the development of retinoblastoma (Dimaras et al. 2015). Globally, the incidence of retinoblastoma is 1 per 16,000-18,000 live births with the prediction of about 8,000 cases per year. About 69% of all those cases are reported in middle-income countries, 20% in low-income, and 11% in high-income countries (Dimaras et al. 2015). There is a vast gap in the retinoblastoma treatment facilities among these countries. Consequently, the high-income countries have a patient survival rate of > 95% whereas it is only about 30% in low-income countries (Dimaras et al. 2015). The major reasons behind the poor outcome of retinoblastoma cases are the late diagnosis, recurrence, and metastasis, lack of availability of resources, refusal to perform enucleation.
The pRB protein (encoded by RB1 gene) is found to interact with atleast 110 cellular proteins (Morris and Dyson 2001) but the exact molecular mechanism by which it normally suppresses retinoblastoma is still elusive. Studies have shown that RB can act in E2F dependent as well as E2F independent manner (Dick et al. 2013). In E2F dependent manner, RB negatively regulates E2F transcription factor thus suppressing cell cycle progression, but RB closely related genes p107 and p130 also suppress E2F, though to a lower extent, and it not mutated in cancers thus contradicting the above perception to some extent and indicating the differences in their biological processes (Classon & Harlow, 2002;Dick et al., 2013). E2F independent action of RB is achieved through upregulation of p27 which results in suppression of cell cycle and apoptosis of cell, but the decrease in expression of p27 with an increase in expression of pRB during maturation of cone precursor cell, cut out any consideration of p27 mediated suppression of retinoblastoma (Dimaras et al. 2015). However, the pathway of p53 is directly linked with the RB pathway through the p14/Arf tumor suppressor gene (Mendoza and Grossniklaus 2015) and can be responsible for RB1 mediated retinoblastoma suppression. it is observed that large deletions including RB1 in children result in fewer tumors than that of common null mutation indicating the role of adjacent unknown deleted genes in survival of tumor cells (Dimaras et al. 2012 with each other to initiate and progress the disease. To understand the molecular mechanism of retinoblastoma and pRB action, it is essential to a nd out the key genes playing a critical role in retinoblastoma development, functional pathways and biological processes by systamitc analysis of multipal microarray data. The main purpose of this study is the identi cation of hub genes and molecular pathways involved in the onset and progression of retinoblastoma by employing potential integrative bioinformatics analysis.  Data Processing GEO2R online software (https://www.ncbi.nlm.nih.gov/geo/geo2r) is used to analyze the raw micro-array data for quality check and identify the differentially expressed genes (DEGs). A quality check of raw reads is necessary before going to the expression analysis and network construction. In GEO2R, the value distribution tab is used to assess the distributions of values across Samples (median-centered) which generally indicate that the data are normalized and each dataset was separately normalized by median normalization and the quality of the data accessed through boxplot. A brief outline of the work ow representing the datasets and methods used to conduct this study has been given in Fig. 1.

Identi cation Of Differentially Expressed Genes
GEO2R is a web tool that allows differentiation among groups of samples which are differentially expressed across experimental conditions. The genes which are differentially expressed between retinoblastoma samples and control samples were identi ed by the LIMMA package (Linear Models for Microarray data) using GEO2R. An adjusted pvalue (p.adj ) ≤ 0.05 and logarithmic fold change (log 2 FC) ≥ 2 were used as the threshold to nd DEGs and control the statistical signi cance of the expressed genes.

Pathway Enrichment Analysis And Gene Ontology Of DEGs
Gene ontology is used for an integrated research annotation and visualization which allows a complete set of annotation methods understanding the cellular component, molecular function, and biological process concept behind these genes. In this study PANTHER database (Protein Analysis Through Evolutionary Relationships) (http://www.pantherdb.org/tools/)was used to perform functional annotation and ShinyGO v0.61 (http://bioinformatics.sdstate.edu/go/) for KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis of differentially expressed genes.

PPI Network Construction And Hub Genes Identi cation
The PPIs co-expression network of DEGs were created by using STRING ( Search Tools For Retrieval of Interacting Genes/Proteins) database (https://string-db.org/). The database currently contains 24,584,628 proteins from 5090 organisms. The network with a con dence score ≥ 0.7 was selected for PPI network construction, and the network was visualized and merged by using the Cytoscape (http://cytoscape.org/) software. In Cytoscape, each gene is represented as "node" and the strength of the co-expression relationship between genes is represented as "edge" and the number of interactions between the speci c gene and other genes is represented as "degree". Node with a degree > 10 was selected as hub nodes. There are several plugins in Cytoscape which are used to analyze network and nd the hub genes such as Network analyzer which is used for determining the topological parameters of the networks and cytoHubba plugins were used for identifying the hub genes and bottleneck genes.

Mining And Analysis Of Module In PPIs Network
MCODE (Molecular Complex Detection) plugin in Cytoscape was used to extract and analyze modules/clusters from the merge network. The signi cant cluster with MCODE score ≥ 6 and node density cutoff = 1, node score cutoff = 0.2, k-core = 2 used as statistical signi cance.
Validation Of Core Genes By TCGA GEPIA (http://gepia.cancer-pku.cn/index.html) (Gene Expression Pro ling Interactive Analysis) is a web server for expression pro ling of the candidate genes in retinoblastoma and their respective control sample. A statistical signi cance threshold log 2 FC ≥ 2, P-value ≤ 0.05, and FDR < 1% is used for further validation. The survival analysis was executed to calculate the expression levels of hub genes and their survival time of patients with retinoblastoma. The hazard ratio and the P-value were calculated using log-rank tests. GEPIA provides tumor and normal differential expression analysis by using RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA.

Identi cation of differentially expressed genes (DEGs)
We identi ed DEGs between normal and tumor samples individually for each dataset by LIMMA package using GEO2R. Signi cantly differentially expressed genes were selected by qualifying the criteria P.value ≤ 0.05 and [log 2 fc] ≥ 2. We henceforth found a total of 267 Differentially Methylated Genes (DMGs) in GSE57362, 265 Differentially Expressed miRNAs (DE-miRNAs) targets in GSE7072 and 770 DEGs in GSE110811. The total numbers of upregulated and downregulated genes in each dataset are summarized in Table 1. All three datasets were normalized and shown in Fig. 2(A-C). Volcano plots are drawn for DEGs in each of the three datasets shown in Fig. 3(A-C). Our computational analysis unveiled that 3 common genes found in the upregulated-downregulated miRNA dataset, upregulateddownregulated mRNA dataset and upregulated-downregulated DNA methylation dataset. Although no common gene is present among the downregulated miRNA dataset, upregulated mRNA dataset and hypo-DNA methylated dataset.
We also drew Venn diagrams comprising genes of mRNA and miRNA (both upregulated and downregulated) datasets as well as mRNA and DNA methylation (both upregulated and downregulated) datasets but found no common genes in the overall aspect shown in Fig. 4(A-F).

Gene Ontology And Pathway Enrichment Analysis
The PANTHER and ShinyGO v0.61 were utilized to unwrap the biological activities and key regulatory pathways associated with the DEGs in DNA methylation (267 genes), miRNA (265 target genes) and mRNA (770 genes). Signi cantly enriched KEGG pathways and GO terms were identi ed based on P.value ≤ 0.05. The top 20 signi cant biological processes, molecular functions, and cellular components associated with the DEGs in the three datasets are shown in Fig. 5 (A-I). Table 2 shows hub genes involved in the top 17 signi cant molecular pathways in DNA methylation, miRNA and mRNA datasets of Retinoblastoma. The result showed that the p53 pathway, cell cycle and apoptosis pathway are the common pathways.

Co-expression Network Construction
A single gene does not always contribute to disease progression by simply altering its' expression level rather multiple genes interact with each other in a complex manner to cause the progression of any disease. Expression of a particular gene is the main criterion for the identi cation of DEGs and the interaction between multiple genes is not usually taken into consideration for that (Singh and Som 2019). The co-expression association between several genes is widely studied by constructing gene co-expression networks. We have found that the DNA methylation coexpression network has 63 nodes and 83 interactions, the miRNA co-expression network has 129 nodes and 424 interactions and mRNA co-expression network has 243 nodes and 471 interactions as described in Table 3 and shown in Fig. 6 (A-C). After constructing PPI networks on Cytoscape, we found 5 hub genes and 3 bottleneck genes in each of the three datasets as shown in Table 4. All three datasets have some correlated gene pairs that were represented by an intersection network formed by combining all the three individual networks (Fig. 7). From this study, we further stated that individual genes were differentially expressed rather than being co-expressed. Thus we found that the progression of retinoblastoma might be caused by two factors 1) coordinated expression of the genes and 2) the interactions among the genes.
After that, we used the MCODE plugin in Cytoscape which is used as MCODE algorithm to mine and analyze modules/clusters from the co-expression network. Firstly we set the parameter of MCODE plugin such as node density cut off 1, node score cut off to 0.2, k-core 2 and the maximum depth to 100. Three modules/clusters were identi ed and shown in Fig. 8. Further analysis of module we found that 9 hub genes in cluster 1 (MCODE Score-10.607) (CDK1, CDK2, CCNB1, CUL1, RB1, TP53, HDAC1, JUN, MAPK1) and 2 common genes (E2F3, ESR1). No hub genes were present in clusters 2 and 3. A total of 6 bottleneck genes (genes with high betweenness centrality) were also found in the co-expression network (shown in Table 5). The expression level of the 12 critical genes obtained from DNA methylation, miRNA, and mRNA datasets of both normal and tumor samples is checked by using the GEPIA tool. After performing the whole analysis, we observed that there is signi cant differential expression of these genes in malignant conditions in comparison to normal conditions by considering the level of signi cance of P-value ≤ 0.05, log 2 FC > 2 and FDR < 1% as shown in Fig. 9. We also performed stage and grade boxplots to validate the 10 hub genes and 3 common genes based on the three datasets (DNA methylation-GSE57362, miRNA-GSE7072, & mRNA-GSE110811) [ Fig. 10 (A-M)].
Survival Analysis: To validate the hub genes of Retinoblastoma, a total of 512 cases of LGG (Lower Grade Glioma) patients, 166 cases of GBM (Glioblastoma Multiforme) patients, and 1092 cases of BRCA patients were registered in this learning as unavailability of Rb samples. The patients were grouped into two categories: high expression groups and low expression groups on the basis of gene expression pro le, and expression levels higher than the median. Overall survival analysis was determined on the basis of gene expression. Later on, survival analysis was executed to ascertain the correlation between gene expression level and the overall survival of the patient. We further analyzed the data to validate our ndings in the GEO group. Kaplan-Meier and long-rank test analysis revealed that six hub genes were signi cant in the survival analysis from 10 hub genes and 3 common genes, which were linked with patient prognosis (Fig.). To summarize, high expression of CCNB1, CDK1, TP53, RB1, JUN and low expression of HDAC1 was shown to predict a worse prognosis in patients with retinoblastoma. Survival analysis graphs for each gene were shown in Fig. 11 (A-M).

Discussion
Retinoblastoma (Rb) is the most common ocular pediatric malignancy caused by the inactivation of the RB genes. The early diagnosis of Rb will provide the possibility to control the primary tumor with effective therapies which will ultimately improve the clinical outcomes. In this study, we identi ed hub and common candidate genes for retinoblastoma by integrative bioinformatics analysis and these genes could be further served as potential therapeutic targets. We compared the gene expression pro les of DNA methylation, miRNA, and mRNA for both healthy and tumor samples using individual microarray datasets available in NCBI-GEO. Overall 3 common key genes and 10 hub genes were identi ed for all the three datasets by comparing the DEGs for all the datasets taken into account.
In the present study, a total no. of 10 hub genes was found after merging the co-expression networks. In this intersection network, connections were established among these 10 hub genes shown to be over/under-expressed.
This represented a conserved co-expression association that was established across the three datasets. The 10 hub genes are CDK1, CDK2, CCNB1, CUL1, RB1, TP53, HDAC1, JUN, PIK3CA and MAPK1 (shown in Table 5). Further, we have also identi ed 3 potential common genes (E2F3, ESR1 and UNC5D) (also shown in Table 5). We have further validated these key genes involved in retinoblastoma by GEPIA tool and survival analysis and found six potential candidate genes(CCNB1, CDK1, TP53, RB1, JUN and HDAC1).
The previous studies were found that those 10 real hub genes were involved in the process of cell proliferation and tumorigenesis. CDK1 gene encodes a protein that belongs to the Ser/Thr protein kinase family and serves as a catalytic subunit of M-phase promoting factor (MPF) which plays a major role in both G1/S and G2/M phase transitions of the eukaryotic cell cycle. CDK1 was identi ed as a hub gene responsible for the prognosis and progression of bladder cancer (Yan et al. 2019). The protein encoded by the CDK2 gene is also a member of a serine/threonine-protein kinase family that participates in the regulation of cell cycle and activity of this protein is critically essential during the G1 to S phase transition of the eukaryotic cell cycle (Stelzer et al. 2016). The cyclin E-Cdk2 may have the ability to block the function of the C-terminal region of pRb expressed in trans through a mechanism different from that of cyclin D-Cdk4/6 which does not depend on C-terminal phosphorylation of pRb (Harbour et al. 1999).
CCNB1 encodes a protein that is required for the proper regulation of the G2/M transition phase of the cell cycle (Stelzer et al. 2016). This protein forms complexes with p34 (cdc2) to produce the MPF. The CCNB1 gene was shown to be upregulated in gastric cancer (GC) ( In this study we are reporting three common genes were found among DNA methylation, miRNA and mRNA dataset. E2F3 gene encodes a transcription factor speci cally binding to RB1 and it occurs in a cell-cycle dependent manner. Progression of cell-cycle from G1 to S phase is controlled by the proper functioning of the DRTF1/E2F complex (Stelzer et al. 2016). According to our study, E2F3 might be playing a crucial role in the progression of retinoblastoma and can be used as a potential therapeutic target. E2F3 along with KIF14 may represent important pro-oncogenes that have been seen to be dysregulated in most Rb tumors (Madhavan et al. 2009). Another study revealed that UNC5D in co-operation with E2F3 and p53 is involved in the apoptosis caused due to de ciency in Nerve Growth Factor (NGF) in netrin-1 de cient human neuroblastoma cells (Zhu et al. 2013).
The most of the available studies were reports conculation of either methylation or gene expression or miRNA microarray data. Therefore we attempted to report the more accurate and reliable retinoblastoma candidate genes by complying multiple datasets. The existing evidence strongly supports that the identi ed candidate and hub genes in this study were involved in numerous ways to the onset and progression of various cancers. Although, there are some limitations in this study that we conducted so far. Considering the data, we could conclude that all these 10 hub genes were signi cantly differentially expressed in retinoblastoma but we further need to conduct more molecular biological experiments using preclinical models to validate the roles of these genes in the onset and progression of retinoblastoma.

Conclusion
This integrative bioinformatics approach has made it possible for us to identify 10 hub genes in three GEO datasets i.e. DNA methylation, miRNA, and mRNA whose co-expression might cause tumor formation and metastasis in children affected with Retinoblastoma. The biological functions and molecular pathways of these 10 predicted hub genes (CDK1, CDK2, CCNB1, CUL1, RB1, TP53, HDAC1, PIK3CA, JUN, and MAPK1) which have been shown to be differentially expressed in Rb and detailed and further studies need to be conducted to validate their roles in Rb progression as well as diagnosis. These genes are known to regulate the functions of several other downstream genes and in the near future exploring these elements could help the researchers to interpret the possible reasons for dysregulation of these hub genes in this cancer. These genes could also be served as potential therapeutic markers to develop promising medications to treat retinoblastoma in the forthcoming years. Declarations