DEGs and Biological Process Pro ling to screen the novel biomarkers associated with both Uterine Leiomyomas and Uterine leiomyosarcomas


 Background

Uterine Leiomyomas (ULM) or Uterine fibroid are benign lesion of unspecified aetiology and still there is dearth of prognostic biomarkers for diagnosis. The aim of this present study is to explore the novel biomarkers to be associated with Uterine Leiomyomas (ULM) and Uterine leiomyosarcomas (ULMS) that were responsible for their pathogenicity.
Methods

The microarray dataset (GSEID:GSE64763) was retrieved from the Gene Expression Omnibus database. Data preprocessing and differential gene expression analysis was performed. Principal Component Analysis (PCA) plot and heat map for ULM and ULMS were constructed for respective differentially expressed genes. The DEGs were further intersected to find the common DEGs in ULM and ULMS. Based upon STRING v 10.5, protein- protein interaction network was constructed. Further, Gene Ontology (GO) and KEGG pathway enrichment analysis were also performed to dissect out possible function and pathways.
Results

A total of 50 significant DEGs for ULM while 321 DEGs for ULMS have been identified with their official gene symbol. Between ULM and ULMS, total 14 common DEGs were identified of which 8 were up-regulated while 6 were down-regulated. Comparison of DEGs list with annotated gene list obtained from OMIM and Gene Cards, lead to identification of only 3 known disease genes (RAD51B, ESR1 and PDGFRA) while SHOX2, TNN and COL11A1 genes were found to be novel biomarkers in ULM and ULMS both. Gene ontology and KEGG pathway enrichment analysis of common novel and known candidate genes led to the identification of several important processes and pathways like ECM receptor interactions and Focal adhesion.
Conclusions

SHOX2, TNN and COL11A1 are the novel biomarkers related to both ULM and ULMS disease and have been found to be associated with ECM receptor interactions and Focal adhesion like pathways and hence can serve as novel diagnostic as well as therapeutic targets.


Conclusions
SHOX2, TNN and COL11A1 are the novel biomarkers related to both ULM and ULMS disease and have been found to be associated with ECM receptor interactions and Focal adhesion like pathways and hence can serve as novel diagnostic as well as therapeutic targets.

Background
The uterus derived from paramesonephric organogenesis is an essential supportive organ for prenatal growth and development in Eutherians. Histologically, the uterus has an inner mucosal layer(Endometrium) and outer muscularis layer (Myometrium) [1]. Myometrium composed of highly vascularized smooth muscles cells which helps in inducing contraction during childbirth [2]. Uterine Leiomyomas (ULM)or Uterine broid are benign lesion of unspeci ed aetiology which mainly arise from Myometrium [3].These lesions composed of smooth muscles including the extracellular matrices are commonly found in pelvic area of those women bearing their reproductive ages [4].Uterine leiomyosarcomas (ULMS) a rare malignant tumor known for hematogenous transmission leading to recurrence at both native and distant areas of uterine smooth muscles [5].
According to NIH India, and different case studies which revealed that 25% of Indian women found to be suffering with ULM [6]. Nearly 25% of cases were found to be carriers of different symptoms which includes excessive bleeding, pain in pelvic region, pregnancy related complications, menstrual cramps [7].
Even ULMS cases were found to be 25-36% near 50-60 years of age period [8]. Various predisposing factors like obesity, stress, smoking, age, race which is highly prevalent in African-American women, hormonal (like estrogen) imbalance are associated with leiomyoma occurrence. Even some genetic factors too are found to be associated with the diseases [9].
Till date, these tumors have been found to be resistant to various chemotherapeutic agents and still adjuvant therapy does not hold a promising role in treatment of these tumors [10]. So, in attempts of knowing pathobiology of these lesions comparison study was done with normal myometrium. Also, very few genes were found to be associated with Uterine Fibroid and ULMS that were responsible for their pathogenicity of the diseases. So, to search more candidate genes and to disclose their mechanism at molecular level inclusive in silico approach using different bioinformatics softwares were applied. The purpose of the present study is to screen potent biomarkers of ULM and ULMS diseases. The present study includes the gene expression dataset (ID:GSE64763) analysis to identify differential gene expressions (DEGs).The construction of PPI(protein-protein interaction) networks were performed based upon the combined score. Enrichment and functional analysis of DEGs was also performed.

Methods
Retrieval of Microarray gene expression pro le.
Using NCBI Gene expression Omnibus (GEO) datasets (https://www.ncbi.nlm.nih.gov/geo/) [11] the raw gene expression pro le (ID:GSE64763) [12] dataset was retrieved. The sample dataset were obtained for ULMS,ULM and NL tissue specimens. In this dataset RNA were hybridized to HG-U133A_2] Affymatrix Human Genome U133A2.0Array at GPL571 platform. Different bioinformatics tools were used for the study of differential genes expressions in ULMS, NM and ULM samples.
Preprocessing dataset and screening DEGs.
The preprocessing of retrieved raw datasets were performed. In this the values of gene expression of probes related to speci c genes were averaged and then using BiGGEsTs software [13] selection of up regulated and down regulated genes were made. GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) tool was used for converting the probe level symbols into gene level symbols. The selected DEGs have < 0.05 adjusted p values and threshold logFC values > 0.1 for up regulated and <-0.1 for down regulated genes.

Generation of Principal component analysis and heatmap plots
Using the online tool ClustVis [14], heatmap and Principal component analysis (PCA) plot was generated for DEGs. This tool can support upto maximum 2MB of le size thus it was impossible to generate PCA plot for total gene expression dataset.

PPI network and subnetwork construction
To predict functional interactions among proteins, an online tool STRING v 10.5 [15] (https://www.stringdb.org/) is used. This online tool provides combined scores between gene pairs for protein-protein interactions. For present study, the DEGs which were identi ed were uploaded to this online database and combined score > 0.4 was set as the parameter for analysis. Then Cytoscape v 3.2.1(http://www.cytoscape.org) [16], an in silico software package was used for different network and sub networking creation. Degree and edge betweenness criteria were employed for constructing networks.
DEGs functional analysis DAVID (Database for Annotation, Visualisation And Integrated Discovery) software [17] (https://david.abcc.ncifcrf.gov) integrates an extensive set of functional annotation of large sets of genes record. Gene Ontology (GO) enrichment analysis involves molecular function (MF), cellular component (CC) and biological process (BP) which by using DAVID v 6.8 and STRING v 10.5 tools were performed. Depending upon the hypergeometric distribution, DAVID uses a whole set of genes based upon the similar or closely associated functions.

Selection of DEGs between for ULM and ULMS
Microarray data of ULM, ULMS and control specimens were normalized ( Fig. 1) using GEO2R. A total of 50 signi cant DEGs for ULM while 321 DEGs for ULMS have been identi ed with their o cial gene symbol. In ULM, out of total DEGs, 29 were up-regulated and 21 were down-regulated while in ULMS, 154 were up-regulated and 167 were down-regulated ( Fig. 2A and 2C) (Supplementary le 1). Among total DEGs, 8 up-regulated DEGs while 6 down-regulated DEGs were found to be common between ULM and ULMS (Fig. 2C). The p-value < 0.05 and | log 2 FC > 0.1 were used as selection criteria. On the basis of average gene expression value DEGs were selected. Further, 2 DEGs in ULMS and 1 DEGs in ULM were found to be common in OMIM and Gene Cards.

Principal component and hierarchical clustering analysis of DEGs
Principal Component Analysis for ULM and ULMS reveals a scatter plot showing total variance of 50.6% and 44.9% corresponding to the principal component 1 (x-axis) while 7.3% and 7.4% corresponding to principal component 2 (y-axis) respectively ( Fig. 3A and 3B). Heat-map shows a data matrix where coloring gives an overview of the numeric differences. Two separate heat map for ULM and ULMS were constructed for respective differentially expressed genes ( Fig. 4A and 4B).

The Protein-Protein Interaction Network
For protein-protein interaction network, all DEGs with combined score > 0.4 (283 gene pairs out of 371 DEGs) was used which yielded one main network having 266 nodes and 883 edges (Fig. 5) while a separate network of DEGs with combined score > 0.9 was extracted separately (Fig. 6). A total of 110 DEGs with a combined score > 0.9 were included in network (red node-for up regulated and blue node-for down-regulated) (Fig. 6).

Known Disease Genes and candidate genes to ULM and ULMS
Comparison of DEGs related to ULM and ULMS reveals 14 common DEGs of which 8 were up-regulated while 6 were down-regulated. However, out of these common DEGs, only 10 DEGs were found to have combined score > 0.4 and hence included in interaction network (Fig. 5). Furthermore, we were interested to know the genes which have already been validated. For this, we compared our DEGs list with annotated gene list for obtained from OMIM and Gene Cards (Fig. 7A)which lead to identi cation of only 3 known disease genes (2 for ULMS and 1 for ULM; represented as a red and blue triangle in the network, Fig. 6). Direct neighbours of these three known genes were considered as candidate genes related to ULM and ULMS; this yields a total of 4 candidate genes which are shown in Fig. 6. All the common as well as known candidate genes related to ULM and ULMS both (

Functional enrichment analysis
Gene ontology enrichment analysis for DEGs of ULM and ULMS was performed and signi cantly enriched functions, processes, and cellular components ( p-value < 0.05) were listed in Table 1 (For ULM DEGs) and Table 2 (For ULMS DEGs). Major signi cant (p-value < 0.05) processes enriched for ULM were regulation of cell death, regulation of apoptosis, cell-cell adhesion and cell morphogenesis (Fig. 8A) while extracellular matrix organization, response to steroid hormone stimulus, regulation of cell proliferation, blood-vessel morphogenesis, cell motility and cell cycle phase were signi cant processes (p-value < 0.05) for ULMS (Fig. 8B).    Co-enrichment analysis of common and known candidate genes related both to ULM and ULMS led to the identi cation of several important processes. A separate biological processes network was created for those genes (Fig. 9). UP-regulated genes like KIF5C, ZNF365, EPYC, COL11A1, SHOX2, MMP13, TNN, RNF128, RAD51B were found to be involved in the regulation of cell proliferation, cell adhesion, response to estrogen stimulus (Fig. 8A). Major processes regulated by down-regulated genes (GATA2, GPM6A, ESR1 and PDGFR1A) were regulation of transcription,cell morphogenesis and cell differention,cell projection,extracellular matrix organization. (Fig. 8B) (Fig. 10).

Discussion
Uterine Leiomyoma, or uterine broid (ULM), is a benign lesion which arises commonly in the muscular areas of the uterine wall [18]. Uterine leiomyosarcoma (ULMS) is a smooth muscles malignancy that arises in the smooth muscles areas of the uterus [19]. Approximately, among every 1000 women having broid, one to ve women were found with ULMS too. The prevalence of their occurrence is increasing and till date no effective treatments were found [20]. ZNF365 gene was found to be involved in maintenance of stable genome, repairing damaged DNA.
Moreover, ZNF365 also promotes recovery of stalled replication fork in order to provide genomic stability which were detected in both hereditary and sporadic cancer types [21]. Variations in ZNF365 gene may increase the risk of having breast cancer through affecting the dense tissues proportion in breast [22]. According to YJhang et al. ZNF365 loss leads to delay in progression of mitosis and this also results in exit due to stress in replication process which leads to increase in aneuploidy, centrosome reduplication and disruption of cytokinesis process [21]. However, this gene mechanism in the case of ULM and ULMS has yet not been identi ed and since, we speculate that ZNF365 may be closely related with DNA repairing and genome stability thus could be a potent target for ULM and ULMS treatment.
KIF5C gene encodes motor proteins that belong to the kinesin superfamily involved in eukaryotic cell motilities [23]. Tsibris et al., 2003 found the KIF5C gene to be one of the up-regulated genes in uterine broid [24]. SHOX2 gene is used to regulate transcription processes and its DNA methylation was found to be the biomarker of lung cancer [29].In breast cancer, S. Hong et al. investigated induction of EMT through SHOX2 overexpression [30]. B. Schmidt et al. identi ed that methylation of SHOX2 DNA was found as biomarker for lung cancer [31]. Fubiao Ye etal. investigated cell apoptosis and cell proliferation, extracellular matrix formation as major roles of SHOX2 on nucleus pulposus cells [32]. However, SHOX2 role in ULMS and ULM diseases was not found till now. Since in different carcinomas cases its involvement in cell apoptosis and cell proliferation, extracellular matrix formation like processes may provide a biomarker for the treatment of both ULMS and ULM also.
TNN gene encodes proteins involved in cell migration [33]. In tumors it stimulates angiogenesis of endothelial cells. It was also found to be one of the biomarkers for breast cancer [34].According to Leif E.Peterson et al., the TNN gene is involved in cell matrix adhesion in lung adenocarcinoma [35]. Baolin Liu et al investigated that cancer genes like TNN were found to be involved in extracellular matrix interactions like pathways [36]. However, TNN gene was not identi ed in ULM and ULMS like cases and since it helps in cell matrix adhesion, so it may serve as a promising target for these both cases.
MMP13 encodes protein produced from stromal broblast that are involved in degradation of different ECM components and induces angiogenesis by increasing protein levels of VEGF and VEGFR2 [37].According to Sunil K Halder et al. high expression of MMP13 in uterine leiomyoma pathogenesis was detected [38].Though it was not found in ULMS cases. And according to Guillaume E Courtoy et al. MMP13 encoded proteins were involved in apoptosis, cell proliferation in myoma [39]. And this may provide a potential lead for treatment ULMS also.
GPMA6 gene encodes protein involved in neuronal differentiation and development. These encoded proteins helps in neuronal stem cells migration [40]. GPMA6 gene was found to be novel target gene involved in proliferation, promoting tumor survival and development in thyroid carcinomas [41]. And these features of GPMA6 gene would help to provide novel candidate for both ULM and ULMS.
According to Xuhui Liu et al., the COL11A1 gene was identi ed as a marker for uterine broid via gene expression analysis. COL11A1 gene encoding proteins was found in focal adhesion and extracellular matrix receptor interactions which suggests to be involved as biomarkers in leiomyoma cases [42]. However, it was not found to be involved in leiomyosarcomas. However the features of focal adhesion and ECM receptor interactions of this gene may help to identify a potent marker for ULMS.
RNF128 (also called as Grail) is ubiquitin E3 ligase and plays a vital role in producing cytokines [43]. Yi-Ying Lee et al. suggested that RNF128 downregulation was involved in urothelial cancer [44]. Miika Mehine et al. investigated through integrated ULM dataset analysis that RNF128 to be one of the markers for ULM [45]. Though none of the studies revealed its connection with ULMS but being p53 interacting glycoprotein and under stress conditions becomes crucial for apoptosis induced by p53 may also help to identify a key biomarker for ULMS cases.
GATA2 was found to be involved in cell proliferation and cell cycle regulation.  [57]. Its overexpression may lead to uterine leiomyoma. Juhasz-Böss also reviewed PDGFRA correlation with ULMS. So, PDGFRA gene might provide a biomarker for treatment of both ULM and ULMS.
In this microarray analysis NM, ULM and ULMS tissues has been used which provides an integrated approach to study the synergistic effect of differential gene expression on several biological processes and pathways to reveal their mechanism at molecular level.

Conclusion
We conclude that RAD51B,ESR1 and PDGFRA genes were found to be common reported biomarkers both in ULM and ULMS treatment through participating into several pathways and also associated with ECM receptor interactions and Focal adhesion like pathways which was revealed through our studies. Additionally, SHOX2, TNN and COL11A1 might be the novel biomarkers related both with ULM and ULMS disease which were also found to be associated with ECM receptor interactions and Focal adhesion like pathways which were revealed through our ndings.   The lines in the box are coincident, indicating that these chips have been highly normalized.   (NM) and B. among Uterine Leiomyosarcomas (ULMS) and normal myometrium (NM) . The blue to orange gradation represents the gene expression values change from small to large. ClustVis tool was used to draw heat map.

Figure 5
Venn diagram showing the common as well as known Uterine broids. The red rectangle highlights the common genes between ULM and ULMS aas well as UF related candidate genes. Venny tool v 2.1.0 was used to draw the venn diagram.