Comprehensive analysis of pseudogene LDHAP5 expression level and its potential pathogenesis in ovarian serous cystadenocarcinoma CURRENT STATUS: REVIEW

Background We aim to find out differentially expressed pseudogenes and explore their potential functions in four types of common gynecologic malignancies (cervical squamous cell carcinoma, ovarian serous cystadenocarcinoma, uterine corpus endometrial carcinoma and uterine carcinosarcoma) using bioinformatic technology. Materials & methods: We identify up-regulated or down-regulated pseudogenes and build the pseudogenes-miRNA-mRNA regulatory network through public datasets to explore their potential functions in carcinogenesis and cancer prognosis. Results LDHAP5 was selected as the most potential candidate pseudogene among 63 up-regulated pseudogenes for it was significantly associated with poor overall survival in ovarian serous cystadenocarcinoma. KEGG pathway analysis revealed that LDHAP5 was most enrichment in microRNAs in cancer, pathway in cancer and PI3K-AKT signaling pathway. Further analysis revealed that EGFR was the potential target mRNA of LDHAP5 which may play a great role in ovarian serous cystadenocarcinoma. Conclusion LDHAP5 was first discovered to be associated with the occurrence and prognosis of ovarian serous cystadenocarcinoma, and it may be used as a novel specifically therapeutic target against ovarian serous cystadenocarcinoma.


Background
Gynecological malignancies account for a large part of women's tumors and seriously endanger women's health. It is estimated that there will be approximately 13,800 new cases of uterine cervical cancer, 65,620 cases of uterine corpus cancer, and 21,750 cases of ovarian cancer in the United States in 2020, and it is speculated that there will be 4,290, 12,590 and 13,940 cancer deaths, respectively.(1) Advanced gynecological malignancies have a poor prognosis for lacking effective treatments to control distant metastasis.(2) However, most current clinical drugs are non-specific, and their therapeutic effects are limited.(3) Therefore, it is urgent to identify novel biomarkers for gynecological tumors, so as to improve drug efficacy and prolong survival.
Pseudogene was first discovered and named by Jacp et al in 1977.(4) Pseudogenes usually originate from paralogous functional genes ("parent gene"), but they lost the capacity of encoding functional proteins for accumulation of mutation (frameshift mutations, early or delayed stop codons, etc.). (5) Pseudogenes have not been paid attention until PTEN pseudogene 1 (PTENP1) was found to share the same microRNA response elements (MREs) with its homologous functional parent gene, PTEN. (6) With the advancement of next-generation sequencing (NGS), approximately 20,000 pseudogenes have been discovered in the human genome, and the role of pseudogenes as a long non-coding RNAs (lncRNAs) in the development of disease has been further revealed. (7)(8)(9) The current research results showed that pseudogenes mainly regulate gene expression at the post-transcriptional levels through the following two pathways.(10) The first way is that pseudogenes can be used as competitive endogenous RNAs (ceRNAs) to competitively bind miRNAs with coding gene, thereby positively regulating gene expression. (11)(12)(13) For example, PTEN pseudogene 1 (PTENP1) can competitively bind miRNA-17, miRNA-21, miRNA-19 and other miRNAs through the ceRNA mechanism, thereby preventing its parent gene PTEN from being degraded by miRNAs and increasing the expression of PTEN gene.(6) Pseudogenes play a negative role in another regulatory pathway, which can complete with their parent gens for destabilizing RNA binding proteins (RBPs), resulting in a decrease in the expression of parent genes. (14) In our study, we attempted to identify differentially expressed pseudogenes in four gynecological malignancies through the pseudogene database dreamBase, and then use pseudogenes-miRNA-mRNA regulatory network to further explore the potential function and mechanism in gynecological malignancies.

Prognostic analysis of upregulated expressed pseudogenes.
Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/) was used to evaluate prognostic values (overall survival) of upregulated pseudogenes in 32 kinds of common human cancers. (16) The group thresholds were as follows: the group cut-off was 'Median', the 'cutoffhigh' and 'cutoff-low' were 50%, axis units were 'Months', and p value < 0.05 was considered statistically significant.

Screening for pseudogene-regulated miRNAs and miRNA-target mRNAs.
The public online datasets of starBase v2.0 and miRTarBase were used to identify pseudogenesbinding miRNAs and miRNA-target mRNAs, respectively. (17,18) The network of pseudogenes-miRNA-mRNA was constructed using Cytoscape v_3.

KEGG pathways and Gene oncology (GO) enrichment analysis of target mRNAs.
The list of miRNA-target genes was imported into the STRING v_11.0, and the top five significantly GO terms and KEGG pathways were selected according to the values of false discovery rate (FDR), and then were visualized by GraphPad PRISM Version 6.02.(20)

Construction of protein-protein interaction network and screening for hub genes.
STIRNG v_11.0 was used to construct the regulatory network of protein-protein, and then visualized by Centiscape plugin of Cytoscape v_3.7.2. (19)(20)(21) The top ten hub genes were identified according to the values of Degree unDir.

Hub genes expression and mutations analysis.
Hub genes expression and mutations analysis in ovarian serous cystadenocarcinoma were analyzed using the online database cBioPortal.

Identification of potential target gene of LDHAP5.
Pearson correlation analysis between LDHAP5 and the top ten hub genes expression in ovarian serous cystadenocarcinoma was performed using GEPIA. (16) Kaplan-Meier overall survivals of target genes were analyzed by Kaplan-Meier Plotter. (23) The mRNA expression levels of ten hub genes in TCGA patients were further measured using the dataset of Oncomine Main. (24) 3. Results 3.1 identification of dysregulated pseudogenes in four common gynecological malignancies.
According to epidemiological statistics, cervical squamous cell carcinoma, ovarian serous cystadenocarcinoma, uterine corpus endometrial carcinoma and uterine carcinosarcoma are still lethal diseases in women.(1) In order to explore the potential role of pseudogenes in the carcinogenesis and cancer prognosis of four gynecological malignancies, we used the public database dreamBase to identify differentially expressed pseudogenes. As shown in Fig. 1A and Table 1, we identified 63 up-regulated and 0 down-regulated pseudogenes simultaneously in four gynecological malignancies after preliminary screening. We further measured the expression levels of 63 upregulated pseudogenes in 32 types of human cancers (Fig. 1B). Finally, 40 pseudogenes were thought to play potential roles in gynecological malignancies after removing the pseudogenes that were less highly expressed in 32 types of human cancers. Table 1 Numbers of downregulated pseudogenes among the four types of common gynecological malignancies from dreamBase. 3.3 Investigation of pseudogenes-miRNA-mRNA regulatory network.
By searching the database of starBase v2.0, we found only LDHAP5 had its corresponding miRNAs.

KEGG pathways and
3.5 EGFR was identified as the target mRNA of LDHAP5 in ovarian serous cystadenocarcinoma.
We used the Centiscape plugin of Cytoscape v_3.7.2 to visualize the regulatory network of proteinprotein constructed by STRING v_11.0 (Fig. 4). Then the top ten hub genes (TP53, MYC (Table 3). We found only EGFR (fold change = 1.192, P = 0.001), PTEN (fold change = 1.214, P = 0.007) and CREB1 (fold change = 1.723, P = 1.66E-04) mRNA were highly expressed in TCGA ovarian patients (n = 594) than normal patients (n = 8) using the database of Oncomine Main (Fig. 6A), and then we further analyzed the prognostic values (overall survival) of five hubs in ovarian serous cystadenocarcinoma using Kaplan-Meier plotter ( Table 4, Fig. 6B). Only the EGFR was significantly correlated with poor outcome (HR = 1.51, 95%CI = 1.15-2, P = 0.0033) in ovarian serous cystadenocarcinoma, while the SIRT1 predicted good outcome (HR = 0.75, 95%CI = 0.57-1, P = 0.047). According to the pseudogene-miRNA-mRNA regulatory mechanism, we finally concluded that LDHAP5 may play potential roles in ovarian serous cystadenocarcinoma through targeting EGFR.  Table 3 Pearson correlation analysis between LDHAP5 and ten hub genes expression in ovarian serous cystadenocarcinoma using GEPIA.   Based on the ceRNA hypothesis, our research focused on pseudogenes that can be transcribed into mRNA. We then further used the pseudogene-miRNA-mRNA regulatory network to identify pseudogenes that may play potential roles in common gynecological malignancies and explore their mechanism.
The initial goal of our study was to find pseudogenes that differentially expressed simultaneously in four common gynecological malignancies. However, we only found three significantly up-regulated pseudogenes (KRT8P3, KRT8P45 and LDHAP5) that predicted poor prognosis in ovarian serous cystadenocarcinoma after Kaplan-Meier survival analysis. With the deepening of our research, LDHAP5 was selected as the candidate pseudogenes for it has corresponding miRNAs. There are two reasons accounting for this phenomenon, the first one is that many pseudogenes remain unidentified so far. After all, pseudogenes were considered as "junk" or "fossil" DNA at first, and many methods had been invented to avoid detecting pseudogenes. (32)(33)(34)(35)(36) The second possibility is that the current ceRNAs hypothesis is not yet perfect, and it needs to be further demonstrated to build a more comprehensive regulatory network. (37) In our study, 148 potential target mRNA were identified. Functional enrichment analysis showed that which has been confirmed in many cancers. (40,41) The shortcoming of our research is that our conclusion is mainly based on the analysis of existing databases. In order to further confirm the role of pseudogene LDHAP5, we need to construct ovarian cancer cell lines that differentially express LDHAP5 in future. We then will confirm our previous theoretical results in vivo and in vitro level, and even use clinical pathological specimens of ovarian cancer patients to further confirm. EGFR antagonists (gefitinib, lapatinib, erlotinib, etc.) have been used in a variety of cancers, such as pancreatic cancer, small cell lung cancer, colorectal cancer and so on. (42−44) Once our research is successfully validated, it may be used in ovarian cancer in future.
With the deepening of research work, more functions of pseudogenes and corresponding mechanisms will be further revealed, and they will make contributes to identify more biomarkers, specific drug design, and the adoption of personalized treatment in the future.

Conclusion
To summarize, our study for the first time systematically elucidated the high expression of pseudogene LDHAP5 in ovarian serous cystadenocarcinoma, and it may lead to poor prognosis through targeting EGFR. It may serve as a new therapeutic target, and thereby improving the prognosis of patients with ovarian cancer in future.

Availability of data and materials
Not applicable.

Conflict of interest statement
The author(s) declare no competing interests.

Consent for publication
Not applicable.

Authors' contribution
Peng Wu was responsible for the study concept and design; Shitong Lin, Canhui Cao, Ping Wu, Peipei Gao, Wenhua Zhi, Ting Peng were involved in data collection, data screening and statistical analysis; Shitong Lin wrote the manuscript, and Yifan Meng took charge of supervising the manuscript. The final manuscript was approved by all the authors above.  Prognostic values of 41 upregulated pseudogenes in 32 kinds of human cancers using GEPIA. The group thresholds were as follows: the group cut-off was 'Median', the 'cutoffhigh' and 'cutoff-low' were 50%, axis units were 'Months', and p value < 0.05 was considered statistically significant. Red represents poor outcome, green represents good outcome, yellow represents neutral outcome (hazard ratio=1), and wathet means that "The group thresholds you set are too strict. The sample size is insufficient at your custom thresholds". The values in the boxes represent the P values. GEPIA: gene expression profiling interactive analysis.  Identification of potential target genes of LDHAP5. The protein-protein interaction network of 148 genes was constructed using STRING v_11.0.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Table S1.docx Table S2.docx