Identification of lncRNA FAM99A gene as a prognostic biomarker of hepatocellular carcinoma

Background The complicated pathogenesis of hepatic cancer involves multiple clinical prognosis-associated oncogenes. Methods We utilized the bioinformatics approach to analyze the data from hepatic cancer cases collected by TCGA repository. Results We first found that the FAM99A (Family With Sequence Similarity 99 Member A) gene, a long non-coding RNA (lncRNA), is lowly expressed in hepatocellular carcinoma and closely related to clinical prognosis. We further analyzed the underling molecular mechanism from the perspectives of copy number variation (CNV), DNA methylation, immune cell infiltration, and related cellular pathway. Even though we did not observe a strong correlation between the FAM99A expression and the CNV or immune cell infiltration, the high methylation levels of the five methylated probe sites (cg24218935, cg01745044, cg04353359, cg04938738, cg25356611) were found to be negatively correlated with low expression level of FAM99A. Besides, we performed the enrichment analysis to screen out a group of FAM99A-correlated genes and molecular pathways, such as complement cascade, RNA metabolism, drug metabolic process, PPAR signaling pathway, or cell cycle. Conclusions

TCGA (The Cancer Genome Atlas) archives the multiple-genomics data from more than thirteen types of cancer, including expression level, mutation, copy number variation (CNV), genome methylation of lncRNA genes, and clinical information, etc. [5,6]. It helps to identify the prognosis-associated lncRNA oncogenes. Herein, we aimed to first analyze the potential role of the lncRNA FAM99A gene in the pathogenesis and prognosis of hepatic cancer.
In this study, we first identified that the lncRNA FAM99A is primarily expressed in liver cancer, based on the data of TCGA. Also, we explored the possible molecular mechanisms of lncRNA FAM99A in hepatic carcinogenesis from the perspectives of gene expression, copy number variation (CNV), DNA methylation, immune cell infiltration, and enrichment analysis of FAM99A-correlated genes.

Expression analysis
We analyzed the expression profile of FAM99A gene in the different cancer tissues and corresponding control tissues in the TCGA project by the GEPIA 2 (http://gepia2.cancerpku.cn/#analysis) [10]. The boxplot data and the expression levels of FAM99A gene by pathological stage in the TCGA-LIHC (Liver hepatocellular carcinoma) and TCGA-CHOL (Cholangio carcinoma) cohorts were provided, respectively.

Survival curve analysis
We utilized the Kaplan-Meier plotter (http://kmplot.com/analysis/index.php?p= service&cancer=liver_rnaseq) to perform the overall survival (OS), relapse free survival (RFS), progress free survival (PFS), disease specific survival (DSS) analyses by the expression level of the FAM99A gene in the hepatic cancer cases [11]. Auto select best cutoff was set. The clinical factors, including the pathologic stages, grade, AJCC_T, gender, vascular invasion, race, sorafenib treatment, alcohol consumption, hepatitis virus, were also considered.

Copy number variation analysis
Based on the GSCALite (http://bioinfo.life.hust.edu.cn/web/GSCALite/) [12], we performed the copy number variation (CNV) analysis of lncRNA FAM99A in the hepatic cancer cases of TCGA-LIHC cohort. The CNV pie distribution, CNV profile (homozygous amplification, homozygous deletion, heterozygous amplification, heterozygous deletion), and the Pearson correlation between CNV and expression level were provided, respectively.

DNA methylation analysis
We analyzed the DNA methylation status of lncRNA FAM99A in the hepatic cancer cases of the TCGA-LIHC cohort through the MEXPRESS [13,14]. The correlation between DNA methylation and expression level of FAM99A gene was analyzed by Pearson's test. The correlation coefficients (r) and Benjamini-Hochberg-adjusted P values targeting the different methylation probes, including cg24218935, cg01745044, cg04353359, cg04938738, cg25356611, were provided, respectively.

Immune cell infiltration analysis
We utilized the GEPIA 2 approach to conduct the pair-wise gene correlation analysis between lncRNA FAM99A expression and the signatures of the following immune cells: central memory T cell; Effector memory T cell; Effector T cell; Effector Treg T cell; Exhausted T cell; Native T cell; Th1 like cell; Resting Treg T cell.

Enrichment analysis of FAM99A-correlated genes
We performed the cluster analysis of the lncRNA FAM99A -correlated significant genes, through LinkedOmics (https://www.biostars.org/p/287820/) [15]. The heat map targeting the FAM99A positively or negatively correlated significant genes, and GSEA (Gene Set Enrichment Analysis) profiles for the enrichment category of reactome pathway were provided, respectively. In addition, we performed the GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis. Weighted set cover was utilized for the redundancy reduction. The data was visualized by the bar chart, DAG (Directed Acyclic Graph) or volcano plot.

Result
Expression analysis data tissues, compared with the control tissues. Moreover, we observed the correlation between the FAM99A expression and the pathological stages of liver hepatocellular carcinoma cases (Fig.1D), but not cholangio carcinoma cases (Fig.1E). Therefore, these suggested the potential role of lncRNA FAM99A gene in the etiology of hepatic cancer or cholangio carcinoma.

Survival curve analysis data
Next, we tried to analyze the association between FAM99A expression status and clinical prognosis for hepatocellular carcinoma and cholangio carcinoma. Due to the lack of survival data for the cholangio carcinoma cases, we only focused on the hepatocellular carcinoma. We observed the lower rates of overall survival ( Fig.2A, HR=0.56, P=0.0014), relapse free survival (Fig.2B, HR=0.63, P=0.011), progress free survival (Fig.2C, HR=0.62, P=0.0035), disease specific survival (Fig.2D, HR=0.56, P=0.015), in the FAM99A high expression group, compared with the high expression group. In addition, we fully considered the effect of different clinical factors, such as the pathologic stages, grade, vascular invasion, or sorafenib treatment, in the above correlation. We performed survival curve analysis after grouping the samples by different clinical factors. As shown in Table 1 and Table S1-S3, we observed the relationship between FAM99A low expression and the worse survival in the subgroups of "pathologic stage 3", "grade 3", "AJCC_T3", and "male" (all HR<1, P<0.05), but not female subgroup (all P>0.05). These results provide evidence regarding the association between FAM99A low expression and poor clinical outcomes of hepatocellular carcinoma, which warrants a more in-depth molecular mechanism investigation.

CNV analysis data
Herein, we analyzed the CNV status of the lncRNA FAM99A gene. lncRNA FAM99B was also examined. As shown in Fig.3A-B, we did not observe the copy number variations in the majority of hepatic cancer cases, and the heterozygous amplification/heterozygous deletion in the limited cancer cases. Furthermore, we did not detect a strong correlation between CNV and expression of the lncRNA FAM99A gene (Fig.3C). Thus, copy number variations of the FAM99A gene may not play an essential role in hepatic tumorigenesis.

DNA methylation analysis data
We attempted to exploit the potential molecular mechanism from the point of Furthermore, GO analysis data (Fig.7) presented a series of FAM99A -correlated issues of biological process (e.g., protein activation cascade, drug metabolic process, etc.), cellular component (e.g., extracellular organelle, mitochondrion, etc.), and molecular function (e.g., oxidoreductase activity, RNA binding, etc.). KEGG analysis (Fig.8) further showed the enriched pathways, such as metabolic pathways, PPAR signaling pathway, cell cycle.

Discussion
Based on the available data sets of hepatic cancer cases collected by TCGA, for the first time, We discovered that lncRNA FAM99A is mainly expressed in liver-related tumors, namely hepatocellular carcinoma and cholangio carcinoma. When compared with the adjacent controls, lncRNA FAM99A is lowly expressed in the hepatocellular carcinoma or cholangio carcinoma, suggesting that FAM99A may be a liver-specific tumor suppressor gene.
Nevertheless, there are only a total of 36 cholangio carcinoma tissues and 5 adjacent control tissues in the TCGA-CHOL project. Also, we did not obtain a positive result in clinical prognostic analysis. Therefore, in this study, we only focus on the correlation between lncRNA FAM99A and hepatocellular carcinoma. Despite this, we do not rule out the potential regulatory role of lncRNA FAM99A in the initiation and progression of cholangio carcinoma, considering the link between the Noncoding RNAs (ncRNAs) and cholangio carcinoma [16]. More sample sizes, clinical and basic experimental data are needed for an in-depth investigation.
With regards to the hepatocellular carcinoma, we reported a statistical correlation between low expression of FAM99A gene and poorer prognosis status of overall survival, relapse free survival, progress free survival, and disease specific survival. There existed the statistical expression reference of FAM99A among different pathological stages (stage I-IV) as well. When hepatic cancer samples were grouped according to the clinical information, the positive association between lowly-expressed FAM99A and poor survival outcomes exists in the subgroups of "pathologic stage 3", "grade 3", "AJCC_T3". Besides, it is important to note that we observed the correlation between FAM99A gene expression and the clinical prognosis of male hepatic cases, but not female cases. These suggest that the prognostic warning ability of lowly expressed FAM99A gene may increase with the tumor differentiation process or pathological state in the male patients with hepatic cancer.
LncRNA FAM99A rs1489945 was reported to be linked to the maternal mean arterial blood pressures in a Cambridge birth cohort [7]. Therefore, we also explored the mutation and CNV status FAM99A gene in cancers. Our findings showed a very low genetic mutation frequency of FAM99A in cancers, which is not statistically significant correlated with gene expression or clinical prognosis (data no shown). We also did not observe a high frequency of CNV, and a strong correlation between the FAM99A expression and the CNV. In addition, considering the links between cellular immune responses and hepatocellular carcinoma [17], we also analyzed the correlation between the lncRNA FAM99A expression and the signatures of the following immune cells: central memory T cell; Effector memory T cell; Effector T cell; Effector Treg T cell; Exhausted T cell; Native T cell; Th1 like cell; Resting Treg T cell. However, we still did not observe a strong correlation.
DNA methylation status of RNA was closely related to the gene expression and the carcinogenesis of hepatic cancer [18,19]. Eukaryotic lncRNA also take part in the metastasis and prognosis of hepatocellular carcinoma, through regulating the chromatin remodeling and methylation [20,21]. The high methylation levels of the five methylated probe sites (cg24218935, cg01745044, cg04353359, cg04938738, cg25356611) were found to be negatively correlated with low expression levels of FAM99A. And we found that the cg24218935 and cg04353359 sites are located in the promoter region, while cg01745044, cg04938738, and cg25356611 are in the non-promoter region. It is worthwhile to further explore the synergy role of different methylation sites of FAM99A in the expression level and survival prognosis of hepatic cancer cases.
As a downregulated gene in preeclampsia, FAM99A takes part in the regulation of invasion, migration and apoptosis of trophoblast cells [8]. We analyzed a series of genes related to FAM99A expression. Among them, we observed a high degree of expression consistency between FAM99A and FAM99B (Family With Sequence Similarity 99 Member B).
Regarding FAM99B, only one article was reported by searching that FAM99B is also a liverspecific lncRNA, which can inhibit cell proliferation, migration, and invasion of cells [22].
Such cellular function attribute may also be involved in the role of FAM99A in hepatic tumorigenesis and progression. In addition, we performed a series of enrichment analyses based on FAM99A expression-related genes. FAM99A gene is related to numerous biological events such as completion cascade, fatty acid metabolism, metabolism of RNA, drug metabolic process, oxidoreductase activity, and RNA binding, which provides possible research directions for in-depth molecular research. The molecular mechanism regarding the role of DNA methylation or ceRNA (competing endogenous RNAs) networks of FAM99A in the above biological activities merits further experiments.

Conclusion
Based on the liver cancer cases within TCGA-LIHC cohorts, we first identified the lowly            The GO analysis of the lncRNA FAM99A correlated significant genes. The DAG data for the biological process (A), cellular component (B), and molecular function (C) were provided, respectively.

Figure 8
The KEGG analysis of the lncRNA FAM99A correlated significant genes. Volcano plot was provided. FDR, false discovery rate.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.