Identication of dysregulated long noncoding RNA and associated mechanism in gastric cancer

Backgrounds


Backgrounds
Gastric cancer (GC) is one of the most common cancers worldwide [1,2] and has become the 3 rd leading cause of death related to cancer. GC incidence varies signi cantly, from the highest rate in Eastern Europe, South America and East Asia to the lowest rate in North America, and risk factors for GC-e.g. diet, lifestyle [3], and chronic H. pylori infection [4]-are unique for each region. Since two-thirds of newly diagnosed patients have either locally advanced or metastatic disease, as a result, the 5-year overall survival of GC patients remains 20-40% worldwide [5], and the median survival time after surgery is only 9 to 10 months [6] for patients with metastatic disease. Improved health care and screening programs in Japan have shown that over 70% of patients with early-stage of GC survive over 5 years [7], indicating a decisive role of early diagnosis and treatment for the survival of GC patients. The most common diagnostic methods -endoscopy and pathological examination -are the 'gold standard' for GC diagnosis [8]. Unfortunately, they cannot be used for screening in many countries due to the costs of the procedure and potential risk of patient's injury [9], and existing common serum markers-CEA, CA199, and CA72-4 [10]-lack sensitivity and speci city essential for early cancer screening.
Long noncoding RNA (lncRNA) is a 200-nucleotide long transcript that regulates gene expression and messenger RNA (mRNA) splicing in the nucleus. Over time, a growing number of studies explored the role of lncRNA in the regulation of different physiological and pathological functions, e.g. cell differentiation and proliferation [11], carcinogenesis [12], and metastasis [13]. Several lncRNAs have been identi ed as oncogenes and tumor-suppressors: for example, up-regulation of HOTAIR drives proliferation, migration, and invasion of GC cell [14], H19 has oncogenic activity in GC and colon cancer [15], while CASC2 suppresses proliferation of GC cells through the MAPK signaling pathway [16]. Recent evidence indicates that lncRNAs can also modulate and be regulated by cancer immune microenvironment [17], making lncRNAs a potential biomarker and a therapeutic target that can improve the management and treatment of GC.
In this study, we did a meta-analysis of lncRNAs to assess their overall accuracy for the diagnosis of GC.
Using Gene Expression Pro ling Interactive Analysis (GEPIA), we compared expression patterns in GC and normal tissue and found eight lncRNAs with marked differences in expression. We also identi ed two genes that had different levels of expression in normal tissue and GC and could interact with these lncRNAs. Taken together, our results suggest a connection between lncRNAs and prognosis in GC patients 2 Methods

Search strategy and eligibility criteria
Publicly available databases (PubMed: https://www.ncbi.nlm.nih.gov/pubmed/; and EMBASE: https://www.embase.com) were comprehensively searched to identify relevant English-language articles reporting microarray data for human lncRNAs in GC patients and published up to the end of 2018. The following keywords and phrases were used: (lncRNA OR long noncoding RNA) AND ((gastric cancer) OR GC OR stomach neoplasms OR (stomach AND neoplasms) OR (gastric AND cancer)). Duplicate articles were manually removed using Reference Manager (Thomson Reuters EndNote X7, New York, NY, USA). To determine eligible studies the titles, abstracts, and full texts were evaluated independently by two investigators. Another investigator extracted data from identi ed papers, and the reference lists of eligible articles were reviewed to obtain associated studies. All disagreements were resolved by an independent investigator. The criteria for inclusion were: 1) studies with a con rmed diagnosis of gastric cancer; 2) studies with lncRNAs microarray analysis and reports on altered lncRNAs; 3) studies on diagnostic value of lncRNAs in tissue, serum, plasma, peripheral blood, or gastric juice (if published data were su cient to allow meta-analysis); 4) original articles published in English with full text available. Articles that did not satisfy these criteria were excluded. General information from the eligible studies was arranged in tables, and data on lncRNAs were pooled into forest plots.

mRNA microarray data information and processing DEGs
We used NCBI-GEO, an online public microarray database, to acquire gene expression pro les for GC and normal stomach tissues from GSE54129, GSE19826 and GSE79973 datasets that had been produced using GPL570 Platform ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array). GPL570 contained 111 human GC tissues and 21 noncancerous tissues, 12 adjacent normal/tumor-paired gastric tissues, and 10 pairs of GC tissue and adjacent non-tumor mucosa, respectively. DEGs in tumor and normal tissues were identi ed with GEO2R online tools [6]. The DEGs with |log2FC|< 0 were considered to be down-regulated, while the DEGs with |log2FC|> 0 -up-regulated. Three microarrays were compared, and upregulated and downregulated genes with |log2FC|> 2 and adjusted p-value less than 0.05 in TXT format were pooled into Venn software online to identify common DEGs.

Bioinformatics analysis
Bioinformatics analysis was performed to determine relationships between altered lncRNAs and DEGs of overlapping mRNA microarrays. Brie y, lncRNAs from selected studies were uploaded into GEPIA to validate their expression in GC tissues. Then lncRNAs with altered expression were imported into Qiagen's IPA system and overlaid with a global network of interactions in gastrointestinal disease. Next, we overlaid identi ed DEGs and genes related to the altered lncRNAs to nd potential interactions. Finally, we used GEPIA and Kaplan Meier plotter online database to validate expression levels and effects of identi ed genes and lncRNAs on survival time.

Statistical methods
The con dence interval (CI) of diagnostic value was calculated using meta-disc (version 1.4; Ramony Cajal Hospital, Madrid, Spain), the results were considered signi cant for a two-sided p-value less than 0.05. Heterogeneity was inferred by calculating inconsistency (I 2 ) (heterogeneity was considered substantial for I 2 values above 50%), and the results were incorporated into a random-effects model.
Potential reasons for heterogeneity were analyzed by regression analysis. The sensitivity and speci city of potential biomarkers were evaluated using summary receiver operating curve (sROC) and area under the curve (AUC). In addition, LR + (positive likelihood ratio), LR -(negative likelihood ratio), and DOR (diagnostic odds ratio) were calculated. The method for differential analysis to compare molecules expression in GC tissues and normal tissues is one-way ANOVA, which was achieved by GEPIA website. The overall survival analysis related to targeted molecules in GC was generated by Kaplan Meier Plot website.

lncRNAs expression levels altered in GC
The summary of 13 studies reporting changes in lncRNA expression in GC tumor genesis and development is presented in Table 1. Different types of samples were used: Zhang et al. compared lncRNA between GC and non-GC patients using tissue and plasma [18]; gastric mucosae were used in one study [19]; all other studies analyzed GC cancer tissue and adjacent healthy tissue. Among all identi ed studies, differences in lncRNAs expression levels were detected for at least 75 lncRNAs [18]. Because of the re-annotation of the published microarray database, two studies did not provide fold changes of dysregulated lncRNAs [18,20,21]. A study from Hu et al. used a 1.5-fold change for selection criteria [22], while 2 fold change was used for other studies [18,[23][24][25][26][27][28][29][30]. The median age of GC patients enrolled in the analysis was at least 57.8 years, except one study [29] that did not have detailed information. Gender distribution and histopathological information of GC patients are shown in Table 1.
In summary, the analysis of identi ed microarray data showed substantial lncRNAs alteration in GC patients. However, data reported by different research groups were extremely variable. Thus, we concentrated on the potential of lncRNAs for GC diagnosis.

Meta-analysis of differentially expressed lncRNAs in GC patients.
To further investigate the diagnostic value of lncRNAs in GC, all articles exploring lncRNAs as a novel biomarker for GC patients were collected using the search strategy indicated in the owchart in Fig. 1. Twenty-three studies were included and pooled into meta-analysis.
The number of patients in each study ranged from 30 to 132, and tissue, plasma, serum, or gastric juice samples were used. Besides b-actin or GAPDH, U6 [31,32] and 18s RNAs [33] were used as endogenous standards for diagnostic evaluation. Quantitative methods and cut-off values were also different in different studies, and both individual lncRNAs and panels [33,34] were selected as novel diagnostic biomarkers for GC. Additionally, classic GC biomarkers (e.g. CEA and CA19-9) were compared with novel GC biomarkers [34], and the lncRNA panel showed markedly higher AUC value for discriminating GC patients from controls. Considering this evidence, we performed a meta-analysis with meta-disc software version 1.4. We pooled data from various specimens and generated forest plots shown in Fig. 2. The pooled sensitivity was 0.76 (95% CI: 0.74-0.77; Q =195.59, p =0.0000, I2% = 85.2%) and the speci city was 0.66 (95% CI: 0.64-0.68; Q =208.98, p =0.0000, I2% =86.1%), which indicated a presence of substantial heterogeneity. Then a random-effects model was used to re-analyze the diagnostic threshold of pooled data. The Spearman correlation coe cient was 0.238 (p =0.214, data not shown), suggesting no evidence of a diagnostic threshold. Afterward, forest plots of DOR were generated, which revealed that substantial heterogeneity was still present. This might result from the discrepancy of the studied populations, endogenous references, or specimen types. Meta-regression analysis on the possible factors indicated that specimen type was probably the reason for heterogeneity. Thus, the results (e.g. sensitivity) extracted from identi ed studies could not be simply pooled and were only suitable for subgroup analyses. Filtering studies based on specimen type reduced heterogeneity, however, it was still higher than acceptable levels. On the sROC curve of plasma samples, which included 16 LncRNAs, the maximum joint sensitivity and speci city (Q value) was 0.7443, and the area under the curve was 0.8096, indicating a moderate level of overall accuracy. The combined sensitivity, speci city, LR+, LR-and DOR in plasma were 0.84 ( Fig. S2). The data showed a lower pooled sensitivity for tissue.

Validation of lncRNAs expression by GEPIA
Since heterogeneity was not reduced to an acceptable level through subgroup analysis, we used GEPIA [35] analysis, a web-based tool to deliver fast and customizable functionalities based on TCGA [36]and GTEx data [37], to validate the expression of those lncRNAs between primary GC tissues and normal gastric tissues . We observed increased expression of six lncRNAS(ABHD11-AS1, H19, PVT1, UCA, HOTTIP, and SUMO1), and two lncRNAs (FER1L4 and LINC00982) decreased in GC tissues compared to normal stomach tissue . The speci c data related to expression of those lncRNAs recorded by dataset is shown in Fig. 3.

Identi cation of DEGs in GCs and investigation of correlation to modulated lncRNAs by IPA
To investigate the underlying mechanism related to lncRNAs, we extracted 3944, 629 and 1406 DEGs from GSE54129, GSE19826 [38] and GSE79973 [39] via GEO2R online tools, respectively. The gastric samples used for those arrays were collected during surgery. Healthy gastric mucosa (GSE54129) or adjacent normal gastric tissue (GSE19826, GSE79973) was used as control. . Subsequently, Venn diagram software [40] was used to identify common DEGs in these datasets. A total of 226 common DEGs were identi ed, including 142 up-regulated genes (p< 0.05 & log2FC> 2) and 84 down-regulated genes (p< 0.05 & |log2FC| > 2) in the GC tissues (Fig. 4). Meanwhile, IPA analysis identi ed molecules that interacted with the altered lncRNAs in gastrointestinal diseases (Table 2). Then we pooled these molecules and DEGs in GC into the Venn diagram and identi ed two genes (IGF2BP3 and FOLR1), which probably interacted with altered lncRNAs in GC.

Validation of genes interacted with lncRNAs in GC
To investigate the potential role of these genes in GC, we further validated the expression of IGF2BP3 and FOLR1. GEPIA website [35] and Kaplan Meier plotter (http://kmplot.com/analysis), a website established on TCGA [36]and GTEx data [37] were used to recognise the correlation between the expression of those genes and the prognosis of GC patients. We found a dramatically increased expression of IGF2BP3 and signi cantly reduced expression of FOLR1 in GC patients compared to healthy controls, the altered expression of these two genes was reported to be correlated with poor overall survival time of GC patients, especially the altered expression of FOLR1 (P<0.01, shown in Fig. 5).

Discussion
To improve the survival time of GC patients, early diagnosis and treatment have been recognised as effective methods. Thus, the exploration of useful biomarkers for early diagnosis and positive management based on the mechanism of GC development is required. To date, several available biomarkers, such as CA-199, CA72-4, and CEA, are used. However, the sensitivity and speci city of those biomarkers are limited. Since the rst study of lncRNAs in GC has been reported in 1997 [41], with more research focused on the clinical value of lncRNAs in GC diagnosis. Exploration of dysregulated lncRNAs as biomarkers for GC diagnosis has several advantages: 1) lncRNAs can be detected and resist ribonuclease degradation in body uids [42]; 2) expression of lncRNAs has temporal and tissue speci city [43]; 3) ectopic expression of lncRNAs is responsible for tumor genesis [44,45]. Therefore, investigation of lncRNAs might produce novel diagnostic and prognostic biomarkers for GC and help us understand the molecular mechanisms of GC development and progression.
To explore the potential role of lncRNAs in GC, the present study reviewed and analyzed published studies that reported differentially expressed lncRNAs between GC and normal tissue using microarray analysis. Due to the substantial variety of reported data, for meta-analysis we retrieved articles reporting on the diagnostic value of lncRNAs (Fig.1). However, the data pooled from all studies showed marked heterogeneity (Fig. 2) that was most likely associated with specimen types as evidenced by metaregression analysis. Although we performed a subgroup meta-analysis ( Figure S1&S2) and also revealed that individual or speci c lncRNA combinations could potentially serve as novel biomarkers for diagnosis of GC, the heterogeneity was still too high. Meanwhile, we found that the data from different research groups had signi cant differences in quality. Therefore, we validated and found eight lncRNAs with signi cant differences of expression in GC compared to normal tissue using GEPIA website (Fig.3). To further investigate the potential mechanism underlying the function of these lncRNAs in GC, we utilised IPA analysis to disclose molecules interacting with these lncRNAs in gastrointestinal diseases. Subsequently, bioinformatics methods was performed to identify DEGs in GC based on three datasets (GSE52149, GSE19826, and GSE79973, shown in Fig.4). Then results for lncRNAs and DEGs were pooled into the BioVenn diagram to identify two genes (IGF2BP3 and FOLR1, shown in Fig.4), which might be regulated by altered lncRNAs in GC samples. Finally, we utilised GEPIA and Kaplan Meier plotter analysis to verify that IGF2BP3 and FOLR1 both changed signi cantly and moreover, correlated with worse survival time in GC patients (Fig.5).
Insulin-like growth factor-2 mRNA-binding protein 3 (IGF2BP3) was revealed a signi cant elevation in GC patients and a marked correlation with GC prognosis. IGF2BP3, also known as IMP3, belongs to a conserved IGF2 mRNA-binding protein family. It has been rst recognised in 1997, due to its high expression in pancreatic carcinoma [46]. Subsequently, IGF2BP3 has been found to be overexpressed in various tumors [47][48][49][50]. Moreover, it has been demonstrated to modulate tumor cell fate by promoting tumor growth [51], cell proliferation [2], drug-resistance [52], and invasiveness [53]. The expression of IGF2BP3 has also been shown to correlate with prognosis and metastasis of human cancer. H19, PEG10, and IGF2BP3 have been reported to promote the expression of each other and the suppression of those genes can decrease cell proliferation, anchorage-independent growth, invasion, and chemoresistance in GC [54]. Although IGF2BP3 has been con rmed to be an embryonic regulator, the research related to IGF2BP3 in human is still very limited. IGF2BP3 expressed a high level in progenitor cells, but also was observed in mature cells, e.g. placenta, lymph nodes, tonsils and testes. Moreover, IGF2BP3 transgenic mice were observed an increased biogenesis of endocrine pancreas, resulting in pancreas resemble to be embryonic tissue and intestinal cells with rearrested capacity of differentiation. Those recapitulation fetal-like phenotypes regulated by IGF2BP3 suggested a potential diagnostic role of IGF2BP3 in tumor genesis. However, currently, few evidence has been elucidated the expression and function of IGF2BP3 in the GC progression. Our study rstly put forward a higher expression of IGF2BP3 in GC tissues, and a signi cant correlation with prognosis of stomach cancer patients (shown in Fig.5). According to the bioinformatics data, as RNA binding protein, we also provide a putative mechanism interpreting its association with poor prognosis of GC patients.
Folate receptor 1 (FOLR1) is a membrane-bound protein with a high a nity to folate that binds and transports folate with physiological levels into cells. Folate, one of the crucial components of cell metabolism and DNA synthesis and repair, is a requirement for the rapid division of cancer cells [55]. A higher expression of FOLR1 has been found in speci c epithelial-derived malignant tumors [56] and solid tumor like breast [57], lung [58] and ovarian [59]cancer, and also illustrated to positively correlate with tumor grade and stage [55]. During early carcinogenesis, FOLR1 promotes cells to increase folate uptake and DNA damage repair [60]. Overexpression of ROLR1 results in a growth advantage of tumor cells through a possible mechanism involving folate uptake [61] and translocation to nucleus by regulating the key developmental gene in cancer cells [62]. Therefore, the increased expression of FOLR1 suggested a higher activity of cancer cells and might predict a poor prognosis of cancer patients. Recently, FOLR1 has been con rmed as a potential target for immunotherapy with chimeric antigen receptor (CAR) T cell in GC [63]. In line with the previous data, the present study also found a higher expression of FOLR1 in GC patients, which was correlated with poor prognosis of GC patients. However, currently, no evidence has been focused on the implication of FOLR1in prognostic prediction of GC and our analysis pointed out a novel insight to explore the potential of FOLR1 in the development and prognostic prediction of GC.
Nevertheless, this study has shown substantial evidence to support the potential prognostic role of IGF2BP3 and FOLR1 in progression of GC and close relationship with LncRNAs. More research is still required to investigate the mechanisms underlying these observations rather than theoretical assumption based on the fundamental of bioinformatics. In fact, the present study in fact also inspires informatics analysis as a supplementary tool to investigate the potential mechanism of LncRNAs alteration in the progression of GC.

Conclusion
In conclusion, our study revealed two axes -H19-IGF2BP3 and PVT1-FOLR1 might correlate to the GC development, and IGF2BP3 and FOLR1 are further identi ed to be potential for diagnostic prediction in GC patients. Therefore, the present study provides novel insights for elucidating the role of IGF2BP3 and FOLR1 in stomach cancer and new potential candidates to predict the GC prognosis.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.
Availability of data and material: The datasets used and/or analyzed during the current study are available from the GEO datasets.

Competing interests:
The authors declare that they have no competing interests, and all authors should con rm its accuracy.

Funding:
Funding was not received. Acknowledgements: Not applicable.
Author's contributions: LS and YM analyzed and interpreted the data from eligible publications. LS analyzed the GEO datasets and identi ed the potential interaction between selected LncRNAs and DEGs in GC patients. ZZ and YM were major contributor in conclusion and writing the manuscript. All authors read and approved the nal manuscript.     Eight LncRNAs with signi cant difference in GC specimen compared to normal specimen. To further identify the reported LncRNAs' expression levels for GC and normal people, all genes were further analysed by the GEPIA website. Eight LncRNAs were veri ed in signi cant expression levels in GC patients compared to healthy people (The method for differential analysis is one-way ANOVA, using stomach tissue and normal tissue as variable for calculating differential expression. Matched Normal data was selected the dataset from TCGA normal + GTEx normal tissue. Genes with higher |log2FC| values and lower q values than pre-set thresholds are considered differentially expressed genes.*P < 0.01, Tumour: Red colour; Normal: Grey colour).  Prognostic relationship of FOLR1 in GC patients. (The method for differential analysis is one-way ANOVA, using stomach cancer tissue and normal tissue as variable for calculating differential expression. Matched Normal data was selected the dataset from TCGA normal + GTEx normal tissue. Genes with higher |log2FC| values and lower q values than pre-set thresholds are considered differentially expressed genes.*P < 0.01, Tumour: Red colour; Normal: Grey colour).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.