3.1. Data quality assessment and preprocessing
Regression analysis of the raw data was performed using the affyPLM R package. The relative log expression (RLE) plot revealed that the gene expression levels in GSE74530 were consistent with the median approaching 0 (Fig. 1A), indicating that the quality of the expression data was reliable. As for the RNA degradation plot, it showed that the RNA integrity has a good quality (Fig. 1B), and all the 12 samples can be used for further analysis. A total of 16,434 mRNAs and 1,210 lncRNAs were identified in the microarray data using the human comprehensive gene annotation from GENCODE. As shown in Fig. 1C, there were no obvious outliers in the spread and location in boxplot, and small discrepancies can be sufficiently removed by normalization. The median values of 12 samples were almost at the same level after normalization (Fig. 1D), which effectively corrected the systematic differences between the chips.
3.2. The screening results of DELs and DEMs
We identified 34 DELs and 1,641 DEMs by comparing the tumor groups with the adjacent normal tissue group using the limma package. The expression of these DELs and all the DEGs are visualized in a heatmap (Fig. 2) and a volcano plot (Fig .3), respectively. The miRNA targets of the lncRNAs were predicted using the miRcode database in R. The target mRNAs of these miRNAs were obtained using the three highly reliable microRNA target prediction databases of miRTarBase, TargetScan and miRDB, and the result was intersected with the above-mentioned DEMs.
As a result, we got a total of 2,137 reliable miRNA-mRNA pairs and 162 predicted lncRNA-miRNA pairs (including 11 lncRNAs, 51 miRNAs and 851 mRNAs), which were used for further analysis.
3.3. Construction of the ceRNA network in oral cancer
In order to reduce the false positives, we calculated the PCC between the 34 DELs and 1,641 DEMs, and the co-expressed pairs with the top 5% PCC value (PCC > 0.886) were defined as the significant co-dysregulated competing pairs. Next, the results were merged with the 11 lncRNAs, 51 miRNAs and 851 mRNAs from the previous screening step. The intersection resulted in the selection of a total of 238 potential co-dysregulated competing triples, the full list is shown in Additional file 1.
To perform a deeper functional study of the lncRNAs that act as a miRNA sponge in oral cancer, we built a ceRNA network and applied the Cytoscape software to perform visualization (Fig. 4A). The network included 10 lncRNA nodes, 41 miRNA nodes, 122 mRNA nodes and 238 edges. In addition, several miRNAs in the network had been identified in oral cancer as listed in Table1.
Table 1 The effect characteristic of miRNAs in oral squamous cell carcinoma (OSCC).
miRNA
|
Targets
|
Effect characteristic
|
Reference
|
hsa-miR-139-5p
|
AKT
|
Inhibits oral cancer cell proliferation and induces cell apoptosis
|
(36)
|
hsa-miR-17-5p
|
Casp-2/-7/-8, Bcl-2 and DIALO
|
Inhibits hypoxia-induced apoptosis
|
(37)
|
hsa-miR-107
|
p53
|
p53 upregulates miR-107 in normal condition
|
(38)
|
hsa-miR-375
|
Sp1 and cyclin D1
|
Inhibits cell growth and its expression is correlated with prognosis of TSCC
|
(39)
|
hsa-miR-455-5p
|
UBE2B
|
Binds SMAD3-specific promoter regions, leads to UBE2B downregulation, and contributes to oral cancer tumorigenesis
|
(40)
|
hsa-miR-137
|
CDK6
|
Arrests cell cycle at the G1-S checkpoint
|
(41)
|
hsa-miR-140-5p
|
ADAM10
|
Inhibits the invasion and migration of TSCC cells
|
(42)
|
hsa-miR-125b-5p
|
BAK1
|
Boosts the level of BAK1 that is controlling the apoptotic pathway
|
(38)
|
3.4. Functional enrichment analysis
Speculations on the possible functions of the lncRNAs were made through the functional enrichment analysis of their linked mRNAs. GO analysis was performed to analyze the functions of the mRNA nodes, and the GO interaction network was constructed using the BiNGO tool (Additional file 2). As a result, 18 GO terms were found to be significantly enriched (Fig. 4B and Additional file 3). Among these terms, the top three enriched ones were collagen binding, extracellular matrix binding and cell adhesion molecule binding, all of which belonged to the molecular function (MF) terms. Interestingly, extracellular matrix (ECM) binding was related to the proliferation of OSCC cells (43) and played an important role in the growth and survival of oral cancer cells (44). It was demonstrated that cell adhesion molecules, together with tumor-associated matrix molecules, are functionally involved in the progression of oral cancer (45). What’s more, discoidin domain receptor-1 (DDR1) could be activated by the specific binding with collagens (II,III) (46), and the activation of DDR1 has been reported in oral cancer (47). The KEGG pathway analysis resulted in 10 enriched pathway terms, shown in Fig. 4C, including the terms of ECM-receptor interaction, small cell lung cancer, PI3K-Akt signaling pathway, focal adhesion, TGF-β signaling pathway and more (Additional file 4). Among these pathways, ECM-receptor interaction (48), PI3K-Akt signaling pathway (49), TNF signaling pathway (50) and TGF-β signaling pathway (51) were OSCC-related pathways.
3.5. Topological analysis of the ceRNA network
In order to identify the hub genes in the lncRNA-miRNA-mRNA network that are related to oral cancer, we computed the node degrees. In the study of Han et al. (33), the nodes with a degree greater than 5 were defined as hubs. Based on this research, a total of 42 nodes could be chosen as hubs, including 10 lncRNAs, 28 miRNAs and 4 mRNAs (Table 2 and Additional file 5). In addition, BC was also calculated as a measure to select the hubs (52) (Table 3). Higher BC values of the nodes indicated an increased important of these nodes in the regulatory network (53).
We found that the three lncRNAs of HCP5, AGAP11 and HCG22 had higher node degrees along with greater BC values, suggesting that they may be potential key regulators controlling the oral cancer related ceRNA network.
Table 3 List of the 15 genes with the top betweenness centrality.
Number
|
Type
|
name
|
Betweenness Centrality
|
1
|
lncRNA
|
HCP5
|
0.26
|
2
|
lncRNA
|
AGAP11
|
0.16
|
3
|
miRNA
|
hsa-miR-17-5p
|
0.14
|
4
|
miRNA
|
hsa-miR-24-3p
|
0.13
|
5
|
miRNA
|
hsa-miR-27a-3p
|
0.10
|
6
|
miRNA
|
hsa-miR-1297
|
0.10
|
7
|
miRNA
|
hsa-miR-22-3p
|
0.09
|
8
|
lncRNA
|
CRNDE
|
0.09
|
9
|
miRNA
|
hsa-miR-140-5p
|
0.09
|
10
|
miRNA
|
hsa-miR-129-5p
|
0.08
|
11
|
lncRNA
|
HCG22
|
0.08
|
12
|
miRNA
|
hsa-miR-216b-5p
|
0.07
|
13
|
miRNA
|
hsa-miR-507
|
0.07
|
14
|
miRNA
|
hsa-miR-23b-3p
|
0.06
|
15
|
miRNA
|
hsa-miR-107
|
0.06
|
3.6. Key lncRNA-miRNA-mRNA subnetwork
Based on the above-conducted analysis, we obtained three key lncRNAs: HCP5, AGAP11 and HCG22, which may play a role in oral cancer. These hub lncRNAs were used with their linked miRNAs and mRNAs to construct three more specific functional lncRNA-associated subnetworks. The AGAP11-associated subnetwork included 1 lncRNA, 19 miRNAs, 21 mRNAs and 48 edges (Fig. 5). As shown in Fig. 6, the subnetwork of HCP5 consisted of 1 lncRNA, 23 miRNAs, 53 mRNAs and 110 edges. As for HCG22, it interacted with 14 miRNAs and 34 mRNAs (Fig. 7).
To further understand the biological functions of the three hub lncRNAs, we performed GO functional enrichment analysis and KEGG pathway analysis for each hub-associated subnetwork. The results of the functional enrichment analysis revealed 14 enriched GO terms and 11 enriched pathway terms in the AGAP11-associated subnetwork (Fig.8A), while there were 15 enriched GO terms and 6 enriched KEGG terms in the HCP5-associated subnetwork (Fig. 8B). Regarding the HCG22-associated subnetwork, there were 10 differentially enriched GO terms and 10 enriched KEGG pathways (Fig. 8C).
3.7. Differentially expressed RNAs in TCGA data
A total of 236 oral cancer samples and 44 normal samples were obtained after data preprocessing. To enhance the data reliability, genes with high expression values (data values were more than the third quartile) were filtered for further analysis. However, lncRNA AGAP11 was excluded. We identified 193 differentially expressed RNAs: 99 genes were downregulated and 94 were upregulated. The full list of differentially expressed RNAs was shown in Additional file 6.
3.8. Functional enrichment analysis
GO analysis of the up-regulated genes revealed that 17 enriched clusters were associated with biological processes (BP), 6 with cellular components (CC), and 9 with molecular function (MF) (Additional file 7). Among these terms, the top three enriched biological process were cell-cell adhesion, extracellular region and water transporter activity. Functional enrichment analysis of the down-regulated genes showed that 84 enriched clusters were associated with BP, 5 with CC, and 9 with MF (Additional file 8). Among them, 26 genes were enriched in sequence-specific DNA binding and represented the lowest FDR.
3.9. Survival analysis
Survival analysis was estimated based on Kaplan-Meier curve analysis. The difference was statistically significant with log-rank P<0.05. As a result, HCG22 was significantly correlated with reduced survival time in patients with oral cancer (Fig.9B), while HCP5 showed no significant correlation with overall survival in oral cancer(Fig.9A).