3.1 Laryngoscopy and contrast-enhanced CT
Laryngoscopy and enhanced CT showed the typical clinical features of 6 LSCC patients (Fig. 2).
3.2 Aberrant expression profiles of plasma exosomal miRNAs in LSCC patients
The next-generation sequencing technology revealed 136 DEMs between plasma exosomes from LSCC and control patients. Of those, 93 were upregulated and 43 were downregulated (|log2(foldchange) | >1, P<0.05) (Fig. 3a, b).
3.3 Identification of critical DEMs combined with TCGA database
A total of 150 LSCC samples and 14 normal samples were obtained from TCGA database, and 246 DEMs were obtained through differential analysis. After intersecting with plasma exosomal DEMs, 38 DEMs were identified (Fig. 3c, Supplementary Table 2), and their expression levels in plasma exosomal samples were shown in Fig. 3d.
3.4 Random Forest and ANN-based establishment of the diagnostic panel
Referring to previous literatures, we selected 8 miRNAs with consistent trend for subsequent analysis. To find the optimal parameters, the circular random forest classification of the 8 miRNAs was performed using the random forest algorithm, and the average error rate was calculated. Then, the panel with the best stability was selected for subsequent analysis. And 5 miRNAs with significance greater than 2 according to the Gini coefficient were identified as candidates (Fig. 3e). Of the 5 genes, hsa-miR-139-3p, hsa-miR-486-5p, hsa-miR-944 were most important, followed by hsa-miR-320b and hsa-miR-455-5p (Fig. 3f). In the heatmap (Fig. 3g), 5 candidate miRNAs could be used to distinguish LSCC from normal samples, among which, hsa-miR-139-3p and hsa-miR-486-5p were lowly expressed in LSCC and highly expressed in normal tissues, hsa-miR-944, hsa-miR-320b and hsa-miR-455-5p were highly expressed in LSCC tissues.
Then, the ANN panel was constructed based on candidate DEMs from random forest screening (Fig. 3h). The expression data of the 5 DEMs were first converted into a “gene score”, and then a neural network algorithm was used to calculate the weight value of each DEMs, and the panel classification score was calculated by “gene score” and “gene weight”, with the following formula: LSCC ANN panel score = ∑ (gene expression level × Gene weight). Next, the panel score of 154 samples was set as the predictive value, and the LSCC and normal samples were set as the outcome value, the classification ability of the panel was validated using the “pROC” package, and it was found that the AUC of the model was 0.782, suggesting that the model had an accurate predictive effect (Fig. 4a).
3.5 Validation of the diagnostic panel
To further verify the panel accuracy, the plasma exosomal samples and another dataset (38 LSCC samples and 5 normal samples from GSE79493, GSE106280 and GSE137308) were used to verify whether the panel had the same predictive ability in the independent dataset. Using the same method, “gene score” and “gene weight” of the 5 DEMs were calculated, and then the scores were calculated by ANN panel and verified by ROC curve. It was found that the AUC in plasma exosomal samples was 1.000 (Fig. 4b), the AUC in the dataset from GEO database was 0.716 (Fig. 4c), which further proved the validity and stability of the panel. In addition, we also verified the diagnostic value of individual miRNAs from the diagnostic panel, using data from TCGA as the train group, sequencing data from plasma exosomal samples as the test group and PCR results as the secondary validation. It was found that the AUC of hsa-miR-139-3p was 0.957 in the train group, 0.944 in the test group and 0.944 in the validation group (Fig. 4d), that of hsa-miR-486-5p was 0.853 in the train group, 1.000 in the test group and 1.000 in the validation group (Fig. 4e), that of hsa-miR-455-5p was 0.816 in the train group, 0.889 in the test group and 0.944 in the validation group (Fig. 4f), that of hsa-miR-944 was 0.713 in the train group, 0.750 in the test group and 0.889 in the validation group (Fig. 4g), and that of hsa-miR-320b was 0.847 in the train group, 0.667 in the test group and 0.667 in the validation group (Fig. 4h). In addition, we verified the expression of miRNAs through the CancerMIRNome database, and the results showed that the 5 miRNAs were differentially expressed in LSCC and normal tissues, which were consistent with the sequencing results (Fig. 4i-m).
3.6 Functional enrichment analysis of the DEMs
The DEMs were separately input into the online miRNA target gene prediction websites TargetScan, miRWalk, miRTarBase and miRDB for target gene prediction, and the candidate target genes were considered those predicted by three or more databases that intersected (207 target genes of hsa-miR-139-3p, 134 target genes of hsa-miR-320b, 389 target genes of hsa-miR-455-5p, 324 target genes of hsa-miR-486-5p and 1359 target genes of hsa-miR-944. Fig. 5a-e). In total, 2207 target genes were obtained after removing 206 duplicate genes jointly regulated by multiple miRNAs. The intersection of target mRNAs for upregulated miRNAs (hsa-miR-455-5p, hsa-miR-320b, hsa-miR-944) and downregulated mRNAs, and target mRNAs for down-regulated miRNAs (hsa-miR-139-3p, hsa-miR-486-5p) and upregulated mRNAs were taken. The results were performed on a total of 61 genes including 4 up-regulated genes, 57 down-regulated genes, respectively. The miRNA-mRNA regulatory network between the 5 miRNAs and their 61 target genes was shown in Fig. 5f.
To explore the function and pathway of target genes, we performed GO and KEGG pathway analyses. GO analysis showed that target genes were mainly enriched in negative regulation of protein phosphorylation, negative regulation of protein kinase activity, cell leading edge, distal axon, cadherin binding, enzyme inhibitor activity (Fig. 5g). KEGG analysis showed that target genes were mainly enriched in Complement and coagulation cascades, Pancreatic secretion, Peroxisome, Diabetic cardiomyopathy (Fig. 5h). In addition, PPI network demonstrated that ACTR3, ACTR2, YWHAZ, EIF3A, SYNCRIP were the core genes in the network (Fig. 5i). In addition, we found significantly differences in immune cell infiltration in patients with LSCC and healthy controls. Patients with LSCC had lower proportions of NK cells resting (P<0.001), Monocytes (P<0.001), Mast cells resting (P<0.001), and Eosinophils (P=0.007), and higher proportions of Plasma cells (P=0.039), T cells CD8 (P=0.02), T cells follicular helper (P=0.011), Macrophages M1 (P=0.014), Mast cells activated (P=0.015), and Neutrophils (P=0.029) than those in healthy controls (Fig. 5j).
3.7 Prognostic analysis of the DEMs
We examined the prognostic predictive value of individual miRNAs in the diagnostic panel for patients with LSCC, although capable of predicting, the results were less favorable (Supplementary Fig. 1). Then, we tried to explore the prognostic prediction ability by miRNA joint analysis. In the TCGA database, samples with unknown survival time were excluded, and the remaining 150 LSCC samples were randomly divided into training and test cohorts on average. We found no significant difference between the train cohort and the test cohort (Supplementary Table 2). Then, the LASSO Cox regression analysis was performed to construct the DEMs-based risk score model, and cross-validation was performed for the avoidance of overfitting to construct a 5-miRNA prognostic model based on the optimum λ value (Fig. 6a, b). The risk score was calculated with the established formula: risk score = (0.1528×hsa-miR-139-3p exp.) + (0.1133×hsa-miR-486-5p exp.) + (0.5263×hsa-miR-455-5p exp.) + (-0.4418×hsa-miR-320b exp.) + (0.2768×hsa-miR-944 exp.). The median risk score divided the 75 LSCC patients in the training cohort equally into high- and low- risk groups (Fig. 6c). Kaplan-Meier survival curves revealed that the low-risk group had a significantly higher survival probability than the high-risk group (P<0.001, Fig. 6d). Furthermore, ROC curve was used to check the accuracy of the model, and the AUC was calculated as 0.674 in 1 year, 0.792 in 3 years, and 0.813 in 5 years (Fig, 6e), indicating the reliability and feasibility of prognostic model. Patients in the high-risk group had a greater probability of death and a shorter lifespan than those in the low-risk group (Fig. 6f). PCA and t-SNE analysis confirmed that patients with different risks were sorted well into different categories (Fig. 6g, h).
The robustness of the prognostic model was performed by a test cohort containing 75 LSCC samples, which was divided into high- and low-risk groups according to the median risk score calculated by an established formula constructed from the train cohort, with 47 patients in the high-risk group and 28 patients in the low-risk group (Fig. 6i). The Kaplan-Meier survival curve showed the same trends (P=0.043. Fig. 6j), suggesting that the model had the same prognostic risk assessment ability for patients with LSCC. Furthermore, the AUC was calculated of 0.591 for 1 year, 0.667 for 3 years, and 0.603 for 5 years respectively (Fig. 6k). As the risk score increased, the survival time gradually decreased (Fig. 6l), and PCA and T-SNE analyses also obtained the same trend with the train cohort (Fig. 6m, n).
After that, the univariate and multivariate Cox regression analysis was also conducted to identify the independent factors associated with the survival status. The univariate Cox regression analysis showed that risk score was an independent prognostic factor for the survival of patients in the training cohort (P=0.002, HR=4.179, 95% CI: 1.666−10.478, Fig. 6o) and test cohort (P=0.042, HR=1.768, 95% CI: 1.020−3.065, Fig. 6p). The multivariate Cox regression analysis also indicated that risk score was a prognostic factor independent on other clinical factors for LSCC patients in the training cohort (P=0.003, HR=4.644, 95% CI: 1.692-12.748, Fig. 6q) and test cohort (P=0.014, HR=1.896, 95% CI: 1.140-3.152, Fig. 6r). Moreover, the univariate and multivariate Cox regression analyses revealed that gender and N staging was an independent prognostic factor of poor survival in patients with LSCC in the test cohort. (Fig. 6q, r).
3.8 Validation of 5 DEMs expression in an independent clinical cohort by qRT-PCR
An independent clinical cohort consisting of 48 LSCC and 36 NC was applied to examine the expression status of the screened DEMs by qRT-PCR. As expected, hsa-miR-139-3p and hsa-miR-486-5p were down-regulated in LSCC patients. By contrast, hsa-miR-320b, hsa-miR-455-5p and hsa-miR-944 were up-regulated in LSCC patients (Fig. 7a-e). To understand if plasma miRNAs from independent clinical cohort can give the same effect, we evaluated the diagnostic efficacy of the above selected 5 DEMs. Consistent with previous results, individual miRNA did not provide accurate diagnostic efficiency (Fig. 4d-h), and the ANN panel could increase the AUC to 0.875 (Fig. 7v), further demonstrating the stability of the ANN panel. After that, we tried to construct the logistic model to further test the diagnostic efficacy of the 5 DEMs (Fig. 7f-u), the AUC could reach 0.906 at the maximum when two miRNAs were combined (Fig. 7l, has-miR-139-3p + has-miR-486-5p). The AUC was further improved when increasing the number of miRNAs from the logistic model, and ranged from 0.923 to 0.949 for the logistic model containing four miRNAs (Fig. 7p-t), then the AUC reached 0.959 when the number of miRNAs increased to five (Fig. 7u). In summary, our diagnostic panel built with ANN was validated with logistic model and still achieved good diagnostic efficiency, all these 5 miRNAs provided promising AUC values for distinguishing LSCC patients from controls.