Data source and clinical information
We downloaded EA-related miRNA-seq, mRNA-seq expression profiles and clinical information from the TCGA database (https://cancergenome.nih.gov/). miRNA data were extracted from 87 tumor tissues and 13 normal esophageal tissues. We also downloaded the mRNA-seq expression profiles of 54 extra normal esophageal tissues as controls from the GTEx database (https://www.gtexportal.org/). Finally, the expression data of 19835 genes in 87 tumor samples and 67 normal samples were summarized as mRNA data.
Differential expression analysis and target gene prediction
Linear models were used to identify differentially expressed miRNAs/mRNAs in tumor/normal groups using the limma R package. [12] A false discovery rate (FDR)-adjusted P value of <0.05 combined with a simultaneous absolute value of >1 for logFC was set as the threshold for differentially expressed miRNA/mRNA identification.
The correlation between miRNAs and the overall survival (OS) of the patient was further estimated by a Cox regression analysis. The diagnostic and prognostic capability of significant miRNAs was measured by the area under the curve (AUC) of receiver operating characteristic (ROC) analysis.
Subsequently, 12 databases (Microt4, miRWalk, mir-bridge, miRanda, miRDB, miRMap, Pictar2, PITA, MiRNAMap, RNAhybrid, RNA22 and TargetScan) in miRWalk2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/) were employed to determine the target genes of miR-3648. The genes that could be found in no less than 3 databases were identified as the target genes of miR-3648.
Then, the cross-linked genes between the target genes of miR-3648 and the differentially expressed genes were determined as the differentially expressed target genes (DETGs) of miR-3648.
Tumor-infiltrating immune cell analysis
CIBERSORT (https://cibersort.stanford.edu/) and leucocyte signature matrix 22 (LM22) were used to quantify the proportions of different immune cell types in the ESCC samples from the TCGA database. [13] Normalized gene expression data were analyzed using the CIBERSORT algorithm by running 1000 permutations. The CIBERSORT p value reflects the statistical significance of the results, and a threshold less than 0.05 is recommended.
Weighted gene coexpression network analysis
The R package “weighted gene coexpression network analysis” (WGCNA) [14] was utilized to identify modules of highly correlated DETGs as previously described. [15] Briefly, the DETG expression matrix was used to define gene coexpression similarity based on Pearson’s correlation coefficient of paired genes, followed by an adjacency matrix transformed from coexpression similarity using the power function. Here, with a soft threshold β = 4 and with R2 > 0.9, the coexpression network distribution exhibited an essentially scale-free topology. Next, a topological overlap matrix (TOM) is built based on the adjacency matrix A. The TOM-based dissimilarity subsequently led to distinct modules defined as clusters of densely interconnected genes. A dynamic hybrid tree-cutting algorithm is further used to build a hierarchical clustering tree to divide modules.
The infiltration fractions of immune cells in tumor tissues and normal tissues were employed as clinical traits. Pearson’s correlation test was then used to evaluate the module correlations with clinical traits. Among these clinical traits, the genes of the most significant module were determined to be immune-related genes (IRGs) for subsequent analysis.
Identification of Key Prognostic Genes
To identify key prognostic genes, we conducted a Cox regression analysis to estimate the association between the expression of the genes of the most significant module and overall survival time of patients. Through this method, 4 genes for which P was <0.05 were identified as candidate prognostic genes.
Construction of prognostic signature
We selected genes with excellent prognostic performance and diagnostic capability to construct a prognostic signature. The risk score (RS) for each tumor sample was calculated using the following formula.
In the above formula, n represents the number of genes included in the prognostic signature. Expi represents the expression level of gene i. Coefi represents the Cox regression coefficient of gene i. The prognostic independence of the risk score under the interference of multiple clinical characteristics was analyzed through univariate and multivariate Cox regression analysis.
Establishment of nomogram
Then, the independent prognostic characteristics were included in the construction of the nomogram. Meanwhile, the consistency index of actual survival and predicted survival was employed to assess the survival prediction capability of the nomogram. The prediction results and observation results in the calibration curve were visualized to measure the prediction performance of the nomogram.
Statistical analysis
The Wilcoxon test was applied to estimate the difference between groups. ROC analysis and Kaplan–Meier (K-M) curve analysis were performed to evaluate the prognostic performance and diagnostic capacity. The construction of the nomogram employed the R package “rms v6.0-2”. All R program packages for statistical analysis were implemented on the “R v4.0.4” platform. P < 0.05 was considered statistically significant.