Constructing a co-expression network to identify key modules
We firstly included 33 ACC samples to perform WGCNA analysis after weeding out the outlier samples (Figure S1). According to the result, we regarded the beta (β) = 5 (scale free R2 = 0.89) as soft-thresholding for further adjacencies calculation (Figure S2). Eventually, totally 8 modules were identified after classifying genes into gene modules and merging modules as shown in Figure S3. Genes not including in any other significant modules were included in the grey module, which were removed from subsequent analysis. Among the 8 modules, the blue module was the most associated module with survival status positively (P = 5e-05, R2 = 0.65), meanwhile the red module was the most associated module negatively (P = 4e-06, R2 = -0.71) (Fig. 2A). As shown in Fig. 2B, the MS of the two modules were also higher than MS of any other module. The relationships between MM and GS in the blue module (cor = 0.64) and red module (cor = 0.74) were also significant suggested by Fig. 2C-D. Thus, we considered blue and red modules to be key modules in this study. As shown in Figure S4A-B, we also created a network heatmap and a classical MDS plot.
Genes in key modules associated biological pathways
According to GO biological processes analysis, genes in red module were enriched in 4 BPs including leukocyte migration, positive regulation of cell migration, regulation of G-protein coupled receptor protein signaling pathway, and chemokine-mediated signaling pathway (Fig. 3A). As for genes in blue module, totally they were enriched in 197 BPs (Table S1). The top 10 were organelle fission, nuclear division, chromosome segregation, nuclear chromosome segregation, mitotic nuclear division, sister chromatid segregation, mitotic sister chromatid segregation, microtubule cytoskeleton organization involved in mitosis, regulation of chromosome segregation, and mitotic spindle organization (Fig. 3B). When talking about KEGG pathway analysis, genes in blue module were enriched in only 3 KEGG pathways including cell cycle, oocyte meiosis, and p53 signaling pathway (Fig. 3C).
Survival associated gene identification
With the cut-off criterion (|cor.geneModuleMembership| > 0.8 and |cor.geneTraitSignificance| > 0.2), 107 genes (12 genes in red module and 95 genes in blue module) were identified in co-expression network in total. Then we performed OS and DFS analysis for these 107 genes by using GEPIA. According to the results, high expression levels of 95 genes were obviously associated with poor OS for ACCs. Meanwhile, 100 genes showed significant P value in DFS analysis. 93 genes were common among them, which were regarded as survival associated genes (Table S2).
Establishment of two multi-gene signatures for predicting OS
We identified 93 biomarkers significantly related to survival of ACCs by using WGCNA analysis. With the aim of setting up a multiple-gene signature to predict the prognosis of patients with ACC, we calculated the relative regression coefficient of each genes by using the Ridge, ELASTIC-NET, and LASSO methods. The most powerful prognostic markers of the 93 biomarkers were screened out in each model as shown in Fig. 4. After that, 11 prognostic models were constructed and the relative regression coefficients of the most powerful prognostic markers were shown in Table S3. Combine the relative expression levels of the mRNA in the classifier and corresponding LASSO coefficients, we first calculated the risk score of each sample in the internal validation set. Furthermore, we worked out the AUC based on the risk score (Table 2). Two models contained model 1 (AUC = 0.676; α = 0.1) and model 2 (AUC = 0.676; α = 0.6) were finally identified, which were thought as candidate best models.
Table 1
Clinical characteristics of patients with ACC in each study.
Characteristics | Training dataset | Internal validation dataset | Entire TCGA dataset | GSE76019 | GSE19750 | GSE76021 |
Number of patients | 52 | 27 | 79 | 34 | 44 | 29 |
Age(median, IQR) | 48.5(31, 58) | 52(36, 62) | 50(35, 60) | 0 | 52.5(41,58.5) | 0 |
≥ 65 | 7 | 4 | 11 | 0 | 6 | 0 |
< 65 | 45 | 23 | 68 | 0 | 34 | 0 |
NA | 0 | 0 | 1 | 34 | 4 | 29 |
Gender | | | | | | |
Male | 19 | 12 | 31 | 0 | 17 | 0 |
Female | 33 | 15 | 48 | 0 | 26 | 0 |
NA | 0 | 0 | 0 | 34 | 1 | 29 |
Stage | | | | | | |
I | 5 | 4 | 9 | 10 | 7 | 12 |
II | 23 | 14 | 37 | 7 | 9 | 5 |
III | 9 | 7 | 16 | 10 | 4 | 10 |
IV | 14 | 1 | 15 | 7 | 8 | 2 |
NA | 1 | 1 | 2 | 0 | 16 | 0 |
Grade | | | | | | |
1 | 0 | 0 | 0 | 0 | 7 | 0 |
2 | 0 | 0 | 0 | 0 | 9 | 0 |
3 | 0 | 0 | 0 | 0 | 4 | 0 |
4 | 0 | 0 | 0 | 0 | 8 | 0 |
NA | 52 | 27 | 79 | 34 | 16 | 29 |
ACC: adrenocortical carcinoma; IQR: interquartile range; NA: not available. |
Table 2
AUC of risk scores calculated by the Ridge (α = 0), ELASTIC-NET (withαvarying from 0.1 to 0.9), and least absolute, shrinkage, and selection operator (LASSO;α = 1) methods by using internal validation set.
Models | Alive vs Dead |
AUC | 95% CI |
Ridge (α = 0) | 0.670 | 0.445–0.896 |
ELASTIC-NET (α = 0.1) | 0.676 | 0.447–0.905 |
ELASTIC-NET (α = 0.2) | 0.670 | 0.440–0.901 |
ELASTIC-NET (α = 0.3) | 0.636 | 0.404–0.869 |
ELASTIC-NET (α = 0.4) | 0.574 | 0.347–0.801 |
ELASTIC-NET (α = 0.5) | 0.574 | 0.349–0.798 |
ELASTIC-NET (α = 0.6) | 0.676 | 0.457–0.895 |
ELASTIC-NET (α = 0.7) | 0.563 | 0.333–0.792 |
ELASTIC-NET (α = 0.8) | 0.551 | 0.323–0.780 |
ELASTIC-NET (α = 0.9) | 0.551 | 0.326–0.776 |
LASSO (α = 1) | 0.534 | 0.308–0.760 |
In model 1 and model 2, ACCs in training set were further split into two groups (low-risk, and high-risk) according to median-risk score (5.0406 in model 1 and 16.5329 in model 2). In both of the two models, high-risk group was associated with poor OS of patients with ACC (Model 1: P < 0.0001, Fig. 5A; model 2: P < 0.0001, Fig. 5D). We also performed time-dependent ROC analysis. The results suggested that the AUC values of model 1 in the training set were 0.95 at 1 year, 0.98 at 3 years and 0.96 at 5 years (Fig. 5A). Meanwhile the AUC values of model 2 in the training set were 0.92 at 1 year, 0.91 at 3 years and 0.95 at 5 years (Fig. 5D).
Validation of model 1 and model 2
In the present study, we used 3 validation sets (inter validation set, GSE19750, and GSE76021) to validate the results we got from training set. With the same method we mentioned in training set, we calculated the risk score for each sample in these three validation sets based on model 1 and model 2. Samples in internal validation set, GSE19750, and GSE76021 were split into two groups relying on the median-risk score in each set. In model 1, according to the survival analysis, the high-risk group had lower survival rate compared with the low-risk group in all the validation sets (internal validation set: P = 3E-04; GSE19750: P = 0.011; GSE76021: P = 0.012) as the training set suggested (Fig. 5B, Fig. 6A-B). As for the results of time-dependent ROC analysis, the prognostic accuracy of model 1 in the internal validation set was 0.73 at 1 year, 0.82 at 3 years and 0.77 at 5 years (Fig. 5B); the prognostic accuracy of model 1 in GSE19750 was 0.61 at 1 year, 0.88 at 3 years and 0.91 at 5 years (Fig. 5A); the prognostic accuracy of model 1 in GSE76021 was 0.84 at 1 year, 0.78 at 3 years and 0.76 at 5 years (Fig. 6B). As for model 2, the high-risk group was associated with poorer survival compared with low-risk group (internal validation set: P = 0.018; GSE19750: P = 0.0035; GSE76021: P = 0.012; Fig. 5E, Fig. 6C-D). Moreover, the prognostic accuracy of model 2 in the internal validation set was 0.78 at 1 year, 0.74 at 3 years and 0.76 at 5 years (Fig. 5E); the prognostic accuracy of model 2 in GSE19750 was 0.60 at 1 year, 0.89 at 3 years and 0.89 at 5 years (Fig. 5C); the prognostic accuracy of model 2 in GSE76021 was 0.90 at 1 year, 0.78 at 3 years and 0.77 at 5 years (Fig. 6D). Both of the two models showed great prognostic values of patients with ACC.
Prognostic value of model 1 and model 2
Because we had identified two best models before, we respectively included the risk score of model 1 (or model 2) and important factors we mentioned before for the cox regression analysis. Risk score of model 1 (Hazard Ratio = 4.822, 95%CI of ratio: 2.636–8.821, P < 0.001), risk score of model 2 (Hazard Ratio = 4.465, 95%CI of ratio: 2.449–8.143, P < 0.001), tumor stage (Hazard Ratio = 6.097, 95%CI of ratio: 2.433–15.282, P < 0.001), and M stage (Hazard Ratio = 5.806, 95%CI of ratio: 2.471–13.640, P < 0.001) were influence features of OS as suggested by univariate Cox analysis (Fig. 7A-C). In model 1, the results of multivariate Cox analysis suggested that even being adjusted by other features, risk scores of model 1 (Hazard Ratio = 4.663, 95%CI of ratio: 2.238–9.713, P < 0.001) was still relevant to OS among patients with ACC as shown in Fig. 7B. In model 2, the results of multivariate Cox analysis suggested that even being adjusted by other features, risk scores of model 2 (Hazard Ratio = 5.603, 95%CI of ratio: 2.410-13.024, P < 0.001) and tumor stage (Hazard Ratio = 3.803, 95%CI of ratio: 1.223–11.822, P = 0.021) were still relevant to OS among patients with ACC (Fig. 7D).
Identification of model 1 and model 2 associated biological pathways
With the cut-off criteria we set before, only one risk score-related KEGG signaling pathways were enriched in model 1 including cell cycle (nominal P < 0.001, |ES| = 0.738, gene size = 123 and FDR = 0.903%) as shown in Table 3. In model 2, two risk score-related KEGG signaling pathways were significantly enriched including cell cycle (nominal P < 0.001, |ES| = 0.722, gene size = 123 and FDR = 0.884%) and oocyte meiosis (nominal P < 0.001, |ES| = 0.604, gene size = 111 and FDR = 0.913%; Table 3). Interestingly, these two pathways were consistent with the results of KEGG analysis of genes in blue module.
Table 3
Gene set enrichment analysis of model 1 and model 2 by using GSE76019.
Category | Term | Count | ES | P-Value | FDR |
Model 1 | Cell cycle | 123 | 0.738 | < 0.001 | 0.009025 |
Model 2 | Cell cycle | 123 | 0.722 | < 0.001 | 0.008841 |
Oocyte meiosis | 111 | 0.604 | < 0.001 | 0.009127 |
KEGG: Kyoto Encyclopedia of Genes and Genomes; ES: Enrichment score; FDR: False Discovery Rate |
Clinical utility of model 1 and model 2 and nomogram building
Both of the two candidate models showed great prognostic values of patients with ACC according to pervious analyses. In order to distinguish the best prognostic model, we further constructed two nomograms separately according to risk scores of model 1 (Fig. 8A) and model 2 (Fig. 9A) and other significant features in multivariate Cox analysis. Both of the two models showed good potential for clinical application. The model 1 related signature showed high potential except Threshold Probability (Pt) was approximately between 0.75 and 0.82 (Fig. 8E). Signature of model 2 also showed good value for clinical application when 0.00 < Pt < 0.72 or 0.90 < Pt < 1(Fig. 9E). But as for the result of calibration curves (Fig. 8B-D; Fig. 9B-D), the nomogram of model 2 showed better value compared with ideal model and model 1. Especially for nomogram’s 3-year and 5-year OS, the nomogram showed great performance almost as the ideal model did (Fig. 9C-D). Thus, we regarded model 2 (α = 0.6) as the best prognostic model in the present study. Model 2 contained 14 most powerful prognostic markers (ASPM (abnormal spindle microtubule assembly), AURKA (aurora kinase A), CCNB2 (cyclin B2), CDC20 (cell division cycle 20), CENPA (centromere protein A), EXO1 (exonuclease 1), FBXO5 (F-box protein 5), HJURP (Holliday junction recognition protein), KIF2C (kinesin family member 2C), MKI67 (marker of proliferation Ki-67), NUF2 (NUF2 component of NDC80 kinetochore complex), PARPBP (PARP1 binding protein), TACC3 (transforming acidic coiled-coil containing protein 3), and TROAP (trophinin associated protein)), which were selected for further analysis.
A summary of mutations and CNVs of genes in the best model
Among the 92 ACC patients with sequencing data, only ten independent samples existed mutations of genes in the best model (Table 4). As for genetical alteration of the 14 genes, MKI67 altered most (12%) and the main type was mRNA High (Fig. 10). Combined with the relative mRNA expression values of these genes, these genes seemed to highly express when there existed alteration of them (Fig. 10). Among the 90 ACC samples with CNV data, CNVs of the fourteen genes were constantly existed (Fig. 11A, Table 5). Among the fourteen genes, PARPBP was the most associated gene, which had the highest frequency of CNV events (77.78%, 70/90) (Fig. 11A, Table 5). Moreover, gain of copy number was the most common CNV event (62.32%, 344/552, Fig. 11B).
Table 4
Mutations of genes in model 2 in ACC patients from TCGA database.
ACC Sample ID | ASPM | AURKA | CDC20 | EXO1 | FBXO5 | HJURP | KIF2C | MKI67 | TACC3 |
TCGA-PK-A5HB | I1717* | | | | | | | | |
TCGA-OU-A5PI | Q2415H | | | | | | | | |
TCGA-OR-A5K4 | R3198C | | V361I | | | | | | |
TCGA-OR-A5JA | | V27M | | C695F | | | | D2208Y | D709Y |
TCGA-OR-A5JC | | | | | L127Ifs*6 | | | | |
TCGA-OR-A5L3 | | | | | | P459L | | | |
TCGA-OR-A5K9 | | | | | | | T575I | | |
TCGA-OR-A5JP | | | | | | | | R2786Q | |
TCGA-OR-A5LP | | | | | | | | T284N | |
TCGA-OR-A5JG | | | | | | | | P3145H | |
Note: ACC: adrenocortical carcinoma. |
Table 5
Different CNV patterns occur in ACC samples from TCGA database.
| Diploid | Deep deletion | Shallow deletion | Copy number gain | Amplification | CNV sum | Percentage |
ASPM | 63 | 0 | 13 | 13 | 1 | 27 | 30.00% |
AURKA | 37 | 0 | 3 | 49 | 1 | 53 | 58.89% |
CCNB2 | 56 | 0 | 22 | 11 | 1 | 34 | 37.78% |
CDC20 | 55 | 0 | 32 | 2 | 1 | 35 | 38.89% |
CENPA | 58 | 0 | 18 | 14 | 0 | 32 | 35.56% |
EXO1 | 59 | 0 | 15 | 16 | 0 | 31 | 34.44% |
FBXO5 | 54 | 0 | 19 | 16 | 1 | 36 | 40.00% |
HJURP | 59 | 2 | 16 | 13 | 0 | 31 | 34.44% |
KIF2C | 54 | 0 | 32 | 3 | 1 | 36 | 40.00% |
MKI67 | 52 | 0 | 10 | 28 | 0 | 38 | 42.22% |
NUF2 | 63 | 0 | 11 | 14 | 2 | 27 | 30.00% |
PARPBP | 20 | 1 | 6 | 63 | 0 | 70 | 77.78% |
TACC3 | 41 | 0 | 10 | 37 | 2 | 49 | 54.44% |
TROAP | 21 | 0 | 1 | 65 | 3 | 69 | 76.67% |
Relationship between alterations of genes in the best model and the clinical features
Based on the data from TCGA database, six clinical features (gender, age, pathologic stage, T stage, N stage, and M stage) were collected for this analysis. As the result suggested (Table 6), alterations of genes were significantly associated with pathologic stage (Chi-square = 10.644, P = 0.014), T stage (Chi-square = 11.008, P = 0.012), and M stage (Chi-square = 10.687, P = 0.001).
Table 6
Clinical pathological parameters of ACC patients with or without mutation/CNV of genes in model 2.
| | With mutation and/or CNV | Without mutation and CNV | Total number | Chi-square | P-value |
Age | ≥ 65 | 6 | 8 | 14 | 0.200 | 0.655 |
| < 65 | 24 | 50 | 74 | | |
Gender | Female | 19 | 40 | 59 | 0.284 | 0.594 |
| Male | 11 | 18 | 29 | | |
Pathologic stage | I | 3 | 6 | 9 | 10.644 | 0.014 |
| II | 10 | 33 | 43 | | |
| III | 5 | 13 | 18 | | |
| IV | 12 | 6 | 18 | | |
T stage | T1 | 3 | 6 | 9 | 11.008 | 0.012 |
| T2 | 11 | 37 | 48 | | |
| T3 | 3 | 8 | 11 | | |
| T4 | 13 | 7 | 20 | | |
N stage | N0 | 24 | 54 | 78 | 2.195 | 0.138 |
| N1 | 6 | 4 | 10 | | |
| Nx | 0 | 0 | 0 | | |
M stage | M0 | 18 | 52 | 70 | 10.687 | 0.001 |
| M1 | 12 | 6 | 18 | | |
| Mx | 0 | 0 | 0 | | |
Note: With mutation or CNV: Cases have mutant or CNV or mutant + CNV, confirmed through TCGA database. Without mutation and CNV: Cases with neither mutant nor CNV, confirmed through TCGA database. |
The effects of alterations in the fourteen genes on the relative mRNA expression levels
According to the result, copy number gains of ASPM (P < 0.0001), CCNB2 (P = 0.0437), CENPA (P < 0.0001), EXO1 (P = 0.0292), HJURP (P = 0.007), MKI67 (P = 0.0054), NUF2 (P = 0.0004), PARPBP (P = 0.0037), TACC3 (P < 0.0001), and TROAP (P = 0.0024) were significantly associated with higher mRNA expression comparing with these in copy number shallow deletions (or deep deletions, Fig. 12). As for AURKA (P = 0.0768) and FBXO5 (P = 0.1043), gains of copy number led to the trend of higher mRNA expression. There was no copy number gain in CDC20, but samples with shallow deletion were significantly lower expressed compared with the diploid ones (P = 0.0451). As for KIF2C, only one sample existed copy number gain, which perhaps meant that the result might not be convincible.
Correlation between CNVs of the fourteen genes and survival of patients with ACC
In this step, survival analysis was performed to explore the prognostic value of CNVs in the fourteen genes. According to the result, ACC patients with alterations in the fourteen genes had poor OS (P = 7.286E-4, Fig. 13A) and DFS (P = 0.0131, Fig. 13B). Unfortunately, there was no association between patients with CNV or without CNV and OS (P = 0.58, Fig. 13C). Then we separately explored the association between OS in each gene and CNVs. The result demonstrated that shallow deletions in ASPM (P = 0.0170, Fig. 13D), CENPA (P = 0.0071, Fig. 13H), EXO1 (P = 0.0069, Fig. 13I), HJURP (P = 0.0200, Fig. 13K), and NUF2 (P = 0.0450, Fig. 13N) led to better OS of patients with ACC compared these affected by copy number gains. While patients with shallow deletions in AURKA (P = 0.0150, Fig. 13E), MKI67 (P = 0.0039, Fig. 13M), PARPBP (P < 0.0001, Fig. 13O), and TACC3 (P = 0.0016, Fig. 13P) had poorer OS. As for MKI67, patients affected by CNVs (shallow deletion or gain) had poorer OS comparing with these affected by diploid (P = 0.0039, Fig. 13M).
Validation of the fourteen genes in the best model
According to the result, all the classifiers constructed and verified by TCGA-ACC data showed good performance because the accuracy of classification was more than 0.70 (Table 7). Especially the polynomial-SVM-based classifier, its accuracy was 0.9870. And we validated all the classifiers by using GSE19750, GSE76019, and GSE76021. All the classifiers showed good performance in GSE76019 and GSE76021 (accuracy of classification ≥ 0.50, Table 7). But in GSE19750, KNN-based classifier and all the SVM-based classifiers did not play well as we expected (Table 7). Perhaps because only 22 ACC samples from GSE19750 were included for validation. All in all, all the classifiers constructed based on the fourteen genes showed great performance to distinguish dead ACC samples from alive ACC samples, which meant all the fourteen genes were significantly associated with prognosis of patients with ACC.
Table 7
The accuracy of classification of LDA-based classifier, KNN-based classifier, linear-SVM-based classifier, polynomial-SVM-based classifier, RBF-SVM-based classifier, and sigmoid-kernel-SVM based classifier.
| | LDA | KNN | Kappa | Linear-SVM | Kappa | Polynomial-SVM | Kappa | RBF-SVM | Kappa | Sigmoid-kernel SVM | Kappa |
TCGA-ACC | Training set | 0.8701 | 0.8961 | 0.7463 | 0.8831 | 0.7111 | 0.9870 | 0.9694 | 0.9351 | 0.8469 | 0.8182 | 0.5664 |
GSE19750 | Testing set 1 | 0.6364 | 0.2727 | -0.0353 | 0.4091 | 0.0403 | 0.1818 | 0 | 0.2727 | 0.0435 | 0.4545 | 0.0704 |
GSE76019 | Testing set 2 | 0.7647 | 0.7059 | 0.2056 | 0.7059 | 0.3561 | 0.6471 | 0 | 0.7353 | 0.3319 | 0.7647 | 0.5211 |
GSE76021 | Testing set 3 | 0.6207 | 0.7241 | 0.7241 | 0.7931 | 0.5272 | 0.6897 | 0.2162 | 0.5862 | -0.0235 | 0.7931 | 0.5272 |
Note: LDA: Linear discriminant analysis; KNN: K-Nearest Neighbor; RBF: radial basis function; SVM: Support Vector Machine. |
Hub genes in the best model
All the fourteen genes from the best model were included in this PPI network (Figure S5A). As shown in Figure S5B, the degrees of six genes (ASPM, AURKA, CCNB2, CDC20, KIF2C, and NUF2) were higher than any other of the remaining nine genes (Degree = 13). Thus, the five genes were considered as hub genes in the best model. Expression of ASPM was significantly related to CCNB2, CDC20, FBXO5, HJURP, MKI67, NUF2, PARPBP, TACC3, and TROAP (Figure S5C). Among them, MKI67 was the most related gene (Figure S5D). Expression of AURKA was significantly associated with FBXO5, HJURP, MKI67, PARPBP, TACC3, and TROAP (Figure S5C). Moreover, PARPBP was associated with AURKA best (Figure S5D). Expression of CCNB2 was relevant to ASPM, CDC20, CENPA, FBXO5, HJURP, PARPBP (Figure S5C) and PARPBP was the most associated gene with CCNB2 (Figure S5D). Expression of CDC20 was associated with CCNB2, ASPM, FBXO5, HJURP, KIF2C, MKI67, PARPBP, TROAP (Figure S5C) and the most related gene was HJURP (Figure S5D). The expression of NUF2 was significantly related to HJURP, FBXO5, CENPA, CDC20, AURKA, and ASPM (Figure S5C). Among them, ASPM was the most associated gene (Figure S5D).