Co-expression gene modules form WGCNA
R package of “WGCNA” was used to construct co-expression network. All genes were included in cluster analysis to generate gene modules. We chose the power of β =6 (scale free R2 =0.92) as the soft thresholding to guarantee scale-free network (Fig.1A). The result of hierarchical clustering analysis is presented in Fig.1B. We merged some modules whose height is lower than 0.25, because these modules are not considered to be capable of acting as a standalone module. Finally, we identified 25 modules ranging from 845 genes (Turquoise) to 40 genes (White) in size (Fig.2A). Meanwhile, genes not classified into any one module were included in an insignificant module (Grey), and it not belongs to any other 25 modules. Interactions of the 25 co-expression modules were analyzed (Fig.2B), light color represents low overlap while darker red color indicates higher overlap.
Gene co-expression modules associated with clinic traits
Download clinic traits dataset from TCGA database. Associations between clinic traits we were interested in and modules were identity based on correlation between modules eigengene and the clinic traits (Fig.2C). We investigated correlation coefficient and P value between each module and each clinical feature to select modules we are interested in. As we chose P<0.01 as a significant threshold, results showed that there are 3 modules (pink, purple, turquoise) were positively correlated to both survival time and survival state at the same time. Furthermore, the dendrogram and heatmap was used to quantify groups of correlated eigengenes(Fig.3).
Functional enrichment analysis
GO enrichment analysis was conducted in 3 positively correlated modules (Fig.4). Genes in pink module mainly enriched in regulation of mRNA metabolic process, RNA splicing, nuclear speck, spliceosomal complex, catalytic step 2 spliceosome and pre-mRNA binding. Genes in purple module mainly enriched in nuclear chromosome segregation, DNA replication, chromosome segregation, mitotic nuclear division, condensed chromosome, chromosomal region, chromosome, centromeric region and DNA-dependent ATPase activity. Genes in turquoise module mainly enriched in membrane docking, organelle localization by membrane tethering, RNA splicing, golgi vesicle transport, nuclear speck, SNARE complex, SNAP receptor activity and spliceosomal complex.
Univariate Cox proportional hazards regression and LASSO regression analysis
For next step to establish a prognosis model to predict the outcome of ccRCC patients, the genes inside these 3 modules were included in univariate Cox proportional hazards regression. In order to improve the robustness and eliminate collinearity between genes as much as possible, we select LASSO regression method to process OS correlated genes data of three modules. When the partial likelihood deviance reached its minimum, remove the gene if its coefficient is 0. In pink module, the log (Lambda) reached its minimum of -3.45 (Fig.5A). In purple module, the log (Lambda) reached its minimum of -3.05 (Fig.5B). In turquoise module, the log (Lambda) reached its minimum of -3.175 (Fig.5C).
Construction of signature
The application of LASSO reduces the dimension of multivariable Cox regression analysis to an acceptable level. We merged all the correlated genes of 3 modules, and put them into multivariable Cox regression analysis. The regression coefficients from the multivariable Cox analysis were used to construct ScoreW and develop a prognostic model to predict OS. 8 genes (CAPRIN2, coefficient =1.16E-05; IFI44, coefficient = 1.33E-06; LTV1, coefficient =3.10E-06; ZNF320, coefficient =-3.85E-06; MTHFR, coefficient =-5.75E-06; XPOT, coefficient =2.02E-06; BCL3, coefficient =8.62E-07; PAX2, coefficient =7.89E-07) were included in the signature. The information of 8 genes was showed in Table 1. The K-M survival curves of 8 genes were plotted in Fig.6, the high expression of 6 genes (CAPRIN2, IFI44, LTV1, MTHFR, XPOT and BCL3) can significantly influence OS of ccRCC patients. We used the median score value as the cutoff value to classify patients into low or high risk. The ScoreW distribution, gene expression, survival status of each patient was ranked by the ScoreW values of prognostic signature (Fig.7A). K-M curve was plotted to indicate the correlation between ScoreW and survival time of patients with ccRCC(P<0.001, Fig.7B). The time-dependent ROC curve suggest a considerable effective performance of the signature for OS prediction (P<0.05, AUC=0.796, Fig.7C). Correlation between ScoreW and clinical traits was showed in Fig.7D. Results indicated that higher ScoreW often appeared in the advanced cancer stage, higher ScoreW means the worse status of patients.
We bring ScoreW as one element which was similar to other clinic traits (age, gender, histological grade, and pathological stage) in univariable and multivariable Cox analysis. The results showed that predictive ability of ScoreW has significance in univariable and multivariable Cox analysis for OS of ccRCC patients(Table 2). The results of univariable analysis indicate ScoreW was significantly associated with OS[HR=1.089(1.060-1.118), P<0.001, Fig.7E]. And the results of multivariable analysis indicate that the ScorW can be an independent predicting factor from other clinical elements[HR=1.054(1.032-1.076), P<0.001, Fig.7F]. The details of each element by time-dependent ROC analysis were presented in Fig.7G.
Only 6 genes (CAPRIN2, IFI44, LTV1, MTHFR, XPOT and BCL3) were significant with clinical pathological parameters (Fig.8). Results indicated CAPRIN2, IFI44, LTV1, XPOT and BCL3 were the risky genes for procession of ccRCC while MTHFR was a protective gene.