The construction of gene co-expression networks
The co-expression network built by the WGCNA package was defined as undirected and weighted gene networks. In this network, the node was associated with gene expression and the edge was related to pairwise correlations of gene expression. Moreover, the patient clinical information also corresponded to the network. After removing the obvious outliers, the samples from 2 microarrays were clustered based on the first 5000 genes with the smallest deviation and both results are shown in Figure S1. In order to emphasize high correlations and understate low correlations, the absolute value of the correlation to a power (soft thresholding) was raised. The soft threshold selected with the recommended index (R2 > 0.9) was 3 and 4, respectively, in the LSC and LAD dataset (Figure S2A&C). Figure S2B&D revealed the mean connectivity under different soft thresholds. For the LSC dataset, the mean connectivity was 34.6, and for the LAD dataset, the mean connectivity was 19.3. After determining the connectivity, the selected soft threshold was checked and the R2 was 0.92 and 0.97, respectively, which indicated that the selected soft threshold was available (Figure S3). Then, the expression profiles of selected genes from the LSC and LAD dataset were re-calculated and defined be module eigengenes of various modules. The parallel module was merged with 75% similarity. However, in our study, there was no module being merged. Figure 1A&B shows a cluster dendrogram and the merged module, in which each black line presents a gene and each color represents a module. The similarity of the 2 nodes in the commonality of the connected nodes is reflected by their topological overlap (Figure 1C&D).
Identification of clinical trait related modules
One of the advantages of WGCNA is that the classified module can be linked to the clinical trait. Clinical traits such as age, sex, smoking history, survival and tumor stage, were introduced in this investigation. The correlation coefficient of clinical traits and modules was shown in Figure 2. The smoking related module was regarded as the main research object in the next study. The turquoise module (correlation coefficient = 0.32, P = 7e-04) and purple module (correlation coefficient = -0.26, P = 0.008) from LAD were positively and negatively correlated, respectively, with smoking history, while in the LSC dataset, the purple module (correlation coefficient = 0.17, P = 0.048) and salmon module (correlation coefficient = -0.37, P = 1e-05) were positively and negatively correlated, respectively, with smoking history. In Figure 3, the module that had a positive or negative correlation with smoking as well as module genes are shown, where nodes with a deeper red color and a larger size indicate that the gene had more related genes, and edges with a deeper red color as well as a thicker line represents that the gene pairwise had a stronger correlation. The external cycle demonstrated that 20 genes which had more related genes and the more edges with other genes. Gang Liu et al. found that cigarette smoke could reduce the expression of immune regarding genes [20]. Hence, the association of smoking-related module with immune gene set was observed, which discovered that only the purple module in LSC had significantly correlation with the immune gene set (P = 0.0001). Nine immune genes (IGHD, IGHM, IGKC, IGLL3P, GUSBP11, MZB1, CD27, CD38 and CD79A) in purple module had been pointed out with red arrows. The large majority of nine immune genes from smoking module (purple module) are involved in the regulatory of B cells memory (7/9) and B cell naïve (6/9) (Table S1). Moreover, the modules that had a positive correlation with age, sex, survival and tumor stage and module genes are also shown (Figure S4).
KEGG pathways and GO enrichment analysis
In order to know more information about each module and to contribute to screen hub genes, KEGG pathways and GO enrichment analysis were performed. Figure 4&S5 exhibit the pathways and GO function enriched by the smoking corresponding modules of LSC and LAD, in which the same color indicates pathways or functions involved in the same group, and the size of the nodes reflects the enrichment significance of the terms. Most of immune gene (IGHD, IGHM, IGKC, MZB1, CD27, CD38 and CD79A) mentioned above had been pointed out by a red arrow in the network. And they were all involved in positive regulation of B cell activation function (Figure 4B). Besides, pathways and functions enriched in LSC and LAD could also confirm the result mentioned above. In the LSC samples, there were functions and pathways regarding B cell being enriched, such as positive regulation of B cell activation and B cell receptor signaling pathway (Figure 4B&C), while in LAD, no immune related pathways and functions were presented (Figure S5). Moreover, the enriched pathways and functions of other clinical trait related modules are shown in Figure S6-S7.
Prognostic analysis of the immune genes
The correlation coefficient of nine immune gene expressions was calculated and the result showed that the expression of them had high association with the others (Figure 5). The prognosis value of nine immune genes corresponding to smoking (CD27, CD38, CD79A, MZB1, IGHM, IGKC, IGLL3P, GUSBP11, and IGHD) was estimated, which revealed that high expression of them had better overall survival rate (Figure 6, P = 0.0076, 0.015, 0.0059, 0.042, 0.013, 0.04, 0.019, 0.0099, and 0.0056, respectively). The confidence interval of each gene was shown in Table 1. To confirm the result, overall survival analysis of nine genes in another microarray was performed. High expressions of two third genes (CD27, CD38, GUSBP11, IGHD, IGKC, and IGLL3P) were still present significant correlation with overall survival with P value as 0.041, 0.04, 0.032, 0.043, 0.047 and 0.048, respectively (Figure 7) and the survival curves of the remaining genes with no significant P-value were shown in Figure S8.
The prognosis related miRNA-mRNA pair
To explain the mechanism of the immune genes involved in prognostic outcomes, the miRNA cluster related smoking exposure was obtained. The result of WGNCA analysis showed that miRNAs in turquoise module was highly associated with tobacco smoking (P = 0.012, correlation coefficient = 0.46, Figure 8A). The targeted miRNAs of nine prognosis related immune gene was predicted on miRanda and shown in Figure 8B. The binding site between miR-29a and IGHD was presented in Figure 8C. The patients with low expression of miR-29a had better OS (Figure 8D, logrank P = 0.018), while based on TCGA dataset, people with miR-29a-5p low expression also present better OS (Figure 8G, logrank P = 0.022, HR = 0.67). Given that miRNA and mRNA had inhibitory regulatory relationship, the targeted miRNA-mRNA pairs had opposite expression in tumor. After observing, only IGHD and miR-29a-5p in SCLC tumor had inverse expression (Figure 8D-8E). Further, IGHD related plasma cell was also associated with OS, which indicated low infiltration of plasma predicted better OS (Figure 8H, logrank P = 0.03, HR = 0.73).