Identification of the signature for distinguishing ADC from SCC
Figure 1 describes the flowchart of this study. First, from the 20,283 genes detected in the GSE30219 dataset (Table 1), we extracted 10,474 DE genes between the 85 pADC samples and the 14 normal controls, and 14,533 DE genes between the 61 pSCC samples and the 14 normal controls (SAM, FDR < 0.05). Interestingly, we found 295 genes that were DE genes in both the pADC and pSCC samples but with opposite dysregulated directions in the two types of samples when compared with the normal controls, and defined them as the subtype-opposite genes. Similarly, from the 20,283 genes detected in the GSE18842 dataset (Table 1), we extracted 9,281 DE genes for the 14 pADC samples and 13,141 DE genes for the 31 pSCC samples when compared to the 45 normal controls (SAM, FDR < 0.05). And, 481 subtype-opposite genes were identified in this dataset. Notably, all the 148 overlapped subtype-opposite genes between the two datasets had consistently dysregulated directions in both pADC and pSCC samples, compared with the normal controls, respectively. Given that a dataset may usually capture only a part of all DE genes due to insufficient statistical power [27, 28], we integrated together the subtype-opposite genes extracted from the two datasets, excluding the 133 genes that were subtype-opposite genes in one dataset but had inconsistent dysregulation directions (without statistical control) in the other dataset. Finally, we obtained 495 subtype-opposite genes to develop the qualitative transcriptional signature for distinguishing ADC from SCC. Then, we utilized the subtype-opposite genes to develop a qualitative transcriptional signature for distinguishing ADC from SCC. In the training data integrated from two microarray datasets (GSE30219 and GSE18842), including 99 pADC samples and 92 pSCC samples, from 122,265 gene pairs consisting of the subtype-opposite genes, we extracted 61,602 gene pairs with potentially subtype-opposite REO patterns (Ea > Eb in pSCC or equally Eb > Ea in pADC) occurring significantly more frequently in pSCC samples than in pADC samples (Fisher’s exact test, FDR < 0.05). Next, for each subtype-opposite gene pair, we calculated the apparent accuracy of the gene pair for distinguishing ADC from SCC in the training data, as the pathological assessments are not 100% reliable [29]. Finally, using each of the top 50 subtype-opposite gene pairs (Table S2) as a seed, we performed a forward selection procedure and obtained 50 optimal sets of gene pairs (see Methods), among which two sets reached the highest apparent accuracy (98.43%). One set contained only one gene pair, KRT5 and AGR2, as the addition of any other gene pair did not increase the apparent accuracy. The other set, consisting of two gene pairs, also contained the gene pair (KRT5 and AGR2), indicating that this gene pair had the optimal performance. Therefore, the gene pair, KRT5 and AGR2, was selected as the signature for distinguishing ADC from SCC. The classification rule of the signature is that a sample was classified as SCC if the mRNA expression level of KRT5 was higher than that of AGR2; otherwise ADC. According to the classification rule, two of the 61 pSCC samples in the GSE30219 dataset and one of the 31 pSCC samples in the GSE18842 dataset were reclassified as ADC and all the 99 pADC samples in the two datasets were confirmed by the signature.
Krt5 and Agr2 proteins immunostaining in pADC and pSCC
Immunohistochemical analysis of the Krt5 and Agr2 proteins was performed for 96 pADC samples and 80 pSCC samples, derived from Anenabio, Xi'an, China. The IHC results for Krt5 and Agr2 proteins are shown in Figure 2A. For the 96 pADC samples, Agr2 protein was highly expressed in 63 (65.63%) samples, while Krt5 protein was only highly expressed in 7 (7.29%) samples (Fig. 2B). In contrary, for the 80 pSCC samples, Krt5 protein was highly expressed in 43 (53.75%) samples, while Agr2 protein was only highly expressed in 8 (10.00%) samples (Fig. 2C). The results suggested that Krt5 protein was mainly expressed in pSCC samples, while Agr2 protein was mainly expressed in pADC samples. The representative IHC staining of Krt5 and Agr2 proteins in pADC and pSCC samples are represented in Figure 2D and 2E, respectively. The results provided the biological evidences of the signature in distinguishing ADC from SCC. However, the IHC analysis also showed that 6 (6.25%) pADC samples and 2 (2.50%) pSCC were highly expressed of both Krt5 and Agr2 proteins, and 12 (12.50%) pADC and 13 (16.25%) pSCC samples were low expressed of both Krt5 and Agr2 proteins, suggesting the limitation of IHC of immunomarkers in distinguishing ADC from SCC.
Validation of the signature
First, we tested the signature on two datasets (GSE19188 and E-MTAB-2435) with relative unambiguous NSCLC types, which were concordantly determined by two independent routine pathologists. In the GSE19188 dataset with 45 pADC and 27 pSCC samples, the apparent accuracy of the signature for pADC (sensitivity) was 93.33%, the apparent accuracy for pSCC (specificity) was 96.30%, and the overall apparent accuracy was 94.44% (Table 2). Similar, in the E-MTAB-2435 dataset, the apparent accuracy of the signature for 63 pSCC samples (specificity) was 98.41% (Table 2). Additionally, in the two test datasets, we also compared our signature with the other 49 optimal sets of gene pairs obtained from the training data, and found our signature (KRT5 and AGR2) had the optimal performance (Table S2), suggesting the robustness of our signature in distinguishing ADC from SCC.
Since the histological classification of NSCLC in the other test datasets were not mentioned whether they were confirmed by independent pathologists or performed additional detection, we calculated the apparent accuracy of the signature and performed several biological analyses to indirectly support the reclassification of our signature. Firstly, based on the knowledge that SCC patients suffer poorer prognoses than ADC patients [30], we evaluated the correctness of the reclassification by our signature through survival analyses. For this purpose, we integrated 7 datasets recording survival information of patients treated with curative surgery resection only, including 805 pADC samples and 125 pSCC samples. In the integrated dataset, the apparent sensitivity (pADC prediction) of the signature was 95.78% and the apparent specificity (pSCC prediction) was 88.00% (Table 2). Notably, the signature reclassified a total 34 (4.22%) pADC samples as SCC and a total 15 (12.00%) pSCC samples as ADC. The survival analyses showed that the 34 pADC patients reclassified as SCC had significantly shorter OS than the remained 771 signature-confirmed pADC patients (log-rank p = 0.0123, HR = 1.89, 95% CI = 1.14-3.14, Fig. 3A), whereas the 15 pSCC patients reclassified as ADC showed longer OS than the 110 signature-confirmed pSCC patients but without significantly difference (log-rank p = 0.5538, HR = 1.32, 95% CI = 0.52-3.34, Fig. 3B). Multivariate Cox analysis showed that the pADC patients reclassified as SCC also had significantly shorter OS than the signature-confirmed pADC patients (p = 0.0458, HR = 1.72, 95% CI = 1.01-2.93, Table 3), after adjusting for data centers and clinical parameters, including stage, age and gender. The multivariate results for data centers and clinical parameters are displayed in Table 3. Notably, the 144 SCC patients classified by our signature had significantly shorter OS than the 786 ADC patients classified by the signature (log-rank p= 0.0012, HR = 1.60, 95% CI = 1.20-2.12, Fig. 3C), which was more significant than the OS difference between the original pSCC and pADC groups (log-rank p= 0.0249, HR = 1.42, 95% CI = 1.04-1.93, Fig. 3D). The OS between the two histological subtypes classified by our signature remained significantly different (p = 0.0500, HR = 1.36, 95% CI = 1.00-1.85, Table 4) after adjusting for data centers and clinical parameters. Furthermore, in order to reduce the potential bias due to integration and truncation of survival time, we removed one dataset from the integrated data in turn and performed the survival analyses for each new integrated data. All the results showed that the OS differences between the two histological groups classified by our signature were more significant than that between the original histological groups (Fig. S1). The above results suggested that the signature could rectify some misclassifications by routine pathological assessment which confused the survival difference between the two histological subtypes. Besides, in the GSE50081 dataset with the highest reclassification rate (12.35%) in the integrated dataset, we analyzed the proliferative activities of the reclassified samples by calculating their proliferation scores. The results showed that the 15 pADC samples reclassified as SCC had significantly higher proliferation scores than the signature-confirmed pADC samples (Wilcoxon rank sum test, p = 0.0085, Fig. 3E), indicating that the pADC samples reclassified as SCC are more proliferative than the signature-confirmed pADC samples which may cause worse prognoses. While the 6 pSCC samples reclassified as ADC had lower proliferation scores than the signature-confirmed pSCC samples though the difference was not significant possibly due to the small sample size (Wilcoxon rank sum test, p = 0.1298, Fig. 3E). Next, we also performed differential expression analyses for the subtype-specific marker genes using the RankProd (RP) algorithm. The result showed that the 15 pADC samples reclassified as SCC had significantly increased mRNA expression of the two SCC marker genes (RP algorithm, KRT5: p < 0.0001; TP63: p < 0.0001, Fig. 3F), and decreased mRNA expressions of an ADC marker gene (RP algorithm, NAPSA: p < 0.0001, Fig. 3F) than the signature-confirmed pADC samples. In contrast, the 6 pSCC samples reclassified as ADC had significantly increased mRNA expression of an ADC marker gene (RP algorithm, NAPSA: p < 0.0001, Fig. 3F), and decreased mRNA expressions of the two SCC marker genes (RP algorithm, KRT5: p < 0.0001; TP63: p < 0.0001, Fig. 3F), respectively, when compared with the signature-confirmed pSCC samples. The mRNA expressions of three neuroendocrine marker genes were in very low level in all the samples. Finally, based on the mRNA expression measurements of top 1000 most variable genes in the GSE50081 dataset, the samples were optimally classified into two subgroups (k=2) using consensus clustering (Fig. S2). The clustering result showed that 10 of the 15 pADC samples reclassified as SCC were clustered with the signature-confirmed SCC samples and 5 of the 6 pSCC samples reclassified as ADC were clustered with the signature-confirmed ADC samples (Fig. S2). Similar clustering results were observed for top 2000 and 3000 most variable genes (Fig. S3, Fig. S4).
Our previous study has demonstrated that REOs of gene pairs were highly stable in FFPE specimens with partial RNA degradation [18]. Here, we applied the signature to the GSE44170 dataset derived from FFPE specimens, and found the apparent accuracy of the signature for 38 pSCC samples (specificity) was 92.11% (Table 2). The 3 (7.89%) reclassified as ADC samples had lower proliferation scores than the signature-confirmed pSCC samples though the difference was not significant possibly due to the small sample size (Wilcoxon rank sum test, p = 0.3193, Fig. 4A). Moreover, the 3 reclassified samples had (marginally) significantly decreased mRNA expression of KRT5 (RP algorithm, p = 0.0765, Fig. 4B) and TP63 (RP algorithm, p = 0.0033, Fig. 4B), respectively, than the signature-confirmed pSCC samples. Consensus clustering was not performed as the dataset contains one subtype.
In another research, we also demonstrated that the REOs of gene pairs were robust against tumor cell variations in mixed tumor specimens [19]. Thus, we applied the signature to mixed tumor samples with 10% ~ 100% tumor cells in the TCGA datasets. In the TCGA-ADC dataset, the apparent accuracy of the signature for 498 pADC samples (sensitivity) was 97.59% (Table 2), which were not significantly related with tumor cell proportions (Spearman's rank correlation, p = 0.9331). Here, the signature reclassified 12 (2.41%) pADC samples as SCC, whose proliferation scores were significantly higher than that of the signature-confirmed pADC samples (Wilcoxon rank sum test, p = 0.0211, Fig. 4C). And, these reclassified samples had significantly increased mRNA expressions of KRT5 (RP algorithm, p < 0.0001, Fig. 4D) and TP63 (RP algorithm, p < 0.0001, Fig. 4D), and decreased mRNA expression of NAPSA (RP algorithm, p = 0.0126, Fig. 4D), respectively, when compared with the signature-confirmed pADC samples. Similarly, in the TCGA-SCC dataset, the apparent accuracy of the signature for the 499 pSCC samples (specificity) was 83.57%, which were also not significantly related with tumor cell proportions (Spearman's rank correlation, p = 0.8886). The 82 pSCC samples were reclassified as ADC by the signature and their proliferation scores were significantly lower than the signature-confirmed pSCC samples (Wilcoxon rank sum test, p < 0.0001, Fig. 4E). Comparing with the signature-confirmed pSCC samples, the 82 reclassified samples had significantly increased mRNA expression of NAPSA (RP algorithm, p < 0.0001, Fig. 4F) and decreased mRNA expressions of KRT5 (RP algorithm, p < 0.0001, Fig. 4F) and TP63 (RP algorithm, p < 0.0001, Fig. 4F), respectively.
Previously, we have reported that the REOs of gene pairs were also robust to low-input RNA specimens. Thus, we next tested the performance of the signature in the GSE58661 dataset for small biopsy specimens, including 42 pADC and 36 pSCC samples. The results that the apparent sensitivity (pADC prediction) and specificity (pSCC prediction) were 95.24% and 88.89%, respectively (Table 2). The 4 (11.11%) pSCC samples reclassified as ADC had marginally lower proliferation scores than the signature-confirmed pSCC (Wilcoxon rank sum test, p = 0.0501, Fig. 5A). Although the 2 pADC samples reclassified as SCC had lower proliferation scores, they had significantly increased mRNA expression of KRT5 (RP algorithm, p < 0.0001, Fig. 5B) and TP63 (RP algorithm, p < 0.0001, Fig. 5B) than the signature-confirmed pADC samples, indicating the correctness of the reclassification for the 2 samples. In contrast, the 4 pSCC samples reclassified as ADC had significantly increased mRNA expression of NAPSA (RP algorithm, p < 0.0001, Fig. 5B), and decreased mRNA expressions of KRT5 (RP algorithm, p < 0.0001, Fig. 5B) and TP63 (RP algorithm, p < 0.0001, Fig. 5B), than the signature-confirmed pSCC samples. Additionally, consensus clustering based on the top 1000 most variable genes showed that all the 4 pSCC samples reclassified as ADC by the signature were clustered with the signature-confirmed ADC samples (Fig. S5). Similar clustering results were observed for top 2000 and 3000 most variable genes (Fig. S6, Fig. S7).
Finally, we also evaluated the performance of the signature for poorly differentiated specimens. In the GSE94601 dataset, the 19 and 4 poorly differentially samples initially assigned to the LCC subtype were reclassified to pADC and pSCC with the positive expressions of ADC markers (NAPSA/TTF1) and SCC markers (Krt5/P40), respectively. The apparent sensitivity (pADC prediction) and specificity (pSCC prediction) of the signature for poorly differentially samples were 100% and 50%, respectively (Table 2). The proliferation scores of the two pSCC reclassified as ADC samples were 8.82 and 8.83, respectively, which were similar with the median proliferation scores (8.89) of the 19 signature-confirmed pADC samples but lower than that (9.67 and 9.06) of the two signature-confirmed pSCC samples though the difference was not significant possibly due to the small sample size (Wilcoxon rank sum test, p=0.3333, Fig. 5C). Additionally, we also performed differential expression analysis for the 44 proliferation-related genes [31] and found 20 genes that were differentially expressed in the 2 reclassified samples (RP algorithm, p < 0.05), and all the 20 genes were down-regulated (Fig. 5D), when compared with the two signature-confirmed pSCC samples. The results provide tentative evidences for the correctness of the reclassification by our signature. Consensus clustering was not performed as the small sample size of pSCC.
Taken together, the above results indicated that the signature could accurately distinguish ADC from SCC, especially for the clinical challenging cases.