Differential analysis of genes expression data
Removing batch effects and normalizing data, according to the differential expression analysis of COPD vs. healthy samples, 918 probes from SAE, 1942 probes from lung tissue, 134 probes from blood, 1074 probes from alveolar macrophages, and 5768 probes from sputum samples were identified as the differentially expressed genes (adjusted p-value<0.0001 or fold change>|2|) (Table 1).
Table 1. The summarized data indicating the primary, qualified, and differentially expressed probes in each biological sample.
Biological samples
|
GEO_ID
|
Platform
|
GOLD Stage (Number)
|
Samples
|
N. probes
|
N. differential probes
|
Lung tissue
|
GSE47460
|
GPL6480
GPL14550
|
Healhy=(108)
II=(100)
III=(37)
IV=(83)
|
328
|
42,545
|
1,942
|
Small Airway Epithelial cells (SAE)
|
GSE20257
|
GPL 570
|
Healthy=(112)
I=(9)
II=(12)
III=(2)
|
135
|
54,675
|
932
|
Blood samples
|
GSE54837
|
GPL570
|
I=(88)
II=(70)
III=(55)
IV=(13)
|
226
|
54,675
|
134
|
Macrophages
|
GSE13896
|
GPL6947
|
Healthy=(54)
I=(8)
II=(3)
|
65
|
54,675
|
1,074
|
Sputum
|
GSE22148
|
GPL570
|
II=(71)
III=(59)
IV=(13)
|
143
|
54,675
|
5,768
|
Gene selection and prediction
Using Elastic-net penalized logistic regression, the total number of 33 genes was associated with COPD progression with AUC, sensitivities, and specificities in each biological sample (Table 2). According to statistical comparisons of AUCs of selected genes in different biological samples, SAE cells and macrophages selected genes performed significantly better to predict the disease progression. However, the AUC related to the candidate genes in SAE samples was not different (p-value=0.478) compared to macrophages to predict the disease stage (Figure 1).
Functional enrichment classified the novel genes into five groups, including "Regulation of CoA-transferase activity", "Vacuole organization", "dendritic spine organization", and "Cell adhesion molecules". The expression level of candidate genes in each healthy and all COPD stages among all biological samples (Lung Tissue, SAE, Blood, Macrophage, and Sputum) was measured and graphed (Figure 2).
Table 2. Probes and corresponding 33 candidate genes by elastic-net penalized logistic regression model for the association between the genes with COPD progression.
|
Gene Symbol
|
Probe ID
|
Up/Down upregulated
|
Tissue
|
Epithelium
|
Blood
|
Macrophage
|
Sputum
|
|
CCR4
|
A_23_P72989
|
|
Up
|
100%
|
-
|
-
|
-
|
-
|
|
ITK
|
A_23_P354151
|
|
Up
|
84%
|
-
|
-
|
-
|
-
|
Tissue
|
RPUSD2
|
A_23_P309850
|
|
Down
|
82%
|
-
|
-
|
-
|
-
|
|
RAB11B
|
A_23_P67748
|
|
Down
|
78%
|
-
|
-
|
-
|
-
|
|
OXNAD1
|
A_24_P927189
|
|
Up
|
68%
|
-
|
-
|
-
|
-
|
|
GPR171
|
A_23_P253317
|
|
Up
|
62%
|
-
|
-
|
-
|
-
|
|
BTBD19
|
1557049_at
|
|
Up
|
-
|
100%
|
-
|
-
|
-
|
|
THSD4
|
222835_at
|
|
Down
|
-
|
96%
|
-
|
-
|
-
|
|
PPP4R4
|
233002_at
|
|
Down
|
-
|
95%
|
-
|
-
|
-
|
|
NRG1
|
206343_s_at
|
|
Up
|
-
|
90%
|
-
|
-
|
-
|
|
DNM3
|
209839_at
|
|
Up
|
-
|
89%
|
-
|
-
|
-
|
Epithelium
|
ITGA6
|
201656_at
|
|
Down
|
-
|
84%
|
-
|
-
|
-
|
|
CD109
|
226545_at
|
|
Down
|
-
|
77%
|
-
|
-
|
-
|
|
UHRF1
|
225655_at
|
|
Down
|
-
|
76%
|
-
|
-
|
-
|
|
CST6
|
206595_at
|
|
Down
|
-
|
75%
|
-
|
-
|
-
|
|
EPHB2
|
209589_s_at
|
|
Up
|
-
|
70%
|
-
|
-
|
-
|
|
CDKN2A
|
207039_at
|
|
Up
|
-
|
70%
|
-
|
-
|
-
|
|
KIAA1199
|
212942_s_at
|
|
Up
|
-
|
67%
|
-
|
-
|
-
|
|
RGS20
|
210138_at
|
|
Down
|
-
|
63%
|
-
|
-
|
-
|
|
SH3RF2
|
243582_at
|
|
Down
|
-
|
62%
|
-
|
-
|
-
|
|
MTHFSD
|
244734_at
|
|
Up
|
-
|
-
|
100%
|
-
|
-
|
|
CLEC7A
|
1554406_a_at
|
|
Up
|
-
|
-
|
89%
|
-
|
-
|
Blood
|
VCAN
|
211571_s_at
|
|
Up
|
-
|
-
|
66%
|
-
|
-
|
|
PTPN4
|
236935_at
|
|
Down
|
-
|
-
|
-
|
100%
|
-
|
|
CCDC37
|
242615_at
|
|
Up
|
-
|
-
|
-
|
77%
|
-
|
|
GABARAPL1
|
208869_s_at
|
|
Up
|
-
|
-
|
-
|
67%
|
-
|
Macrophage
|
ADAMTSL1
|
1552808_at
|
|
Down
|
-
|
-
|
-
|
64%
|
-
|
|
ATOH8
|
1558706_a_at
|
|
Down
|
-
|
-
|
-
|
64%
|
-
|
|
SSBP1
|
202591_s_at
|
|
Up
|
-
|
-
|
-
|
62%
|
-
|
|
SRPX
|
204955_at
|
|
Up
|
-
|
-
|
-
|
61%
|
-
|
|
CHRFAM7A/CHRNA7
|
210123_s_at
|
|
Up
|
-
|
-
|
-
|
-
|
100%
|
Sputum
|
HSPA4
|
208814_at
|
|
Down
|
-
|
-
|
-
|
-
|
63%
|
|
CADM1
|
232767_at
|
|
Up
|
-
|
-
|
-
|
-
|
62%
|
|
AUC (SD)
Sensitivity (SD)
Specificity (SD)
|
0.92 (0.035)
|
0.97 (0.039)
|
0.73 (0.051)
|
0.96 (0.052)
|
0.82 (0.064)
|
0.86 (0.073)
|
0.89 (0.138)
|
0.46 (0.079)
|
0.88 (0.222)
|
0.62 (0.122)
|
0.80 (0.097)
|
0.93 (0.057)
|
0.82 (0.023)
|
0.95 (0.064)
|
0.82 (0.044)
|
For plotting co-expression patterns of selected genes among the patients, heatmap with agglomerative hierarchical clustering were plotted (Figure 3). Co-expression pattern of the selected genes resulted in four major clusters in the COPD patients including (OXNAD1, CCR4, ITK, and GPR171), (ADAMTSL1, THSD4, PPP4R4, ITGA6), (BTBD19, EPHB2, CHRFAM7A, SSBP1, GABARAPL1, ATOH8, PTPN4, MTHFSD, CCDC37, NRG1, CADM1, CLEC7A, VCAN), and (KIAA1199, DNM3, SRPX, CDKN2A, RPUSD2, RAB11B, HSPA4, RGS20, SH3RF2, CST6, CD109, UHRF1) (Figure 4). Of these 33 genes, 24 have previously been reported in the literature to be associated with lung diseases, including COPD or other lung disorders (Table 3). THSD4, PPP4R4, CDKN2A, CADM1, and NRG1, which has previously been detected in GWAS studies to determine single nucleotide polymorphisms (SNPs) in COPD and asthma, were among the mentioned 24 genes (https://www.ebi.ac.uk/gwas/home) (38-40). However, we identified nine genes that have not been previously reported in COPD and other lung diseases, including RPUSD2, RAB11B, BTBD19, DNM3, SH3RF2, MTHFSD, ATOH8, SRPX, and HSPA4 (Table 3). These genes may represent novel potential biomarkers in the diagnosis and prognosis of COPD. The functional protein interaction network for the selected genes is illustrated in Figure 3, based on the STRING database (Figure 4).
Table 3. Confirmation of the association of 33 selected genes with COPD/or lung function by literature reviewing in PubMed databank.
Gene Symbol
|
Probe ID
|
Number of studies
|
Associated diseases
|
References (PMIDs)
|
CCR4
|
A_23_P72989
|
4
|
idiopathic pulmonary fibrosis, lung cancer, lung metastasis in breast cancer
|
11590382, 16095529, 17168792, 23915095
|
ITK
|
A_23_P354151
|
7
|
idiopathic pulmonary fibrosis, sarcoidosis, allergic lung disease
|
16630934, 15323564, 12734350, 1646075, 26628680, 25512530, 24089408,
|
RPUSD2
|
A_23_P309850
|
0
|
-
|
-
|
RAB11B
|
A_23_P67748
|
0
|
-
|
-
|
OXNAD1
|
A_24_P927189
|
1
|
lung cancer
|
24040438
|
GPR171
|
A_23_P253317
|
1
|
lung cancer
|
26760963
|
BTBD19
|
1557049_at
|
0
|
-
|
-
|
THSD4
|
222835_at
|
8
|
COPD, airway diseases, asthma, pulmonary fibrosis
|
27564456, 24286382, 23932459, 23541324, 23409998, 22461431, 21965014, 20010834
|
PPP4R4
|
233002_at
|
1
|
COPD
|
28170284
|
NRG1
|
206343_s_at
|
23
|
lung cancer, COPD
|
31382039, 30988082, 30694715, 30568455, 30268483, 30069312, 29959202, and etc.
|
DNM3
|
209839_at
|
0
|
-
|
-
|
ITGA6
|
201656_at
|
4
|
Lung Fibrosis, idiopathic pulmonary fibrosis, lung cancer, prostate, head and neck cancers
|
31396340, 30936924, 28701000, 27143927
|
CD109
|
226545_at
|
5
|
Lung cancer, prostate and breast carcinoma, squamous epithelium, tumour cells
|
29113239, 28191885, 24667143, 17922683, 15116102
|
UHRF1
|
225655_at
|
8
|
Lung cancer, human ovarian cancer tissues, gastric and breast cancers, metastasis in hepatocellular carcinoma, renal cell carcinoma
|
30528265, 30008828, 29516630, 28849055, 27437769, 26695082, 21351083, 20517312
|
CST6
|
206595_at
|
1
|
lung cancer
|
24398667
|
EPHB2
|
209589_s_at
|
2
|
Allergic Rhinitis and Asthma, lung diseas
|
28231727, 10037197
|
CDKN2A
|
207039_at
|
28
|
Lung cancer, COPD, head and neck cancer, malignant mesothelioma
|
30178167, 28487787, 27987577, and etc.
|
KIAA1199
|
212942_s_at
|
2
|
Non-small-cell lung cancer, cancer cells
|
30478628, 28901311
|
RGS20
|
210138_at
|
1
|
Lung cancer
|
29872324
|
SH3RF2
|
243582_at
|
0
|
-
|
-
|
MTHFSD
|
244734_at
|
0
|
-
|
-
|
CLEC7A
|
1554406_a_at
|
2
|
pulmonary fibrosis, lung inflammatory response
|
27852745, 27473664
|
VCAN
|
211571_s_at
|
7
|
Lung cancer, breast cancer, lung metastasis, asthma
|
27895126, 27581786, 27513329, 25044411, 21742797, 22392539, 23202429
|
PTPN4
|
236935_at
|
1
|
Non-small cell lung cancer (NSCLC)
|
26951513
|
CCDC37
|
242615_at
|
2
|
|
26200272, 22011669
|
GABARAPL1
|
208869_s_at
|
1
|
Non-small cell lung cancer (NSCLC)
|
26356813
|
ADAMTSL1
|
1552808_at
|
1
|
Lung cancer
|
29207642
|
ATOH8
|
1558706_a_at
|
0
|
-
|
-
|
SSBP1
|
202591_s_at
|
1
|
Lung cancer
|
28638454
|
SRPX
|
204955_at
|
0
|
-
|
-
|
CHRFAM7A/CHRNA7
|
210123_s_at
|
5
|
Non-small cell lung cancer (NSCLC), lung tumor, smoking-related lung cancers
|
25407004, 28283678, 28978081, 31096457, 30282908
|
HSPA4
|
208814_at
|
0
|
-
|
-
|
CADM1
|
232767_at
|
18
|
Lung cancer, lung tumor, lung epithelial cell apoptosis, lung fibroblasts, lung tumorigenesis
|
31069869, 23620770, 22429880, 22429880, and etc.
|
Causal pathway of selected candidate genes
Fitting the path diagram of selected genes (Figure 5), the genes in SAE, lung tissue, and sputum had more significant direct effects on COPD progression, respectively. In contrast, the identified genes in blood samples had less significant direct and indirect effects on COPD progression. Based on the magnitude of indirect path coefficient, the novel genes in macrophages, lung tissue, SAE cells, and sputum affected COPD progression significantly indirectly compared with blood samples (Table 4). All goodness of fit indices indicated that the model has an acceptable fit (RMSEA=0.059, P-value<0.05; SRMR=0.051).
Table 4. Direct, indirect, and total effects of selected genes in studied biological samples on COPD progression.