Differential expression analysis of PAX gene family
Transcriptome data of 496 LUSC and 49 adjacent normal tissues were acquired from TCGA database. Since 8 patients missed prognostic data, a total of 488 cases were included. And 2 GEO datasets (GSE21933 and GSE33479) of LUSC and adjacent normal tissues were downloaded subsequently.
Compared with normal lung tissue, PAX gene family, except PAX4/5, shown significantly differential expression in LUSC in TCGA dataset(Fig.1). As shown in Fig.2, PAX2/6/9 were observed with differential expression between LUSC and normal lung tissue in GSE21933 dataset. According to Fig.3, PAX1/2/9 expressed significantly different between LUSC and normal lung tissue in GSE33478. Comprehensive evaluation of these results preliminarily indicated that PAX2/9 emerged to a consistent transcriptome tendency among TCGA, GSE21933 and GSE33479 datasets.
Diagnostic efficiency of PAX gene family in LUSC
The outcomes of ROC curve exhibited in Fig. 4, PAX3 (AUC=0.818, P<0.001) and PAX9 (AUC=0.900, P<0.001) performed higher diagnostic efficiency in TCGA dataset. As for GSE21933 dataset, PAX6 (AUC=0.940, P<0.001) and PAX9 (AUC = 0.960, P < 0.001) indicated superior predictive diagnostic accuracy (Fig.5). We further evaluated the diagnostic significance of PAX gene family in LUSC via GSE33479 dataset, only PAX9 (AUC = 0.833, P < 0.001) shown strong diagnostic ability. All datasets confirmed PAX9 a good concordance with the diagnostic prediction.
Correlation with clinicopathological factors
To explore correlation between PAX9 expression and clinicopathological factors, GSE73403 with 69 LUSC cases were also obtained from GEO database. We found no significant clinicopathological factors among age, gender, smoking degree, residual resection, TNM and pathologic stage in TCGA dataset (Table 1). And only high TNM stage shown significance between PAX9 expression in LUSC based on GSE73403 dataset (Table 2).
Table 1. correlation between PAX9 expression and clinicopathologic feature in TCGA
Variables
|
Number of cases
|
PAX9 expression
|
X2
|
P value
|
low
|
high
|
Age(years)
|
|
|
|
|
|
<70
|
268
|
128
|
140
|
|
|
≥70
|
215
|
114
|
101
|
1.321
|
0.250
|
Missing
|
5
|
|
|
|
|
Gender
|
|
|
|
|
|
Female
|
126
|
63
|
63
|
|
|
Male
|
362
|
181
|
181
|
0.000
|
1.000
|
Missing
|
0
|
|
|
|
|
Smoking degree
|
|
|
|
|
|
low
|
231
|
118
|
113
|
|
|
high
|
245
|
118
|
127
|
0.405
|
0.524
|
Missing
|
12
|
|
|
|
|
residual resection
|
|
|
|
|
|
R0
|
390
|
203
|
187
|
|
|
R1+R2
|
37
|
15
|
22
|
1.792
|
0.181
|
Missing
|
61
|
|
|
|
|
TNM stage
|
|
|
|
|
|
Ⅰ+Ⅱ
|
327
|
167
|
160
|
|
|
Ⅲ+Ⅳ
|
88
|
42
|
46
|
0.310
|
0.578
|
Missing
|
73
|
|
|
|
|
Pathologic stage
|
|
|
|
|
|
1+2
|
394
|
201
|
193
|
|
|
3+4
|
90
|
41
|
49
|
0.874
|
0.350
|
Missing
|
4
|
|
|
|
|
Table 2. correlation between PAX9 expression and clinicopathologic feature in GSE73403 dataset.
Variables
|
Number of cases
|
PAX9 expression
|
X2
|
P value
|
low
|
high
|
Age
|
|
|
|
|
|
<60
|
35
|
20
|
15
|
|
|
≥60
|
34
|
14
|
20
|
1.759
|
0.185
|
Gender
|
|
|
|
|
|
Femal
|
4
|
1
|
3
|
|
|
Male
|
65
|
33
|
32
|
1.001
|
0.317
|
Pathologic stage
|
|
|
|
|
|
Low
|
46
|
25
|
21
|
|
|
High
|
23
|
9
|
14
|
1.421
|
0.233
|
Smoking
|
|
|
|
|
|
No
|
39
|
19
|
20
|
|
|
Yes
|
30
|
15
|
15
|
0.011
|
0.916
|
TNM stage
|
|
|
|
|
|
I/II
|
43
|
14
|
29
|
|
|
III/IV
|
26
|
20
|
6
|
11.457
|
0.001
|
Survival analysis of PAX gene family in LUSC
In Cox regression with 488 LUSC patients whose basic characteristics were variables, level of smoking, residual resection, TNM stage(Ⅲ+Ⅳ) and pathologic stage(Ⅲ+Ⅳ) were observed to be associated with overall survival (OS) in TCGA dataset (Table 3). These clinicopathological factors were then included into Cox proportional hazards model to adjust for the correlation between PAX gene expression and prognosis. According to Fig.7 and Table 4, PAX9 expression was associated with the OS of LUSC (P = 0.007 adjusted to P = 0.002), and PAX5 was associated with OS of LUSC only after adjusted in the Cox proportional hazards model (P=0.157 adjusted to P=0.023), while PAX4 shown no significance after adjustion (P=0.017 adjusted to P=0.080). In addition, combined effect survival analysis of PAX5 and PAX9 expression showed statistically significant difference between high-low risk group (P = 0.007, Fig. 7J adjusted to P = 0.001, Table 5).
Table 3. Basic characteristics of 488 LUSC patients
Variables
|
Patients
|
Overall survival
|
(n=488)
|
No. of events
|
MST (days)
|
HR (95% CI)
|
P
|
Age(years)
|
|
|
|
|
|
<70
|
268
|
107
|
1655
|
1
|
|
≥70
|
215
|
103
|
1645
|
1.168(0.847-1.610)
|
0.350
|
Missing
|
5
|
|
|
|
|
Gender
|
|
|
|
|
|
Female
|
126
|
49
|
1856
|
1
|
|
Male
|
362
|
161
|
1640
|
1.138(0.867-1.493)
|
0.343
|
Missing
|
0
|
|
|
|
|
Level of smoking
|
|
|
|
|
low
|
231
|
107
|
1874
|
1
|
|
high
|
245
|
100
|
1189
|
1.369(1.040-1.802)
|
0.024
|
Missing
|
12
|
|
|
|
|
Residual resection
|
|
|
|
|
R0
|
390
|
169
|
1695
|
1
|
|
R1+R2
|
37
|
16
|
1075
|
1.879(1.118-3.160)
|
0.016
|
Missing
|
61
|
|
|
|
|
TNM stage
|
|
|
|
|
|
Ⅰ+Ⅱ
|
327
|
135
|
1933
|
1
|
|
Ⅲ+Ⅳ
|
88
|
47
|
951
|
1.669(1.196-2.330)
|
0.002
|
Missing
|
73
|
|
|
|
|
Pathologic stage
|
|
|
|
|
Ⅰ+Ⅱ
|
394
|
159
|
1841
|
1
|
|
Ⅲ+Ⅳ
|
90
|
49
|
965
|
1.559(1.131-2.149)
|
0.006
|
Missing
|
4
|
|
|
|
|
Table 4. Prognostic significant of PAX genes expression in LUSC of TCGA database.
Gene expression
|
Patients (n=488)
|
Overall survival
|
NO. of event
|
MST (days)
|
Crude HR (95% CI)
|
Crude P
|
Adjusted HR (95% CI)
|
Adjusted P §
|
PAX1
|
|
|
|
|
|
|
|
Low
|
244
|
104
|
1655
|
|
|
|
|
High
|
244
|
106
|
1679
|
0.977(0.744-1.282)
|
0.867
|
1.003(0.733-1.373)
|
0.985
|
PAX2
|
|
|
|
|
|
|
|
Low
|
244
|
108
|
1485
|
|
|
|
|
High
|
244
|
102
|
1713
|
0.932(0.710-1.223)
|
0.611
|
1.057(0.733-1.445)
|
0.730
|
PAX3
|
|
|
|
|
|
|
|
Low
|
244
|
112
|
1485
|
|
|
|
|
High
|
244
|
98
|
1841
|
0.886(0.675-1.163)
|
0.383
|
0.869(0.656-1.223)
|
0.488
|
PAX4
|
|
|
|
|
|
|
|
Low
|
244
|
122
|
1154
|
|
|
|
|
High
|
244
|
88
|
1975
|
0.716(0.544-0.942)
|
0.017
|
0.755(0.551-1.034)
|
0.080
|
PAX5
|
|
|
|
|
|
|
|
Low
|
244
|
101
|
1984
|
|
|
|
|
High
|
244
|
109
|
1470
|
1.217(0.927-1.598)
|
0.157
|
1.444(1.052-1.981)
|
0.023
|
PAX6
|
|
|
|
|
|
|
|
Low
|
244
|
102
|
1470
|
|
|
|
|
High
|
244
|
108
|
1713
|
0.965(0.736-1.267)
|
0.799
|
1.056(0.771-1.446)
|
0.735
|
PAX7
|
|
|
|
|
|
|
|
Low
|
244
|
97
|
1915
|
|
|
|
|
High
|
244
|
113
|
1426
|
1.197(0.911-1.571)
|
0.196
|
1.229(0.900-1.680)
|
0.195
|
PAX8
|
|
|
|
|
|
|
|
Low
|
244
|
109
|
1841
|
|
|
|
|
High
|
244
|
101
|
1640
|
1.147(0.874-1.506)
|
0.322
|
1.107(0.809-1.514)
|
0.525
|
PAX9
|
|
|
|
|
|
|
|
Low
|
244
|
119
|
1315
|
|
|
|
|
High
|
244
|
91
|
2160
|
0.686(0.521-0.902)
|
0.007
|
0.606(0.440-0.835)
|
0.002
|
Table 5. Combined effect analysis of PAX5 and PAX9 expression in HCC patients RFS.
Group
|
PAX5
|
PAX9
|
Patients
|
NO. of event
|
MST (months)
|
Crude HR (95% CI)
|
Crude P
|
Adjusted HR (95% CI)
|
Adjusted P §
|
A
|
Low
|
High
|
137
|
54
|
2803
|
1
|
|
1
|
|
B
|
Low
|
Low
|
107
|
47
|
1426
|
1.239(0.835-1.840)
|
0.288
|
1.405(0.905-2.182)
|
0.13
|
C
|
High
|
High
|
107
|
37
|
1841
|
0.990(0.650-1.506)
|
0.962
|
1.139(0.717-1.809)
|
0.581
|
D
|
High
|
Low
|
137
|
72
|
1656
|
1.631(1.144-2.326)
|
0.007
|
1.985(1.340-2.940)
|
0.001
|
Further validation was conducted using GSE73403 dataset (PAX2 data missed), high pathologic stage and level of smoking were found to be associated with OS of LUSC (Table 6). These clinicopathological factors were then included in Cox proportional hazards model to adjust for the relationship between PAX9 expression and prognosis. PAX9 expression was associated with OS of LUSC(P=0.011, Fig 8I adjusted to P=0.001, Table 7).
Table 6. Basic characteristics of 69 LUSC patients in GSE73403.
Variables
|
Patients
|
overall survival
|
(n=69)
|
No. of events
|
MST (years)
|
HR (95% CI)
|
P
|
Age
|
|
|
|
|
|
<60
|
35
|
14
|
NA
|
1
|
|
>=60
|
34
|
14
|
3.830
|
0.902(0.429-1.895)
|
0.783
|
Gender
|
|
|
|
|
|
Femal
|
4
|
0
|
NA
|
1
|
|
Male
|
65
|
28
|
NA
|
22.017(0.010-42.839)
|
0.225
|
Pathologic stage
|
|
|
|
|
Low
|
46
|
15
|
5.080
|
1
|
|
High
|
23
|
13
|
2.420
|
2.386(1.114-5.111)
|
0.020
|
Smoking
|
|
|
|
|
|
No
|
39
|
11
|
NA
|
1
|
|
Yes
|
30
|
17
|
2.420
|
2.463(1.152-5.265)
|
0.015
|
TNM stage
|
|
|
|
|
Low
|
43
|
17
|
3.830
|
1
|
|
High
|
26
|
11
|
NA
|
1.179(0.550-2.528)
|
0.675
|
Table 7. Prognostic significant of PAX genes expression in LUSC of GSE73403.
Gene expression
|
Patients (n=69)
|
Overall survival
|
|
NO. of event
|
MST (years)
|
Crude HR (95% CI)
|
Crude P
|
Adjusted HR (95% CI)
|
Adjusted P §
|
|
|
PAX1
|
|
|
|
|
|
|
|
|
Low
|
34
|
15
|
5.080
|
1.000
|
|
|
|
|
High
|
35
|
13
|
3.500
|
0.970(0.461-2.040)
|
0.935
|
1.068(0.490-2.328
|
0.868
|
|
PAX3
|
|
|
|
|
|
|
|
|
Low
|
34
|
14
|
3.830
|
1.000
|
|
|
|
|
High
|
35
|
14
|
3.500
|
1.481(0.698-3.141)
|
0.299
|
2.081(0.918-40715)
|
0.079
|
|
PAX4
|
|
|
|
|
|
|
|
|
Low
|
34
|
17
|
5.080
|
1.000
|
|
|
|
|
High
|
35
|
11
|
3.830
|
0.980(0.453-2.121)
|
0.959
|
0.812(0.354-1.863)
|
0.623
|
|
PAX5
|
|
|
|
|
|
|
|
|
Low
|
34
|
17
|
3.170
|
1.000
|
|
|
|
|
High
|
35
|
11
|
NA
|
0.603(0.280-1.299)
|
0.188
|
0.485(0.213-1.104)
|
0.085
|
|
PAX6
|
|
|
|
|
|
|
|
|
Low
|
34
|
15
|
5.080
|
1.000
|
|
|
|
|
High
|
35
|
13
|
3.830
|
1.216(0.576-2.567)
|
0.605
|
1.270(0.599-2.690)
|
0.533
|
|
PAX7
|
|
|
|
|
|
|
|
|
Low
|
34
|
17
|
3.830
|
1.000
|
|
|
|
|
High
|
35
|
11
|
5.080
|
0.976(0.454-2.097)
|
0.949
|
0.916(0.421-1.993)
|
0.825
|
|
PAX8
|
|
|
|
|
|
|
|
|
Low
|
34
|
14
|
3.830
|
1.000
|
|
|
|
|
High
|
35
|
14
|
3.500
|
1.416(0.673-2.977)
|
0.352
|
1.486(0.697-3.168)
|
0.305
|
|
PAX9
|
|
|
|
|
|
|
|
|
Low
|
34
|
19
|
2.580
|
1.000
|
|
|
|
|
High
|
35
|
9
|
NA
|
0.370(0.166-0.824)
|
0.011
|
0.240(0.102-0.565)
|
0.001
|
|
Nomogram
Nomograms were plotted to further evaluate the predictive significance of each variables, including PAX5 and PAX9 expression, smoking degree, residual resection, pathologic stage and TNM stage in TCGA dataset (Fig. 9A). The accuracy of nomograms was evaluated using calibration curve (Fig. 9B-D). Based on different risk scores, patients were divided into high and low risk groups using “R” package (Fig. 9E-F). The heat map of PAX5/9 expression in TCGA LUSC dataset shown in Fig. 9G. According to Fig. 9G, time-dependent survival curve shown significant prognostic ability (P<0.001). The outcomes of ROC curve of 1-, 2-, 3-, 4- and 5-year survival in prognostic model exhibited in Fig. 9I.
GO and KEGG analysis of PAX gene family
We attempted to explore the role of PAX gene family played in LUSC. The details about GO and KEGG pathways enriched for PAX gene family were displayed in Table S1. The results indicated PAX gene family might play a crucial role in transcription, sequence-specific DNA binding and so on. Corresponding relationship between biological functions and PAX gene family was visualized by chord diagram (Figure 10A). The corresponding log (P value) of each biological function was shown in the bubble diagram (Figure 10B).
Genome‐wide and co‐expression analysis
The purpose of genome‐wide co‐expression analysis was conducted to further sought the capabilities of PAX9. We screened out the genes associated with PAX9 expression used bioinformatics tools. The expression profile of these genes in conjunction with PAX9 were shown at the heat map (Figure 11A), including KRT13, TMPRSS4, CYP2S1, SOX2, P2RY1, RGMA, GBP6, PITX1, FOXA1, FBXO34, etc. Genes are positively(red) or negatively(blue) correlated with the expression of PAX9. Meanwhile, the absolute value of all the correlation coefficients is greater than 0.40. GO and KEGG pathways enriched for PAX9 and co‐expressed genes was exhibited in chord diagram and bubble diagram (Figure 11B, C), including transcription factor activity, sequence specific DNA binding, positive regulation of transcription from RNA polymerase II promoter, etc. The details list shown in Table S2. Gene-gene interaction among PAX9 and related genes was displayed in network diagram (Figure 11D).
GSEA
Comprehensively analysis of the whole genome characteristics was carried out using the GSEA computing platform. To evaluate the PAX9 related biological processes, we divided patients into high and low groups according to the expression of PAX9 in LUSC tissues. Results indicated that PAX9 was mainly associated with MYC pathway, EIF pathway, MCM pathway, p53 regulation pathway, Wnt canonical pathway, IGF1-mTOR pathway, mTOR pathway, PITX2 pathway, SMAD2/3pathway, ERK pathway, E2F pathway, FOXO pathway, DNA repair, base excision repair, DNA replication, mismatch repair and so on. The representative GSEA results were listed in the supplementary material (Table S3).