Sample and clinical data
The study included omics data and clinical information of 175 patients in total, including mutation results and CNV results of 175 samples, methylation expression profile data including 175 tumor samples and 10 control samples, and mRNA expression profile data including 175 of the tumor samples and 4 control samples, the miRNA expression profile data included 175 tumor samples and 4 control samples. The analysis process of our study is shown in Fig. 1.
Table1 TCGA pancreatic cancer patient characteristics
characteristics
|
Number(n)
|
Age(years)
<65
|
79
|
>=65
|
96
|
Gender
|
|
Male
|
96
|
Female
|
79
|
Histological type
|
|
Pancreas-Adenocarcinoma-Other Subtype
|
25
|
Pancreas-Adenocarcinoma Ductal Type
|
144
|
Pancreas-Colloid (mucinous non-cystic) Carcinoma
|
4
|
Pancreas-Undifferentiated Carcinoma
|
1
|
Discrepancy
|
1
|
Status
|
|
Alive
|
84
|
Dead
|
91
|
T
|
|
T1
|
7
|
T2
|
24
|
T3
|
139
|
T4
|
3
|
TX
|
1
|
Not Available
|
1
|
N
|
|
N0
|
49
|
N1
|
117
|
N1b
|
4
|
NX
|
4
|
Not Available
|
1
|
M
|
|
M0
|
78
|
M1
|
4
|
MX
|
93
|
Stage
|
|
Stage I
|
1
|
Stage IA
|
5
|
Stage IB
|
15
|
Stage IIA
|
28
|
Stage IIB
|
116
|
Stage III
|
3
|
Stage IV
|
4
|
Not Available
|
3
|
Grade
|
|
G1
|
30
|
G2
|
94
|
G3
|
47
|
G4
|
2
|
GX
|
2
|
Mutant genes and mutation characteristics
Many somatic mutations occur during the development of cancer. A few gene mutations directly promote tumorigenesis and development. These genes were called driver genes. MutSigCV was used to predict mutations in 175 tumor samples. When the significance threshold P<0.05, a total of 991 candidate driver genes were screened, including SMAD4, TP53 and other genes related to cancer development.
Table 2 Results of correlation between gene module characteristics and phenotype
Module
|
Cor
|
P value
|
turquoise
|
-0.292345297
|
7.16E-05
|
black
|
0.226804486
|
0.002265162
|
green
|
0.127502617
|
0.088967995
|
magenta
|
0.095603005
|
0.203004528
|
DNA methylation
The limma package was used to screen the methylation chip data of 175 cancer samples and 10 control samples for differential methylation sites. When the significance threshold was 0.05 and |logFC|>0.3, a total of 1106 differential methylations were screened. The methylation level of 351 methylation sites decreased, and the methylation level of 755 methylation sites increased.
Analysis of mRNA differentially expressed genes and mutations
Differential gene analysis was performed on the mRNA expression profiles of 175 cancer samples and 4 control samples. When the significance threshold was 0.05 and | logFC|>1, a total of 246 differentially expressed genes were screened. The effect of each mutation site on genes was different. We counted the effects of mutations on genes in all samples of each gene. These effects could be divided into the following categories, including frame_shift_del, frame_shift_ins, in_frame_del, missense_mutation, nonsense_mutation, nonstop_mutation, splice_site, translation_start_site. statistical analysis by t test found that the distribution of mutation types was significantly different in frame_shift_del (P=1.56e-12), in_frame_ins (P=3.95e-11), nonstop_mutation (P=6.22e-05) and splice_site (P =0.005), which indicated that SNV has a certain effect on gene expression. Further enrichment analysis was performed on the differential genes and the significant mutation gene set analyzed by MutSigCV. The hypergeometric distribution test showed that the differential genes were significantly enriched into the significant mutation gene set analyzed by MutSigCV and p value was 0.037.
Candidate mRNA gene set and construction of weighted co-expression networks for candidate gene sets
The differentially expressed genes were searched in the pancreatic cancer-related genes searched in the GENE database, the OMIM database, and the KEGG database. The relationship between the genes and differential genes in the database was shown in the Fig. 2a, of which 216 gene were identified in the differential analysis, and these genes might be related to the onset of PC. In order to further expand the scope of the investigation, the 501 genes annotated in the differential methylation site analysis and the 991 significant mutant genes obtained by MutSigCV analysis were added to finally obtain a candidate gene set with a total of 3284 genes. The PC sample expression data of 3284 genes in TCGA were selected for the next analysis of weighted co-expression networks.
The WGCNA software package of R was used to construct a weighted co-expression network for candidate gene sets. To ensure that the network was scale-free, we choosed the optimal β=5 (Fig. 2b). We used average-linkage hierarchical clustering method to cluster genes, and obtained 13 modules in total.
In order to determine the correlation between the genetic module and the disease, calculate the Pearson correlation coefficient of each module and the sample characteristics (cancer or normal) (the higher the module was more important) and the significance P value of the corresponding correlation. The results of the correlation coefficient between gene module characteristics and phenotypes were shown in Table 1. Subsequently, the gene significance (GS) value of each gene module was calculated (Fig. 2c, d). The larger the GS value, the more relevant the module was to the disease. Through correlation analysis and GS value calculation, two modules related to PC were finally selected. These two modules were turquoise and black, respectively.
Based on the expression relationship of the genes in the two co-expression modules analyzed above, we construct a co-expression network for each module separately. In the end, we selected the top10 gene as the key gene in the co-expression network, that was, the key gene related to PC.
Survival analysis
In order to determine whether the 20 key genes associated with PC obtained in the previous analysis were significantly correlated with prognosis. We performed Kaplan-Meier survival analysis based on the clinical data of these 175 PC samples in TCGA and the expression of these genes in the samples. When the significance threshold was set to 0.05, a total of 9 module core genes, including MST1R(P=0.047), TMPRSS4(P=0.0077), PTK6(P=0.0027), KLF5 (P=0.00019), CGN(P=0.014), ABHD17C(P=0.026), MUC1(P=0.039), CAPN8 (P=0.024), B3GNT3 (P=4e-04) were related to prognosis (Fig. 3).
Further multi-factor Cox regression analysis was performed on these 9 genes, and a regression model Score=-0.0877*MST1R-0.0325*TMPRSS4+0.0693*PTK6 +0.1893*KLF5-0.0634*CGN-0.1143*ABHD17C+0.1589*MUC1-0.2122*CAPN8+ 0.3480*B3GNT3. According to the R package maxstat, then determined the classification's best score threshold point was 2.110527 and divided the sample into high score and low score groups according to 2.110527 (Fig. 4). Then Kaplan-Meier analysis was performed based on the survival time and status of the samples, and the analysis results showed that there were significant differences in the prognosis of the samples with high and low groups (P=0.00035).
Driver genes mediate molecular typing of early pancreatic cancer
We selected 150 early PC samples (Stage I, Stage II) from TCGA, and used the expression of the 9 prognostic-related genes (MST1R, TMPRSS4, PTK6, KLF5, CGN, ABHD17C, MUC1, CAPN8, B3GNT3) in the early cancer samples analyzed in the previous step to perform unsupervised clustering analysis (Fig. 5a). As shown in the Fig. 5b, all PC samples could be divided into two categories, and the two types of samples had significantly different prognosis.
In order to further analyze the differences of these key genes in different subgroups, the average expression value of each gene in each type of sample was used as the expression value of the gene in that category. After t-test analysis, it was found that these 9 genes related to prognosis genes were differentially expressed in two subclasses (Fig. 6).
Functional differences in early pancreatic cancer subgroups
The limma package in R was used for differential gene analysis of the two types of PC samples obtained in the previous step. When the significance threshold was 0.05 and |logFC|>1, a total of 563 differentially expressed mRNAs were screened (Fig. 7). Perform functional and pathway enrichment analysis of these genes (Fig. 8). These differential gene results show that these genes mainly were related to various digestion and absorption and cell adhesion functions. The KEGG pathway shown that these genes were related to pancreatic secretory function. It was inferred that the process of PC was related to various digestive disorders and changes in extracellular cell connections.
The analysis of these genes combined with clinical characteristics and the proportion of tumor infiltrating leukocytes (TILs)
These genes of expression level in PC tissues were higher in those of paracancerous tissues (P <0.05) (Fig. 9a). These genes showed high expression levels in stage III/IV of PC, but it is not statistically significant (Fig. 9b). It may be because of the sample size was not large enough to pick up a statistical difference. In order to further confirm the correlation between the expression of these nine genes and the immune microenvironment, CIBERSORT algorithm was used to analyze the correlation of 22 kinds of TILs proportion with these 9 genes expression. Among them, B-cell naive, CD8+ T cells, and Macrophages M0 cells were correlated with these 9 genes expression (Fig. 10). These results support that the expression levels of these nine genes affect the immune activity of tumor microenvironment. Later, according to the drug-target interactions recorded in the drug-bank database, MUC1, PTK6 were the target genes of 2 drugs (Potassium nitrate, Vandetanib).
MST1R, PTK6, ABHD17C and CGN is highly expressed in PC cells
We next investigated the expression level of four of these genes in cancer cell lines and pancreatic epithelial HPDE lines, MST1R, PTK6, ABHD17C and CGN were expressed relatively high in BxPC-3, Panc-1 and PATU-8988(Fig. 11, Table 3).
Table 3 Primers designed for the qRT-PCR validation of candidate gene mRNA levels and β-actin
Symbol
|
Forward primer (5 ‘to 3ʹ)
|
Reverse primer (5 ‘to 3ʹ)
|
MST1R
|
CATACCGCCACATTGACCCT
|
CTGCATCACTTGGTACAGAGAAT
|
PTK6
|
GCTATGTGCCCCACAACTACC
|
CCTGCAGAGCGTGAACTCC
|
CGN
|
CGAGGAAAAAGTCTCACGGCT
|
GTCTTCAGGTCCTTGTTCTGTCTCT
|
ABHD17C
|
GCCGTCGAGGTCTTCTTCTC
|
GACTCACGCCATACCGGG
|
β-actin
|
CCTGGCACCCAGCACAAT
|
GCTGATCCACATCTGCTGGAA
|