Microarray data information
181 data serials in total were collected from GEO, among which 8 data serials (Supplementary table 1) included samples of both primary tumor and adjacent normal tissue, and 2 data serials (GSE42952 and GSE71729) included samples of both hepatic metastasis and primary tumor (table 1).
Data serials GSE42952 and GSE71729 were selected for following analysis. GSE42952 including 10 primary tumor samples and 7 hepatic metastasis samples from surgical resections, was processed on array platform GPL570. GSE71729 including 145 primary tumor samples and 25 hepatic metastasis samples from surgical resections, was processed on array platform GPL20769.
DEGs in hepatic metastasis of pancreatic cancer
Data serials GSE42952 and GSE71729 were analyzed through GEO2R and volcano plots of DEGs in each data serial were showed (figure 1A and 1B). 1490 DEGs were identified between group of hepatic metastasis and group of primary tumor in GSE42952, including 588 up-regulated and 902 down-regulated DEGs. 1053 DEGs were identified between group of hepatic metastasis and group of primary tumor in GSE71729, including 445 up-regulated and 608 down-regulated DEGs. The overlapped H-DEGs were obtained, including 217 up-regulated H-DEGs and 257 down-regulated H-DEGs (figure 1C and 1D).
Verifications of DEGs in primary tumor and cell lines of pancreatic cancer
Data of TCGA and GTEx was used to generate a DEGs list between group of primary tumor and group of adjacent normal tissue, and it was designated as primary tumor related DEGs (P-DEGs). Through a comparison between H-DEGs and P-DEGs, 40 out of 217 up-regulated H-DEGs were also found up-regulated in primary tumor compared to adjacent normal tissue, meanwhile 11 out of 257 down-regulated H-DEGs were also found down-regulated in primary tumor compared to adjacent normal tissue (figure 2).
As mesenchymal components of the samples would confuse the results of gene expressions in tumor cells, a verification step in cell lines of pancreatic cancer was carried out. 27 out of 40 up-regulated H-DEGs and 2 out of 11 down-regulated H-DEGs in previous step were verified (figure 3A and 3B). The finally generated H-DEGs were presented by heat plot (figure 3C). The expressing profiles of all cell lines were available in Supplementary table 2.
GO and KEGG pathway enrichment of DEGs
For GO analysis (table 2), negative regulation of blood coagulation, negative regulation of hemostasis, alcohol metabolic process and so on were the enrichment results of BP (figure 4A). Collagen-containing extracellular matrix, endoplasmic reticulum lumen, chylomicron and so on were the enrichment results of CC (figure 4B). Carboxylic acid binding, sulfur compound binding, vitamin binding and so on were the enrichment results of MF (figure 4C).
Complement and coagulation cascades, Arginine and proline metabolism, Steroid biosynthesis, and alanine, aspartate and glutamate metabolism were the enrichment results of KEGG pathway (table 4). The corresponding p values and H-DEGs in each KEGG pathway were presented (figure 4D and 4E).
Prognostic DEGs
For cox proportional hazards regression analysis, genes with HR>1 were considered as factors promoting tumor progression, while genes with HR<1 were considered as factors preventing tumor progression. 18 out of 27 up-regulated H-DEGs were found HR>1, while none of 2 down-regulated H-DEGs was found HR<1. Over-expressions of MARCO and PCSK6 among the 18 H-DEGs (HR>1) were both significantly associated with decreased overall survivals (p=0.046 and P=0.013). Further analysis showed that MARCO and PCSK6 were both significantly correlated with TNM stages (figure 5).
PPI networks and orthotopic expressions of candidate DEGs
MARCO and PCSK6 were chosen as candidate DEGs to generate PPI networks through String and visualized by cytoscape (figure 6A). The PPI network of MARCO consisted of 11 nodes and 38 edges and the PPI network of PCSK6 consisted of 11 nodes and 16 edges. Each node represented a protein that would interact with MARCO or PCSK6.
The corresponding protein of MARCO and PCSK6 were examined by immunohistochemistry (figure 6B). The representative stainings of PCSK6 in pancreatic cancer, normal pancreas tissue and normal liver tissue were showed, while the stainings of MARCO were negative in current samples of pancreatic cancer, normal pancreas tissue and normal liver tissue in database of The Human Protein Atlas.
Expression of MARCO and TILs in pancreatic cancer
As the expression of MARCO might be related to TILs, correlation between mRNA expression of MARCO and 8 types of TILs including Neutrophil, Macrophage, natural killer cell (NK), myeloid derived suppressor cell (MDSC), regulatory T cell (Treg), activated B cell (Act_B), activated CD4 T cell (Act_CD4) and activated CD8 T cell (Act_CD8) in pancreatic cancer were analyzed (figure 6C). 5 types of TILs including Macrophage, MDSC, NK, Treg and Act_CD4 were positively correlated to expression of MARCO (p value <0.01 and rho>0). Treg, MDSC, Macrophage, NK and Act_CD4 were ranked in decreasing order of rho coefficients.
Functional gene sets correlated to expression of PCSK6
Totally 172 samples of pancreatic adenocarcinoma from TCGA were separated into group PCSK6_high (n=86) and group PCSK6_low (n=86). The enrichment list of gene sets (top 20) was ranked by NES (table 3). There were 9 significant gene sets (normalized P<0.05) included Notch signaling, Mitotic spindle, TGF beta signaling, Glycolysis, Estrogen response early, Peroxisome, Estrogen response late, Apoptosis and P53 pathway. The enrichment plots of significant gene sets were showed (figure 7A). Top 50 genes in each group were plotted by heatmap (figure 7B). The correlations between NES and p value of gene sets were visualized (figure 7C).