Data sources and processing
To identified the prognosis related genes, genomic data of RNA sequence and micro-array for pancreatic cancer were queried from TCGA cohort, ICGC cohort, and GEO cohort on Sep 1, 2020. PKM value of TCGA cohort, normalized counts value of ICGC cohort, and RAM value of GEO cohort were extracted and applied for further analysis. Survival data was available for TCGA cohort, ICGC cohort, and one GEO cohort (GSE57495). To investigate the expression level of BCE genes in pancreatic cancer, three GEO databases (GSE15471, GSE16515, and GSE28735) which included both tumor and normal control samples. Totally, 48 BCE genes were retrieved from literature and KEGG database (Supplemental Table 1).(10) This is a secondary analysis based on the open online database, and the informed consent was waived.
Survival Signature development and validation
In the study, we set the TCGA cohort as train cohort, while ICGC and GSE57495 as test cohorts. All the analyses were performed in Rstudio software with implemented packages. All BCE genes were treated as continuous variable in signature development. Firstly, All BCE genes were verified in three GEO datasets by t.test between tumor and normal control samples. Secondly, BCE genes in TCGA were screened by univariate Cox analysis of overall survival (OS) with survival package. Finally, LASSO-penalized regression analysis for model construction with candidate genes with glmnet/Survival package. After prognostic genes with altered expression in tumor were identified, the risk-score was calculated based on the formula: . In the equation, n represented selected gene number, exp(Gi) represented the expression of gene i, while βi represented the coefficient for gene i. Based on the risk-score mean value, TCGA and ICGC patients were both stratified as high-risk and low-risk groups. For survival analysis, we applied Kaplan-Meier method to calculate the OS plot in different risk groups with survminer package. The log-rank test was performed to check the statistical significance. Forest plots were as draw to demonstrate the Hazard ratio (HR) of selected prediction genes.
Protein–protein interaction (PPI) analysis
The PPI analysis of the prognostic genes was performed using the STRING database (http://string-db.org), which provides critical assessment and integration of protein interactions.
Immune infiltration analysis
The immune infiltration score of immune cells between two groups were calculated with single-sample gene set enrichment analysis (ssGSEA) with gsva package.
Cell lines
The human pancreatic cancer cell lines (CFPAC-1, CAPAN-1, MIA PaCa-2, and PANC-1), hepatocellular carcinoma cell line HuH7, colorectal cancer cell line (HCT116 and SW480), and lung cancer cell line (A549) were purchased from Shanghai Institutes for Biological Sciences (Shanghai, China). The human Primary fibroblast N16 was denoted from Translational Medicine Research Center, Shanghai East Hospital. All cells were cultured in modified Eagle’s medium (Gibco, Carlsbad, USA) supplemented with 10% fetal bovine serum, 1% penicillin, and 1%streptomycin at 37◦C with 5% CO2.
Real-Time PCR
Total RNA was isolated using TRIZOL Reagent (Invitrogen, Life Technologies) and was converted to cDNA using the PrimeScriptTM RT reagent Kit (Takara, Japan). Expression levels of mRNA were measured by real-time PCR (Applied Biosystems, 7500, USA) using SYBR Premix Ex TaqTM II (Takara, Japan). Total amount of mRNA was normalized to actin mRNA. The primer sequences were shown in Supplementary Table 2.
Immunohistochemical staining (IHC) with TMA
A tissue microarray (Shanghai Outdo Biotech Co., Ltd. Shanghai, China) with 90 pancreatic carcinoma samples and paired adjacent samples were applied for IHC. Rabbit antihuman polyclonal ABAT antibody (Sigma-Aldrich, Cat# HPA041690, USA) and BCAT2 antibody (Sigma-Aldrich, Cat# HPA054091, USA) were used at a 1:500 dilution. Mouse antihuman CD68 antibody (Servicebio, Cat# GB14043, China) was used at a 1:200 dilution. Quantitative analysis of the staining was performed with histochemistry score (Hscore), which was determined based on the intensity of staining and the proportion of labeled tumor cells as previously described.(12) The Hscore was calculated based on the formula: . In the equation, i represented the graded of staining intensity, which included no staining (i=0), weak (i=1), moderate (i=2), and strong (i=3), while pi represented the percentage of labeled tumor cells with the corresponding stating intensity. To quantify the immune cell infiltration, positive cell in the tumoral and stromal compartments were enumerated separately and normalized per unit area as cells/mm2.(13)
Statistics.
All analyses were set at two-sided p value <0.05 as the threshold for statistical significance. The data were expressed as mean ± standard deviation (SD).