2.1 Data collection
Single cell sequencing data of pancreatic cancer were downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/, ID: GSE154778 and GSE154778) database. GSE156405 contained 5 primary tumor (PT) and 1 liver metastatic (LM) pancreatic cancer samples, and GSE154778 included 10 PT samples and 5 LM samples (Table S1). The transcriptome data of 183 pancreatic cancer samples were retrieved from The Cancer Genome Atlas (TCGA, https://tcga-data.nci.nih.gov/tcga/) database. Among which, totally 126 samples had complete survival information (Table S2). Furthermore, spatial transcriptome data of PT samples (ST_PDAC-1 (GSM6132061)) by high throughput sequencing were extracted from GSE202740 dataset.
2.2 Single cell data quality control and cell annotation
Cell Ranger (v6.1.2) was utilized to align the data to the human genome (GRCh37). Seurat v4.1.1 was used to process single-cell data, and cells with mitochondrial content higher than 20%, hemoglobin content higher than 5%, and expression genes less than 200 were filtered. Seurat was used for data normalization, cell clustering and dimension reduction. The "FindVariableFeatures" function was used to select 2000 highly variable genes from the corrected expression matrix, and then the "RunPCA" function was used for principal component analysis, retaining the top 20 principal components for further analysis. Batch effects were corrected by "RunHarmony" of R package harmony. The "FindClusters" function was used for cell clustering (resolution 0.6), and the "RunUMAP" function was used for nonlinear dimensionality reduction. Cell grouping was annotated based on cellmark2.0 database and common mark genes of cells.
2.3 Differential gene and functional enrichment analyses
The “Findmarkers” function from the Seurat package was utilized to performed differential gene analysis. The differentially expressed genes (DEGs) among between two groups were screened by p.adj < 0.05. The R package "ClusterProfiler" was used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) enrichment analysis. The significantly enriched pathways were screened using p < 0.05.
2.4 Gene set variation analysis (GSVA)
The Hallmark gene set was downloaded for GSVA from the MSigDB(v2023.1) (https://www.gsea-msigdb.org) database to investigate the difference of the biological function between two groups using R package. The difference analysis among groups were conducted using the "limma" package. The differentially enriched pathways between two groups were identified based on |t| > 2 and p < 0.05.
2.5 Copy number variation (CNV) analysis
Based on single-cell gene expression and chromosome sequencing data, the inferCNV (v1.14.0) package in R was used to distinguish malignant epithelial cells from non-malignant epithelial cells. The inferCNV analysis was performed with the following settings: cutoff=0.1, cluster-by-groups=TRUE, tumor subcluster-partition-method = "random-trees", and hidden markov model (HMM) = TRUE. To reduce false positive calls in CNV inference, the default Bayesian latent mixture model was applied to determine the posterior probability of changes in each cell, with a default threshold of 0.5. Furthermore, all gene CNV scores of ECs and reference cells (T cells) were hierarchically clustered using the k-means algorithm. The malignant epithelial cells were identified according to the CNV score. If the score was greater than the 95th percentile of the reference cell, it is malignant epithelial cells. The others were non-malignant epithelial cells.
To illustrate tumor clonality and evolution, the "sub cluster" pattern was further applied to divide malignant cells into eight clusters, and HMM generated different CNV patterns. Each CNV was annotated as a gain or loss at the p or q arm level according to UCSC chromosome band information. Subclones containing the same arm level CNVs were collapsed to construct an evolutionary tree. The phylogenetic tree was drawn using Uphyloplot2 software to represent the subclonal CNV structure.
2.6 Inference of developmental trajectory
The Monocle (v2.28.0) was used to construct pesudotime trajectories based on gene expression profiles of malignant epithelial cells. After dimensionality reduction and cell sorting, all malignant epithelial cells were projected and sorted into trajectories with different branches, and cells within the same branch were considered to have the same cell state. Branch expression analysis modeling (BEAM) was further performed to identify genes with branch dependent expression patterns. These branch dependent genes can help us to explore the mechanism of cell fate determination.
2.7 Cell communication analysis
Cellchat (v1.5.0) package was used to predict and visualize biologically meaningful intercellular communication. For each individual dataset, first extracted the expression matrix and metadata from the Seurat object, and then used the "createCellChat" function to generate a CellChat object. After calculating the highly variable genes and pathways, the "computeCommunProb" function was used to infer the intercellular communication probability. The results were shown using a series of visualization functions provided by CellChat, such as "netVisual-bubble" presented the dot diagram of signaling pathways emitted from cells. The functions "compareInteractions", "netVisual-diffinteraction", "netVisual-heatmap" of the CellChat package were used to compare the relative number or interaction strength of various cell subsets between two groups.
2.8 LM-index and overall survival analysis
Based on the risk genes, the gelnet (v1.2.1) package in R was applied to construct a LM-index by one class logistic regression (OCLR) algorithm to represent the risk of LM pancreatic cancer samples. The patients were divided into high and low LM-index groups using the R package "survminer". Kaplan Meier (K-M) survival analysis was used to evaluate the overall survival (OS) of patients with high and low LM-index, and two-sided log rank test was used for comparison.
2.9 Spatial data analysis
Raw data was quality controlled using fastp, and then data alignment and count were performed using SpaceRanger (v2.1.0). Seurat package (v4.4.0) was used for data filtering (spot with more than 200 retained genes, less than 20% mitochondria, and less than 5% hemoglobin), spot normalization and regression. Finally, Seurat's FindTransferAnchors and transferdata functions were employed to map single-cell data and achieve annotation of spatial data.
2.10 Immunofluorescence staining
Pancreatic cancer tissue microarray samples were purchased from bioaitech (Xi’an,China). The use of the human tissue microarrays was approved by the Ethics Committee of Union Hospital of Tongji Medical College of Huazhong University of Science and Technology. The in-formation of patients was shown in Table S3. ITGAV (Proteintech, #27096-1-ap) and postn (Proteintech, #66491-1-Ig) were used as primary antibodies.
Immunofluorescence staining was performed according to the instructions of alphatsa multiplex IHC Kit (AXT37100031, Alphaxbio). In brief, the tissue chip was dewaxed and hydrated through a series of xylene and alcohol washes, and then antigen repair and sealing were performed. The sections were blocked and incubated with primary antibody and secondary antibody, and fluorescent staining was performed. Finally, the nuclei were counterstained with DAPI for 5 min and enclosed in Mounting Medium. ZEN (v3.1) software was used for film reading.
2.11 Cell collection and culture
Human pancreatic cancer cell line PANC-1 and human normal pancreatic ductal epithelial cell line HPDE6-C7 were purchased from BeNa culture collection (Beijing, China). All cells were cultured in RPMI-1640 complete medium (Gibco) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin. The cells were placed in a humidified incubator maintained at 37 °C with 5% CO2.
2.12 qRT-PCR
Total of RNA from cells was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA), and the quality and concentration of RNA were evaluated using UV spectrophotometer, and then reverse transcription was performed using Transcriptor First Strand cDNA Synthesis Kit (GenStar, Beijing, China). Furthermore, qPCR detection was performed using the LightCycler 480 fluorescence quantification system (Roche, Basel, Switzerland). The reference gene was GADPH, and the primer sequences were listed in Table S4. The mRNA expression levels were calculated using the 2-ΔΔCT method (three repeats).
2.13 Cell transfection
The small interfering RNA (siRNA) was used to knock down ITGVA in PANC-1 cells. The ITGAV siRNA sequence was presented in Table S5. The cells were seeded onto 6-well plates and transfected once they reached approximately 50% confluence. The transfection involved using a negative control (NC) and si-ITGAV to transfect the cells, respectively. All transfections were carried out with Lipofectamine 3000 (Invitrogen, Carlsbad, CA, USA). The knockdown efficiency of ITGAV was detected by qRT-PCR.
2.14 Cell scratch assay
A total of 7×105 cells were seeded in each well of a 6-well and cultured for 24 hours. A line was drawn in the center of the well using a 10 μL pipette tip. After two washes with PBS, the cells were cultured in a 37 °C incubator for 24 hours. Subsequently, the wounds were photographed using a microscope at various time intervals. The distances of the wounds were measured using Fiji (ImageJ).
2.15 Invasion and migration assays
The migration and invasion abilities of cells were analyzed using a polycarbonate membrane with an 8 μm pore size in a 24-well Transwell chamber (Coring, NY, USA). The upper chamber added 1×104 cells in serum-free medium containing 0.1% BSA. Moreover, the lower chamber was supplemented with medium containing 0.1% BSA and EGF (50 ng/ml, MCE, NJ, USA). After incubation for 24 h, the cells in the upper chamber were completely transferred to the lower membrane. The polycarbonate membrane was fixed and stained with Giemsa solution (Solarbio, Beijing, China), and photographed with a microscope.