Gene mapping and protein structure analysis
Based on the UCSC genome browser on human Dec. 2013 (GRCh38/hg38) assembly (http://genome.ucsc.edu/), the genome location information of Sp1 was obtained. We also applied the "HomoloGene" function of the NCBI (National Center for Biotechnology Information) to conduct conserved functional domain analysis of Sp1 in different species. Additionally, we obtained the phylogenetic tree of Sp1 in different species using the constraint-based multiple alignment on-line tool of the NCBI (https://www.ncbi.nlm.nih.gov/tools/cobalt/).
Gene expression analysis
We first logged into the online HPA (Human protein atlas) database and obtained the expression data of the Sp1 in different normal tissues, cancerous tissues, and blood cells. “Low specificity” was defined by “NX (Normalized expression) ≥ 1 in at least one tissue/region/cell type but not elevated in any tissue/region/cell type”.
We used TIMER2 (tumor immune estimation resource, version 2) website (http://timer.cistrome.org/) to investigate the expression difference of Sp1 between cancerous and adjacent normal tissues in different tumors of the TCGA project. We also used “Box Plots” module of the GEPIA2 (Gene Expression Profiling Interactive Analysis, version 2) website (http://gepia2.cancer-pku.cn/#analysis) to acquire box plots of the expression difference of Sp1 between tumor tissues and the corresponding normal tissues of the GTEx (Genotype-Tissue Expression) database. In addition, the violin plots of Sp1 expression in different TNM stages of all TCGA tumors with the online tool HEPIA2. Furthermore,
We explored the expression level of the total protein or phosphoprotein of Sp1 between cancerous and adjacent normal tissues via the UALCAN portal (http://ualcan.path.uab.edu/analysis-prot.html). The available CPTAC (Clinical proteomic tumor analysis consortium) datasets in the UALCAN portal include six tumors, namely, breast cancer (BRCA), ovarian cancer, colon cancer, renal cell carcinoma (RCC), Uterine corpus endometrial carcinoma (UCEC), and Lung adenocarcinoma (LUAD).
Patients and specimens
From January 2018 to December 2022, 26 patients undergoing gastrectomy for gastric cancer and 27 advanced GC patients receiving ICIs and chemotherapy in Changzhou No.2 People hospital. Cancerous and adjacent normal tissue was collected during surgery or puncturation, and histopathologically confirmed and staged according to the Union for International Cancer Control. Patients’ written informed consents and approval from the Ethics Committees of Changzhou No.2 People’s Hospital were obtained for the use of these clinical materials.
Immunohistochemisty (IHC)
Tissue sections were incubated in an oven at 55°C for 20 min followed by three 5-min washes with xylene for dewaxing then rehydrated by 5-min washes in 100%, 95%, and 80% ethanol and distilled water. Samples were heated at 95°C for 30 min in 10 mmol/L sodium citrate (pH 6.0) for antigen retrieval. Endogenous peroxidase activity was blocked by incubation in 3% H2O2 for 30 min. After 30 min blocking with the universal blocking serum (Dako Diagnostics, Carpinteria, CA), the sections were incubated with anti-Sp1 antibody at 4°C overnight and washed 3 times with PBS at room temperature. Then a secondary antibody was added for 30 min incubation (Dako Diagnostics). The samples were washed 3 times with PBS and developed using DAB followed by counterstaining with hematoxylin. Dehydration was performed following a standard procedure and the slides were sealed with cover slips. Images were scanned with a digital pathology slide scanner (KF-BIO, CHINA).
Sp1 immunostaining signals were evaluated by two researchers, with the clinical information blinded to them, and scored. Brown cytoplasmic staining for Sp1 was considered positive. The percentage of Sp1-positive cells was scored with the following four categories: 1 (< 25%), 2 (25–50%), 3 (50–75%), and 4 (> 75%). The staining intensity of positive cells was scored as 0 (absent), 1 (weak infiltration), 2 (moderate infiltration), and 3 (strong infiltration). The final score was the sum of the intensity and the percentage.
Survival analysis
The “survival map” module of GEPIA2 was used to conduct the survival analysis of Sp1 across all TCGA tumors. Cutoff-high (50%) and cutoff-low (50%) values were used as the expression thresholds for splitting the high-expression and low-expression groups of OS (Overall survival) and DFS (Disease-free survival).
Genetic alteration analysis
We investigate the genetic alteration characteristics of Sp1 with the cbioportal website (https://www.cbioportal.org/). The results of the alteration frequency, mutation type and Copy number alteration (CNA) were obtained in the “Cancer Types Summary” module. We also used the “Comparison” module to obtain the data of OS, progression-free survival (PFS), and DFS differences in the TCGA cancer cases with or without Sp1 genetic alteration.
Analysis of tumor behavior states, immune infiltrates, and immune biomarkers
The online tool Sangerbox (http://sangerbox.com/index.html) was used to investigate the correlations between TMB, MSI and Sp1 in all types of cancers in TCGA. The correlations between the Sp1 expression and a variety of genes involved in immune checkpoint signaling, such as CTLA4 were also evaluated with Sangerbox. Spearman’s correlation was performed and the P-value and partial correlation (cor) value were obtained.
We used the TIMER2 online tool to explore the correlations between Sp1 expression and several types of immune cells, which includes B cells, CD4+ T cells, CD8+ T cells, dendritic cells, macrophages, and neutrophils in all types of tumors. The TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, XCELL, MCPCOUNTER and EPIC algorithms were applied for immune infiltration estimations, especially for CD8+ T cells. The P-values and correlation values were obtained via the purity-adjusted Spearman’s rank correlation test. The data were visualized as a heatmap and a scatter plot.
DNA methylation analysis
We also used the SangerBox tool to investigate the correlations between the Sp1 expression and four classical DNA methyltransferase including DNMT1, DNMT2, DNMT3A, and DNMT3B in all types of cancer. The MEXPRESS web (https://mexpress.ugent.be/) was used to analyze the DNA methylation level of Sp1 of multiple probes in different cancers of TCGA database. The beta value of methylation, the Benjamini-Hochberg-adjusted P-value and Pearson correlation coefficient value of each sample were obtained. The promoter region probes were highlighted.
Phosphorylation analysis
We used iPTMnet database (http://proteininformationresource.org/iPTMnet) to analyze the predicted phosphorylation features of the S7, T42, S59, S101, T278, T453, S641, T668, S698, and S702 locus of Sp1. We also investigate the differences in phosphorylation levels of Sp1 between normal tissues and primary tumors, including BRCA, ovarian cancer, colon cancer, RCC, and UCEC, using the CPTAC analysis.
Multiplex Immunofluorescence (mIF)
Multiplex staining of was performed using TSA 6-color kit (H-D110061,yuanxibio). Primary antibodies panel included anti-CD8 (#BX50036, Biolynx), anti-CD68 (#BX50031-C3, Biolynx), anti-HLA-DR (#ab92511, Abcam), anti-PanCK (#GM351507, Gene Tech). Primary antibodies were sequentially applied, followed by horseradish peroxidase-conjugated secondary antibody incubation(Cat# DS9800, Lecia Biosystems), and tyramide signal amplification. The slides were washed with TBST buffer and heat-treated by microwave after each TSA operation. Nuclei were stained with DAPI (D1306, ThermoFisher) after all the antigens above being labeled, then washed in distilled water, and manually coverslipped. The stained slides were scanned to obtain multispectral images using the Pannoramic MIDI imaging system (3D HISTECH). Images was analyzed using Indica Halo software.
Enrichment analysis of Sp1-related genes
The STRING online tool (https://string-db.org/) was applied to investigate the top 50 experimentally determined Sp1-binding proteins. The main parameters were set as follows: minimum required interaction score [“Low confidence (0.150)”], meaning of network edges (“evidence”), max number of interactors to show (“no more than 50 interactors” in 1st shell) and active interaction sources (“experiments”). The GEPIA2 was used to determine the top 100 Sp1-correlated genes based on the TCGA datasets. Furthermore, we used the “Gene_Corr” module of TIMER2 to supply the heatmap data of the selected genes, which contains the correlation and P-value in the Spearman’s rank correlation test. The log2 TPM was applied for the dot plot. The P-value and the correlation coefficient (R) were indicated. Venny 2.1.0 (https://bioinfogp.cnb.csic.es/tools/venny/index.html) was used to conduct an intersection analysis to compare the Sp1-binding and interacted genes. Then, these two sets of genes were combined and submitted to DAVID for additional functional annotation, such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). In this work, we mainly focused on three aspects of GO analysis: biological processes (BP), cellular components (CC), and molecular functions (MF). In addition, we used KEGG analysis to investigate the pathways in which the Sp1-binding and interacted genes were involved.
Gene Set Enrichment Analysis (GSEA)
GSEA was used to explore the up-downregulations among different pathways associated with Sp1 in STAD. The functional gene set was set to c2.cp.kegg.v7.4.symbols.gmt, the analysis parameters were "No collapse", the number of permutations was set to "1000", the permutation type was set to "Phenotype", and the above files were analyzed by GSEA software (version 3.0). In this study, GESA was used to explore Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways significantly associated with high and low Sp1 expression, and mapped the top five pathways. P-value < 0.05 and FDR < 0.25 were considered statistically significant.
The scRNA-seq data analysis
The GC scRNA-seq data (GSE163558) were obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), which included 10 fresh human tissue samples of six patients, including primary tumors, adjacent non-tumoral samples, and six metastases from various organs or tissues (liver, peritoneum, ovary, lymph node). Data filtering and preprocessing were conducted using the R package “Seurat”. The initial screening criteria included: genes expressed in at least three cells; each cell expresses at least 250 genes; the percentage feature set function was used to calculate the percentage of mitochondria and rRNA, ensuring that each cell expresses more than 200 genes and less than 5000 genes; mitochondrial genes comprising less than 15% of the total genome. Following data filtering, samples were merged for further analysis. To address batch effects and integrate different single-cell transcriptome samples, the FindIntegrationAnchors and IntegrateData functions in the Seurat package were employed, identifying 4000 highly variable genes with the FindVariableFeatures function.Then, principal component analysis (PCA) was performed using the RunPCA function. Cell clustering was carried out with the FindNeighbors and FindClusters functions (resolution = 0.1, dim = 50). Dimensionality reduction was achieved using the UMAP method. Marker genes for each cluster were identified using the FindAllMarkers function (logFC = 0.75, min.pct = 0.25, p-adj < 0.05).