Data download and preprocessing
TCGA pan-cancer data were downloaded from the UCSC Genome Browser (https://genome.ucsc.edu), including batch effects normalized gene expression transcription data, clinical data, single-sample gene set enrichment analysis (ssGSEA) score data, drug-target data, homologous recombination deficiency (HRD)score and genome-wide DNA damage data, immune signature scores data and RNA based stemness scores data. All data processing is described on the official website. The pan-cancer study combined with clinical data included a total of 1368 SCC samples, including 252 cervical squamous cell carcinomas (CESCs), 95 esophagi squamous cell carcinomas (ESCAs), 520 head & neck squamous cell carcinomas (HNSCs), 501 lung squamous cell carcinomas (LUSCs).
Angiogenesis Subtypes
First of all, we obtained 507 angiogenesis genes from the AmiGO2 website (http://amigo.geneontology.org/amigo). Combined with gene expression data, 474 angiogenesis genes were finally obtained for analysis. Univariable Cox analysis was used to filter the angiogenesis genes which had the prognostic value for SCC patients (P<0.05). Based on the prognostic angiogenesis-related genes, ConsensusClusterPlus R-package was used to identify subtypes in SCC tumor samples using 1000 iterations, 80% sample resampling from 2 to 7 clusters (k2 to k7) using kmdist with average linkage algorithm and correlation as the similarity metric.
Gene set enrichment analysis and pathway
To study the changes in gene sets, Gene Set Enrichment Analysis (GSEA) was performed on all [11]. We analyzed the correlations between angiogenesis subtypes with cancer hallmark pathways and the Encyclopedia of Genes and Genomes (KEGG) pathway in each tumor sample. GSEA can highlight genes associated with the subtypes through pathway analysis, even if this link is weak. After differential Expressed Genes (DEGs) analysis, the Gene Ontology (GO) analysis and KEGG pathway enrichment analysis were also performed.
Genomic correlations with angiogenesis subtypes
Aneuploidy and LOH Scores and ABSOLUTE purity/ploidy file were obtained from research by Thorsson, V et al. [12]. All purity, ploidy, LOH, and CNV invocation used to create the DNA damage scores utilized in this study as well as those summarized below were derived by the TCGA Aneuploidy AWG using ABSOLUTE[13], respectively. Moreover, HRD and HRD-loss of heterozygosity (HRD-LOH) score from the UCSC genome browser. The copy number burden fraction change and the number of segments represent the base fraction deviating from the baseline multiplicity and the total number of segments in each sample's copy number profile, respectively. Each fragment was designated as amplification, deletion, or neutral based on its number of copies relative to the circular ploidy of the sample. All data have calculated the correlations of angiogenesis subtypes (subtype1 and subtype2) by t-test. In addition, we calculated oncoplot, mutation panorama and OncogenicPathways based on TCGAmutation and maftools R packages.
Differentially expressed genes and regulation associated with angiogenesis
For the analysis of differentially expressed genes and miRNA, the Mann-Whitney U test was performed to derive the differential genes between subtype1 and subtype2 (FDR <0.05, absolute logFC > 1). Abnormal vascular networks due to tumor cells can secrete a large number of pro-angiogenesis factors, which is characterized by vascular disease, immaturity, and permeability [14]. To clarify the regulation of angiogenesis subtypes, we performed a computational analysis to identify two "master regulators": transcription factors (TF) and miRNA. For TF, we first downloaded 318 TFs from the Cistrome Bowser (http://cistrome.org/). Correlation test of 163 angiogenesis genes obtained from batch survival analysis with TF (person correlation: R2 = 0.4, P < 0.05). For miRNA, use t-test to calculate the differential miRNA, (FDR=0.05, log FC > 1). And the resulting five miRNAs to the site of the target gene prediction TargetScan (http://www.targetscan.org/vert_72/). Each of the predicted miRNA target genes is the result of the correlation test which intersected angiogenesis genes. By TransmiR v2.0 database (http://www.cuilab.cn/transmir) of predicted miRNA each TF, and then intersected with the TF of the correlation test. Finally, the TF-miRNA-target regulatory network was constructed using Cytoscape (https://cytoscape.org/) software.
The microenvironment in the angiogenesis subtypes
Tumor infiltrating leukocytes (TILs) have been shown to be associated with tumor prognosis and treatment response and are a crucial component of the tumor microenvironment [15]. CIBERSORT [16], a common computational method for quantification of cellular components by gene expression profiling (GEP) from bulk tissues. Therefore, we upload the gene expression data to the CIBERSORT website, which can be obtained free of charge (https://cibersort.stanford.edu/). we used a leukocyte gene signature matrix, termed LM22 and 1000 permutations. LM22 contains 547 genes, can be distinguished 22 kinds of human hematopoietic cell subtypes, including 7 types of T cells, naive and memory B cells, plasma cells, NK cells, and myeloid subpopulations[15]. We used the ssGSEA [17] which ranks the gene expression values of a given sample and then uses the empirical cumulative distribution function to calculate the enrichment score (ES) [18] method to calculate the enrichment score of angiogenesis genes in each sample. In the ssGSEA analysis, we used only genes associated with angiogenesis for our calculations. R packages have used GSVA, limma, and GSEABase. Application of the Estimating Stromal Cells and Immune Cells in Malignant Tumor Tissues Using Expression Data (ESTIMATE)[19] Algorithm to Calculate Stromal Cells, Immune Cells, and Estimated Scores.
The clinical implication of angiogenesis subtypes
The survival outcome of these patients with different subtypes was calculated by Log-rank test. To investigate whether there is a significant correlation between clinicopathological characteristics and SCC subtypes. We further studied the relationship between angiogenesis subtypes and gender, clinical stage (I~IV), tumor status, and histological grade (1~3). According to the grouping established by angiogenesis classification (K=2), subtype1 has 957 samples and subtype2 has 417 samples and analyzes the prognostic difference between the two groups of samples.
Statistical analysis
Pearson was performed to evaluate the immune signatures level for each sample. The enrichment levels of 68 immune signatures and angiogenesis were quantified by the heatmap R package. We compared the differences in drug target types between subtype1 and subtype2 and visualized the differences using the "pheatmap" R software package. A two-sided P value less than 0.05 were set as a statistical significance threshold. All the Analysis were performed based on R version 3.4.2 (2020-04, https://www.r-project.org/). The Student’s t-test was used to compare subtype1 and subtype2 with differential expression genes.