Genomic analysis has greatly expanded people's knowledge in the genetic composition of cancer mutations and in many cases helped design of new treatments. In the hope of using genomic information to guide the diagnosis and treatment of STS, we recruited thirty-two patients, twenty with LMS and twelve with DDLPS, at Zhongshan Hospital of Fudan University, Shanghai, China. The patient characteristics are indicated in Table 1.
Table 1 Summary of Clinical information and WES and RNA-seq analysis
|
|
|
All
|
LMS
|
DDLPS
|
Sample size
|
|
|
|
WES
|
32
|
20
|
12
|
RNA-seq
|
16
|
8
|
8
|
Age (years)
|
|
|
|
Median
|
51
|
49.5
|
54
|
Range
|
16-70
|
16-65
|
37-70
|
Gender (%)
|
|
|
|
Male
|
12 (37.5%)
|
3 (15%)
|
9 (75%)
|
Female
|
20 (62.5%)
|
17 (85%)
|
3 (25%)
|
WES
|
|
|
|
TMB (counts/Mb)
|
|
|
|
Median
|
2.27
|
2.61
|
1.97
|
Range
|
1.20-16.1
|
1.57-16.1
|
1.20-4.51
|
MSI score (%)
|
|
|
|
High
|
1 (3.1%)
|
1 (5%)
|
0 (0%)
|
Low
|
31 (96.9%)
|
19 (95%)
|
12 (100%)
|
SNP+Indel
|
2887
|
2110
|
1088
|
CNA gain
|
|
113
|
1089
|
RNA-seq
|
|
|
|
Up-regulated
|
|
2404
|
2364
|
Gene fusion
|
40
|
4
|
36
|
RT-PCR verified
|
8/12 (66.7%)
|
1/4 (25%)
|
7/8 (87.5%)
|
* TMB is reported as counts/Mb. It is calculated by all nonsynonymous mutations detected in WES divided by 35Mb. 35Mb is the sum of exome probes in WES.
WES: whole exome sequencing. TMB: tumor mutation burden. MSI: microsatellite instability
|
Somatic mutation profile
Whole exome sequencing was run on both the tumor and paired peripheral blood (normal) samples from the same patient and compared to give somatic mutations in the tumor for each patient. Overall, the tumor mutational burden (TMB) of STS with a median of 2.27 counts/Mb is relatively low in comparison with other cancers (4). There is no significant difference of TMB between LMS and DDLPS (median 2.61 and 1.97 counts/Mb respectively, p = 0.105) (Table 1). A large number of somatic mutations led by MUC16 (66%) and PABPC3 (56%) were detected (Supplementary Table S2). In order to focus on the genes with a known function in cancer or reported to related to cancer, we filtered the mutations against a list of 1190 member cancer related genes compiled from literature and OncoKB (http:\\www.oncokb.org)(26). There is a distinct distribution of somatic mutations between LMS and DDLPS among the top 30 most mutated cancer-related genes (Figure 1). Eleven out of the 20 (55%) LMS patients had mutation in TP53. In contrary, none of the DDLPS patient showed TP53 mutation. Similarly, FLG is mutated in 9 (45%) LMS patients but only in 1 (8.3%) DDLPS patient. ANHAK, ATRX, CSPG4 and PCMTD1 each was mutated in 5 (25%) LMS patients and none DDLPS patients. HERC2 was mutated in 5/8 (41.7%) DDLPS patients, C12orf55, DNAJC16, PTPRQ, and TIAM1 each was mutated in 3 (25%) DDLPS patients. None of the above five genes had mutation in LMS patients (Figure 1). As a comparison, the mutation frequency of HERC2 in TCGA-SARC dataset was 11/58 (19.0%) for DDLPS and 18/102 (17.6%) for LMS cases. The mutation frequency of TP53 in TCGA-SARC dataset was 10/58 (17.2%) for DDLPS and 80/102 (78.4%) for LMS cases. Therefore HERC2 mutation is imbalanced and TP53 mutation is generally lower in both DDLPS and LMS groups in our patients as comparing to that of TCGA-SARC dataset (4). This may arise from the difference of patient composition. All 32 cases in our group are ethnic Chinese but only 2/160 (1.25%) are Asian and 140/160 (87.5%) are white in the LMS and DDLPS cases of TCGA-SARC dataset. The number of cases is small in both groups though and that prevented us to draw a definitive answer about the distribution difference. The absence of HERC2 mutation and biased presence of TP53 mutation in the LMS group is an interesting observation. HERC2 belongs to E3 ubiquitin protein ligases, and can modulate p53 activity through regulating p53 oligomerization independent of MDM2 (27). Probably because the mutations that affect p53 tetramerization disrupt the HERC2-p53 interaction, therefore HERC2 mutations are redundant in LMS with mutant TP53.
SCNA in LMS and DDLPS
In addition to SNV and indel, gene amplification and deletion are also important contributors to carcinogenesis. To further reveal the somatic copy number alterations (SCNA) in Chinese sarcoma patients, we used GISTIC2.0 to detect SCNA (Supplementary Tables S3-5). We found significant chromosomal loss in LMS peaked at cytobands 10q22.1, 13q34, and 17p13.1, where PTEN, RB1 and TP53 are located respectively (Figure 2A). All these genes are important tumor suppressors. This suggested that chromosomal loss affecting tumor suppressor genes is a hallmark of LMS. On the other hand, we found a focal amplification peaked at 12q14.1, included in its wide peak are MDM2 (12q15), CDK4 (12q14.1) and HMGA2 (12q14.3) (Figure 2B). We further verified the prediction of GISTIC by GSEA analysis under the positional mode. We used chromosomal neighbors as the input gene set instead of signaling pathway members in GSEA and found 12q14-12q15 regions are indeed enriched in DDLPS (EnrichmentScore ES = 0.662 for 12q14 and ES = 0.784 for 12q15) (Figure 2C and data not shown). MDM2 and CDK4 are both important cell cycle regulating genes. HMGA2 is a chromosomal structure organization protein and may also play a role as a transcription factor. It has been reported that alteration of HMGA2 is associated with myxoid liposarcoma and takes part in adipogenesis and mesenchymal differentiation (28, 29). Amplification of 12q14 and 12q15 regions likely upregulated the above genes important in regulation of cell cycle and transformation of adipocytic tissue, therefore drove DDLPS. At the gene level, we detected co-amplification of MDM2 and CDK4 in more than 90% DDLPS cases. Such co-amplification combined with TP53 inactivation would result in cell proliferation, and is very likely the initiating events to drive fat tumorigenesis in DDLPS (30).
Additional genes detected by GISTIC 2.0 with amplification and deletion are listed in supplement Table S2-S5. In LMS, 96 genes are significantly amplified, while as many as 4532 genes are significantly deleted (confidence level 0.95), indicating that deletion is much more frequent than amplification. In DDLPS 1089 genes are amplified and no genes are significantly deleted. These observations are consistent with previous publications as reviewed in (3). In DDLPS amplification of oncogenes such as MDM2, CDK4, HMGA2 and JUN are common, while in LMS deletion of tumor suppressor such as TP53, RB1, and PTEN are preferred, indicating different patterns of CNA in DDLPS and LMS.
Concurrent and exclusive mutations
Given the possibility that synergistic or anti-synergistic interactions between genes may contribute to the origin or progression of LMS and DDLPS, we tested interactions among the somatic mutations called in WES analysis using the somaticInteractions function in MAFtools. (Figure 3, and the full list in Supplementary Table S2. TP53 is mutually exclusive with BRD9 (p<0.1) but co-occurs with Filaggrin (FLG) (p<0.1). BRD9 is a subunit of the human BAF (SWI/SNF) nucleosome remodeling complex and has emerged as an attractive therapeutic target in cancer (31). It has a bromodomain highly homologous to the bromodomain of BRD7, which is reported to interact with p53 and required for p53 function (32). Whether mutation of BRD9 promotes LMS through impairment of the p53 pathway is to be determined. FLG is a highly mutated cancer driver gene. Its mutation is also found in several other cancer types such as non-melanoma skin cancer, head and neck cancer, lung cancer, colorectal cancer, uterine cancer, and prostate cancer (33). There has no reported synergic interaction between FLG and TP53 mutations in cancer yet, including STS. The above somatic interaction analysis may provide hints for exploring genes with unspecified functions in STS.
RNA-seq revealed genes with differential expression
Eight each of the LMS and DDLPS samples that had sufficient RNA quality were further subjected to RNA-Seq analysis. Although fresh or snap-frozen tissue are generally preferred over FFPE samples for RNA extraction and sequencing analysis, FFPE samples are by far the most accessible tissue samples. Dedicated works have been done both by academic labs and commercial manufacturers to develop special protocols to perform RNA-seq from FFPE samples. They have succeeded in using exome capturing for partially degraded RNA. We used a New England Biolabs protocol and companion kits in sample preparation for RNA-seq (see Methods) which has been proven functional and are widely used in NGS work with clinical samples (16, 17).
In total we identified 2396 genes expressed with significantly different levels in the LMS and DDLPS (Figure 4 and Supplementary Table S6). Unsupervised clustering was performed to examine the discriminant effect of these genes. We found that all the LMS naturally clustered together, and so did the DDLPS samples. Ranked the highest in differential expression is MDM2, with logFC = 4.12 (DDLPS over LMS) and adjusted P-value = 1.73e-52, consistent with previous studies (4, 34). MDM2 is a proto-oncogene. It encodes a nuclear-localized E3 ubiquitin ligase and plays an important role in cell cycle regulation. Its copy number gain has been implicated in cancers including DDLPS (35, 36). We also observed the gene copy number of MDM2 is correlated with its mRNA in DDLPS (Figure 5A). Similarly, another key cell cycle regulator CDK4 is both highly upregulated in expression (logFC = 3.61, adjusted p-value = 1.09e-20) and amplified in gene copy number (Figure 5A). These results suggest that hyper activation of cell cycle is highly correlated with and may possibly a driving force underlying DDLPS. This notion is supported by a recent study in human mesenchymal stem cell model in which co-expression of MDM2 and CDK4 produced DDLPS-like morphology (37). Additionally, we found JUN upregulation (logFC= 1.45, adjusted P-value = 0.0396) in DDLPS (Supplementary Table S6). JUN is a known player in blocking adipocytic differentiation so may have assisted tumorigenesis in adipose tissue.
Since we detected both CNA in WES and differential gene expression in RNA-seq, it would be of interest to see whether CNA is correlated with gene expression. Although the top varied genes such as MDM2 and CDK2 indeed have a good correlation between their DNA and mRNA level (Figure 5A), the bulk of genes do not show this relationship, especially in the case of LMS (Figure 5B-C). This may due to several factors. Gene expression is a tightly regulated cellular process. The copy number of available template DNA, especially for those which only have a moderate increase as the bulk of genes are, is only a small contributor of gene expression activity. The more important regulation can be attributed to the accessibility of the chromosomal region, and activities of promoters, specific transcription factors, and DNA dependent RNA polymerase machinery, and mRNA stability.
Pathway analysis of genes with differential expression
To investigate the pathways affected in LMS and DDLPS, we performed Gene Set Enrichment Analysis (GSEA) (Table 2 and Supplementary Table S7)(12, 21). We found several pathways are enriched in DDLPS (ES > 0) while depleted in LMS (ES < 0). The pathways enriched in DDLPS include "Ubiquitin mediated proteolysis" (ES = 0.472, adjusted p-value = 0.002) which has MDM2 as a member. The pathways apparently enriched in LMS such as “Calcium signaling pathway", "Vascular smooth muscle contraction", and “Linoleic acid metabolism”(ES = -0.37, -0.44, and -0.60 respectively, and adjusted P-values = 0.002 for all) are probably due to the difference of the tissue origins rather than tumorigenesis (38, 39). Another notable pathway is spliceosome (ES =0.53, adjusted p-value =0.003) because alteration in transcript splicing may result in different sets of antigens differentially recognizable by immune cells in different tumor environments.
Table 2. Pathways enriched in GSEA (LMS in relative to DDLPS)
Name
|
Total
|
Hits
|
EnrichmentScore
|
p-val
|
p-adj
|
Calcium signaling pathway
|
68
|
58
|
-0.37
|
0.00017
|
0.0028
|
Vascular smooth muscle contraction
|
30
|
22
|
-0.44
|
0.000017
|
0.0028
|
Linoleic acid metabolism
|
30
|
26
|
-0.60
|
0.00018
|
0.0028
|
Ascorbate and aldarate metabolism
|
34
|
28
|
-0.67
|
0.00018
|
0.0028
|
Chronic myeloid leukemia
|
33
|
27
|
0.53
|
0.000022
|
0.0028
|
p53 signaling pathway
|
31
|
27
|
0.50
|
0.000022
|
0.0028
|
Spliceosome
|
27
|
22
|
0.52
|
0.000022
|
0.0028
|
AGE-RAGE signaling pathway in diabetic complications
|
18
|
16
|
0.47
|
0.000023
|
0.0028
|
Ubiquitin mediated proteolysis
|
27
|
25
|
0.47
|
0.000023
|
0.0028
|
Osteoclast differentiation
|
44
|
39
|
0.50
|
0.000023
|
0.0028
|
Distinct gene fusion patterns between DDLPS and LMS
In the RNA-seq analysis, we identified in 3 (out of 8) DDLPS patients 4 potential gene fusion transcripts and in all 8 LMS patients 36 fusion transcripts in total (Figure 6 A-C and Supplementary Table S1). The distribution of fusion events is biased towards DDLPS over LMS (p = 0.0195, Figure 6C). There is no recurrent fusion transcript identified probably due to the small sample size. Fusion transcripts involving chromosome 12 are only found in DDLPS, including both inter- and intra-chromosomal rearrangements (Figure 6B). MDM2 and RAB3IP are the most common fusion partners, and both are located in chromosome 12. There is also significant correlation between MDM2/CDK4 amplification and chromosome 12 rearrangement (P < 0.001, Figure 7B). Gene fusions can occur as a result of chromosomal rearrangements such as translocation, interstitial deletion, or inversion during DNA replication, which are more common in DDLPS than in LMS (Figure6 A-B). Therefore, it is not a surprise to see gene fusions more frequently in DDPLS, especially in chromosome 12 where ring or giant marker chromosomes often occur (4, 40). Peptides generated from the identified fusion transcripts may be a potential source of tumor neoantigens that can be targeted to produce safer and patient specific CAR-T cells for immunotherapy (41). Although RNA-seq by NGS method is good at high throughput survey of all possible gene fusion events in the sample, it relies on bioinformatic models to predict the fusion. This may potentially introduce errors. We selected all 4 predicted gene fusion transcripts in LMS and 8 out of 36 in DDLPS to verify with RT-PCR in tandem with Sanger sequencing. We were able to verify only 1 event in LMS but 7 out of 8 events in DDLPS (Figure 6 D and Supplementary Table S1). This confirmed that gene fusions are relatively more frequent in DDLPS but scarce in LMS and further highlighted that neoantigen based immune therapy may have a higher success rate in DDLPS than in LMS.
LMS and DDLPS have different Profiles of Tumor Infiltrating Immune Cells
The status of tumor microenvironment (TME) is an important factor when immune therapy is considered. Previous studies revealed that STS immune subtypes are associated with response rate to PD1 blockade. To access the TME in LMS and DDLPS and compare the difference between them, we used MCP-counter to profile tumor infiltrating immune cells in the two subtypes following a recent study in STS (Figure 7) (24). Among the eight immune populations (T cells, CD8+ T cells, cytotoxic lymphocytes, natural killer cells, B cell lineage, monocytic lineage, myeloid dendritic cells and neutrophils) and two stromal populations (endothelial cells and fibroblasts) used by MCP-counter to categorize sarcomas immune classes (SIC), DDLPS has a higher signature score than LMS in both the non-immune cell populations (p= 0.002 for fibroblasts and p = 0.047 for endothelial cells). Increased endothelial cell signature score has been shown to associate with a high density of CD34+ endothelial cells and enhancer free endothelial-driven angiogenesis in STS (4). The high fibroblasts signature score of DDLPS is consistent with its mesenchymal origin. The previous TCGA study suggested that CD8+ T cells are higher in DDLPS than in LMS (P <0.01), but no such significance was observed in the current study (P = 0.76) (4).
Unsupervised clustering of the 16 DDLPS/LMS RNA samples according to their MCP-counter Z-scores revealed a bipartite pattern (Figure 7A). All the LMS samples were classified to SIC A as defined by Petitprez et al., while most DDLPS samples except one were classified to SIC B (24). The SIC B cluster can be further divided to two subclasses (B1 and B2) with the B2 subclass has a higher immune cell content in general. We also looked the expression of individual immune checkpoint molecules in the sample. It showed a higher representation of PD-L1 in LMS and higher TIM3, PD1 and CTLA4 in DDLPS in some individual samples. Overall, the immune checkpoint point molecules are not highly active consistent to the observed ICI treatment efficiency in STS. However, the expression profiles of individual patients are worth to check when an ICI prescription is considered.