Patients and pathological review
This study was conducted in accordance with the Declaration of Helsinki and received approval from the ethics committee of the Chiba Cancer Center (approval number: M04-001). Patients with high-grade EC who underwent surgical resection at Juntendo University from 2006 to 2022 (14 patients, approval numbers: M09-0551, M20-0007), The University of Tokyo Hospital from 2010 to 2022 (51 patients, including 8 patients with grade 1–2 endometrioid carcinomas, approval numbers: G0683, 2022083Ge), or National Cancer Center Hospital from 2012 to 2015 (24 patients, approval number: 2022-003) were enrolled in this study. Written informed consent for research use of the samples and clinical information was obtained from all participants. Tissue samples from the patients at the National Cancer Center Hospital were provided by the National Cancer Center Biobank, Japan.
Pathological diagnoses were initially made according to the 2020 WHO classification20 (https://tumourclassification.iarc.who.int/welcome/) by at least two pathologists at each institution. Subsequently, a gynecologic pathologist (H.Y.) conducted central review to confirm histological type. Only tumors diagnosed as high-grade EC such as serous carcinoma, clear cell carcinoma, EMG3, undifferentiated/dedifferentiated carcinoma, carcinosarcoma, or small cell carcinoma were included in the study. Representative histological images are shown in Supplementary Figure S6. All pathologists were blinded to genomic analysis results and prognostic information. All specimens resected during surgery were appropriately fixed in 10% neutral buffered formalin for 12–48 hours immediately after the collection of frozen tissue samples. Frozen tumor samples were used for genomic analysis. Formalin-fixed paraffin-embedded sections, including representative tumor areas, were provided for IHC analysis. Clinicopathological data for each patient were retrospectively obtained from medical records.
Whole-exome sequencing
Tumor tissues were collected immediately after surgery, cut into small pieces, and stored at − 80°C until use. Genomic DNA was extracted from surgically resected specimens using the QIAamp Fast DNA Tissue Kit (Qiagen, Venlo, Limburg, Netherlands). Peripheral blood samples were stored at − 30°C. Genomic DNA was extracted from peripheral blood samples using the QIAamp DNA Blood Midi Kit (Qiagen). Whole-exome sequencing (WES) libraries were generated using the Twist Library Preparation EF system (Twist Bioscience, South San Francisco, CA) with enzymatic fragmentation, the Twist Universal Adaptor system, and the Twist Fast Hybridization Target Enrichment system. Briefly, 50 ng of genomic DNA was enzymatically fragmented to 200–300 bp, followed by end-repair, A-tailing, and pair-end index adaptor ligation. Pre-capture amplification was conducted with KAPA HiFi DNA polymerase (KAPA Biosystems, Roche Diagnostics, Indianapolis, IN, USA). Exonic fragments from approximately 750 ng of amplified products were enriched using the Twist Comprehensive Exome Panel as a probe. Massively parallel, paired-end sequencing of sample libraries was performed with a NovaSeq6000 sequencer (Illumina, San Diego, CA, USA).
Analysis of sequence data
Reads from paired-end WES were aligned to the human reference genome (hg38) using the Burrows-Wheeler Aligner (http://bio-bwa.sourceforge.net/) and Bowtie 2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml). Somatic (synonymous and non-synonymous) mutations were called using an in-house caller and publicly available mutation callers: Genome Analysis Toolkit (https://gatk.broadinstitute.org/hc/en-us), MuTect2 (https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2), and VarScan 2 (http://varscan.sourceforge.net/). Mutations were discarded if any of the following criteria were met: total read number < 20, variant allele frequency (VAF) in tumor samples < 0.05, mutant read number in germline control samples > 2, mutation in only one genome strand, or variant present in normal human genomes of the 1,000 Genomes Project dataset (https://www.internationalgenome.org/) or the in-house database. Gene mutations were annotated using SnpEff (http://snpeff.sourceforge.net). CN was analyzed using an in-house pipeline that determined the log R ratio (LRR) as follows: (I) homozygous (VAF ≤ 0.05 or ≥ 0.95) or heterozygous (VAF 0.4–0.6) single-nucleotide polymorphisms (SNPs) were selected from the genomes of the related normal samples in the 1,000 Genomes Project database; (II) normal and tumor read depths for the selected SNPs were adjusted according to the G+C percentage of a 100-bp
Analysis of mutational signatures
Mutations with VAF ≥ 0.1 that passed the quality filter were subjected to mutational signature analysis with SigProfilerExtractor39 (https://github.com/AlexandrovLab/SigProfilerExtractor). To summarize results, we aggregated single base substitution (SBS) categories into broader categories. The age category was derived by summing the values from SBS1 (Age) and SBS5 (Age). The APOBEC category combined values from SBS2 (APOBEC) and SBS13 (APOBEC). Similarly, the POLE category was created by summing values from SBS10a (POLE), SBS10b (POLE), SBS14 (POLE), and SBS20 (POLD1). The MMRd category was based on values in SBS15 (MMRd). SBS87 was used as is. According to the description provided on the Catalogue Of Somatic Mutations In Cancer (COSMIC) website (https://cancer.sanger.ac.uk/signatures/), we also aggregated CN signature categories into broader categories. The HRD-TandemDup category was derived by summing the values from CN17 (HRD), CN18 (unknown), CN19 (unknown), and CN20 (unknown).
Transcriptome sequencing
Ribosomal RNA depletion with NEBNext® rRNA Depletion Kit v2 (New England Biolabs, Ipswich, MA, USA) was performed on 500 ng of RNA extracted from clinical samples. Sequencing libraries were prepared using the NEBNext® Ultra™ II RNA Library Prep Kit for Illumina® (New England Biolabs) and sequenced over 150 bp from both ends using the NovaSeq6000 sequencer (Illumina). The expression level of each gene was calculated using DESeq2 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html). Gene set enrichment analysis and single sample GSEA were performed with the GSEAPY package, version 1.0.6 40 (https://gseapy.readthedocs.io/en/latest/gseapy_example.html#Prerank-example).
Microsatellite instability testing and detection of MLH1 promoter methylation
Using labeled primers, microsatellite instability was analyzed with polymerase chain reaction (PCR) at five microsatellite loci: BAT25, BAT26, NR-21, NR-24, and MONO-27. MLH1 promoter methylation was assessed with PCR after digestion of genomic DNA with a methylation-sensitive restriction enzyme as previously described41, with slight modifications. Genomic DNA (200 ng) was digested in a volume of 20 µL reaction buffer by HinP1I (Nippon Genetics, Tokyo, Japan), followed by heat inactivation of restriction enzymes. Digested DNA (20 ng) was subjected to 25 cycles of multiplex PCR in a total volume of 10 µL reaction buffer using the Type-it Microsatellite PCR Kit (Qiagen). PCR products were electrophoresed on a 3730xl DNA Analyzer (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) and analyzed using Peak Scanner 2 software (Applied Biosystems).
TCGA data analysis
We obtained batch effect normalized RNA-seq data for uterine corpus endometrial carcinoma pan-cancer and uterine carcinosarcoma pan-cancer from UCSC Xena24. Metadata, such as histological type and subtype information, were acquired from both UCSC Xena and cBioPortal (https://www.cbioportal.org/). The analysis was conducted using 595 samples (including adjacent normal tissues) and 17,507 genes (with no missing values) for which gene expression data and metadata were available.
Immunohistochemical analysis and interpretation
All IHC tests, including those for p53 (DO7, prediluted, Dako, Glostrup, Denmark), ARID1A (HPA005456, rabbit polyclonal, 1:2000, Sigma-Aldrich, St. Louis, MO, USA), PMS2 (EP51, prediluted, Dako/Agilent), MSH6 (EP49, prediluted, Dako/Agilent), estrogen receptor (ER) (SP1, prediluted, Ventana Medical Systems, Tucson, AZ, USA), progesterone receptor (PR) (1E2, prediluted, Ventana Medical Systems), CD8 (4B11, 1:200, Dako), and PD-L1 (E1L3N, 1:400, Cell Signaling Technology, Danvers, MA, USA) were performed as described in our previous study23,42. IHC testing for p53, ARID1A, PMS2, MSH6, CD8, and PD-L1 was performed with the Link 48 autostainer (Dako/Agilent). IHC testing for ER and PR was performed using the BenchMark ULTRA autostainer (Ventana/Roche Diagnostics). After deparaffinization, tissue sections were stained using the antibodies mentioned above and then counterstained with hematoxylin.
Pathologists evaluated all IHC slides according to the following criteria. For p53 staining, tumors with either a strong and diffuse nuclear staining pattern (> 80% of carcinoma cells) or a staining pattern entirely negative of carcinoma cells (null pattern) were considered to have aberrant p53 staining pattern indicating TP53 alterations. Staining of the adjacent nontumor cells served as an internal positive control. Tumors with weak or heterogeneous staining patterns were considered to have wild-type TP53. In addition, subclonal mutant p53 immunostaining was defined as a combination of wild-type patterns and one or more mutant patterns, with each present in at least 5% of tumor cells43. The loss of ARID1A nuclear expression in tumor cells was classified as either homogeneous (negative in almost all tumor cells) or focal (regionally negative)42. Because IHC for PMS2 and MSH6 can reportedly be used instead of the four-antibody panel (MLH1, MSH2, MSH6, and PMS2) for MMRd screening5, MMRd status was defined as the complete loss of nuclear staining for PMS2 or MSH6 proteins in this study. Internal positive controls with intact nuclear staining included the adjacent normal mucosa, stromal cells, and inflammatory cells. The proportion of positive tumor cells with nuclear staining, regardless of intensity, was measured for ER and PR expression. CD8 + T cells were analyzed using HALO software (Indica Lab, Corrales, NM, USA). Specifically, the tumor area was manually annotated by a pathologist (S.S.), the number of CD8 + T cells and their area within that region were measured, and the number of CD8 + T cells per unit area was calculated. PD-L1 expression was assessed by calculating the combined positive score (CPS)44. CPS was calculated by dividing the number of PD-L1–positive cells (including viable tumor cells, lymphocytes, and macrophages) by the total number of viable tumor cells and multiplying by 100. Representative IHC staining results are shown in Supplementary Figure S7.
Statistical analyses
Python (version 3.11.4) and SciPy (version 1.11.1) were used for statistical analyses. Comparisons of quantitative variables between groups were conducted using the nonparametric Wilcoxon signed-rank test or analysis of variance (ANOVA) followed by Tukey’s post hoc test. Correlations were assessed with Pearson’s correlation coefficients. P values < 0.05 were considered statistically significant.