Low-Coverage Whole Genome Sequencing Diagnoses Endometrial Carcinoma from Tampon DNA of Uterine Bleeding Patients

Endometrial carcinoma (EC) is a disease predominantly affecting postmenopausal women. It accounts for about 5% of abnormal uterine bleeding. It is still challenging to diagnose cancers from uterine bleeding patients. Previously, chromosome aberrations were found to be frequent in EC. Here we employed a low coverage whole genome sequencing technology to investigate chromosome aberrations in tampon-collected DNA of patients with suspicious EC.

, which are account for 80-90% of all the EC while only 40% of the EC-related deaths [3][4][5]. By contrast, up to date, no precancerous conditions were identi ed for type II tumors, which were believed to be non-estrogenrelated and lack the typical symptoms such as abnormal uterine bleeding (AUB) and postmenopausal bleeding (PMB). Remarkably, type II EC is more invasive, and the 5-year survival is much lower than type I [3,4,6].
For most if not all cancers, the population-based screening and detection of precancerous and early-stage cancers could be of much help to decrease the mortality and relieve the socio-economic burden [7]. Taking cervical cancer as an example, in the last decades, widely practiced human papillomavirus and cytology screening for high-risk subjects has a good e cacy on detecting cervical pre-cancerous lesions and eventually decreased the incidence of invasive cervical cancer [8]. Unfortunately, the screening and early detection of EC remains a huge challenge. The current clinical evaluation for EC hinges on women presenting symptoms such as AUB or PMB, or asymptomatic women with thicken endometrium or uterine cavity mass indicating by ultrasonography or other imaging tests [9]. Following suspicious diagnoses, more invasive procedures, such as dilation and curettage (D&C), and hysteroscopy examination, are performed to obtain the pathologic evidences. Not only does this place much more burden on the individual patient, it also introduces increased costs and additional risks of anaesthesia-or procedure-related complications [10]. More importantly, considering that only 5-10% of AUB or PMB cases were diagnosed to be EC or atypical hyperplasia, it is believed that most patients undergo unnecessary but intensive diagnostic work-up for benign diseases [10].
EC is a disease predominantly in postmenopausal women, with over 80% cases occurring in women over 50 yearsold. However, the epidemiologic data showed that increasing rate of obesity may lead to a rise in the proportion of pre-menopausal EC cases [11,12]. The high proportion of ECs diagnosed at an early stage is largely due to abnormal bleeding, which is present in 94% of such cases [13]. In postmenopausal women, the presence of PMB equates to a risk of EC of 9%. In contrast, the risk of EC in pre-menopausal women with AUB is only 0.33% (95% Con dence Interval (CI) 0.23-0.48).
According to the ndings from The Cancer Genome Atlas projects (TCGA), a large portion of EC patients presented signi cantly Copy Number Variations (CNVs), which might be used for the diagnosis and treatment of EC [14]. In this pilot study (single-blinded), we prospectively collected the exfoliated cells using a vaginal tampon, and developed a bioinformatics protocol named "Uterine exfoliated cell Chromosomal Aneuploidy Detector" (UterCAD) to detect CNVs. The sensitivity and speci city of UterCAD for early EC diagnosis were investigated in patients presented with AUB or PMB.

Statistical Analysis
At least 10 M paired reads were collected for each sample. The reads were mapped to human reference genome HG19. Genomic coverage was counted using the software SAMtools mpileup. Then we calculated average coverage for each 200-kb bin. Z-scores for each bin was then normalized by Z-score by using the Formula 1.
The Circular binary segmentation algorithm was then used to identify signi cant genomic breakpoints and copy number changed genomic segments, by using the R package 'DNACopy'.
Categorical variables were reported as frequencies and percentages, and continuous variables were described as mean and standard deviation (SD), or median with interquartile range (IQR), as appropriate.

Patient characteristics
From March 2020 to January 2021, 51 consecutive patients presented with AUB or PMB were admitted to our department. After the tampon use, one patient withdrew due to personal reason. After DNA extraction and surgery, 5 patients diagnosed as EC according to the biopsy from previous D&C were not found with cancer cells after hysterectomy ± lymphadenectomy, and another 1 patient was agged with low DNA quality. Thus, these 7 patients were excluded from further UterCAD analysis. The STARD ow diagram was shown in Fig. 1. The baseline characteristics of these patients were shown in Table 1. The median ages were 57 years-old and 59 years-old in the EC group (ranged from 43 to 83 years-old) and benign group (ranged from 45 to 75 years-old), respectively. The postmenopausal rate was 83.3% (25/30) and 85.7% (12/14) (P = 0.608), respectively. According to the nal pathological reports, 26 cases were diagnosed to be EEC and the other 4 cases were USC. The benign group consisted of focal atypical hyperplasia (n = 1), endometrial polyps (n = 5), non-atypical hyperplasia (n = 4), submucosal leiomyoma (n = 2) and endometritis (n = 2). and 10q gain (n = 2) in EEC ( Fig. 2A), and multiple chromosome changes in USC (Fig. 2B), while none of these CNVs were found in benign lesions (Fig. 2C)

Z-scores Between Malignant and Benign Lesions
We further explored the value of Z-score of each chromosome arm in differentiating endometrial malignancies from benign diseases. The Area Under Curve (AUC) of Receiver-Operating-Characteristic curve ranged from 0.428 to 0.749 (median AUC = 0.549, Table 2). Chromosomal 8q and 10q showed high diagnostic accuracy, with AUCs of 0.749 and 0.612, respectively (Table 2). We combined all the chromosomes' information to build a diagnostic model for EC. The optimal Z-score cutoff |Z| ≥ 2.40 was calculated by Youden Index. At this cutoff, UterCAD test showed a sensitivity of 83.3% and a speci city of 92.9% (Table 3). The AUC was 0.91 (0.83-0.99), which was better than the result from any single chromosome. A higher cutoff (|Z| = 3) showed better speci city (100.0%), whereas compromising the sensitivity (47.1%). The diagnosis model identi ed 100% (4/4) of USC and 80.8% (21/26) EEC (Table 3). As shown in Fig. 3. The AUC is 1.00, 0.933 and 0.942 for diagnosing USC, EEC, and overall, respectively. Tampon DNA positive rate correlates with menopause.

Discussions
As a lesion located in the uterus cavity, endometrial cancer usually causes abnormal bleeding, which is much more prevalent in women elder than 45 or postmenopausal phase [3]. During last decades, the mortality and morbidity of EC keep rising, which even surpassed cervical cancer in certain areas and signi cantly threatened women's health [2,15].
Thus, an effective approach for the screening and early detection of EC is urgently needed.
According to previous studies, human cancers (including EC) are commonly characterized by a rapid growth and increased need of energy and material supplies. Due to the lack of energy and oxygen, necrosis was quite common in tumor mass and caused the fell off of cancer cells into nearby cavities like oesophagus, stomach, urinary and uterine/vaginal tracts [16][17][18][19]. This proposed an opportunity to collect the exfoliated cells from EC tumor using a vaginal tampon, which is much less invasive than the previously reported uterine cavity brush or D&C [20,21]. In this research, we proposed a non-invasive method, UterCAD, which relies on tampon-based whole genome DNA sequencing technology.
In this study, we used a tampon to collect the exfoliated cells from upper genital tract and investigated its CNVs for the early diagnosis of EC. As data shown, the UterCAD technology achieved a sensitivity of 83.3% and a high speci city of 96.2%, proposing it may be a powerful non-invasive method for the early detection of EC. Interestingly, 5 EEC patients (diagnosed in other hospitals via D&C) presented no cancer cells after hysterectomy in our hospital, indicating these small lesions might be only localized in endometrium. Consistently, UterCAD analysis detected no CNVs in all these 5 cases, implying the superior speci city of this technology.
Similarly, several previous studies have reported the Pap test could collect samples from vagina, cervix surface, cervical canal, uterine cavity, and even fallopian tubes, which might be an attractive approach for the non-invasive diagnosis of EC. As previously reported, Kinde et al. and Wang et al. detected EC-derived DNA in 41% and 29% of associated Pap samples in two independent patients groups, the low detection rate might be caused by the short period (just a few seconds) for sample-collecting using Pap-smear, which signi cantly limited the quantity of tumor cells [17,18]. In the current study, we improved the cells-collection using a vagina tampon (6-hour protocol) and demonstrated it could harvest many more cells for UterCAD test.
In fact, the presence of chromosomal CNV may have a stronger indication for an underlying carcinogenic event, since not all mutations lead to gene dysfunction and cancer events. According with TCGA data, the application of UterCAD for EC early diagnosis is endorsed by the fact that this malignancy is particularly rich in CNVs, especially in these high grade tumors like USC and clear cell carcinoma [22]. With regards to our cohort, UterCAD detected signi cant CNVs in all these 4 USCs, which was in consistent with previous researches [23]. In addition, women with positive ndings indicated by UterCAD might be in risk for the more aggressive USC and warrants immediate actions.
While this study was a pilot study to test DNA copy numbers, further large-sample-size validations should be performed in our future work. Some previous studies also showed a small proportion of endometrial tumors are characterized with immense gene single nucleotide mutations but negligible copy number variations [24]. The Cancer Genome Atlas provided us an overview of endometrial cancer and the over-represented point mutations. A panel of high frequent mutations may be further necessary in improving the overall diagnosis sensitivity, especially with regards to certain subtypes of EC [14].
Collectively, we rstly investigated the e ciency of UterCAD, a genome sequencing method based on tamponcollected DNA, in a group of suspicious EC. Our results proposed a special effect of UterCAD for the early detection of ECs (especially type II tumors with more frequent CNVs). The high sensitivity and speci city warrant UterCAD as a non-invasive procedure before endometrium biopsy.