Vaginal Microbiota Characteristics of Pregnancy of Unknown Location Women in China(VMCPULW): Study Protocol for a Prospective Cohort

Background: Pregnancy of unknown location (PUL) is a kind of pregnancy that has a positive beta-human chorionic gonadotropin (β-hCG) test result but the location of pregnancy cannot be determined by ultrasound. Early determination of the location of the pregnancy is important for subsequent treatment. However, there is no study on the characteristics of vaginal microbiota in pregnant women with unknown locations and a model for judging it by vaginal microbiota combined with clinical indicators. Therefore, we designed this study to compare the characteristics of vaginal microbiota in intrauterine and ectopic pregnancy populations during pregnancy with unknown locations and to establish a prediction model for pregnancy locations in PUL populations with clinical indicators. Methods: This is a prospective, multicenter cohort study. 576 eligible participants will be included in this study. Vaginal microbiota was collected from all participants at inclusion, and color Doppler ultrasound was performed weekly. After the locations of pregnancy were determined, participants of intrauterine pregnancy were followed up to their early pregnancy outcome, and participants of ectopic pregnancy were followed up until a none-pregnancy level of β-hCG was confirmed. Discussion: The regular method of judging the location of pregnancy is by color Doppler ultrasound and β-hCG test. We hope to provide earlier clinical methods of prediction for women with unknown locations of pregnancy through this study. menstruation (regular or not), symptoms and signs (vaginal bleeding, abdominal pain), pregnancy-related laboratory test results (the results of the first β-hcg, the first estradiol and the first progesterone test), results of sex hormone tests, serological indicators (hemoglobin, total white blood cell count and platelet count), ultrasonography results (size of gestational sac , size of ectopic pregnancy, volume pf pelvic fluid) questionnaires and patient follow-up medical record data filling tools. These tools are first being developed in English, then translated into Chinese, and retranslated back into English. The interviews will be conducted in Chinese. The interviewer-administered questionnaire will be divided into two parts. The first part of the questionnaire involves the participants' sociodemographic characteristics, menstrual condition, maternal history and laboratory tests. The second part is the follow-up content after the pregnancy location of the patient has been determined. This questionnaire will be completed face-to-face by a trained interviewer at the time of participant recruitment. We also conducted a pre-experiment on the questionnaire, which ensures the clarity of each question in the questionnaire.


Introduction
Pregnancy of unknown location (PUL) is a kind of pregnancy that has a positive beta-human chorionic gonadotropin (β-hCG) test result but the location of pregnancy cannot be determined by ultrasound. The outcome of PUL can be a failed PUL(FPUL), intrauterine pregnancy (IUP), ectopic pregnancy (EP) or persistent PUL (PPUL).(1) The location of pregnancy cannot be easily detected before 5-6 weeks' gestation. (2) The time interval from confirmation that their pregnancy is ongoing to determine the pregnancy location will be left. It was difficult to determine the location of pregnancy clinically.(3) However, PUL may be a transient state of EP which is a main cause of maternal morbidity and mortality with a reported rate of 6%-20%. (4)(5)(6) Clinically, early diagnosis of EP is fundamental for safe and effective treatment. Whether and when to intervene in the PUL is a dilemma.
Human microbiota refers to the totality of microorganisms that exist inside and on the human body. (7) The human vaginal microbiota is unique, with the presence of Lactobacillus dominates microbiota species. (8) Thus, the imbalance of the normal flora could increase the risk of infectious disease and adverse pregnancy outcomes. (9)(10)(11) The rapid advance of bacteria's deoxyribonucleic acid (DNA) extraction, on-board sequencing and databases has brought breakthroughs in the study of microbiota. Among them, 16S rRNA gene sequencing is the most widely used technology in recent years. Microbiota can be identified at the level of "genus" or even "species". (12)At present, most studies on the relationship between microbiota and pregnancy are mainly focused on the exploration of pregnancy outcomes (such as preterm birth), (13,14) and no one has conducted research on the prediction of pregnancy location of PUL. We hypothesize that among the PUL people, there are significant differences in the structure and number of their microbiota among different people who will have ectopic pregnancy or intrauterine pregnancy in the future.
Therefore, we decided to analyze the differences of the vagina microbiota of pregnant women with unknown locations and definite pregnancy locations, and try to find markers to predict the pregnancy location in advance.

Study setting
This is a prospective, multicenter cohort study conducted from August, 2020 to December, 2021 in 4 centers: the Women and Children's Center of the First Affiliated Hospital of Guangzhou University of Chinese Medicine, Shenzhen Maternity&Child Healthcare Hospital, He Xian Memorial Affiliated Hospital of Southern Medical University, and the outpatient department of Dongzhimen Affiliated Hospital of Beijing University of Chinese Medicine. Based on 16S rRNA second-generation high-throughput sequencing, the structure and characteristics of vagina microbiota in women with early pregnancy, and their relationships with early pregnancy outcomes (normal pregnancy, tubal pregnancy) will be studied. Participants will register at four hospitals in mainland China. The Department of Gynecology of the Women and Children's Center of the First Affiliated Hospital of Guangzhou University of Chinese Medicine, as the main research center, is a national clinical key specialty in China and a key specialty of the State Administration of Traditional Chinese Medicine, with great academic influence in the industry nationwide. This study has been approved by the Ethics Committee of the hospital where the main research site is located. Before starting any clinical study, each participant will sign an informed consent form. This study is registered on the Chinese Clinical Trial Register Network (identifier: ChiCTR2000035378)

Sample size
The estimation of sample size in this study is based on the calculation of previous study. Among 100 cases of early pregnancy included in the previous study, 28 cases of vaginal bacterial type belonged to non-lactobacilli predominant bacterial type (NLDM), of which the actual incidence of extrauterine pregnancy was 67.9%, 72 cases of vaginal bacterial type belonged to lactobacilli predominant bacterial type (LDM), of which the actual incidence of extrauterine pregnancy was 25%.
Since the study subjects of the first 100 cases were women hospitalized for vaginal bleeding symptoms, and the present study is a prospective cohort study, the sample size is estimated according to the incidence of vaginal bleeding of 24.2% in the population, 62.6% of whom saw a doctor for vaginal bleeding. (15)Taking the 15% loss to follow-up rate in to account, the estimated sample size is 416 cases (63/24.2%/62.6%).

Study timeline
This study protocol consists of three main phases. For patients diagnosed with pregnancy but whose pregnancy positions are unknown, the follow-up time nodes are: (1) At enrollment: 4-8 weeks of pregnancy, the position of pregnancy has not been identified The preparation before sample collection, all sampling personnel shall be uniformly trained in a same method, and can only take up their posts after passing the assessment. All personnel will wear doctor's coats, disposable medical hats, masks and latex gloves when sampling. Sampling will be performed according to standard operating procedures to ensure the consistency of different collection personnel. The observation items at each follow-up time node are vaginal secretion.
Methods of Collecting Vaginal Secretion: (16) The vagina will be distracted with a sterile blade, and three samples of vaginal lateral wall secretion will be taken from the upper half of the vagina and cervical fornices, while avoiding the cervix with sterile swabs, one of which will immediately go through vaginal discharge examination(Levels of pH, hydrogen peroxide (H2O2), leukocyte esterase(LEU), sialidase(BV), β-glucuronidase(β-GD), and acetylglucosaminidase (AG) in the sample will be detected by dry chemical enzymatic method).
Quality assurance of sample separation, transportation and storage All the samples used are collected according to collection SOP, which ensures that they are transported on dry ice to the -80℃ refrigerator in laboratory for freeze-storage within 4 hours. A dedicated person will register the number of samples sent and received and the transportation process.
Whether the sample is consistent with the investigator will be carefully checked.

Observation of Adverse Events
For early pregnancies whose positions remains unknown, tubal pregnancy rupture and miscarriage are the most serious adverse events.
(1) Patients with ruptured tubal pregnancy require immediate surgical treatment; (2) Patients with threatened abortion should immediately be hospitalized for tocolysis, and patients with threatened abortion that develops into inevitable abortion and missed abortion should immediately be treated with uterine curettage.
Specific process of sample DNA extraction, library construction and on-board sequencing For bacterial diversity analysis, V3-V4 (or V4-V5) variable regions of 16S rRNA genes was amplified with universal primers 343 F and 798 R (or 515F and 907R for V4-V5 regions).
(2) Library construction Amplicon quality was visualized using gel electrophoresis, purified with AMPure XP beads (Agencourt), and amplified for another round of PCR. After purified with the AMPure XP beads again, the final amplicon was quantified using Qubit dsDNA assay kit. Equal amounts of purified amplicon were pooled for subsequent sequencing.

Data management and monitoring
In this study, the overall responsibility system of project manager under the leadership of project leading group, the three-level responsibility system of leading group and site supervisor responsibility system for each project point will be implemented. The training requirements of this cohort study will be strictly abided. A system for regular reporting and inspection will be established, so as to timely find and correct problems.
Training of field researchers: (1) ensure the consistency of different researchers. (2) Training of data entry personnel: Double entry is performed with EDC database. The inconsistent entries and extreme data will be confirmed by a third person by consulting the original data.
Subject compliance: Generally, each survey takes no more than 5 minutes. If the time is too long, the respondent is prone to fatigue, resulting in poor accuracy of answers, which will ultimately affect the results.
Monitoring of study: The overall responsibility system of project manager under the leadership of project leading group, the three-level responsibility system of leading group and site supervisor responsibility system for each project point will be implemented. The training requirements of this cohort study will be strictly abided by. A system for regular reporting and inspection will be established, so as to timely find and correct problems.

Questionnaire development
We will use data collection tools to collect information of patients: interviewer-administered questionnaires and patient follow-up medical record data filling tools. These tools are first being developed in English, then translated into Chinese, and retranslated back into English. The interviews will be conducted in Chinese. The interviewer-administered questionnaire will be divided into two parts. The first part of the questionnaire involves the participants' sociodemographic characteristics, menstrual condition, maternal history and laboratory tests. The second part is the follow-up content after the pregnancy location of the patient has been determined. This questionnaire will be completed face-to-face by a trained interviewer at the time of participant recruitment. We also conducted a pre-experiment on the questionnaire, which ensures the clarity of each question in the questionnaire.

Statistical analysis 1. Bioinformatic analysis:
Raw sequencing data were in FASTQ format. Paired-end reads were then preprocessed using Trimmomatic software (17) to detect and cut off ambiguous bases (N). It also cut off low quality sequences with average quality score below 20 using sliding window trimming approach. After trimming, paired-end reads were assembled using FLASH software (18). Parameters of assembly were: 10bp of minimal overlapping, 200bp of maximum overlapping and 20% of maximum mismatch rate.
Sequences were performed further denoising as follows: reads with ambiguous, homologous sequences or below 200bp were abandoned. Reads with 75% of bases above Q20 were retained. Then, reads with chimera were detected and removed. These two steps were achieved using QIIME software (19) (version 1.8.0).
Clean reads were subjected to primer sequences removal and clustering to generate operational taxonomic units (OTUs) using Vsearch software (20) with 97% similarity cutoff. The representative read of each OTU was selected using QIIME package. All representative reads were annotated and blasted against Silva database Version 123 (or Greengens) (16s rDNA) using RDP classifier (21) (confidence threshold was 70%)..

Statistical description and analysis of clinical data:
Clinical data (e.g., height, weight, maternal history, hormone levels, blood analysis, etc.) will be tested for normality using the Shapiro-Francia method (W test), and P>0.05 is considered to conform to normal distribution. Continuous variables conforming to normal distribution will be expressed a s mean ± standard deviation (mean ± SD), biased distribution will be expressed as median and inter quartile range (median, Q1-Q3); categorical variables will be expressed as percentage (%).
Comparison of differences between two groups for continuous variables: Student t-test will be used for consistent normal distribution, Wilcoxon rank sum test will be used for abnormal distribution; comparison of differences between groups: one-way ANOVA will be used for consistent normal distribution, Kruskal-Wallis H test will be used for abnormal distribution. Categorical variables are analyzed by the chi-squared test, and if the count variable is < 10, it will be obtained from Fisher's exact probability test. All P values are two-sided tests, and P<0.05 is considered statistically significant.

Statistical analysis of prediction model construction:
Mode: Prediction model of early pregnancy position.
Using pregnancy position (intrauterine pregnancy/extrauterine pregnancy) as the dependent variable (Y) and female genital tract bacterial type, BMI, vaginal discharge examination blood analysis, menopause time, abdominal pain, irregular vaginal bleeding, regular menstruation, history of spontaneous abortion, history of ectopic pregnancy, history of pelvic inflammatory disease, intrauterine contraceptive ring, first HCG value, first progesterone value, parametrial ultrasound, and pelvic effusion volume under B ultrasound as independent variables (X), a machine learning-random forest method in R software will be used to construct a prediction model for early pregnancy position based on female genital tract microbiota characteristics.

Discussion
This study will establish a prospective cohort study to track the composition and changes of the flora of pregnant women at unknown locations, and combine microflora with clinical indicators to improve current diagnostic performance, hoping to identify the types of early pregnancy, so as to make timely and reasonable treatment as soon as possible, and reduce the risk of ectopic pregnancy rupture. At In PUL, delay in the diagnosis of EP may lead to an increase in the morbidity and mortality of pregnant women.(23) While maintaining high detection performance to predict ectopic pregnancy, it is of great significance to rationalize the management of PUL. It is hoped that the predictability of ectopic pregnancy can form a new management plan, which can stratify women with ectopic pregnancy in clinically unknown locations into clinical high-risk and low-risk, and further guide the intensity of follow-up by doctors and the frequency of patients to return to the hospital. Reasonable diagnosis and treatment plan, grasp the treatment opportunity and the most suitable treatment plan. This study can not only predict the outcome of ectopic pregnancy, but also the prognosis of threatened abortion. With the findings of this study, we can treat patients in advance and boost their fertility.
Our study is a prospective multicenter cohort study on the characteristics of PUL vaginal microbiota.
Taking into account the impact of multi-center environmental factors and doctors' different assessments on the results of the trial, uniform training was carried out for doctors to improve the reliability of the results. The results of the present study could provide new guidelines for determining the PUL.
Development of microbiota detection methods is important for clinical diagnosis and the location of pregnancy screening. The composition and function of the vaginal microbiota play an important role in pregnancy. Research in this field may further explain the interaction between the vaginal microbiota and women's reproduction from the mechanism, in order to explain the relationship between the microflora and adverse pregnancy outcomes causality. After explaining the mechanism, continue to study the intervention and treatment of adverse pregnancy outcomes, and even rationally design the use of microbiota and its metabolites to improve the composition and abundance of vaginal microflora. The 16s rRNA approach also enabled the discovery of novel, as yet unidentified, evaluation of vaginal microbiota and clinical information as a diagnostic biomarker of EP or spontaneous abortion in women with a pregnancy of unknown location. However, the study design has the following limitations: The study period was as long as 13 weeks, and the research centers were only 4 centers in Guangdong Province and Beijing, and future studies with larger sample sizes are needed to validate our results. Therefore, it is challenging to find all the genus differences within a limited time and sample size.
Based on our previous research and professional statistics, we can improve the sensitivity and accuracy of the research, and reduce the limitation of the small sample size. In addition, we are planning to extend the sampling time and expand the goal to the relationship with the early pregnancy outcome, so that more participants will participate and increase persuasiveness.
In summary, this prospective multicenter cohort study on the relationship between PUL microbiota and pregnancy outcome establishes a multicenter clinical platform. The results of the study are helpful for the early diagnosis of pregnancy location in PUL population and formulate follow-up and provide management plans a basis for subsequent medical decision-making. Early treatment reduces the mortality of EP patients or increases the fertility rate of spontaneous abortion patients. It is expected to provide clinicians with more accurate and sensitive prediction plans and provide basic information for inclusion in future guidelines. The next step may be to study the transplantation of vaginal microbiota, which may be an effective way to prevent ectopic pregnancy in advance.

Trial status
The first participant was enrolled on August 16, 2020. There are currently 103 patients enrolled and the enrollment will be completed at the end of 2021.