Overview of cohort
This longitudinal cohort study was conducted at the CITIC-Xiangya Reproductive and Genetics Hospital (Figure 1). The study was approved by the Institutional Review Board of the hospital (LL-SC-2022014) and registered at ClinicalTrial.gov (NCT05404464). A multidisciplinary research team was developed for the cohort, including epidemiologists, statisticians, reproductive and genetic clinicians, follow-up working groups, sample managers, and information engineers.
All couples with infertility who visited our infertility clinic between January 2016 and January 2026 were and will continue to be regularly screened for registration by well-trained nurses. Participants who completed the preoperative examinations and were eligible for ART were and will be included in the cohort after full informed consent.
Assisted reproductive treatment techniques were limited to IVF, intracytoplasmic sperm injection (ICSI), and preimplantation genetic testing (PGT). In addition, CITIC-Xiangya participated in the China National Birth Cohort (CNBC) study, in which we developed a cohort of 2,745 families receiving ART in 2017 [16]. Accordingly, we carried out an additional questionnaire in the CXART cohort, thus constituting a CNBC sub-cohort. The collection plan of biological samples began in 2016; however, the CITIC-Xiangya biobank with complete procedures and facilities was constructed since 2019, participants with biomaterial formed Biobank sub-cohort based on the main cohort (Figure 1).
ART process and follow up
Figure 1 illustrates the overall process of the CXART cohort development. Couples with infertility at the first visit entered the fresh ET cycle first, and non-first-visit couples may proceed directly to a frozen embryo transfer (ET) cycle in accordance with a standard protocol. Patients with a fresh cycle generally needed to undergo preoperative examination, ovulation induction, triggering, oocyte retrieval, sperm retrieval, insemination, embryo/blastocyst culture, and embryo transfer. Patients who cancelled fresh ET or used PGT underwent frozen ET. Those who failed to achieve live birth in the last transfer considered entering a frozen or fresh ET cycle, according to the number and quality of frozen embryos.
After the transfer, in addition to patient self-reports, four rounds of follow-ups were conducted by the active follow-up team at 14 and 28 days after transfer, and 42 days and 1 year after delivery, respectively. Human chorionic gonadotropin (HCG) and the first B-ultrasound testing were mainly completed in the hospital, and subsequent examinations, as well as delivery records, were required to uploading to the hospital WeChat official account or mini grogram. All patients were encouraged to report any abnormalities during pregnancy, and these details and any subsequent medical records were documented in the follow-up system (FUS), a module built into the electronic medical records (EMR) system. Unreported pregnancy and delivery information was obtained by the follow-up staff via telephone or online at corresponding time points. The growth and development data of the offspring at 1 year of age were obtained through telephone follow-up and recorded in the FUS.
Biological sample collection
We designed a biological sample collection protocol for this cohort with reference to the bioethical principle of non-maleficence. The remaining samples from the couples with infertility were collected after informed consent was obtained during ART treatment on the day of oocyte retrieval, including anticoagulated peripheral blood, follicular fluid, cumulus cells, and semen (Figure 1). Samples were collected by clinical departments and sent to the biobank for standardized processing as follows: peripheral blood from the couples was separated into blood cells and plasma by centrifugation; transparent follicular fluid was collected within 30 minutes after oocyte retrieval; the cumulus cells were collected within 30 minutes after degranulation and frozen in the TRIZOL (ambion, Life Technologies); and the semen was frozen directly after insemination. All samples were processed by professionals and stored at -80°C in the CITIC-Xiangya biobank.
The patient and cycle information were automatically entered into the RuRo sample-management system by scanning of the bar code on each processed sample, and the sample-related information was manually entered by sample managers. A unique identification code was generated and marked on the corresponding sample, which was then stored according to the storage location assigned by the system. During the storage period of the samples, information and biological quality controls were carried out regularly, and the later use or destruction of the samples was supervised and their records were updated over time.
Data collection
To establish the cohort, managed by the Clinical Data Center and Scientific Research Department since 2016, the Reproductive and Genetic Hospital of CITIC-Xiangya has successively cooperated with Qifeng Technology Co., Ltd. and Hanyun Medical Information Technology Co., Ltd. to build a pre-designed and ART-specialized EMR system. Each couple is bound with a unique patient ID (PID) number through the resident ID card. All the data are recorded in cycles, and each cycle is assigned a unique cycle number based on the course of treatment.
Figure 1 shows the process of longitudinal data collection, and the details of the collected variables are provided in Supplementary Table S1. During the infertility clinic phase, the patients’ socio-demographics, personal history, physical examination, and infertility diagnosis were collected from the health information system (HIS). In addition, CNBC sub-cohort participants also completed questionnaires through the hospital WeChat official account or mini grogram, including multiple standard scales, such as the Pittsburgh Sleep Quality Index (PSQI), Perceived Stress Scale-10 (PSS-10), Center for Epidemiologic Studies Depression Scale (CESD), and Self-rating Anxiety Scale (SAS).
During the cycle, data on cycle information, preprocessing, stimulation, trigger, embryo transfer, and luteal support were collected from the HIS. In terms of examination and detection, we actively collected test data on biochemistry, immunology, endocrinology, semen, blastula biopsy, genetics, and chromosomes from the laboratory information system (LIS) as well as detection data on B-ultrasound, salpingography, and hysteroscopy from the picture archiving and communication system (PACS).
We collected laboratory records from the HIS, including sperm acquisition and processing, oocyte retrieval, fertilization and embryo culture, blastocyst culture, and embryo freezing and thawing. Expense information on the ART process was collected from enterprise resource planning (EPR). Moreover, the Biobank sub-cohort also collected data from the RuRo sample management system, including patient and cycle information, sampling date, quantity, source, location and usage. Follow-up data on implantation, clinical pregnancy, perinatal, neonatal outcomes after ET, as well as offspring growth and development information were extracted from the FUS.
Data extraction, linkage and validation
According to the requirements of the research team, anonymized target data from the EMR, questionnaires, and biobank were extracted and organized into pre-designed structured format by experienced medical informatics experts. We randomly selected 500 cycles to review and validate the consistency of the extracted data with the raw data. The data extraction codes were adjusted after the assessment, and iterative data extraction processes were performed until all the extracted data were 100 % consistent and complete for any given sample.
The EMR system automatically linked the HIS, LIS, PACS, EPR and FUS modules through the unique PID and ART cycle numbers. There were 162 couples who submitted the questionnaire twice, and we only retained the most complete questionnaire data from them. Questionnaire data and EMR data were linked in two steps. First, accurate matching was performed according to PID, further fuzzy matching was performed according to the automatically generated questionnaire survey date and cycle start date, so as to assign the data of each questionnaire to the closest cycle of the corresponding couple. Data from the biobank were linked to the EMR data by the PID and ART cycle numbers. For 122 cycles with missing or conflicting ART cycle numbers, we went back to the original records and verified them based on other variables, including abbreviated name, archive number, progress notes, dates of cycle start, oocyte retrieval, ET, and delivery. To further identify a unique cycle in the cohort, we developed a cohort ID based on information regarding the PID, cycle number, and cycle start date.
Logical checks were automatically performed to ensure the accuracy of the data. Manual checks were conducted by the quality control practitioners through monthly chart reviews. Furthermore, data were cleaned using transparent and prespecified rules, including the standardization of medical texts, construction of variable dictionaries, and approaches for missing data and outliers. The diagnostic criteria for clinical diseases, such as infertility and endometriosis, and the calculation of ART laboratory performance indicators, such as the proportion of oocytes recovered, cleavage rate, blastocyst development rate, were based on the corresponding international or Chinese standards/consensus [17-19]. The definition of the infertility core result set adhered to standardized definitions and reporting guidelines [20].