Study Design
This retrospective cohort study utilized data from a single medical center from January 1, 2012, to December 31, 2021. We included singleton pregnancies between 37 weeks 0 days and 41 weeks 6 days, both spontaneous and induced labor. The exclusion criteria were elective cesarean sections, preterm births before 37 weeks, post-term births after 42 weeks, intrauterine fetal death, and multiple gestations. Comprehensive information on maternal background, medical history, delivery details, and postnatal and neonatal care was obtained from electronic health records. Data were extracted according to predefined common categories provided by the Japan Society of Obstetrics and Gynecology Perinatal Database. Participants who underwent vaginal delivery were categorized into TOLAC and non-TOLAC groups based on their history of C-section. Participants who required emergent C-section during the vaginal delivery trial were censored. Labor duration was defined as the total time from labor onset to delivery, encompassing both the first and second stages of labor. The onset of labor was determined based on the participants’ self-reports.
Statistical analysis
We assumed an effect size of 0.20, alpha error of 0.05, beta error of 0.20, and dropout rate of 5%. Based on the 2021 birth statistics of the facility, we presumed the proportion of TOLAC (Trial of Labor After Cesarean) to be 0.07, resulting in a calculated sample size of 3,007 participants. Influenced by prior studies [8],[14]–[18], we identified 14 factors potentially affecting labor duration. These factors included maternal age, Body Mass Index (BMI), maternal nationality, history of vaginal birth, pre-pregnancy smoking habits, gestational diabetes, premature rupture of membranes, fetal sex, birth weight, fetal position, labor induction, labor analgesia, Kristeller maneuver, and vacuum extraction. The distributions of these variables are presented in Table 1. For each factor, continuous variables were analyzed using the t-test, while categorical variables were examined using the chi-square test. The Standardized Mean Difference (SMD) [19] was also calculated. Data with missing information on delivery time or with more than 25% missing values for any item were excluded. Given that the database was regularly updated by medical staff immediately after childbirth, missing information was assumed to be Missing At Random (MAR) [20]. Multiple imputations were employed to address the missing values. Subsequently, logistic regression was used to calculate propensity scores based on these factors. The area under the Receiver Operating Characteristic curve (ROC-AUC) of the propensity scores was computed. For the treatment group, weights were determined as the inverse of the propensity score, while for the control group, weights were the inverse of one minus the propensity score, calculating the Inverse Probability of Treatment Weighting (IPTW) [21]. These weights were then applied to the dataset. To address the increased variance in estimates due to propensity scores being close to zero or one, we trimmed the top and bottom 1% of the propensity scores. Survival curves for each labor duration were created, from which labor duration curves were depicted, and hazard ratios were estimated using Cox proportional hazards regression analysis. Statistical analyses were conducted using R software (version 4.2.3, R Foundation for Statistical Computing, Vienna, Austria).
Outcome
The primary outcome was designated as the hazard ratio for labor duration based on the presence of cesarean scars following the application of IPTW. The secondary outcome was determined as the hazard ratio for labor duration associated with cesarean scars without the application of IPTW.
Sensitivity analysis
IPTW estimates the average treatment effect (ATE) across the entire trial population, including patients with and without cesarean section scars. However, extreme propensity scores can lead to unstable estimates[22]. Therefore, as a sensitivity analysis, we conducted an assessment using propensity score matching. By matching participants with similar propensity scores from both the exposed and control groups, the distribution of covariates in the matched subset became closer to that in the exposed study population. Propensity score matching and IPTW have different assumptions and limitations, allowing for the strengthening of result robustness by examining the effects in both populations.
Considering the potential dependency of the results on a specific dataset, a sensitivity analysis was conducted using the bootstrap method. The bootstrap algorithm can be used to align the values of the explanatory variables with those of a given target distribution [23]. We randomly resampled the original dataset to generate bootstrap samples. For each sample, Cox proportional hazard models were applied both with and without IPTW to calculate hazard ratios. The distribution of hazard ratios was estimated from the obtained samples. This process was repeated 1,000 times.