Study design, setting and data source
We conducted a multi-hospital retrospective cohort study of nulliparous women giving birth between 1 January 2011 and 31 December 2014 at three public hospitals (Auburn, Blacktown/Mount-Druitt and Westmead) in the Western Sydney Local Health District (WSLHD). We included all nulliparous women with no previous pregnancies in the study sample. Study data were extracted from the ObstetriX database held by the hospital maternity units. The ObstetriX database collects information for all women attending their first antenatal visit to the discharge of mothers and their babies from the hospital .
We excluded women with missing information for pre-eclampsia, parity or candidate risk factors for pre-eclampsia. We also excluded women who were prescribed antiplatelet therapy in the first trimester given the effectiveness of these agents for preventing pre-eclampsia .
The primary outcome was the development of pre-eclampsia of any severity or timing. During the study period, pre-eclampsia was defined as hypertension with new onset of significant proteinuria ≥20 weeks’ gestation . Secondary outcomes were early-onset pre-eclampsia (requiring delivery <34 weeks’ gestation) and preterm pre-eclampsia (requiring delivery <37 weeks’ gestation).
We extracted maternal socio-demographic characteristics (age, country-of-birth, primary language spoken at home and socioeconomic status classified from postcode using the Index of Relative Socio-economic Advantage and Disadvantage from the Socio-Economic Indexes for Areas (SEIFA)) , risk factors for pre-eclampsia (listed below); and study outcomes from the ObstetriX database.
For development of the Western Sydney (WS) risk model, we selected 12 candidate risk factors: maternal age, body mass index (BMI), autoimmune disease, chronic hypertension, chronic renal disease, diabetes mellitus (type 1 or 2), multiple (multi-fetal) pregnancy, family history of pre-eclampsia, conception method, ethnicity, socio-economic status, and smoking status. These candidate risk factors were identified from the antenatal guidelines [1-4]; with the addition of conception methods and smoking status which were identified from a systematic review of published risk models . We categorised ethnicity into two groups based on country of birth and primary language spoken at home (Australian/New Zealand-born English speakers; immigrants and non-English speakers). We categorised socioeconomic status into two groups using the SEIFA index (most disadvantaged SEIFA 1-2; most advantaged SEIFA 3-5).
For validation of NICE approach, we classified women with ≥1 high-risk factors or ≥2 moderate-risk factors as meeting the criteria of high-risk for aspirin prophylaxis , and refer to this group herein as “screen-positive”. All NICE-listed risk factors relevant for nulliparous women are collected in the ObstetriX database.
Women with missing values for study variables were excluded from the analysis requiring that variable.
We assessed the distribution of risk factors measured as continuous variables (age, BMI) visually by plotting a probability distribution curve. We performed a descriptive analysis of maternal risk factors by assessing the frequency of categorical variables as a percentage in all women, then separately for women who developed pre-eclampsia and those that did not.
Model development and validation
We split the study sample into two groups for model development and temporal validation by year of infant birth (model development sample 2011-2012, validation sample 2013-2014). For model development, we used a two-stage approach. First, to optimize prediction of pre-eclampsia from age, BMI and other NICE-listed moderate-risk factors we developed a WS ‘base’ model by excluding women with NICE-listed high-risk factors (autoimmune disease, chronic hypertension, chronic renal disease and diabetes (type 1 or 2)). The approach optimizes the model for use for the large majority of women who do not have high-risk factors; and would be sufficient in settings where women with high-risk factors are already referred for further testing and management. Second, we developed a WS ‘full’ model for use in all women, by introducing women with high-risk factors into the development sample, retaining the base model risk score, and estimating coefficients for the high-risk factors. We internally validated the model in the development sample then externally validated it in the validation sample to assess the potential for model overfitting. If the model fit was satisfactory, we planned to refit the model predictors in the entire study sample to develop a WS ‘final’ model. Further details of these analyses are given below.
WS base model
To develop the WS base model, we included the following candidate predictors in a multivariable regression model: maternal age in years, BMI in kg/m2, socioeconomic status (high vs low), conception method (assisted, by use of medications such as clomiphene or fertilization procedures including intrauterine insemination, in-vitro fertilization and intracytoplasmic sperm injection, vs natural conception), smoking status (current smokers vs non-smokers), multiple pregnancy (yes vs no); and family history of pre-eclampsia (yes vs no), and ethnicity (Australian/New Zealand-born English-speakers vs immigrants and non-English speakers).
To consider how to deal with the factors measured on a continuous scale (maternal age, BMI) in the model, we graphically examined their relationship with logit pre-eclampsia using a cubic splines approach. We assessed each factor and possible interactions between factors such as maternal age and multiple pregnancy by inspecting their effect size and p-value. We manually excluded factors that did not contribute to the model. We then developed the WS base model using the final predictors with no further stepwise procedures.
We performed internal validation of the model using the bootstrapping sampling technique to assess potential overfitting of the regression coefficients . The mean c-statistic (corresponds to the area-under-the curve (AUC)) of the bootstrapping models was compared with the WS base model using the following formula: AUC=0.5*(Dxy+1), where: Dxy is Somer’s D. A well fitted model will report minimal optimism. We planned to adjust the regression coefficients by the resulting shrinkage factor if required .
To externally validate the base model, we applied the model algorithm in the validation sample. As described above, the base model was developed use in women without high-risk factors, thus we excluded women with high-risk factors from this analysis. We calculated the predicted probability of pre-eclampsia for each individual woman by calculating the (log odds (Y)) and the odds ratio (ExpY) for pre-eclampsia and using the following equation: Probability=odds/1+odds and presented the distribution of predicted probabilities in a histogram. We assessed model discrimination by calculating the AUC and 95% CI. We assessed model calibration in this sample using the Hosmer-Lemeshow goodness-of-fit test, with p-value <0.05 indicating poor calibration . We also calculated the ratio of observed: expected pre-eclampsia events and graphically assessed calibration by plotting observed risks on the y-axis against predicted risks on the x-axis for subgroups of patients categorized by their predicted probabilities (1-<2%, 2-<3%, 3-<4%, 4-<5%, 5-<8%, ≥8%) .
WS full model
To develop the WS full model, we introduced women with NICE-listed high-risk factors (autoimmune disease, chronic hypertension, chronic renal disease, diabetes) into the model development sample. We developed a multivariable regression model in this sample by retaining the WS base model risk score (Y) and adding the four high-risk factors listed above as additional predictors. We manually excluded high-risk factors that were not strongly or statistically significantly associated with pre-eclampsia. We followed the same approach outlined above for the base model to undertake internal and external validation of the full model to assess potential model overfitting.
WS final model
After assessment of over-fitting and calibration of the model in the validation sample, we refitted the WS base and full model in the entire study sample to develop the final WS model. First, we fitted the WS base model in women without high-risk factors. We retained the base model risk score and refitted the WS final model in the entire study sample to estimate the ß-coefficients for the high-risk factors and a new intercept. We presented the intercept and beta (log odd ratio) estimates and 95% CI for the intercept and each predictor.
Given the model is intended to be used to provide pre-eclampsia risk estimates to inform clinical decisions, we also assessed model sensitivity (95% CI), specificity (95% CI), positive predictive value (PPV, 95% CI), negative predictive value (NPV, 95% CI), positive likelihood ratio (LR) and negative LR to predict pre-eclampsia at specified cut-points determined by the risk thresholds for classifying high- vs low-risk. For our primary analysis, we used ≥8% as the risk threshold to classify high-risk as recommended by the United States Preventive Services Task Force (USPSTF) for commencing aspirin prophylaxis based on the prevalence of pre-eclampsia in trials demonstrating the effectiveness of aspirin  and from a publication recommending a 6-10% risk threshold for informing aspirin decisions . We also examined the final model performance at 2%, 3%, 4%, 5% and 8% risk thresholds. We also reported model sensitivity at 5% and 10% fixed false positive rates (FPRs) to allow comparison with published models identified from our previous systematic review . For these analyses, we classified women as ‘true positive’ if they had a model-predicted risk above the cut-point and developed pre-eclampsia; false positive (predicted risk at/above cut-point and no pre-eclampsia); true negative (predicted risk below cut-point, no pre-eclampsia) or false negative (predicted risk below cut-point and pre-eclampsia).
In a secondary analysis, we assessed the discrimination of the WS final model to predict early-onset pre-eclampsia and preterm pre-eclampsia by estimating the AUC and sensitivity and specificity at ≥8% risk threshold in the entire study sample.
Model comparison with NICE approach
We compared the performance of the WS base and final models with the NICE approach by assessing the sensitivity and specificity, PPV, NPV, positive LR and negative LR of the NICE approach to predict pre-eclampsia in women without high-risk factors for comparison with the WS base model; and all women for comparison with the final model. For these comparisons, we assessed model sensitivity by fixing model specificity at the same level as the NICE approach. We report the model risk threshold that corresponds to this specificity level. For both the WS final model and NICE approach, we also calculated the number needed to treat (NNT) and the number needed to screen (NNS)  to avoid one pre-eclampsia event under a strategy where women classified as high-risk are recommended aspirin. For each approach, the NNT was calculated by applying a RR reduction of 10% for aspirin reported from the Perinatal Antiplatelet Review of International Studies (PARIS) individual participant data meta-analysis of randomized controlled trials  to the 'baseline' risk of pre-eclampsia observed for women classified as high-risk. The NNS was calculated by dividing the NNT by the proportion of pregnant women who were classified as high-risk using the approach.
We performed a secondary analysis to assess the performance of the NICE approach for predicting preterm versus term pre-eclampsia (delivery ≥37 weeks’ gestation) and early-onset versus late-onset pre-eclampsia (delivery ≥34 weeks’ gestation) by estimating the OR and 95% CI using multinomial logistic regression and reporting a p-value for the Wald Chi Square test for the hypothesis of no difference in approach performance between the pre-eclampsia subgroups (preterm versus term; and early-onset versus late-onset pre-eclampsia).
We used SPSS version 25 and SAS version 9.3 statistical software and R for all analyses. The R rms package was used for model internal validation (bootstrapping). A p-value of <0.05 was regarded as statistically significant for all analyses.
We created an Excel spreadsheet to present the WS final model as a risk prediction calculator that can be used in the clinic to provide women with an individualised estimate of their probability of pre-eclampsia . We followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines to report our methods and findings .