External Validation of Models to Predict Unsuccessful Endometrial Ablation: A Retrospective Study

Study Objective: External validation of previously presented and locally established prediction models to help counsel patients for failure of endometrial ablation (EA) or surgical re-intervention within 2 years after EA, called ‘Failure model’ and ‘Re-intervention model’ respectively. Design: Retrospective external validation study, minimal follow-up time of 2 years. Setting: Two non-academic teaching hospitals in the Netherlands. Patients: Pre-menopausal women (18+) who had undergone EA for abnormal uterine bleeding problems between January 2010 and November 2012. A total of 329 patients were eligible for analysis. Interventions: Interventions used for EA were Novasure (Hologic, Marlborough, Massachusetts, US) and ThermaChoice III (Ethicon, Sommerville, US). Measurements and Main Results: The Area Under the Receiver Operating characteristics Curve (AUROC) for the outcome parameter of failure within 2 years after EA was 0.59 (95% CI 0.53 – 0.65). Variables in this model were dysmenorrhea, age, parity ≥ 5 and preoperative menorrhagia. The Hosmer-Lemeshow test showed no signicant difference between the observed and predicted outcome. (Chi-square: 4.62, P-value: .80) The AUROC for the outcome parameter surgical re-intervention within 2 years was 0.62 (95% CI 0.53 – 0.70) Variables in this model were dysmenorrhea, age, menstrual duration> 7 days, parity ≥ 5 and a previous caesarean section. The Hosmer-Lemeshow test showed no signicant difference between the observed and predicted outcome (Chi-square 11.34, P-value .18). Conclusion: Both the failure model and the re-intervention model can be used to predict unsuccessful endometrial ablation in the general population within two years after the procedure. It can be used prior to surgery to facilitate tailor-made shared decision-making, and help counsel patients with regards to the potential outcome of their treatment with the use of a personally calculated percentage.


Introduction
Endometrial ablation (EA) is frequently used as treatment for the common gynaecologic problem: heavy menstrual bleeding. It is increasingly used because of its minimally invasive character, low costs, low risks and short recovery time [1][2][3][4]. In 2017, approximately 9000 endometrial ablations were performed in the Netherlands, whereas in the US it was stated to be around 400.000 procedures [5]. Short term success-rates up to the period of one year have suggested that EA is highly effective, however, longterm follow-up shows diminishing results. In fact, prevalence of post-EA hysterectomy can be as high as 20%, mainly due to complaints of pain or abnormal uterine bleeding [6][7][8]. Current literature is inconclusive about which variables influence the outcomes of EA. For this reason, we previously developed two internally validated prediction models [9]. The first model, called the 'Failure model ', showed variables significant in predicting EA failure.
Failure was defined as: patient dissatisfaction, lower abdominal pain or complaints of abnormal uterine bleeding after EA. Significant variables were age, dysmenorrhea, parity ≥5 and preoperative menorrhagia. The AUC after internal validation was 0.68 [9]. The second model called the 'Re-intervention model', predicted the outcome of surgical reintervention within 2 years after EA. Significant variables were age, dysmenorrhea, menstrual duration> 7 days, parity ≥5 and previous caesarean section. The AUC after internal validation was 0.71 [9]. These internally validated models can be used to help counsel patients with regards to the potential outcome of their treatment with the use of a personally calculated percentage. In order to encourage a wider use of these models, the aim of this study is to externally validate both models, so that they can be implemented for patient counselling in the general population.

Study design
This retrospective external validation study used data  [10].

Patients
This external dataset included pre-menopausal women (18+) undergoing endometrial ablation due to abnormal uterine bleeding. Women who had a (suspicion of) malignancy or cavity-deforming abnormalities were excluded for external validation.
Furthermore, women were excluded if the endometrial ablation could not be, or was incompletely performed.
The duration of follow-up was at least two years after EA because, as stated in our previous article [9], literature has shown that most re-interventions take place within this time frame [9,13,14]. article was constructed based in an (internal) dataset of patient outcomes in our hospital [9]. The term ' External dataset' refers to the patient outcomes from a study in a regional hospital in the eastern part of our country as published by Muller et al. [10]. This specific external dataset will be used to validate our prediction models in the present study. The study was performed in accordance with the relevant guidelines and regulations.

Data extraction
The external dataset from Muller et al. provided us with the majority of the required information [10].
Extra patient chart review was done by two of our researchers to collect additional relevant data (for example pathology results) where necessary. Data regarding one significant factor in the previously published re-intervention model 'duration of menstruation > 7 days' could not be obtained. It was unfortunately neither described in the given dataset, nor in electronic patient records.

Statistical analyses
The baseline characteristics of the patients in the internal and external dataset were compared. The predicted probability of both models (P-Failure or P-Re-intervention) was calculated by using the previously made internally validated prediction models for failure of EA and surgical re-intervention respectively [9]. The internally validated formulas for the calculated probability were as follows: Area Under the Receiver Operating characteristics Curve (AUROC) and Nagelkerke's R square were used to evaluate model performance [15,16]. The AUROC was used to test the discriminative value of the models.
AUROC ranges from 0.0 to 1.0. where a value of 0.5 indicates that a model does not predict an outcome better than random chance. Therefore, 0.5 should be considered as the minimum value of AUROC [15][16][17][18].
Calibration of the models was tested by using the

Hosmer-Lemeshow goodness-of-fit test and calibration
plots. This assesses the hypothesis of the perfect agreement between the predicted and observed outcomes [15][16][17]. The slope and intercept of the regression line in the calibration plot were calculated for both models. A slope of one and an intercept of zero indicates a perfect calibration, were the predicted and the observed outcomes are a perfect fit [16,19]. In the re-intervention model, 'duration of menstruation > 7 days' was a relevant factor [9]. However, in the external dataset, this variable was not available [10].
Therefore, we performed a sensitivity analysis by calculating the discrimination and calibration in three different manners. In this way we could evaluate the necessity of including this variable into the analysis.

Results
In the external dataset, a total of 613 patients were      Combining these outcomes, this means that in this case, as seen in figure 6, the high predicted reintervention rates are too high and the low probabilities are too low. Figure 7 shows the distribution of the predicted re-intervention rates per patient by using the re-intervention model. Most patients had a reintervention rate between 4% and 22%. No patients had a re-intervention rate above 58% or under 4%.

Discussion
Since internal validation of our models to predict unsuccessful endometrial ablation was promising [9], external validation was performed to examine if the models could be widely applicable and useful for the general population. Explanation of the models' significant variables, consistent with literature, [7,[20][21][22] can be found in the article of Stevens et al. [9]. We are aware of the fact that other variables may play a role [20][21][22][23]. Cavity-deforming abnormalities were excluded from the selection, some studies say however that intramural myomas can influence the outcome of EA as well [21]. However, other literature shows that only large submucosal myomas are a risk factor for failure of EA [22], therefore, this group was excluded from analysis. Furthermore, the number of myomas in this group was so small that they could not influence the outcome of our prediction model. El Nashar et al [20], also made a EA failure model, however, to the best of our knowledge, this model was not externally validated. At baseline, a significant difference in age was seen between the internal and external dataset.  [16].

Strengths and limitations
The fact that this study design requires retrospective data can be seen as a limitation. This design has a higher chance of missing data, as in our case, the variable 'duration of menstruation >7 days' of the reintervention model was not known. However, we performed a sensitivity analysis, which showed no significant difference. Strengths were the multicentre

Using the model
To facilitate general use of the models, a website was made: https://www.prediction-failure-ofendometrialablation.com Different patient characteristics can be filled in, and the individual calculated percentage of re-intervention and failure will be provided. These models can be used during consultations to support patient counselling.

Conclusion
After the performance of the external validation, both the re-intervention model and the failure model can be used to predict unsuccessful endometrial ablation within two years after the procedure. The outcome failure is defined as: complaints of abnormal uterine bleeding, patient dissatisfaction or lower abdominal pain. The outcome re-intervention is defined as any surgical re-intervention within 2 years after the EA.
Both of these models, used prior to treatment, can facilitate patient counselling and support the tailormade shared decision-making process regarding EA for the general population.