Since internal validation of our models to predict unsuccessful endometrial ablation was promising (9), external validation was performed to examine if the models could be widely applicable and useful for the general population.
Explanation of the models’ significant variables, consistent with literature, (7,20–22) can be found in the article of Stevens et al. (9). We are aware of the fact that other variables may play a role (20–23). Cavity-deforming abnormalities were excluded from the selection, some studies say however that intramural myomas can influence the outcome of EA as well (21). However, other literature shows that only large submucosal myomas are a risk factor for failure of EA (22), therefore, this group was excluded from analysis.
Furthermore, the number of myomas in this group was so small that they could not influence the outcome of our prediction model.
El Nashar et al (20), also made a EA failure model, however, to the best of our knowledge, this model was not externally validated.
At baseline, a significant difference in age was seen between the internal and external dataset. However, this difference is only one year, which does not seem clinically relevant.
A second difference was seen in the patients with a previous caesarean section and patients with dysmenorrhea. Based on our internal validated models, both factors give higher chance on failure or re-intervention after EA. However, it is important that no significant difference was seen between the internal and external dataset in the model’s outcome measures of failure or re-intervention.
A possible explanation of the baseline difference in dysmenorrhea could be the subjective character of this variable.
Moreover, 49% of hysterectomy pathology results in the external dataset showed signs of adenomyosis. This may explain the high level of dysmenorrhea in the external dataset.
It is confusing that despite literature on adenomyosis as factor for unsuccessful EA, it had no effect on the number of patients with dysmenorrhea included in the external dataset (7,20,22,24).
Patient selection is therefore important, suggesting patients with dysmenorrhea should be screened for adenomyosis, using for example the recently developed MUSA criteria (25) .
The baseline difference in patients with previous caesarean section can possibly be found in the increasing interest in uterine scar defects (niches) and subsequent bleeding problems over the last years (26,27). Patient-selection in the external dataset was between 2010-2012 (10), the internal between 2004-2013 (9).
Although there is a fairly uniform policy in the Netherlands with regard to treatment and diagnosis of abnormal uterine bleeding, it is possible that pathophysiology of the niche is approached differently in various parts of the country.
In short, we are of the opinion that awareness for both dysmenorrhea and previous caesarean section in people wanting EA is important.
After external validation, both the re-intervention model and failure model can be used in the general population with an moderate AUROC of 0.62 and 0.59 respectively. It should be noted that there still is a certain degree of inaccuracy. Although the results of the AUROC are moderate, these prediction models can provide the clinician a tool to discuss the pros and cons prior to surgery. Further research could focus on performing model updating (16).
Strengths and limitations
The fact that this study design requires retrospective data can be seen as a limitation. This design has a higher chance of missing data, as in our case, the variable ‘duration of menstruation >7 days’ of the re-intervention model was not known. However, we performed a sensitivity analysis, which showed no significant difference.
Strengths were the multicentre design and the extra chart review done by two different researchers. Since the participating hospitals for the external- and internal validation were in different parts of our (small) country, this validation can be seen as geographical validation.
The models are made by logistical regression, however, it’s also possible to use machine learning (ML). Hence, a study was conducted to see if ML can create better models than models made with logistic regression. This study showed that for the outcome re-intervention, logistic regression is a better predictor (28). However, it is important to keep ML in consideration. Especially in large datasets with variables with strong predictive power (29,30) and small amount of pre-defined variables in the model (29,31–33).
Prediction models can be used for patient counselling, hoping that the uncertainness of patients can be assuaged with better insight into outcome of their treatment. We can optimise the shared decision making process, and allow patients to make a decision based on their personal calculated percentage.
One notable issue with this approach however, is that the interpretation of percentages can be individual-specific. Some patients (or doctors) may find a failure rate of 30% acceptable, while for others this might be 75%.
This encourages the conducting of research into not only the outcomes of prediction models, but how their results influence the (clinical) decision making of both patient and doctor.
Using the model
To facilitate general use of the models, a website was made:
Different patient characteristics can be filled in, and the individual calculated percentage of re-intervention and failure will be provided. These models can be used during consultations to support patient counselling.