INFLUENCE 3.0 models were designed to support the shared decision-making process regarding personalised surveillance after curative treatment. INFLUENCE 3.0 is explicitly not meant to be used for treatment decision-making, because information on treatment in the model was derived from a retrospective database, meaning that treatment allocation was not random(22).
For the non-NST cohort, both Cox and RSF models performed similar in LRR risk prediction. The Cox model was incorporated in the online tool to predict LRR, as it appeared to have less optimism in prediction – which increases its potential accuracy in an external population(20). Besides, Cox models are better interpretable than RSF models. The RSF model predicted CBC best. In the NST cohort, RSF performed best in both prediction of LRR and CBC.
Discrimination of the models predicting LRR was moderate with optimism-corrected 5-year AUCs of 0.77 (95%CI:0.77-0.77) and 0.77 (95%CI:0.76-0.78) for the non-NST (Cox) and NST cohort (RSF), respectively. The models predicting CBC showed moderate discrimination, with optimism-corrected AUCs of 0.68 (95%CI:0.67-0.69) and 0.73 (95%CI:0.69-0.76) for the non-NST and NST cohort, respectively. Importantly, in the NST cohort there was a higher degree of optimism for the RSF model predicting CBC, compared to the Cox model. This may have been caused by the relatively lower numbers of events in this population. A high degree of optimism is an indicator of overfitting, which means that the true performance of the model may be lower(20). This is more likely to occur in machine learning-based models, as these models are able to incorporate complex variable relationships. Such a model may learn a dataset in such a great detail that it is likely to perform less in another population(23). We used bootstrap resampling to obtain optimism-adjusted discrimination and calibration, and the results in the NST cohort still indicated moderate performance of the RSF model – which was much better than the Cox model. As INFLUENCE will initially be used in the Dutch breast cancer population, and because coefficients of the model were also corrected for optimism, we are confident that the model provides valid results in the Netherlands. In future, external validation should be performed to ensure validity of the model in other populations.
The declining AUC over time in LRR prediction could possibly be attributed to the typical pattern of LRR over time. LRR risks are highest in years 2 and 3, and declines thereafter(24). It is plausible that a model can better discriminate between patients with and without LRR in the first years due to this typical pattern.
The lower performance of the models predicting CBC (which was similar for INFLUENCE 2.0) may be caused by the fact that unmeasured factors such as genetic predisposition and family history – which are known key factors associated with CBC(25) – could not be included in the model. However, a large study in which a model predicting CBC risk was developed (PREDICTCBC-2.0) showed that, even in the presence of information on family history, BMI and important gene mutations, the model was only able to moderately discriminate between cases and non-cases, with a 5-year AUC of 0.65(26). The authors suggested that other risk factors such as breast density, alcohol use, and age at primiparity could improve predictions(26). Furthermore, socioeconomic status might be related to development of CBC, as lower socioeconomic status has been associated with higher breast cancer incidence(27), more advanced tumour stage(28,29) and undertreatment(30,31). Importantly, in the Dutch population these differences were only marginally observed(30,32,33), which was confirmed by nonsignificant contribution of socioeconomic status (based on postal code) in the models predicting LRR and CBC in the present study (data not shown). Moreover, socioeconomic status based on postal code has been shown to be useful in monitoring disparities in healthcare, but it may not be accurate in individual risk prediction(34). Up to now, there are no models available that more accurately predict CBC than INFLUENCE 3.0 and PREDICTCBC-2.0. Importantly, INFLUENCE 3.0 was not designed for use in women with hereditary breast cancer.
Comparison with INFLUENCE 2.0
New models were developed, because an update of the INFLUENCE 2.0 coefficients was not sufficient due to inclusion of a broader population and new variables. INFLUENCE 3.0 is preferred to be used for LRR and CBC prediction – including patients treated with NST – as it is more representative for the contemporary breast cancer population. Compared to INFLUENCE 2.0 (non-NST cohort), the following additional factors were examined on potential predictive value: menopausal status, presence of DCIS component, results of molecular diagnostics, mode of detection (screening through the national programme or clinically detected) and use of immediate breast reconstruction. The latter two were of added value in the final model. Immediate breast reconstruction was associated with type of surgery and therefore combined with this variable in the final model. Importantly, the positive association between use of immediate breast reconstruction and LRR risk can be explained by the fact that patients with prognostically favourable characteristics more often get an immediate breast reconstruction than patients with prognostically unfavourable characteristics(35). This underscores our statement that the model should not be used for treatment decision-making. Furthermore, INFLUENCE 3.0 included patients with T4 breast cancer, which was not the case in INFLUENCE 2.0 due to the smaller sample size and consequently, too few events to make predictions. Menopausal status did not significantly contribute to the current multivariable models, probably due to the large predictive value of age. Molecular diagnostics was only performed in ±10% and 2.5% of the non-NST and NST population, respectively. This, combined with its likely association with treatment (which was included in the models), might explain its lack of predictive value.
The largest advantage of INFLUENCE 3.0 is the inclusion of patients treated with NST. Over time, NST has been increasingly applied, mainly in HER2-positive and triple-negative stage II-III breast cancer patients, aiming to reduce surgical morbidity and monitor tumour response(36,37). This largely increases its applicability in clinical practice.
INFLUENCE 3.0 does not predict the risk of distant metastasis due to reasons described above. However, the model was designed to be used as a guidance tool in the shared decision-making process concerning surveillance, which does not aim to detect distant metastases(3). To assess the risk of distant metastases, however, INFLUENCE 2.0 can still be used (https://www.evidencio.com/models/show/2238/).
Strengths and limitations
A strength is that we used optimism-corrected performance measures based on bootstrapping, which has been shown to be superior to other approaches estimating internal validity(38,39). We used 100 bootstrap samples, which is considered to be enough as more repetitions only marginally improve estimates(39). Moreover, by looking at quarterly intervals, we could adequately judge the models’ accuracy at different time points during follow-up. This is crucial, as there can be multiple moments during follow-up in which patients and their caregivers discuss the frequency of surveillance visits.
The way in which we collected data on LRRs has, next to the benefit of a largely reduced workload, some limitations. First, our validation on the data from the first quarter of 2012, showed that we were approximately for 90% complete, which means that we miss approximately 10% of all LRRs. Reasons for missing these LRRs are clinical diagnoses (Palga only contains pathologically confirmed malignancies) or incomplete reporting. This may have resulted in an underestimation of the overall LRR risk and thus misclassification of the outcome variable. A study on the effect of misclassification in logistic regression models showed that AUCs became lower as the degree of misclassification became higher(40). This implies that AUCs may have been biased, depending on the extent of misclassification, and that true AUCs would have been higher. We assume it will have a similar impact for survival models. As AUCs in our study still showed moderate performances, we support the use of INFLUENCE in clinical practice.
Data on distant metastases were only registered in case the patient was suspected to have a LRR. In case a patient was suspected to have metastases only, the file of the patient was not searched due to time constrictions. Consequently, we missed information on dates of metastases of many patients, which in turn may have resulted in inadequate correction for competing risks(9) in survival analyses. This latter limitation was tested by rerunning the INFLUENCE 2.0 models (which did include all information on metastases) on the same cohort as it was developed on, in which we ignored all data on metastases. This was shown not to have clinical relevant effects on the final estimators, i.e. predicted risks and model performance measures were similar in both scenarios. This was presumed to be a result of most of the patients with metastases dying not so long after the diagnosis of these metastases. So, follow-up was not much longer than it would have been if we would have information on metastases (data not yet published).