Study design
A summary of the research procedure is shown in Figure 1. We conducted a retrospective cohort study. We searched for women with a histological diagnosis of HSIL on colposcopy biopsy who underwent cold-knife conization at Shandong University Qilu Hospital. The study obtained approval from the Ethical Committee of Qilu Hospital of Shandong University (KYLL-202107-134) and obtained a waiver for informed consent. The study was conducted between January 2014 and December 2020. Patients were followed up until December 2021, with a maximum follow-up period of 7 years. We excluded women with other histological diagnoses, those without follow-up data, those with positive margins after cold-knife conization, and those who were immunosuppressed.
Follow up
Patients had their first follow-up 4-6 months after cold knife conization and the second follow-up 10-12 months after surgery. Liquid-based cytology and HPV testing were performed at each follow-up. A cervical biopsy is performed if women have abnormal cytology results (e.g., atypical squamous cells of indeterminate significance or more severe lesions), positive HPV results, or abnormal results on colposcopy. Specimens were collected for HPV testing with the Digene kit (Digene, USA) or Roche Cobas 4800 kit (Roche Molecular, USA). Digene produces quantitative results for 13 HR-HPV genotypes (HPV 16, 18, 31, 33, 35, 45, 51, 52, 56, 58, 59, and 68). A relative light unit/cutoff (RLU/CO) ratio ≥1.0 indicates a positive result. Qualitative results of Cobas 4800 detection of HPV16, HPV18 and 12 other HR-HPV genotypes (HPV 31, 33, 35, 45, 51, 52, 56, 58, 59, 66, and 68).
Criteria for residual/recurrent disease
The criteria for developing residual/recurrent disease were histological examination based on colposcopy biopsy. Histological evidence of CIN (LSIL or HSIL) was considered a residual/recurrent disease. Residual lesions were defined as those diagnosed within the first year after cold knife conization. Cervical lesions detected after one year were considered recurrences. In this study, 49 patients had residuals and 44 patients relapsed. We analyze residual and recurrence together.
Predictors and Endpoints
The following clinical characteristics were included: age, pregnancy, parity, types of the cervical transformation zone, pre-conization cytology, pre-conization HPV, endocervical curettage (ECC), First follow-up after conization-HPV, First follow-up after conization-Cytology, Second follow-up after conization-HPV, Second follow-up after conization-Cytology, Improved (conization histopathology lower than colposcopy biopsy), Severe (conization histopathology higher than colposcopy biopsy), and residual/recurrence information. Residual/recurrence time was described as the time interval from surgery to the first appearance of CIN. Residual/recurrence was determined by colposcopy biopsy histology.
Construction model
We randomly split all patients into development and validation cohorts in a 6:4 ratio. For the development cohort, we first performed univariate analysis to screen out statistically significant features, and then we constructed logistic regression models to explore the role of clinical factors in prognosis. To demonstrate the importance of follow-up data, we constructed four models, namely a model based on preoperative factors (Model A), a model based on first follow-up data (Model B), and a model based on second follow-up data (Model C) and a model based on two follow-up data (Model D).
Each model was built based on stratified 5-fold cross-validation to guarantee generalization ability. Four folds of the development cohort were used for model training for each iteration, and the remaining one fold was used for validation. The role of the cross-validation was to select the optimal hyper-parameters by maximizing the performance on the validation folds. After the cross-validation procedure, the predictive model was re-trained with the entire development cohort and evaluated in the validation cohort.
The discrimination power, which is defined as the agreement between the predicted and actual residual/recurrent probability, was used to evaluate the performance of our models. In this study, the discrimination power was estimated by metrics such as accuracy, sensitivity, specificity, FPR, FNR, and AUC. In addition, we drew a nomogram, which is a reliable tool for graphically representing residual/recurrent probability. Then we used a calibration curve to intuitively assess the agreement between the actual residual/recurrent and the residual/recurrent predicted by the nomograms. Finally, we used a calibration curve to intuitively assess the agreement between the actual residual/recurrent and the residual/recurrent predicted by the nomograms.
In recent years, an increasing number of studies have used a variety of machine learning (ML) methods to construct clinical predictive models [13, 14]. In some cases, ML methods can perform better than traditional regression methods due to the complexity of implicit patterns in the data. Therefore, we further used six ML algorithms to validate our selected high-risk factors, including support vector machine (SVM) [15], random forest (RF) [16], AdaBoost [17], decision tree [18], k-nearest neighbor[19] and naïve Bayes [20].
Statistical Analysis
Statistical analysis was conducted with R software (version 3.6.1) and the Python (version 3.6.4) ML library Scikit Learn (version 0.19.1). The distributions of the potential predictive factors were compared between residual/recurrence patients and controls using the chi-squared test. The Delong test assessed the differences between ROCs. All tests were two-tailed, and statistical significance was defined as P<0.05.