1. Patient Enrollment and Labeling
This was a retrospective study. Data for training and internal validation were collected between April 2016 and November 2021 from Hospital I (Zhujiang Hospital, Southern Medical University, GuangZhou), and external validation was performed from Hospital II (Tung Wah Hospital, Sun Yat-sen University, Dongguan). The study was conducted according to the Declaration of Helsinki and approved by the Research Ethics Committee of the Zhujiang Hospital and Tung Wah Hospital. The inclusion criteria were (1) a definite diagnosis of ROP that meets the conditions for ROP treatment, (2) anti-VEGF therapy as an initial treatment, (3) a clear clinical treatment outcome, (4) complete clinical data, (5) ROP screenings had been complete, and screenings lasted at least six months after treatment, and (6) ocular or systemic diseases that may influence the outcomes of this study.
We divided the cases into two groups: reactivation and non-reactivation. All cases were annotated by three trained ophthalmologists with rich experience independently. ROP screening criteria were developed in accordance with the 2014 Chinese ROP Screening Criteria, a wide-field digital retinal imaging system (RetcamIII) was used. The most severe ROP condition of each child before treatment was recorded as an ROP diagnosis.
2. Treatment And Follow-up Screening
The indications of ROP treatment are type 1 pre-threshold ROP, threshold ROP, or aggressive ROP (A-ROP).[7] Type 1 ROP refers to any-stage ROP with plus disease, stage 3 ROP without plus disease in zone I, or stage 2 or 3 ROP with plus disease in zone II.[13] A-ROP means rapidly progressing, severe ROP, including plus disease without progression being observed through the stages of ROP, rapid pathologic neovascularization, etc. [33] Once type 1 ROP or A-ROP is diagnosed, treatment is initiated within 72 hours.
All guardians of the babies were fully informed of the drug principles, treatment details, and potential risks of anti-VEGF drugs; then, they chose the type of injection drug and signed the informed consent form. Injections were performed by a trained ophthalmologist in a sterile operating room under topical anesthesia using a microscope. A 0.25 mg/0.025 mL dose of Conbercept or a 1mg/0.025 mL dose of Aflibercept was extracted with a 30-gauge needle and perpendicularly injected and redirected slightly toward the center of the eyeball after the needle passed the equator of the lens.[34] Topical tobramycin /dexamethasone eye drops for treated eyes were prescribed for three days after an injection. Post-injection follow-up eye examinations were performed on day one, days three to five (if not improved by day one). Subsequently, each infant was examined weekly to biweekly for at least six months after treatment.
Effective treatment is the regression of the plus disease or the disappearance of a ridge, a decrease in retinal vessel tortuosity, the presence of normal vessels, etc. [35] Reactivation vascular changes include recurrence of plus disease and neovascularization after a partial or complete regression. [33] Once reactivation requiring treatment is diagnosed, treatment should be arranged within 72 hours. We recorded the time of reactivation, the status of ROP, and follow-up treatment.
3. Collecting Data On Potential Predictive Factors
According to research and clinical experience, the following four classes of information regarding the potential predictive factors of recurrent ROP were obtained and confirmed by another researcher: (1) grandmother-related factors: gestational hypertension, gestational diabetes mellitus, premature rupture of membranes, placental abruption, placental previa, cesarean delivery, and IVF-ET; (2) infant variables: gestational age (GA), birth weight (BW), sex, 1-minute and 5-minute Apgar scores, multiple births, neonatal asphyxia, sepsis, respiratory distress syndrome, bronchopulmonary dysplasia, pneumonia, intraventricular hemorrhage, necrotizing enterocolitis, anemia, neonatal hypoxic-ischemic encephalopathy (HIE), congenital heart disorder (atrial septal defect, ventricular septal defect, patent ductus arteriosus), pulmonary hypertension, and neonatal jaundice; (3) treatment: mechanical ventilation, oxygen therapy (post-treatment), and a blood transfusion; (4) TR-ROP conditions: postmenstrual age (PMA) at TR-ROP initial treatment, zone 1 ROP, a preretinal hemorrhage, A-ROP, and anti-VEGF drugs.
The infant variables were collected before anti-VEGF treatment. All of these variables were recorded in electronic health records (EHRs). Missing data were processed in two ways: (1) if fewer than 20%, the missing variables were imputed with a mean value and mode value, and (2) if more than 20%, the missing variables were deleted.
Another possible predictive factor was ROP screening retinal images before initial treatment. We collected fundus images from RetCam III. Posterior fundus images from the same eye as a file were divided into reactivation and non-reactivation groups (label). Exclusion criteria comprised: (1) retinal photographs taken by other devices other than RetCamIII; (2) infants with other ocular diseases (e.g, persistent hyperplastic primary vitreous, congenital cataract, coats disease, or other congenital retinal diseases); and (3) any images without an accurate label.
4. Training and Validation of Prediction Models
4.1. The Clinical Risk Factor Model (Crm)
In recent years, due to the realization of attention mechanisms, many ML models can build prediction models based on table data. A prediction model based on clinical factors was established using three algorithms: random forest (RF), support vector machine (SVM), and categorical boosting (CatBoost). CatBoost is upgraded to the gradient boosted decision tree (GBDT) algorithm and can be compatible with category features. We used the L2 regularization parameter and cross-entropy as the loss function, so the final output value was a numerical value in the range of 0.0–1.0. To increase the robustness of the algorithm, we first calculated the effectiveness of each feature using the GBDT algorithm, resulting in a 0.0–1.0 importance score. The importance score was then passed in as one of the model’s features; some low-importance features were dropped. To be fair, we used a grid search approach for hyperparameter tuning.
For validation, we used five-fold validation as internal validation. The training groups were randomly and equally divided into five parts: four were used for training, and the remaining one was used for validation. This procedure was repeated until each group was validated. This procedure arranged the parameters and evaluated the performance of the prediction model. For internal validation, we used data from Hospital II to externally validate the prediction model. Finally, the importance of the variables among all the investigated factors was scored by CatBoost in reactivation cases.
4.2. Retinal Photograph Model (Rm)
Retinal photographs were used in the RM prediction model. The images were resized to a resolution of 400 × 400 pixels, and images in the training sets were augmented eight-fold with 90-degree rotations and horizontal and vertical flips.
ResNeSt is an improvement based on the Resnet network; it integrates an attention mechanism, which makes an algorithm automatically learn a better perception domain, is more stable, and has less error. The ResNeSt was developed as an open-source project on GitHub (https://github.com/zhanghang1989/ResNeSt). We used the ResNeSt network to build the RM model, which is highly specialized for image data and widely used in medical imagery. Considering the number of datasets, we used transfer learning, which has been proven to improve a model’s performance and partially solve problems of insufficient learning ability, low classification accuracy in small sample learning, and poorly trained model generalization ability caused by poor extraction characteristics, usually loaded ImageNet on the training model.
To gain better performance, the network was trained on an ImageNet and glaucoma dataset (https://grand-challenge.org/), achieving 95% accuracy in the glaucoma dataset, and then transferred to the TR-ROP dataset; the network layer was subsequently fine-tuned in the TR-ROP reactivation dataset. Due to the small number of TR-ROP reactivation datasets, we enhanced the training data by adding random blocks, random contrast adjustments, random gamma values, and mixups.
After training the DL prediction model, we obtained the probability of each retinal image and divided each element in the vector to be 1.0 or 0.0, based on a threshold of 0.5, to obtain the two reactivation patterns in a formation. For internal validation, we used 5-fold cross-validation on the training dataset, which was randomly divided into five independent parts. For several reasons, we did not obtain an external validation dataset for this model. The average results after each group were validated and recorded to evaluate the performance of the DL prediction model. Details are shown in Fig. 1.
4.3. Combined Model (Cm)
We combined clinical factors with retinal images to build a CM prediction model. The model may be an improvement on RM and CRM prediction models. The training involved two steps. First, we trained a stable prediction mode based on the CatBoost algorithm. Second, the output of the CatBoost algorithm was used as part of the next model input, and we used cross-entropy as the basic loss function in the final model.
To improve the final performance of the model, we referred to the experience of focaloss and modified the loss function as follows:
Loss = a * (-ylogy’ – (1-y) log (1-y’))
a = exp(-b*y*G) / Z
Z is the normalization factor, B is the table network weight, and G is the table network output score.
In terms of training parameters, we used batch learning and stochastic gradient descent method to minimize the loss function. The final result was obtained by training 400 epochs using cosine annealing.
4.4. Statistical Analysis
Statistical analysis was performed using SPSSv23 software (IBM,USA). We present variables as mean ± standard deviations for continuous factors and numbers and percentages for categorical factors. A chi-squared test or Fisher’s exact test was used for categorical factors, and a Student’s t-test was used for continuous factors between the non-reactivation and reactivation groups, P < 0.05 was considered statistically significant. The AUC, SEN and SPC were used to show the prediction models’ performance.