Search results and study characteristics
Initially, our literature search identified 1036 studies, and 587 studies were screened after the removal of duplicated records. Figure 1 shows the flowchart of the literature eligibility process. Finally, 22 studies were included for systematic review, [12–33] and 14 of them were included for quantitative meta-analysis. [12,13,16,19–23,26,27,29–31,33]
The characteristics of all eligible studies are summarized in table 1. In total, 348,861 fundus images and 22,560 OCT images were used for training, testing and validation. Of all included studies, SEN and SPE ranged from 80.0% to 98.7% and from 79.5% to 100.0% for PM detection, respectively. Two categories (PM and non-PM) were exported as the primary outcome in 14 studies (63.6%); 5 categories (META-PM) of PM were exported in 4 studies (18.2%); and 3 categories (ATN) of PM were exported in 1 study (4.6%). The remaining 6 studies (27.3%) identified specific PM-related lesions (CNV, myopic traction maculopathy, retinal detachment, etc).
Most studies (n=20, 90.9%) applied convolutional neural network (CNN) to develop algorithms, of which 12 studies used ResNet. There was also 1 study using support vector machine (SVM) and 1 study using Adaboost. 16 studies (72.7%) obtained images from hospitals, and 6 studies (27.3%) from public databases, of which the PathologicAL Myopia (PALM) database was the most frequently adopted public database (n=4, 18.2%).
Risk of bias assessment and publication bias
We assessed the quality of all included studies using the QUADAS-2 tool, and the results are presented in supplementary appendix 3. 7 studies (31.8%) were graded as having a low risk of bias and applicability concerns in all 4 domains. [16,18,20,26–28,33] For patient selection, 12 studies (54.5%) were graded as having an unclear risk of bias because of the lack of a clear description of public datasets, and 12 studies (54.5%) had unclear applicability concerns due to unavailable composition information. For the index test, most studies (n=16, 72.7%) had a low risk of bias and concern of applicability, and only 6 studies (27.3%) were graded with an unclear risk of bias due to underlying data overlap among datasets. For the reference standard, the risk of bias and concern of applicability were low in all included studies. Finally, for the flow and timing domain, 8 studies (36.4%) had unclear risk of bias considering the unclear construction procedure of public datasets. Furthermore, no publication bias existed (P=0.10) by Deek’s funnel plot asymmetry test, shown in online supplementary appendix 4.
Meta-analysis for the performance of AI in PM and PM-CNV detection
For the detection of PM, the forest plots of SEN, SPE and 95% CIs for the included studies are shown in figure 2A and figure 2B. [13,16,19,20,22,23,27,29–31] Using the HSROC model, we obtained the SROC curve with a 95% confidence region and prediction region (figure 2C). The summary AUC was 0.99 (95% CI 0.97 to 0.99), and the pooled SEN, SPE, PLR, NLR, and DOR were 0.95 (95% CI 0.92 to 0.96), 0.97 (95% CI 0.94 to 0.98), 28.1 (95% CI 15.8 to 50.2), 0.06 (95% CI 0.04 to 0.08), and 495 (95% CI 243 to 1008), respectively. For the detection of PM-CNV, the forest plots for the included studies and the SROC curve plot are shown in figure 3. [12,13,21,26,33] The summary AUC was 0.99 (95% CI 0.97 to 0.99), and the pooled SEN, SPE, PLR, NLR, and DOR were 0.94 (95% CI 0.90 to 0.97), 0.96 (95% CI 0.94 to 0.98), 25.9 (95% CI 16.1 to 41.7), 0.06 (95% CI 0.03 to 0.10), and 435 (95% CI 220 to 860), respectively.
Heterogeneity analysis and meta-regression analysis
Since high heterogeneity (I2>50) was found in our forest plots when assessing the SEN and SPE for the detection of PM, we performed meta-regression to explore the potential reasons for heterogeneity. Through our analysis, the DOR was not correlated with any factors as follows: research regions (P value= 0.15); different types of validation datasets (P value= 0.23); imaging modalities (P value= 0.78); types of datasets (P value= 0.36); total data size of images (P value= 0.07).
Subgroup analysis
The results of subgroup analysis are summarized in table 2. We found imaging modalities and resources of data had no significant contributions to the diagnostic performance. For different types of validation datasets, there was a better performance in the internal dataset (SEN=0.95, 95% CI 0.94- 0.96; SPE= 0.97, 95% CI 0.96- 0.99; AUC= 0.99, 95% CI 0.97- 1.00) than external dataset (SEN=0.93, 95% CI 0.92- 0.95; SPE= 0.96, 95% CI 0.94- 0.97; AUC= 0.99, 95% CI 0.98- 0.99). For research regions, we found a better performance in developed countries (SEN=0.96, 95% CI 0.93- 0.98; SPE= 0.98, 95% CI 0.97- 0.99; AUC= 0.99, 95% CI 0.97- 0.99) than developing countries (SEN=0.94, 95% CI 0.90- 0.95; SPE= 0.96, 95% CI 0.93- 0.98; AUC= 0.98, 95% CI 0.97- 0.99). For different total sizes of data, a better performance was detected in data larger than 5000 (SEN=0.96, 95% CI 0.95- 0.98; SPE= 0.97, 95% CI 0.96- 0.99; AUC= 0.99, 95% CI 0.97- 0.99) than smaller than 5000 (SEN=0.93, 95% CI 0.91-0.95; SPE= 0.96, 95% CI 0.94- 0.98; AUC= 0.98, 95% CI 0.98- 0.99).
Sensitivity analysis
The sensitivity analysis is the repeat of the primary meta-analysis. We excluded 5 studies without sufficient information about the division of datasets or in-depth details of clinical data resources. [19,22,29–31] Then, the pooled SEN was 0.94 (95% CI 0.90 to 0.97), and the pooled SPE was 0.96 (95% CI 0.95 to 0.98) for the detection of PM. The results were similar to our main findings; hence, there was no evidence that our main outcome was influenced by which studies were included.