Diabetic retinopathy (DR) is a common disease caused by diabetes, affecting mostly working individuals and leading to loss of vision. By 2040, it is estimated that 600 million people will suffer from diabetes and approximately one third of them have a chance of getting DR [1]. DR is usually identified by an ophthalmologist by visual examination of digital fundus images for the presence of one or more retinal lesions such as microaneurysms, soft exudates, haemorrhages, and hard exudates [2]. DR can broadly be classified into non-proliferative DR (NPDR) and proliferative DR (PDR). The preliminary stage of DR is NPDR where the microaneurysms are visible in the fundus image and the advanced stage of DR that is PDR can lead to severe vision loss. The NPDR is further subdivided into 3 types mild, moderate and severe NPDR. The international clinical DR severity scale contains five grades for classifying DR based on fundus images grade 0 is no apparent retinopathy, grade 5 is PDR, and the aforementioned types of NPDR are classified as grade 1, 2, and 3 respectively.
Since PDR could lead to blindness and the manual evaluation of fundus images may create severe burden on ophthalmologists. Moreover, inaccurate grading of DR can happen with inadequate training of healthcare professionals. Hence, automated methods for DR screening are warranted to assist ophthalmologists and trained healthcare practitioners. However, poor quality digital fundus images can lead to false positives, and hence it is vital to first estimate the quality of acquired funds images before proceeding with DR grading [3]. Therefore, fully automated methods for accurate quality estimation (QE) of fundus images are demanding.
In the past decade, several state-of-the-art deep learning (DL) architectures including AlexNet [4], VGGs [5], GoogLeNet [6], ResNet [7], DenseNet [8], EfficientNets [9, 10] and recently vision transformer (ViT) [11] based models were developed for various computer vision tasks such as object localization, detection and classification. Even though, training DL models from scratch require massive data, transfer learning (TL) could facilitate adapting the already trained models for new classification tasks thus eliminating the need for huge data for training. Both TL and DL have been playing a major role in healthcare for moving towards the development of DL-based automated systems from medical images such as radiographs, computed tomography, digital fundus images, positron emission tomography, and magnetic resonance imaging, etc. for diagnostic and prognostic tasks as well as assisting the medical practitioners in several scenarios such as faster data acquisition and quality control [12–14]. EfficientNetV2 is one of the recently developed DL architectures based on progressive learning with a combination of neural architecture search and scaling to improve both the training speed and parameter efficiency [15]. EfficientNetV2 outperformed several previous state-of-the-art models including ViTs in image classification tasks on the ImageNet challenge. Here, the contributions of this work are as follows:
i)A fully automated method for overall QE of digital fundus images is proposed using an ensemble of EfficientNetV2 small (S), medium (M), and large (L) models. Since model ensembling was proved to be effective in some previous studies [16].
ii) The proposed ensemble model is cross-validated and tested on a publicly available dataset from DeepDRiD as QE on images from this dataset seems to be challenging [3].
iii) The ability of the proposed ensemble model for overall QE is further stratified with respect to DR disease severity.
Related work
Several works existed in the literature for the quality estimation of digital fundus images based on both machine learning and deep learning techniques. These works are basically divided into 2-class classification and 3-class classification problems which are given in Table 1. In 2-class classification, the images are divided into either good or bad quality. Whereas in the 3-class problem, the images are divided into good, moderate, and bad quality. In [17], a PLS regressor was developed based on handcrafted features and the method achieved an area under the receiver operator characteristic curve (AUC) of 95.8% on their private dataset. Further, using a support vector machine (SVM) classifier and from a mixture of private and public datasets containing fundus images of varying resolutions, [18] demonstrated an accuracy of 91.4%, [19] obtained an AUC of 94.5%, and [20] achieved a sensitivity of 95.3% in fundus image QE. In other studies, based on EyePACS Kaggle datasets [21, 22], pre-trained deep learning models were finetuned for feature extraction and these extracted features were further fed to the SVM classifier for detection of bad quality fundus images. The highest classification accuracy in these studies is 95.4%. Using the DRIMDB dataset, several ML classifiers were developed including gcforest, random forest regressor [23–25] and achieved accuracies above 98%.
Some recent studies existed for 3-class classification of fundus image quality using light-weight CNN [26] and an ensemble of CNNs [27] based on Kaggle datasets and obtained accuracies above 85% for 3-class quality classification. In the most recent study using pretrained ResNet50 [28], the finetuned model on a Kaggle dataset demonstrated an accuracy of 98.6%. Overall, using these private and public datasets mentioned so far, the classification task is generally easier since the images are quite differentiable to the naked eye. However, in a recent digital fundus image QE challenge [3], the good and bad quality images in the DeepDRiD dataset are quite hard to differentiate, and hence the highest accuracy obtained in the QE challenge was 69.81%. Therefore, there is a scope to improve the QE of DeepDRiD images and in the present study, we have explored the same using the EfficientNetV2 models and their ensembling [15].
Table 1
Previous works on assessment of fundus image quality using different machine learning and deep learning methods on various private and public datasets. DeepDRiD: Diabetic retinopathy – grading and image quality estimation challenge dataset. CNN: convolution neural network. ML: machine learning, DL: deep learning, PLS: partial least squares, SVM: support vector machine.
Study
|
Method
|
Dataset
|
Image Resolution
|
Performance (%)
|
Yu H et.al. [17]
|
PLS regressor
|
Private – 1884
|
4752 × 3168
|
AUC: 95.8
|
Yu F et.al. [21]
|
SM + AlexNet + SVM
|
Kaggle – 5200 (subset)
|
Original: 2592 × 1944
Resized: 256 × 256
|
Accuracy: 95.4
AUC: 98.2
|
Yao Z et.al. [18]
|
SVM
|
Private – 3224
|
-
|
Accuracy: 91.4
AUC: 96.2
|
Welikala R.A. et.al. [20]
|
SVM
|
UK Biobank – 800 (subset)
|
2048 × 1536
|
Sensitivity: 95.3
Specificity: 91.1
|
Wang S et.al. [19]
|
SVM
|
Private and Public – 536
|
Private: 2560 × 1960
Public: 570 × 760 and 565 × 584
|
AUC: 94.5
Sensitivity: 87.4
Specificity: 91.7
|
S Feng et.al. [22]
|
DT, SVM and DL
|
EyePACS at Kaggle – 4372
|
Multiple resolutions
|
Accuracy: 93.6
Sensitivity: 94.7
Specificity: 92.3
|
S Ugur et.al. [23]
|
Several ML classifiers
|
DRIMDB – 216
|
570 × 760
|
Accuracy: 98.1
|
Raj A et.al. [27]
|
Ensemble of CNNs
|
FIQuA (EyePACS at Kaggle) – 1500
|
Multiple resolutions
|
Accuracy: 95.7 (3-class classification)
|
Perez AD et.al. [26]
|
Light-weight CNN
|
Kaggle – 4768 (2-class)
Kaggle – 28792 (3-class)
|
896 × 896
|
Accuracy: 91.1 (2-class)
Accuracy: 85.6 (3-class)
|
Liu H et.al. [25]
|
gcforest
|
DRIMDB – 216 (3-class)
ACRIMA – 705 (2-class)
|
Multiple resolutions
|
Accuracy: 88.6 (DRIMDB dataset)
Accuracy: 85.1 (ACRIMA dataset)
|
Karlsson RA et.al. [24]
|
Random forest regressor
|
Private – 787 oximetry and 253 RGB
DRIMDB – 216 (194 were used)
|
1600 x 1200 (oximetry)
3192 x 2656 (RGB)
760 x 570 (DRIMDB)
|
Accuracy: 98.1 (DRIMDB)
ICC: 0.85 (oximetry)
ICC: 0.91 (RGB)
|
Shi C et.al. [28]
|
Pretrained ResNet50
|
Kaggle – 2434 (2-class)
|
Multiple resolutions
|
Accuracy: 98.6
Sensitivity: 98.0
Specificity: 99.1
|
Liu R [3]
|
ISBI 2020 grand challenge
|
DeepDRiD – 2000
(2-class)
|
Multiple resolutions
|
Accuracy: 69.81
|