We successfully achieved our goal of developing an artificial intelligence algorithm able to locate the uterus in pelvic MR examinations, place measurement keypoints on it, and provide its three-dimensional measurement with satisfactory accuracy.
The OKS was close to 1, improving from 0.92 (validation) to 0.96 (test). These results are explained by the fact that the OKS of the validation phase were calculated based on the cropped images, whereas those of the test phase were calculated from the full-size images. The larger the image, the smaller the positioning error.
One of the strengths of our study is that our network was tested in an external cohort which did not have a selection bias applied, except for subserous myomas. This performance favours the generalisation of this model.
To the best of our knowledge, only one study on uterine segmentation has been conducted to date. Kurata et al. evaluated a U-net architecture to contour the uterus on MR images[10]. They reached an average DSC score (dice similarity coefficient, which can be compared to OKS) of 0.82. This study included 122 patients with uterine disorders. Our model was optimised by using a substantially larger training database of 800 patients.
In parallel, for men, a wide range of studies have been carried out on the automatic segmentation of the prostate, with similar results. For example, Alexander Ushinsky et al. trained a customized hybrid U-Net CNN architecture on manually segmented MR images and had a DSC score of 0.898 [11].
However, it is more complex for an AI tool to locate and segment the uterus than of the prostate, because it can have different positions, bends, or shapes. Moreover, the uterus is surrounded by many elements (colon, bladder, ovaries).
Another highlight of our study is that our training dataset was strengthened by the clinical heterogeneity of its cases, both in terms of pathological conditions and patient preparation. It included cervical cancer, endometriosis, and rectal vaginal opacification. This suggests that the performance of our CNN would be robust in prospective clinical settings.
Most studies on automated segmentation have used volumetric models or U-Net architectures. In contrast, our network’s performance was achieved with the VGG-11/16 architectures. This model is more suitable for distance measurements, because it is specifically designed to locate an organ and place measurement points on it. To do so, our pipeline operates using two different models.
The average deviation between the AI measurements and those of the radiologists was 3.6 mm (± 6.6 SD), while the inter-radiologist variability was 1.4 mm. However, the R² coefficient was approaching 0.94 for lengths and width, meaning the coherence remained extremely strong between the radiologists and AI. For thickness, however, the R² coefficient was 0.75, owing to the algorithm being challenged by the junctional zone in rare cases.
The speed of our system is a major advantage over the time required for manual segmentation. In our experience, it takes a radiologist 37.89 seconds to measure a uterus in three dimensions, set against 1.6 seconds for the algorithm. Our VGGnet may increase the throughput.
Our algorithm has the ability to overthrow a basic task, thus saving radiologists’ time for significant intellectual tasks.
Our study had a few limitations that should be acknowledged. First, this was a retrospective, monocentric study. The database was created using three MRI scanners (General Electric Healthcare, Valenciennes Hospital, France). The generalisability to other centres or MRI equipment has not yet been established. We subsequently included images obtained using the same T2-weighted acquisition protocols. We can imagine a comparative study of the performance of the algorithm between different MRI parameters or protocols.
We can easily imagine a clear application of our AI tool in daily practice. The measurements of the algorithm can be displayed on the image server or automatically added to reports. Subsequent studies are required to prospectively validate our network in a clinical setting. We could consider further studies using the same pipeline to measure endometrial thickness or ovarian dimensions.