Three-dimensional measurement of the uterus on magnetic resonance images: development and performance analysis of an automated deep learning tool

doi:10.21203/rs.3.rs-2696476/v1

Download PDF

Research Article

Three-dimensional measurement of the uterus on magnetic resonance images: development and performance analysis of an automated deep learning tool

https://doi.org/10.21203/rs.3.rs-2696476/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

The aim of our study was to develop, validate, and test a deep learning (DL) tool for fully automated measurement of the three-dimensional size of the uterus on magnetic resonance imaging (MRI) and to compare it to manual reference measurement.

Materials and Methods

In this single-centre retrospective study, 845 cases were included for training and validation. The ground truth was a manual measurement of the uterus on magnetic resonance (MR) images. A deep learning tool using a convolutional neural network (CNN) with VGG-16/VGG-11 architecture was developed. The performance of the model was evaluated using the objective keypoint similarity (OKS), mean difference in millimetres, and coefficient of determination R² on a new set of 100 patients.

Results

The OKS of our artificial intelligence (AI) model was 0.92 (validation) and 0.96 (test). These performances show a strong correspondence of the positioning of the measurement points between the algorithm and radiologists. The average deviation and R² coefficient between the AI measurements and the manual ones were respectively 3.9 mm and 0.93 for two-points length, 3.7 mm and 0.94 for three-points length, 2.6 mm and 0.93 for width, 4.2 mm and 0.75 for thickness. Inter-radiologist variability was 1.4 mm. A three-dimensional automated measurement was obtained in 1.6 s.

Conclusion

Our deep learning model can locate the uterus on MR images and place measurement points on it to obtain its three-dimensional measurement with a very good correlation with manual measurements.

Deep learning

Convolutional neural network

Artificial intelligence

Uterus

Measurement

MRI

1/ The proposed algorithm located the uterus on MR images and was able to place keypoints measurements on it with an excellent accuracy. The mean OKS was 0.96.

2/ The average deviation between AI measurements and radiologists’ measurements was 3.6 mm. It remains inconsequential, especially for length and width, even though it slightly exceeds inter-radiologist variability (1.4 mm).

3/ This tool can easily be applied in clinical practice as an alternative to time-consuming manual tracing (1.6 s for automated three-dimensional measurement versus 37 s for manual measurement).

Several variations can be observed in the size of the uterus, due to genital activity, gestation, and pathology [1].

Uterus measurements are useful for assessing both the treatment and follow-ups of gynaecological patients. Evaluation of the size of the uterus helps describe the development and senescence of the organs, to choose the best procedures (intra uterine device insertion, hysteroscopy), and to assist surgical procedures such as laparoscopy or laparotomy.

Therefore, an accurate description of uterine measurements must be made in the magnetic resonance (MR) examination of the female pelvis [2].

Usually, the measurement is performed manually by radiologists, wasting medical time and leading to inter-operator variability [3].

Artificial intelligence (AI) has become a potential solution to assist segmentation [4][5]. Thus, AI can develop an automatic segmentation tool for given structures and enable significant improvements in radiological workflows.

Deep learning models have been successful in providing automatic segmentation for different organs such as the prostate, kidneys, and heart [6][7][8][9].

Measuring the uterus is a more challenging task due to anatomical variability, complex contrasts with surrounding tissues, and pathologies such as endometriosis or myomas that distort the contours of the uterus.

Along with ultrasound, magnetic resonance imaging (MRI) is the best imaging modality for the female pelvis, providing better reproducibility because it is not observer dependent.

The aim of our work was to develop, validate, and test an automated deep learning tool for MR images, to implement an automatic measurement of the three-dimensional size of the uterus, and to evaluate its performance compared with the manual measurements of radiologists.

This retrospective study of model creation was approved by the CERCI Independent Ethics Committee of Valenciennes Hospital under the reference CHV-2022-006. All patients were informed of the use of their medical data according to the legal framework imposed by CNIL MR-004. In addition, all data were pseudonymized beforehand.

DATA ACQUISITION

All women over 18 years of age who underwent pelvic MRI, including sagittal and axial T2-weighted images, in the women's imaging department of Valenciennes Hospital (France) between September 2021 and March 2022 were retrospectively collected from the Institutional Picture Archiving and Communication System (EMR Manager, VEPRO AG, Version 8.2, Germany).

Pregnant women were excluded, as were MR images with severe motion artefacts, a highly deviated uterus, and subserous myomatous pathology (FIGO VI and VII)

These examinations were performed using two MR units, either at 1.5 T (SIGNA artist) or 3 T (SIGNA premier) (General Electric Healthcare, Cleveland, USA). The acquisition parameters are listed in Table 1.

Table 1

Magnetic resonance imaging (MRI) acquisition parameters
MRI Parameters	General Electric 1.5T, SIGNA artist, 2021		General Electric 1.5T, SIGNA artist, 2020		General Electric 3T, SIGNA premier, 2019
Plane	Sagittal	Axial	Sagittal	Axial	Sagittal	Axial
TE (ms)	100–120		100–120		115–120
TR (ms)	5000 − 1100		4000–10000		4000–13000
Number of excitations (Nex)	2		2		1.5	2
Field of view (mm)	[393x260]-[408x220]	363x240	393x260	[360 − 240]-[410x270]	332x220
Frequency (Hz)	41.67		41.67		50
Slice thickness (mm)	3.5-4.0		3.5-4.0		3.0-3.5
Interslice gap (mm)	3.5		3.5		0.5-3.0

Patients were randomly assigned to a training (80%) and validation set (20%), without any overlap.

An additional set of MR images acquired between July and August 2021 was used for external validation using the same inclusion and exclusion criteria (test set).

DATA LABELLING

One radiologist (DM) used an artificial intelligence software specifically designed for medical imaging (Cleverdoc V1.9.0 platform) for labelling.

When the uterus was not seen, a label of “not observable” was attributed to the examen. When visible, it was targeted by a square area (label box). Three consecutive cuts were annotated with points corresponding to the measurements of the uterus (Fig. 1). Thickness and length were noted on sagittal T2-weighted images. The length was measured using two different methods: one major axis defined by two points, and the other defined by three points passing through the fundus, the cervix-isthmus junction, and the exocervix (preferred when the uterus is considerably flexed) (Fig. 2). The width was labelled on the axial T2-weighted images. These manual measurements defined the ground truth.

MODEL

The overall pipeline works due to two separate models working successively.

First, a box model finds the uteri.

Then, a keypoint model places the measurement points on the cropped version of the image by placing two points for each class. This is achieved by splitting each class into two subclasses. The point for each subclass is determined by its position (top-left or bottom-right). For a length composed of three points, the midpoint is given in a separate preprocessing step.

We used convolutional neural network (CNN) architectures with an encoder/decoder pattern.

The encoder network was composed of VGG-16 (box model) and VGG-11 (keypoint model).

It receives an image with a size of 224 × 224 pixels as input. Subsequently, it passes through five blocks which are separated by a max pooling layer that halves the height and width of the features (Fig. 3).

Our model’s decoders are single-instance boxes and keypoint detectors, which produce one box and one keypoint instance for each output class, respectively.

The input of the head is a four-dimensional vector of shape. The outputs are four values per class [x,y,w,h] for box position and two values per class [x, y] for keypoint position.

TRAINING

We applied data augmentation techniques such as vertical and horizontal flipping, random rotation of a multiple of 90°, and translation of up to 0.1. For the keypoint model, we applied more techniques, such as changing the brightness and contrast, blurring the image, applying Gaussian noise, or reversing the image’s colours.

The box model was run for 100 epochs (220,783 iterations), and the keypoint model was run for 250 epochs (40,320 iterations).

The Adam optimiser was used with a learning rate of 0.0001 to optimise the weight of the model.

STATISTICS

Model training

We kept track of the model’s losses and calculated the intersection over union (IoU) to evaluate the accuracy of box positions, and the objective keypoint similarity (OKS) for each keypoint class. This metric quantifies the closeness of the predicted keypoint location by using the ground-truth keypoint. The closer the predicted keypoint is to the ground truth, the closer the OKS approach is to 1. Above 0.80, the model is considered very good. OKS is calculated as follows:

OKS\(=\text{e}\text{x}\text{p}(-\frac{{d}^{2}}{2{s}^{2}{k}^{2}})\)

where d is the distance between the ground truth keypoint and predicted keypoint, s is the area of the bounding box divided by the total image area, and k is the per-keypoint constant that controls the fall off.

Model testing

Four experts with experience in genital imaging (EP, GR, BH, and LD with 12, 2, and 1 years of experience, respectively) manually measured the size of the uterus in three dimensions on every MR image of the test set. The radiologists were blinded to the measurements made by others and to the machine. The same viewer (ViewerCleverdoc1.9.0) was used.

We calculated the absolute average difference (in millimetres) between the measurements of one dataset and those of another. The coefficient of determination R² was used to reflect how well the AI measurements matched those of the radiologists. If R² equals 1, the algorithm obtains strictly identical measurements to those of the radiologists.

The Cleverdoc V1.9.0 platform (Lille, France) was used for the statistical analyses.

A total of 845 MRI scans were collected. 45 patients were excluded: 37 because of myomas, 6 because of pregnancies, 3 because of a highly deviated uterus, and 2 because of poor image quality. Patient characteristics are shown in Table 2.

Table 2

Characteristics of the patients whose data was included for training, validation, and testing of the model
	Training and validation set (n = 800)	Test set (n = 100)
Age (mediane [interquartiles])	45 (33–58)	47 (34–56)
Gel vaginal markup No Yes	436 [65%] 364 [45%]	60 [60%] 40 [40%]
Uterus position Anteflexed Retroflexed	704 [88%] 96 [12%]	93 [93%] 7 [7%]
MRI without pelvic pathology	177 [22%]	26 [26%]
Subperitoneal endometriosis	123 [15%]	13 [13%]
Adenomyosis	116 [14%]	12 [12%]
Myomas (FIGO 0 – V)	124 [15%]	19 [19%]
Cervical cancer	23 [3%]	2 [2%]
Endometrial pathology	75 [9%]	10 [10%]
Ovarian pathology	165 [21%]	16 [16%]
Hysterectomy	50 [6%]	-
Uterine malformation	7 [0.9%]	1 [1%]
Other pathology (static disorder, no-gynecological pathology …)	82 [10%]	13 [13%]

From the 800 included patients for training and validation, 4,800 sets were obtained (three consecutive slices centred on the uterus for each sagittal and axial sequence).

An additional external cohort of 100 MR images was used for the model testing (Fig. 4).

Validation Performance ( initial dataset)

During the validation phase, the algorithm was able to locate the uterus and the measurement keypoints with excellent accuracy.

With the measurement by DM as ground truth, the mean OKS was 0.92, ranging from 0.90 and 0.94 (Table 3). The OKS was calculated using the cropped images.

Table 3

Objective keypoint similarity (OKS) values of the algorithm for each measurement keypoint, with the measurement by DM as ground truth.
Key point	Length2 topleft (L1)	Length2 bottomright (L2)	Length2 middle (L3)	Length1 topleft (L4)	Length1 bottomright (L5)	Width topleft (W1)	Width bottomrigth (W2)	Thickness topleft (T1)	Thickness bottomright (T2)	Average (av)
OKS	0.92	0.90	0.94	0.90	0.90	0.94	0.93	0.92	0.93	0.92

Test performance (external dataset)

We observed an improvement in the accuracy of our model when we switched to a new unknown cohort for testing. With the average of radiologist’s measurements as ground truth, the mean OKS of our DL tool was 0.96, ranging from 0.95 and 0.98, as reported in Table 4. The OKS was calculated using full-size images.

Table 4

Objective keypoint similarity (OKS) values of the algorithm and of each radiologist, for each measurement keypoint, with the average of measurements by radiologists as ground truth.
Key point	Length2 Topleft (L1)	Length2 Bottom right (L2)	Length2 middle (L3)	Length1 Topleft (L4)	Length1 Bottom right (L5)	Width Topleft (W1)	Width Bottom right (W2)	Thickness Topleft (T1)	Thickness Bottomright (T2)	Average (av)
GR	0.96	0.96	0.98	0.95	0.95	0.99	0.98	0.94	0.94	0.96
ED	0.97	0.97	0.98	0.96	0.95	0.99	0.98	0.95	0.95	0.97
LD	0.96	0.96	0.97	0.97	0.96	0.99	0.98	0.95	0.95	0.97
BH	096	0.96	0.97	0.96	0.95	0.98	0.97	0.95	0.95	0.96
AI	0.95	0.95	0.97	0.96	0.95	0.97	0.98	0.95	0.94	0.96

Regarding the execution speed, it took less than 5 min for the model to extract all measurements, that is, one three-dimensional measurement in approximately 1.6 s. In comparison, the average time for a manual measurement by a radiologist was clocked at 37.89 s.

Correlation Between Manual And Automated Measurements

Out of the 100 MR images of the test set, the average deviation between AI measurements and radiologists’ measurements was 3.6 mm (± 6.6 standard deviation [SD]). The distribution of the gaps were as follows: 3.9 mm for two-points length, and 3.7 mm for three-points length, 2.6 mm for width, and 4.2 mm for thickness (Fig. 5, Table 5).

Table 5

Statistics of uterine dimension measurements by the radiologists (ground truth) and by the algorithm (AI)
	Minimum (mm)		Maximum (mm)		Median (mm)		Average (mm)		Standard Deviation (SD)
	Ground truth	AI	Ground truth	AI	Ground truth	AI	Ground truth	AI	Ground truth	AI
Length 1 (2 points)	44.73	1.59	123.19	127.94	76.46	76.93	79.95	78.09	17.13	19.28
Length 2 (3 points)	42.14	46.04	123.96	119.75	77.23	74.69	79.43	76.19	16.43	15.46
Thickness	19.78	6.3	73.56	74.08	39.23	36.90	39.89	36.23	11.13	11.95
Width	32.16	35.95	93.65	90.98	52.72	52.58	54.42	54.60	12.35	11.06

The R² coefficients of determination between the algorithm’s measurements and the average of the radiologists’ measurements were 0.93 for two-points length, 0.94 for three-points length, 0.93 for width, and 0.75 for thickness, as shown in Fig. 6.

Variability Between Measurements By Radiologists

The mean difference in measurements between all radiologists was 1.4 mm: 1.27 mm for two-points length, 2.2 mm for three-points length, 1.14 mm for width, and 0.93 mm for thickness (Table 6).

Table 6

Average Deviation (AD) in millimeters between all radiologists
Measures	Length2	Length2	Width	Thickness
EP	2.11	1.37	0.97	0.93
LD	2.12	1.2	1.12	0.92
BH	2.23	1.42	1.31	1.08
GR	2.36	1.11	1.15	0.8
Average	2.2	1.27	1.14	0.93

We performed a secondary analysis to highlight the distribution of the AI errors. Nineteen out of 200 images (100 axial + 100 sagittal) had an absolute deviation (averaged over all image measurements) discreetly greater than 8 mm. This represents less than 10% of the total number of examinations.

We successfully achieved our goal of developing an artificial intelligence algorithm able to locate the uterus in pelvic MR examinations, place measurement keypoints on it, and provide its three-dimensional measurement with satisfactory accuracy.

The OKS was close to 1, improving from 0.92 (validation) to 0.96 (test). These results are explained by the fact that the OKS of the validation phase were calculated based on the cropped images, whereas those of the test phase were calculated from the full-size images. The larger the image, the smaller the positioning error.

One of the strengths of our study is that our network was tested in an external cohort which did not have a selection bias applied, except for subserous myomas. This performance favours the generalisation of this model.

To the best of our knowledge, only one study on uterine segmentation has been conducted to date. Kurata et al. evaluated a U-net architecture to contour the uterus on MR images[10]. They reached an average DSC score (dice similarity coefficient, which can be compared to OKS) of 0.82. This study included 122 patients with uterine disorders. Our model was optimised by using a substantially larger training database of 800 patients.

In parallel, for men, a wide range of studies have been carried out on the automatic segmentation of the prostate, with similar results. For example, Alexander Ushinsky et al. trained a customized hybrid U-Net CNN architecture on manually segmented MR images and had a DSC score of 0.898 [11].

However, it is more complex for an AI tool to locate and segment the uterus than of the prostate, because it can have different positions, bends, or shapes. Moreover, the uterus is surrounded by many elements (colon, bladder, ovaries).

Another highlight of our study is that our training dataset was strengthened by the clinical heterogeneity of its cases, both in terms of pathological conditions and patient preparation. It included cervical cancer, endometriosis, and rectal vaginal opacification. This suggests that the performance of our CNN would be robust in prospective clinical settings.

Most studies on automated segmentation have used volumetric models or U-Net architectures. In contrast, our network’s performance was achieved with the VGG-11/16 architectures. This model is more suitable for distance measurements, because it is specifically designed to locate an organ and place measurement points on it. To do so, our pipeline operates using two different models.

The average deviation between the AI measurements and those of the radiologists was 3.6 mm (± 6.6 SD), while the inter-radiologist variability was 1.4 mm. However, the R² coefficient was approaching 0.94 for lengths and width, meaning the coherence remained extremely strong between the radiologists and AI. For thickness, however, the R² coefficient was 0.75, owing to the algorithm being challenged by the junctional zone in rare cases.

The speed of our system is a major advantage over the time required for manual segmentation. In our experience, it takes a radiologist 37.89 seconds to measure a uterus in three dimensions, set against 1.6 seconds for the algorithm. Our VGGnet may increase the throughput.

Our algorithm has the ability to overthrow a basic task, thus saving radiologists’ time for significant intellectual tasks.

Our study had a few limitations that should be acknowledged. First, this was a retrospective, monocentric study. The database was created using three MRI scanners (General Electric Healthcare, Valenciennes Hospital, France). The generalisability to other centres or MRI equipment has not yet been established. We subsequently included images obtained using the same T2-weighted acquisition protocols. We can imagine a comparative study of the performance of the algorithm between different MRI parameters or protocols.

We can easily imagine a clear application of our AI tool in daily practice. The measurements of the algorithm can be displayed on the image server or automatically added to reports. Subsequent studies are required to prospectively validate our network in a clinical setting. We could consider further studies using the same pipeline to measure endometrial thickness or ovarian dimensions.

AD	Average Deviation
AI	Artificial Intelligence
CNN	Convolutional Neural Network
DL	Deep Learning
DSC	Dice Similarity Coefficient
IoU	Intersection over Union
ML	Machine Learning
OKS	Objective Keypoint Similarity
SD	Standard Deviation

SUMMARY STATEMENT

Deep learning for MRI three-dimensional measurement of the uterus is possible, and has similar performances to expert radiologist.

Ethical approval and consent to participate :

We confirm that all methods were carried out in accordance with relevant guidelines and regulations. This retrospective study of model creation was approved by the CERCI Independent Ethics Committee of Valenciennes Hospital under the reference CHV-2022-006. All patients were informed of the use of their medical data according to the legal framework imposed by CNIL MR-004. In addition, all data were pseudonymized beforehand.

All patients were informed of the use of their medical data according to the legal framework imposed by CNIL MR-004, and informed consent was obtained from all subjects and/or their legal guardian(s).

Consent for publication :

Not Applicable.

Availability of date and materials :

Competing interest :

We have no conflicts of interest of the project to disclose.

Funding :

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Authors’ contributions :

Daphné Mulliez wrote the main manuscript text and prepared all figures and tables. Edouard Poncelet, Guillaume Ramette, Lan Anh Dang and Blandine Hamet participated in the external validation phase (test). Laurie Ferret contributed to the statistics section.

All authors reviewed the manuscript and consented for publication.

Aknowledgements :

We would like to warmly thank Theo Hiraclides and Kevin Delplanque for their involvement.

Goldstuck N. Assessment of uterine cavity size and shape: a systematic review addressing relevance to intrauterine procedures and events. Afr J Reprod Health. Sept 2012;16(3):130-9.
Ludwin A, Martins WP. Correct measurement of uterine fundal internal indentation depth and angle: an important but overlooked issue for precise diagnosis of uterine anomalies. Ultrasound Obstet Gynecol. 2021;58(3):497‑9. Doi : https://doi.org/10.1002/uog.22192
Brouwer CL, Steenbakkers RJHM, van den Heuvel E, Duppen JC, Navran A, Bijl HP, et al. 3D Variation in delineation of head and neck organs at risk. Radiat Oncol Lond Engl. Mars 2012;7:32. Doi : https://doi.org/10.1186/1748-717X-7-32
Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in Auto-Segmentation. Semin Radiat Oncol. Juill 2019 ;29(3) :185‑97. Doi :https ://doi.org/10.1016/j.semradonc.2019.02.001
Kalantar R, Lin G, Winfield JM, Messiou C, Lalondrelle S, Blackledge MD, et al. Automatic Segmentation of Pelvic Cancers Using Deep Learning: State-of-the-Art Approaches and Challenges. Diagn Basel Switz. Oct 2021;11(11):1964. Doi : 10.3390/diagnostics11111964
Han S, Hwang SI, Lee HJ. The Classification of Renal Cancer in 3-Phase CT Images Using a Deep Learning Method. J Digit Imaging. Août 2019;32(4):638‑43. Doi: 10.1007/s10278-019-00230-2
Van Gastel MDA, Edwards ME, Torres VE, Erickson BJ, Gansevoort RT, Kline TL. Automatic Measurement of Kidney and Liver Volumes from MR Images of Patients Affected by Autosomal Dominant Polycystic Kidney Disease. J Am Soc Nephrol JASN. Août 2019;30(8):1514‑22. Doi : 10.1681/ASN.2018090902
Sforazzini F, Salome P, Moustafa M, Zhou C, Schwager C, Rein K, et al. Deep Learning–based Automatic Lung Segmentation on Multiresolution CT Scans from Healthy and Fibrotic Lungs in Mice. Radiol Artif Intell. janv 2022;4(2):e210095. Doi: 10.1148/ryai.210095
Van Assen M, Muscogiuri G, Caruso D, Lee SJ, Laghi A, De Cecco CN. Artificial intelligence in cardiac radiology. Radiol Med (Torino). Nov 2020;125(11):1186‑99. Doi : https://doi.org/10.1007/s11547-020-01277-w
Kurata Y, Nishio M, Kido A, Fujimoto K, Yakami M, Isoda H, et al. Automatic segmentation of the uterus on MRI using a convolutional neural network. Comput Biol Med. Nov 2019;114:103438. Doi : 10.1016/j.compbiomed.2019.103438
Ushinsky A, Bardis M, Glavis-Bloom J, Uchio E, Chantaduly C, Nguyentat M, et al. A 3D-2D Hybrid U-Net Convolutional Neural Network Approach to Prostate Organ Segmentation of Multiparametric MRI. Am J Roentgenol. Janv 2021;216(1):111-6. Doi: 10.2214/AJR.19.22168

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Three-dimensional measurement of the uterus on magnetic resonance images: development and performance analysis of an automated deep learning tool

Status:

Version 1

Abstract

Background

Materials and Methods

Results

Conclusion

Figures

Key Points

Introduction

Materials & Methods

DATA ACQUISITION

DATA LABELLING

MODEL

TRAINING

STATISTICS

Model training

OKS\(=\text{e}\text{x}\text{p}(-\frac{{d}^{2}}{2{s}^{2}{k}^{2}})\)

Model testing

Results

Correlation Between Manual And Automated Measurements

Variability Between Measurements By Radiologists

Discussion

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1