In this study, we found almost perfect inter-rater reliability of kidney measurements, kidney volumes, and HI among three raters using a 3D semi-automated segmentation software in an 86-kidney sample. Higher ICC estimates were found for length and lower for width, which can be attributed to the ease of identifying the poles in contrast to the hilar border during the kidney segmentation process, resulting in a higher variability in width measurements.
High inter-rater reliability was also observed in parenchymal and collecting system volumes. This is significant because both variables are part of the criteria to assess UTD, and the collecting system volume could signal obstructive uropathy, which has potential implications for kidney function [14].
When analyzing inter-rater reliability among kidneys with UTD P3, only HI showed a lower ICC estimate compared to other variables. HI provides a value for the dilated collecting system volume relative to the capsular volume, which can be harder to discern while segmenting in the context of an overall enlarged kidney with a distorted shape due to dilation [7]. Nevertheless, the inter-rater reliability of HI in UTD P3 kidneys was substantial, and for non-UTD P3 kidneys, it was almost perfect, showing higher estimates when the parenchyma was not affected and/or the dilation was less severe. Similarly, inter-rater reliability for HI was lower for kidneys with UTD P2 compared to non-UTD P3, yet higher than for UTD P3. This observation supports the notion that reliability diminishes with increasing dilation grade.
3D US allows for volumetric evaluation and improved visualization of the collecting system. Coupled with the semi-automatic segmentation approach, it facilitates an objective, standardized, and quantitative method for categorizing patients with UTD [5]. We demonstrated that there was a strong positive correlation between 2D and 3D kidney dimensions, supporting the reliability of 3D US for assessing kidney size in the context of UTD. These attributes underscore the value of semi-automatic 3D segmentation values as clinical parameters in the follow-up of patients with UTD. Given that US is a noninvasive, easily accessible, and radiation-free primary imaging modality [5], the high inter-rater agreement among users is a favorable finding in the clinical implementation of this tool.
Various 3D kidney segmentation models have been proposed to analyze renal parenchyma with UTD [10, 11, 15]. The evolution of the models has progressed towards semi-automatic segmentation as a solution to the drawbacks associated with manual segmentation, which is time-consuming, labor intensive and susceptible to inter-operator variability. Additionally, fully automatic methods face challenges; for example, the diverse shapes of kidneys and variations in image intensity distributions could affect the automatic segmentation [16]. Moreover, to make segmentation more accurate, semi-automatic tools address the variability in the direction of US wave propagation by assigning weights to the calculated values based on the reference point of orientation relative to the position of the transducer [17]. One of these models has been tested to evaluate the performance of the segmentation algorithm, yielding promising outcomes of segmentation accuracy and HI estimation [10]. Nevertheless, the software’s reliability across multiple raters was not tested, making our study one of the initial efforts to examine the consistency of a semi-automatic segmentation software among raters [18].
This study has limitations, as raters were aware that their values were being assessed for reliability, potentially influencing their behavior, and increasing agreement. However, the assessments' quantitative nature provided a higher level of objectivity.
In conclusion, our study demonstrates that semi-automated 3D US segmentation for pediatric kidneys can be used to effectively assess renal dimensions, parenchymal volumes, and HI among raters of kidneys with varying degrees of UTD. Identifying these reliable parameters could help determine when more invasive exams, such as MAG3 renography or MRU, are necessary to assess renal function. Additionally, this study establishes the potential for future comparisons between 3D US parenchymal volume measurements and those obtained from established imaging modalities (MAG3 renography and MRU).