We investigated the intra-and inter-observer reliability and variation of the freehand ROI method in nine different regions of the brain in a sample of 40 healthy adults.
The SNR measurements showed that the image quality was sufficient for reliable quantitative measurements. In general, the SNR of b = 0 s/mm2 should be at least 20 in order to derive reliable FA values (36). In our study, the SNR was well above 20 in all regions, and the measured SNR values were comparable to other studies (46, 47).
A limitation of this study was that the commercial program did not include eddy current and subject motion corrections. In addition, the spatial resolution was also a bit lower in comparison to that of modern imaging.
FA values are considered to reflect the integrity of the white matter. Although not in itself a specific parameters in a diagnostic sense, it provides indirect information about myelination, fiber packing density, and fiber orientation (48). It is well-known that FA values vary widely at different anatomic levels of the brain (12, 13, 40, 45, 49). Specifically, Lee et al (2009) reported that regional FA values varied from 0.21 in deep gray matter (putamen) to 0.81 in tightly packed parallel white matter tract bundles, such as the genu of the corpus callosum (12). The corresponding results in this study were 0.32 for deep gray matter (thalamus) and 0.86 for the genu of the corpus callosum. Regions with coherently oriented fibers, such as the cerebral peduncle, internal capsule, and corpus callosum exhibited higher anisotropy than regions with less coherence, such as the centrum semiovale and other subcortical regions (50). Because the regional variability of the FA is in general very large, the possible anatomical mismatch should be taken into account in inter-observer and intergroup comparisons (49). Moreover, less regional variation is found in ADC values (13). In our study, the ADC mean values varied between 0.7–0.8 x 10− 3 mm2/s, and in other similar studies the variation was 0.7 to 0.9 x 10− 3 mm2/s (45, 51–53). In the frontobasal area, the FA and AD values were lower and the ADC and RD values were higher when compared to other WM regions. The FA values were in line with a tractography study by Deng et al (2018), with a mean FA value of 0.41 (profile 0.3 to 0.52) in the uncinate fasciculus and 0.54 (profile 0.40 to 0.68) in the forceps minor (54). In our study, the FA values were 0.57 and 0.51, respectively. The results of Lieberman et al (2014) were also very close to ours in the uncinate fasciculus (55). The FA and ADC values were almost identical to those found in our previous study (30 subjects) in most of the regions (40). The biggest difference between our present and previous study was found in the genu of the corpus callosum (14%). In this region, measurements were previously made on sagittal images (40) instead of axial images, like in the present study.
Asymmetry between the hemispheres was found in some of the regions. In the pyramidal tracts, such as the posterior limb of the capsula interna and corona radiata, the FA values were higher and the ADC and RD values lower in the left hemisphere. The present results are well in agreement with previous studies (13, 40, 56). Some of the observed asymmetry in our study may be attributed to handedness of the volunteers, because 39 of the 40 volunteers in our study were right-handed. Corresponding hemispheric differences were obtained for right-handers in another study (56). In the case of the corona radiata, the phase artifact could also be one possible explanation. In this region, phase artifacts were present in 55% of cases in the left hemisphere but were not present at all in the right hemisphere. The fat-miss registration just raises the FA value locally and decreases the ADC value. An artifact can affect the ROI in the vicinity, even if the visible part of the artifact is cropped out. In the frontobasal regions, the FA values were found to be higher in the right hemisphere, which is in agreement with previous findings (40, 57). Jahanshad et al (2010) found that the frontal lobe variance in asymmetry is strongly due to genetic factors (57). In our study, higher FA values were usually found in the right hemisphere in the frontobasal area. Bonekamp et al (2007) reported that small hemispheric differences could be due to slight slice angulation (58). Therefore, keeping the same slice position and orientation in longitudinal studies is essential (49).
In terms of age-related changes, we found significant differences between the youngest age group (18–30 years) and other age groups (31–40, 41–50, and 51–60 years). Specifically, in the youngest age group, the FA values were higher and the RD values lower in the frontobasal area in both hemispheres when compared to the other age groups. For FA, this result has already been published in our previous study (41). Other studies have also found changes in the frontal regions of the brain caused by aging (16, 17). In general, several studies have found a negative correlation between age and FA and a positive correlation between age and RD in white matter (21, 22, 59). These variations may be related to changes in myelinization and axon density (17, 60).
In the present study, the acceptable intra-observer variability (≤ 10%) was found in six out of nine regions for FA, while three regions had moderate but adequate variation. For ADC and AD, all regions had acceptable variability. In the RD results, seven out of nine regions had an acceptable or moderate variation and two had high variation (genu and splenium of the corpus callosum). The percent variation of the RD values in the corpus callosum is naturally high, because the mean value is clearly lower than in the other regions. Low RD values are due to the fact that the fibers are tightly packed and parallel to each other. In this case, the variation was not a good indicator for assessing reliability. Overall, the variation results were in line with our previous study (40). It is noteworthy that the freehand method gives an average of 4% lower variations in the pyramidal regions compared to the circle method (13, 41). In contrast, in our study, the freehand method gave a slightly higher variation in the corpus callosum than the circle method in previous studies (13, 41). This may also be due to the fact that in our study, ROIs were plotted on the axial image, whereas in previous studies they were plotted on the sagittal image (13, 41). Thus, in this particular region, it would be better to use the circle method for a sagittal image than the freehand method for an axial image. The inter-observer (n = 15) variability was acceptable or moderate in seven out of nine regions. The inter-observer variabilities are in line with our previous study (40).
Intra-observer repeatability was at a very good level according to the 95% limits of agreement. The results varied according to region, and, with tightly packed white matter tracts, such as the posterior limb of the capsula interna, the difference between the limits was small. Additionally, this difference was greater in areas containing crossing fibers, such as the corona radiata and centrum semiovale. Overall, the results were consistent with our previous research (40). The inter-observer agreement was lower than the intra-observer agreement in all regions, and others have reported similar results (13, 40, 61, 62). It is common for that inter-observer agreement results have been one-third lower than intra-observer results (61, 62).
The intra-observer reliability was high according to the average measures of the ICC analysis. Overall, the average ICC results were excellent for all four parameters. The repeatability result was also excellent (above 0.8) in eight out of nine regions for FA and all regions for the ADC. The repeatability of the freehand method was significantly improved compared to our previous study (40). The average ICC increase was 0.4 (37%) in terms of the FA and ADC parameters. The higher ICC values were probably due to increased observer experience in selecting a slide, avoiding artifacts and the partial volume effect of border areas. The single intra-observer ICC analysis was, on average, excellent in terms of the ADC and RD parameters and moderate in terms of the FA and AD parameters. The results showed excellent or moderate repeatability in seven out of nine regions for all DTI parameters. The region with the highest single ICC values was the forceps minor, with excellent reliability for each parameter. Good reliability was also found in the following regions: the uncinate fasciculus, thalamus, and the genu and splenium of the corpus callosum. High reliability in the corpus callosum is consistent with previous studies with the ROI method (45, 63, 64) but also with the TBSS method (38). Inadequate results (ICC < 0.69) were found in the cerebral peduncle (FA, ADC and RD) and centrum semiovale (AD). The reason for the inferior reliability of the cerebral peduncle was the susceptibility artifact, more specifically the air-cavity. This artifact causes local changes in the results of the parameters. Although efforts were made to avoid distracted areas in the ROI, the effects of the artifact were also reflected in the surrounding areas. The reason for the low reliability of the centrum semiovale in the AD values can be explained by the multitude of crossing fibers in the subcortical white matter.
Generally, the regions with high reliability and low variation possess some common features. These regions have low anatomical variation and tightly packed fibers with a common orientation (65). These areas also often have a better SNR, fewer partial volume effects, and are also less affected by “crossing” fibers. In addition, the larger ROI size increases the SNR value and improves the repeatability (65). When a larger ROI size is used in a limited region, it is likely that there are more percentages of the same voxels between the two measurements than for a smaller ROI. The results of the repeat measurements are thus close to each other.
In the future studies, larger samples of carefully collected high-spatial and -angular resolution DTI normal data should be required. In those studies, more subjects should be recruited for each age group in order to perform a reliable analysis of the effect of age. In addition, it would be interesting to study how much the reliability of the measurements improve when different methods, such as the ROI, tractography, and TBSS, are used simultaneously.