How Accurate Is Semi-quantitative Elastography for Calculating Strain Ratios? A Validity and Reliability Study

Background: Elastography is a promising imaging technique for evaluating various musculoskeletal tissues which uses acoustic radiation force pulse sequences to generate shear waves, which propagate perpendicular to the ultrasound beam causing transient displacements to calculate the tissues stiffness. Objective: The aims were to assess the validity, inter-examiner reliability and stability of semi-quantitative elastography for calculating strain ratios (SR) in a homogeneous gel phantom in different locations within the image. Methods: A diagnostic accuracy study was performed in a homogeneous stiffness phantom. Two examiners (one novice and one experienced) performed 50 imaging captures in 5 series of 10 captures with 1-minute difference between captures and 30 minutes difference be-tween series. Each examiner assessed the SR in two locations. Difference between-examiners, stability of measurements, SR error and absolute error, mean error of the measurements and coecient of variation were calculated. Results: The agreement between examiners, validity and stability of measurements were higher in the central area than the lateral limits of the images. Thus, the experience of the examiner showed to be relevant for the concordance of the measurements in the lateral limits of the images (SR difference of 0.13 ± 0.02; p<0.001), but not for the central area (SR difference of 0.00 ± 0.01; p>0.05). Conclusion: Semi-quantitative elastography seems to be an accurate tool for assessing stiffness differences within the same image. Further validity and reliability studies in different materials and in-vivo tissues are needed to ensure the utility of this method.


Introduction
Ultrasound imaging (US) is a safe, portable, and low-cost imaging method for assessing soft tissues including skeletal muscle or viscera widely used by different specialties (e.g., physiotherapists, cardiologists, radiologists, hepatologists or gynecologists) [1]. During the last years, several studies developed technical reports to assess the validity and/or reliability of different imaging procedures [2] and imaging methods [3,4].
Elastography is an imaging technology sensitive to tissue stiffness that has been further developed and re ned in recent years to make a quantitative assessment of tissue stiffness [5]. Although the rst elastography method was the "Strain imaging" (which consist of a manual compression on the tissue with the ultrasound transducer), "Shear-wave imaging" method is the most recent technology which measures physical tissue displacement generated by shear-waves perpendicular to the direction of the generating force produced with the transducer [6]. Recent developments made strain elastography more accurate providing a real-time indication bar showing in a 1-to-6 scale (where 1 is not appropriate and 6 is the most appropriate) the optimal pressure needed [7]. Thus, this semi-quantitative method provides relative stiffness differences (expressed as percentage or "strain ratio") between two areas within the same image.
Prior evidence assessing SR accuracy by using both strain and shear-wave elastography in calibrated phantoms showed greater accuracy results in the shear-wave method [8,9]. Although shear-wave elastography does not depend on the manual com-pression of the assessor as strain elastography, is more expensive, less accessible and is needed to consider that the use of all US methods is susceptible to be biased by shadowing, reverberation, clutter artifacts or the operator experience [10].
Therefore, considering 1) the accessibility advantages of strain elastography and 2) semi-quantitative SR inconsistent results (evaluated methods, the system used and the position of the reference region of interest) [8], an important step before semi-quantitative elastography could be correctly used and interpreted in research or clinical practice is stablishing validity, reliability and stability of measurements.
Therefore, our aims were to determine semi-quantitative elastography strain ratio (SR) 1) calculation validity, 2) inter-examiner reliability and 3) stability of measurements considering different locations within the images and the assessors experience by using a phantom with homogeneous stiffness under optimal conditions.

Material And Methods
This is a diagnostic accuracy study conducted between September 2020 and November 2020 in a private University located in Madrid (Spain), which consists of a con-struct validity, stability of measurement and inter-examiner reliability analysis. This type of study focuses on judgement based on the accumulation of evidence using a speci c measuring instrument (e.g., semi-quantitative elastography SR). This methodology requires examining the relationship between the measure being evaluated and a known variable score (an homogeneous stiff gel phantom with known SR (SR = 1) to be related to the construct measured by the instrument for calculating the construct validity, per-forming repeated measurements at different points in time to the same target to calculate the stability of measurement, and analyzing the equivalence of ratings obtained by different observers with different experience to calculate the interexaminer reliability [11]. This study followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines and checklist [12]. No Ethics Committee approval was needed since is not involving animals nor humans for this research.
Imaging Acquisition procedure A SonoSite Blue PhantomTM (Sarasota, FL) Vascular Access BPO100 was placed in a rigid table. All images were acquired with one Alpinion eCube i8 (Gyeonggi-do, Ltd, Korea) with a 4 cm width linear transducer E8-PB-L3-12T 3-12MHz. Room light, temperature, and all the US settings were stablished under the same conditions for both examiners. Frequency was set to 12.0 MHz; gain to 55dB; dynamic range to 85; brightness to 17; and depth to 4 cm. To ensure optimal sound wave incidence, the transducer was placed perpendicularly to the surface of the phantom locating a long-axis image of the internal cylindrical structure avoiding the inclination of this structure and capturing the lumen of the cylinder with the maximum amplitude throughout the image (Fig. 1a).
This US equip shows a 0-to-6 scale bar in the upper and left part of the image regarding the quality of the applied pressure to optimize the elastography measurement. Therefore, the transducer pressure was carefully calculated according to this scale to perform the optimal pressure during all the images capture (Fig. 1b).
A total of 2 examiners participated in this procedure. One experienced (10 years of practice in the use of US imaging) and one novice (1 year of practice) performed the transducer placement and each one captured fty images as described. Acquisitions were performed in 5 series of 10 captures with 1-minute difference between captures and 30 minutes difference between series.

Measurement assessment procedure
Once the images were captured, all the images were assessed using the measurement tools of the US equip for calculating the SR. Two different SR were calculated as follows ( Fig. 2): 1) Lateral limits: First, we used the caliper to measure 1 cm from the top right corner of the image to the left. Then we used the area selector tool to contour one rectangle with 1-cm width and a height equal to the distance between the most super cial limit of the phantom to the most super cial limit of the cylindrical structure. Finally, another rectangle with same shape and area was placed to the top left corner of the image to obtain the SR.
2) Central area: Within the central 2 cm that were not included in the previous measurement, rstly we divided the distance from the surface of the phantom and the upper limit of the cylindrical structure by 2. Secondly, we contoured a rectangle with 2-cm width and the upper half of the distance previously calculated. Finally, the SR between the upper rectangle and the lower rectangle was calculated.
All the measurements were performed by the same 10-years experienced rater. Every image was coded to blind the rater using alphanumerical codes in a randomized order.

Statistical Analysis
Data analysis was conducted with the Statistical Package for the Social Science (SPSS) Version 21 for Mac OS. Normal distribution of the SR data was veri ed by using the Shapiro-Wilk test. Inter-examiner reliability of SR calculation was assessed by considering the examiner experience calculating the mean of the measurements with the upper and lower limits (95% CI), and the mean difference betweenexaminers (DBE = SR scored by the experienced examiner -SR scored by the novice examiner); Stability of the measurements (SoM %= Standard Deviation of the error *100); and validity was also assessed by considering the examiner experience calculating the mean error (E = known SR -SR obtained by the examiner = 1 -SR obtained by the examiner), the mean absolute error (AE = absolute value of E), the mean error of the measurements (MEM= (mean AE of the experienced examiner + mean AE of the novice examiner) / 2), the mean percent error (PE%= AE/ mean error of the measurements*100), and the mean coe cient of variation (CV%= Standard Deviation/mean). All the analyses were performed for both SR calculated in the lateral limits and center of the images. Student's t-test for independent samples were used to determine examiners, location areas and reference differences. All tests were two-tailed with pvalues < 0.05 considered signi cant.

Results
A total of 100 images were captured and included for analysis, 50 by the experienced examiner and 50 by the novice examiner. From these 100 images, 100 measurements were performed in the lateral limits and 100 measurements were performed in the central area of the images. Table 1 shows inter-examiner reliability and instrument stability data of SR calculations. In general, the agreement between examiners and stability of measurements were higher in the central area than the lateral limits of the images (the difference be-tween-examiners for each location were 0.12 ± 0.03; p < 0.001 and 0.00 ± 0.01; p > 0.05 respectively). Thus, the experience of the examiner showed to be relevant for the concordance of the measurements in the lateral limits of the images (SR difference of 0.1 ± 0.02; p < 0.001), but not for the central area (SR difference of 0.00 ± 0.01; p > 0.05).  Table 2. In general, the mean error of measurements showed statistically signi cant differences (p < 0.001) between the lateral limits (0.13 ± 0.07) and the central area (0.05 ± 0.03*) of the images. SR calculations in the central area and lateral limits, showed no statistically signi cant differences with the known reference (p > 0.05). The experience of the examiner showed no statistically signi cant differences for the error and the absolute error of their measurements (p > 0.05), but the mean error of measurements was signi cantly higher (p < 0.001) in the lateral limits than the central area of the image (0.13 ± 0.07 and 0.05 ± 0.03 respectively).

Discussion
This study assessed the reliability, stability of measurements and validity of semi-quantitative elastography SR calculation considering the experience of the examiner under optimal conditions (controlling the 90º transducer position and pressure by using as reference a cylindrical mimicking vessel). In general, reliability, stability of measurements and validity of semi-quantitative elastography SR calculation were acceptable. Elastography could be one of the most important technological breakthroughs in the eld of ultrasound imaging (since development of Doppler imaging or Panoramic US) including the main advantages of US compared with other imaging techniques (e.g., low cost, short examination time, noninvasiveness, and accessibility) [13].
A previous phantom study conducted by Franchi-Abella et al. [8] reported that qualitative, semiquantitative and quantitative data collected with strain and shear-wave elastography are able to classify properly targets as harder or softer than backgrounds. However, SRs were more accurate in shear-wave elastography compared with strain. According with this study, our results showed central areas of the images to be more accurate, reliable and stable than lateral areas of the images for calculating SR. One likely reason explaining this phenomenon could be a higher number of cross-sound waves in the middle of the transducer, involving a greater accuracy in the greyscale and elastography quality.
There is also evidence supporting better SR accuracy and CVs (CV = 0.08-0.65 for SR ranging 1.57-2.47) when the stiffness difference between a target and a control point is big, whereas CVs are higher if the difference is small (CV = 1.22-1.7 for SR ranging 0.40-0.60) [8]. This could be explained since SRs are relative. Therefore, if the difference between two points is small, the equip could not be enough sensitive for this difference, whereas greater ranges are more easily detectable. In this study, our phantom was homogeneous and the known was SR = 1. Our results showed smaller CVs in the center of the image compared with the lateral areas (0.05-0.06 and 0.14-0.15) respectively. Thus, experienced seems not to be a limitation for obtaining reproducible and valid measurements in central areas of the image.

Clinical implications
This study could be useful to consider in the future the calculation of SR for developing speci c protocols in musculoskeletal tissues for both research and clinical practice by using semi-quantitative methodologies. The most assessed musculoskeletal tissue with elastography is probably the tendon based on the hypothesis of altered stiffness in the presence of tendon injury [20]. Although a previous study reported patterns in healthy tendon elastography describing them as a uniformly rm structure or heterogeneous tissue with interwoven longitudinal or spindle-shaped soft tissue strands [21], it is still controversial how should be interpreted changes in B-mode US with no abnormalities in elastography or changes in elastography with no altered B-mode image [22].
In addition, elastography imaging has been used to assess many muscle pathologies including muscular dystrophy [23] or myositis [24], stiffness differences after exercise [25] or between patients and controls [26]. Also, myofascial trigger points (MTrPs) have been assessed with different US imaging methods since the manual identi cation of MTrP shows a poor reliability [27] and imaging techniques to visualize MTrP, twitch response and changes in the stiffness are needed. Previous studies observed MTrPs using Bmode US are consistent describing MTrPs as hypoechoic regions [28], and using Doppler US found speci c puslatility index response in MTrPs [29]. Furthermore, one study conducted by Jafari et al. [30] assessed MTrP stiffness by using elastography imaging to distinguish quantitatively MTrP from normal tissue, obtaining signi cative differences between MTrPs and the normal part of the muscle. The methodology assessed on this study could be applied to calculate the SR between a MTrP-control point and 2-control points (since it showed a good validity and reliability), its correlation with pain pressure thresholds and SR changes after treatment.
Likely in tendons, a previous study reported a poor reproducibility of elastography in skeletal muscles [31], probably due to non-standardized contraction/relaxation state of the muscle or image anisotropy.

Limitations
Finally, this study has some limitations. First, this study was performed assessing an arti cial material under optimal conditions. We do not know if similar results would be observed in real subjects (e.g., image anisotropy due to round morphology of muscle bellies or vessels, or non-standardized state of the muscle and subject). Second, just one homogeneous material was used. There is a need of evaluation in more materials with known stiffness or strain ratios to con rm our ndings. Finally, it is also important to highlight that further research is needed comparing these semi-quantitative analyses with quantitative results expressed in metric units.

Conclusion
We found that semi-quantitative elastography SR calculation shows an acceptable inter-examiner reliability, validity and stability of measurement. Reliability, validity and stability were similar independently of the examiner ultrasound and transducer-use experience. However, central areas seem to be more reliable and accurate than lateral limits of the image. This paper proposes technical considerations for future studies assessing SRs of tendons, skeletal muscles or MTrPs. Strain ratio (SR) calculation of lateral and central areas