Comparison of reliability of measurements among fetal facial prole parameters by operators with different levels of experience at 15-23 weeks of gestation

Mandible anomalies are associated with many syndromes. Various methods have been proposed to assess fetal mandibles with different reliability. This study aimed to compare the reliability of measurements among ve fetal facial prole parameters by operators with different levels of experience, at 15–23 weeks of gestation in Thai fetuses. An observational study was prospectively conducted. The inferior facial angle (IFA), anteroposterior mandibular diameter (APD), mandible width (MD), maxilla width (MX) and mandible length (ML) were measured in 123 normal fetuses, using 2D ultrasonography, by 3 operators with different levels of experience. Each participant was examined by 2 operators. Each operator performed three independent measurements for each parameter and was blinded to the results of the other. Reliability of measurement was evaluated using intraclass correlation coecient for both intraobserver and interobserver variabilities. Bland-Altman analysis was used to evaluate the agreement between operators’ measurements.


Results
The success rate of ML measurement was highest (100%) among the ve parameters for all operators. Failure of MX measurement was high in fetuses at a gestational age of less than 18 weeks. Intraobserver variabilities of APD, MD, MX and ML measurements were excellent for all operators (ICC 0.958-0.986), while those of IFA measurement was moderate to excellent (0.560-0.923), depending on the operators' experience. Interobserver variabilities varied between pairs of operators; only 2 parameters, APD and ML, showed excellent interobserver variabilities for both pairs of operators (ICC > 0.9) with good agreement. Interobserver variabilities of MX measurements for both pairs of operators were good (ICC 0.606-0.709), while MD was excellent for operator 1 and 2 (ICC 0.867), but moderate for operator 1 and 3 (ICC 0.576) and IFA was good for operator 1 and 2 (ICC 0.602), but poor for operator 1 and 3 (ICC 0.128).

Conclusions
The reliability of ML measurement was the highest, followed by APD, while IFA was the lowest, among the 5 parameters. ML and APD measurements were feasible and reproducible, whereas MX measurement was limited in fetuses with a gestational age of less than 18 weeks. Additionally, reliability of IFA measurement depended on the operator's experience.

Page 3/25
Background Abnormal development of fetal mandible can be caused by either genetic or environmental factors.
Micrognathia and retrognathia are common fetal mandible malformations; both conditions are concurrent in most cases [1]. Micrognathia is an abnormally small mandible, whilst retrognathia refers to an abnormal position of the mandible, which is displaced posteriorly in relation to the maxilla. Many syndromes are associated with micrognathia as evidenced by 659 items in the OMIM website [2].
Mandible anomalies predispose neonates to upper airway obstruction, which leads to suffocation; thus, prenatal recognition allows the neonatal care team to provide emergency care for neonates [3][4][5]. Therefore, the assessment of fetal facial pro le is crucial in fetal anomaly screening.
Micrognathia can be assessed objectively or subjectively, but subjective assessment of the fetal facial pro le can detect only severe forms of micrognathia [6]. Objective assessment of micrognathia or retrognathia has been proposed in previous studies, with the use of indices, ratios or facial angles; such as inferior facial angle [3], frontal nasomental angle [7], jaw index [8], chin index [9], mandible length [10], mandible width/maxilla width ratio [3], transverse and anteroposterior jaw diameter [11], facial maxillary angle [12], and so forth, with appreciable clinical utility. Normative data of these parameters have been established in several studies among different populations, with discrepancies in facial measurements.
Reliability of fetal mandible measurements has also varied among studies.
Fetal mandible measurement is not routinely performed in our practice, so the reliability of measurements should be evaluated. We were interested in the parameters of fetal mandible measurements, including inferior facial angle, jaw index (anteroposterior mandibular diameter/biparietal diameter x 100), mandible width/maxilla width ratio as well as mandible length. Hence, these parameters had to be evaluated for reliabilities of measurement. Inferior facial angle helps sonographic diagnosis of fetal retrognathia [3], while other parameters are useful for recognition of micrognathia [3,8,10].
We postulated that the reliability of fetal facial pro le measurement was different among various methods. This study was prospectively conducted to compare the reliability of measurements among the ve parameters including fetal inferior facial angle, anteroposterior mandibular diameter, mandible width, maxilla width and mandible length, by three operators with different levels of experience at 15 to 23 weeks of gestation in Thai fetuses.

Methods
This observational study was prospectively conducted at the Maternal Fetal Medicine Unit (MFM unit), Songklanagarind Hospital, a tertiary care center in the South of Thailand; after approval by the Ethics Committee of the Faculty of Medicine, Prince of Songkla University, between April and May 2020 (REC.63-060-12-4). The inclusion criteria were singleton pregnant women, > 18 years old, at gestational age between 15 to 23 weeks, who came to the MFM unit for quadruple test or second trimester fetal anomaly screening. All participants who met the inclusion criteria were invited to participate in this study and written informed consent was obtained from all subjects prior to enrollment. Exclusion criteria were women who were known to have fetal anomalies or were detected as having obvious anomalies during screening by ultrasonography.
There were three operators with different levels of experience in second trimester fetal anomaly screening; including: operator 1 (PB), a rst-year MFM fellow; operator 2 (CS), a senior staff in the MFM unit, with more than 20 years of experience; and operator 3 (RS), a junior staff in the MFM unit, with 4 years of experience. All operators were standardized for measurements of the ve parameters, including fetal inferior facial angle (IFA), anteroposterior mandibular diameter (APD), mandible width (MD), maxilla width (MX) and mandible length (ML) by 2D ultrasonography with audit and feedback for two weeks before starting the research.
Each parameter was measured by the standard protocol as follows: 1. Inferior facial angle (IFA) IFA was measured as described by Rotten et al [3], which was de ned on a sagittal view of the fetal facial pro le image ( Figure 1), by crossing of two lines: 1) a reference line: constructed from a line orthogonally to the vertical part of the forehead at the level of nasal bone synostosis; and 2) a pro le line: drawing a line joining the two landmarks: the tip of the mentum and the anterior border of the more protrusive lip. The angle in this study was measured by 2D ultrasonography with automatic calculation.

Anteroposterior mandibular diameter (APD)
This parameter was measured in order to calculate the jaw index for assessment of fetal micrognathia, which was calculated by anteroposterior mandibular diameter (APD)/biparietal diameter (BPD) x 100. In this study, only the APD measurement was studied for reliability, since BPD was a standard fetal biometry routinely performed. The APD was measured as described by Paladini et al [8]. The jaw was assessed on the axial plane at the base of the cranium, just caudal to the lower dental arch, with the whole horseshoe mandible being visualized. The laterolateral diameter of the mandible was drawn joining the bases of the two rami, and the anteroposterior diameter was then measured from the symphysis mentis to the middle of the laterolateral diameter ( Figure 2).

Mandible width (MD)
Mandible width (MD) and maxilla width (MX) were used to calculate the MD/MX ratio, as one method of mandibular evaluation.
MD was measured on an axial plane of the mandible, caudal to the base of the cranium, at the level of the alveolus (or dental arch); as described by Rotten et al [3], by drawing a line orthogonally to the sagittal axis, 10 mm posteriorly to the anterior osteous border, just approximately at the level of the canines. MD measurement was obtained from one external bone table to the other (Figure 3).

Maxilla width (MX)
As described by Rotten el al [3], the MX was measured on the axial plane of the maxilla by the same method as mandible width measurement ( Figure 4).

Mandible length (ML)
The measurement of ML was as described by Otto and Platt [10], measuring the main portion of one ramus of the jaw between the temporomandibular joint and the symphysis mentis (junction of the mandibular rami). To get the plane of ML for measurement, the temporomandibular joint is identi ed at the level just below an axial plane of the orbits, then rotate the transducer inferiorly until the full length of mandible is visualized. Next, identify the symphysis mentis, which appears as an anechoic cartilaginous area between the right and the left mandibular rami, and then measure the ML when these two landmarks are visualized. (Figure 5).
Transabdominal 2D ultrasonography was performed by using the GE Voluson E10 or GE Voluson S10 (GE Thailand) with a curvilinear probe (frequency 2-5 MHz). Each participant was scanned independently by two operators (operator 1 and 2, or operator 1 and 3), with complete measurements of the ve parameters: IFA, APD, MD, MX and ML. The rst operator performed the measurements with three independent measurements of each parameter, then the second operator did the scan independently on the same participant on the same day. Each operator was blinded to the results of the others. The total duration of all measurements of the 5 parameters for each participant by each operator should not be longer than 30 minutes; after that we declared it to be measurement failure. Three values of the measurements from each parameter were used to calculate the intraobserver variability, and the average values of each parameter by each operator were used for interobserver variability analysis.

Results
A total of 123 cases were enrolled in the study with 69 cases performed by operator 1 and 2, and 54 cases by operator 1 and 3, with a mean gestational age of 17.3(1.8) weeks. The fetal facial pro le measurements were 100% successfully performed, for at least one parameter by both pairs of operators. Among the 5 parameters, measurement of the ML was the most feasible method, having 100% success for all operators. High success rates of measurements were also noted in APD and MD (more than 90%) for all operators. The IFA had relatively low success rates of measurement as compared to other parameters for operator 1 and 2, but not for operator 3, while a failure rate of MX measurement was highest in operator 3, but not in operator 1 and 2; as shown in Table 1. Most MX measurement failure cases for all operators were in those fetuses with a gestational age of less than 18 weeks (80%-100%).

ICC, intraclass correlation coe cient
Intraclass correlation coe cients (ICC) for intraobserver variabilities of the 5 parameters are shown in Table 1. Excellent reliability of measurement was noted in APD, MD, MX and ML for all operators. In concerns to IFA measurements, intraobserver variabilities varied among operators: moderate for operator 1, good for operator 2, and excellent for operator 3. Table 2 shows interobserver variabilities for two pairs of operators. Reliability of measurements between operator 1 and 2 was good (IFA and MX) to excellent (APD, MD and ML); however, between operator 1 and 3, excellent reliability of measurement was observed only in APD and ML; with good reliability in MX and moderate in MD, but poor in IFA measurements. The best to the worst ICCs between operator 1 and 2 were in order: ML > APD > MD > MX > IFA, whilst different order was noted between operator 1 and 3: ML > APD > MX > MD > IFA. The best reliability was the ML, followed by the APD, while the worst was the IFA measurements for both pairs of operators.  Table 3 shows mean differences of fetal facial pro le measurements between operators. Bland-Altman plots of the interobserver mean differences of the 5 parameters between two pairs of operators show small mean differences, with good agreement and having few outlier cases; as shown in Fig. 6A-E (operator 1 vs operator 2) and Fig. 7A-E (operator 1 vs operator 3). The outlier cases in ML measurement for both pairs of operators (1 case for operator 1 and 2; and 3 cases for operator 1 and 3) were those with higher gestational ages (19-23 weeks).

Discussion
This study revealed that among 5 fetal facial pro le parameters, ML measurement was the most feasible method and its reliability was the best, followed by APD, while IFA was the worst for all operators. Overall reliability was acceptable for all parameters, except the IFA measurement.
Regarding the feasibility of measurements, ML was the best, as evidenced by 100% of success for all operators. We have learned that it was easy to achieve the proper plane for ML measurement in various fetal positions, and it did not depend on the experience of the operator. APD and MD measurements were also feasible, as high success rates were noted for all operators (> 90%). These two parameters were assessed on the axial plane of the fetal mandibles, with only small differences in level. The short period of training for standardization was enough to get familiar with, and for overall achievement in the measurement. For IFA measurements, relatively low success rates, as compared to other parameters, were noted for 2 operators (the fellow and the senior MFM staff); measurement failure was exclusively due to fetal position. Mid-sagittal view of fetal facial pro le was required; however, some fetuses were in the prone position for longer than 30 minutes; the limit time to declare failure in this study. In so saying, if we had have extended the time for measurement, higher success rates would have been achieved. However, in concern to participant inconvenience, we limited scanning time to this time-point. MX measurement had a wide range of success rates among operators, with the highest for the senior and the lowest for the junior MFM staff. This was possibly related to many factors, such as the operator's experience, technique, fetal position and gestational age at measurement. For all operators, high failure rates were noted in cases with low gestation age (lower than 18 weeks) because the maxilla was too small to be measured via the described protocol. According to the study of Rotten et al, MX measurements have been performed as of 18 weeks of gestation [3].
For intraobserver variability, all parameters except IFA showed excellent ICCs for all operators. This could be explained in that the method of measurement for these 4 parameters (APD, MD, MX and ML) was performed by drawing a straight line between the two landmarks, which were clearly and easily recognized. Our ndings were supported by previous studies by Otto and Platt for ML measurement (ICC 0.9990) [10]; and by Paladini et al for jaw measurement (ICC 0.97) [8]. However, a previous study by Watson and Katz stressed that measurement of mandible diameter lack reliability, because visualization of the full length of the horizontal part of the mandible was di cult; as they were strict in the valid criteria of the transverse diameter measurement by drawing a line between the base of the two rami of the mandible, touching the anterior part of fetal hypopharynx [11]. Rotten et al modi ed the method of measurement by setting 10 mm caudal to the symphysis mentis, so as to measure the transverse diameter of the mandible, instead of the full length of the horizontal part of the mandible. This was to control its di culty, and the mandible width measurement was normalized to the maxilla width (MD/MX ratio) [3]. Therefore, we followed their proposed method in our study in obtaining the mandible width measurement without di culty.
On the contrary, intraobserver variability of IFA measurement was relatively low, as compared to other parameters, for all operators, because any error in measurement can occur by various factors, such as the operator's experience (only senior MFM staff achieved excellent ICC), fetal position, or an error of angle measurement by creating two crossing lines. To minimize an error in measurement, we suggest that at least 3 independent measurements are needed, with a small range of differences between the minimum and maximum values. The wide range of differences among measured values implies that more training of measurement is required.
Interobserver variability of the 5 parameters showed poor to excellent ICCs based on the operators and parameters of measurements. Reliabilities of measurements of all parameters, except IFA, were acceptable. Excellent ICC was noted in two parameters, ML and APD, for both pairs of operators. In addition, Bland-Altman plots con rmed good agreement. IFA showed the lowest ICCs among the 5 parameters for both pairs of operators. This would imply that this method needs more cases for training than any other. According to an expert opinion (Paladini 2010), IFA measurement was limited due to fetal position; however, the jaw index can be measured even if the fetal facial pro le is not visible, since only the axial view of mandible is required [1].
The strength of our study was that as high as ve parameters of fetal facial pro le measurements were evaluated, for both feasibility and reliability in the same participants, by operators with different levels of experience. Additionally, there was an adequate sample size to demonstrate signi cant ndings. In addition, this study was of a prospective design, with real-time measurement. However, the limitation of this study has to be addressed. We limited the scanning time for all measurements of each participant by each operator to only 30 minutes. The success rate of measurements in some parameters might be improved by extending the scanning time, or by using a 3D reconstructed view. However, in our country, 3D ultrasonography is available only in some centers; therefore, we aimed to focus on the use of 2D ultrasonography, so that the results might be useful for general obstetricians.

Conclusions
Among the ve fetal facial pro le parameters, the reliability of ML measurement was the highest, followed by APD, while IFA was the lowest for all operators. Measurements of all parameters, except IFA, were reproducible with acceptable reliability. MX measurement was limited in fetuses with a gestational age of less than 18 weeks. Reliability of IFA measurements was related to the operator's experience. We recommend using the mandible length measurement for objective assessment of fetal mandibles during second trimester fetal anomaly screening. Anteroposterior mandibular diameter measurement Maxilla width measurement