The classification of fetal nutritional status via an appropriate ultrasound standard is important to guide pregnancy management. A fetus incorrectly classified as SGA or LGA will induce the clinician to intensify the monitoring of the pregnant woman and, in the specific case of the GDM, even to modify the diet or treat with insulin. Therefore, we believe that the clinician should choose the curve that best identifies newborns with true alterations in nutritional status (malnutrition or overnutrition).

This study shows that in newborns of mothers with GDM, the rates of SGA and LGA differ by the reference curve used, INTERGROWTH21st or customized. The SGA rate using INTERGROWTH21st was 4.8%, significantly lower than 10.8% observed using customized curves. In contrast, the LGA rate using INTERGROWTH21st was 26%, compared to 13.4% using our customized curves as the reference. Therefore, in our population, the customized method identifies more SGA while INTERGROWTH21st identifies more LGA.

These SGA results were consistent with those recently published by Francis et al. [15] who reported overall SGA and LGA rates of 10.5% and 9.5%, respectively, using customized curves. Using INTERGROWTH21st Francis et al. observed an overall SGA rate, 4.4%, very similar to the 4.7% rate of our sample and, like our study, they find an unexpectedly high LGA rate of 20% (Similar to our 25%, if we take into account that we have analyzed a diabetic population).

Similarly, Anderson et al. [21], reported a significantly lower SGA rates using INTERGROWTH21st versus customized curves (4.5% vs 11.6%); additionally, in their cohort, Anderson et al. had a customised LGA rate of 8.9% and INTERGROWTH21st LGA rate of 20.8%, with wide variation by ethnicity (European women 23.7%, Indian 6.8% and Pacific 32.%) [NH Anderson, personal communication].

The use of PI to assess the nutritional status of the newborn presents certain limitations as it not only evaluates fat mass, but also head size, lean mass and bone mass, hence potentially limiting its accuracy in reflecting adiposity. However, accurate measures of body composition are usually costly. A recent work published by Chen et al. [35] informs us that although skinfold measures may have more discriminative power in terms of total body adiposity, simple anthropometric measures (like PI) correlated strongly with neonatal adiposity and conclude that these simple measures could be of value in large epidemiological studies.

We found that the SGA and LGA classifications by each method (customized vs INTERGROWTH21st) reflect differences in their ability to identify true alterations in the PI as an indicator of the neonatal nutritional status.

The RR of malnutrition (PI <10th centile) in newborns classified as SGA by customized curves was higher, than that of newborns classified as SGA by INTERGROWTH21st. This may be since 33.3% of children classified as SGA by the customized method, but as AGA by INTERGROWTH21st, had a PI <10th centile (suggestive of malnutrition). In any case, when we compare both RR, we did not find a statistically significant difference (p = 0.2).

Likewise, the accuracy of the customized curves for identification of newbornt with a PI < 10th centile was greater than that of INTERGROWTH21st, LR + of 3.86 vs 2.74, respectively. That is, using customized curves, it is 3.86 times more likely that a malnourished newborn is classified as SGA than a normally nourished newborn is classified as SGA.

In a previous study by our team [22], carried out in an unselected population, the customized method was superior to the population-based for the identification of newborns with a PI at birth < 10th centile. This superiority of the customized method was more evident in the highest scales of maternal weight and height.

Owen et al. [36] found a similar relationship between customized birth weight percentiles and neonatal malnutrition, but concluded that, in a low-risk population, the customized curves are only moderately useful in the identification of neonates with a low PI, with a positive likelihood ratio of 4.3 (95% CI: 2.5–7.1). Agarwal et al [37] also found that the PI at birth was lower in newborns classified as SGA by customized curves than in SGA according to population curves. The apparent superiority of the customized method to detect a PI <10th centile should be interpreted with caution since the difference found between the RR was not statistically significant and when comparing the ROC curves of both methods the difference found was not statistically significant. The reality is that, with the prevalence of malnutrition found in the sample (8.9%) we would need at least 1200 individuals to reach a statistical power of 80%. This shows that the statistical non-significance could be due to an insufficient sample size.

Similarly, the RR of overnutrition (PI> 90th centile) associated with LGA classification by customized curves, RR 5.26, was greater than in the newborns classified as LGA by INTERGROWTH21st, RR 3.57). It should be noted that this result was obtained despite the fact that the proportion of children classified as LGA by INTERGROWTH21st was significantly higher than using the customized method (25.6% vs. 13.2%) and it is explained why the majority (86.2%) of LGA children according to IG but AGA according to the custom method presented a normal PI. This difference should also be interpreted with caution since the RRs found showed wide confidence intervals and since, the subsequent comparison of both RRs showed that the differences found were not statistically significant (p = 0.19).

Further, our analysis of the accuracy of each method for identification of newborns with a PI > p90 revealed that the customized method had a greater LR+, 5.40, than the LR+, 2.54, using INTERGROWTH21st. Hence, using customized curves, it is 5.40 times more likely that an over nourished newborn will be classified as LGA than a normally nourished newborn will be classified as LGA. Given that in GDM it is critical to identify fetal overnutrition, we consider of special relevance the differences found in the PPV of both methods to identify a PI > p90. Using our customized curves, the probability that a fetus classified as LGA has a PI > 90th centile is 51.61% while using INTERGROWTH21st the probability drops to 32.20%. These results are consistent with those found by Gonzalez et al. [28] However, in our study, an analysis of the ROC curves shows that the AUC obtained by both methods is very similar and the small difference observed (0.70 vs. 0.68) is not statistically significant. Again, it should be noted that, with the prevalence of newborns with PI> 90th centile, the lack of significance could be due to an insufficient sample size since 1603 individuals would have been required to have a statistical power of 80%.

Another aspect worth discussing is the low sensitivity of both methods to identify newborns under-nourished in fetuses classified as SGA. However, in the same case, the specificity is acceptable. In our opinion, this shows that the same cut-off point (10th centile for SGA) can classify a child as normal or small depending on the reference curve.

The relatively small sample lead to our primary limitations, including occasional RRs with overlapping or wide confidence intervals, which hampered their interpretation. However, the relative risks were usually large enough to be taken clinically relevant. It should be noted that the premises from which we started when estimating the sample size have not been fulfilled. In our estimation, we assumed a ratio between non-exposed / exposed of 8. This was based on the premise that we would find approximately 10% SGA and 10% LGA with each method. However, using INTERGROWTH21st, for example, we found 163 AGA and only 11 SGA. This makes the non-exposed / exposed ratio rise to 14.8. With this ratio, we would have needed 488 AGA newborns to obtain significance (and this to detect a minimum risk of 3; to detect a minimum risk of 2.5 - which is the observed one - we would need an even larger sample: 51 SGA and 751 AGA). Therefore, although in general our study seems to indicate that the customized method could surpass INTERGROWTH21st in the identification of alterations in nutritional status, we think that it is necessary to complement this study with a larger sample.

In addition, selection and information biases could affect the estimated of the performance of the two reference curves. We believe that our results can be extrapolated to other populations of pregnant women with adequate monitoring because obstetricians, endocrinologists, family doctors and primary care midwives monitored the pregnant woman with GDM using criteria for diagnosis, follow-up and treatment established by the Spanish Society of Gynecology and Obstetrics.