Following the searches, 883 citations were identified. The results of the search were reported here in a flow diagram (Figure 1), adapted from the PRISMA-ScR52 structure.
There were 19 studies5,9,10, 53, 54, 57, 60, 62, 64, 66, 67, 69, 70, 72, 73, 75,77, 79, 80 that included late-preterm and term infants but did not delineate them as a specific group as it relates to their diagnosis of NE and their CP outcomes and when using GMA as a predictive tool. Table 4 (Additional file 4) presents the summary of the characteristics of the excluded studies. There was a wide variety in their key characteristics. We summarized these characteristics here. These studies had a wide date range from 1997 to 2019. They were mainly prospective studies (14 of the 19) and the majority used clinical assessments only to identify infants at high risk (10 of the 19). With regards to the assessment tools used, of the 19 studies, 18 used the GMA by Prechtl. For the age at which CP was diagnosed, 12 of the excluded studies used the same criteria as we did in this study, that is, CP diagnosis by at least 2 years. For the method of CP diagnosis, a variety of standardized assessments were used, with the most frequent being by Amiel-Tison and Grenier58.
Eight studies either used non-standardized methods or did not clearly state their method.
Table 5 (Additional file 4) presents the summary of the key findings of these excluded studies and reasons for their exclusion. These findings showed that in high-risk infants, including those with NE, GMA is a strong predictor of CP53, especially when used in the fidgety period period10,54, 67,69,75,79. In 1997 Prechtl et al.5 demonstrated that movement quality was important. Abnormal quality and absent fidgety movements, in a mixed group of preterm and term infants, predicted neurological abnormalities with a sensitivity of 96%. The majority of these were diagnosed as CP. We see in our results, that over time this result has been repeatedly duplicated showing that CS60 and absent fidgety70 GM are highly predictive of CP. The trajectory of the GMA is more important as a predictor of CP54.
The GMA is more sensitive than the traditional neurological examination55,60,64 and the sensitivity increases with combined use of other modalities such as electroencephalogram (EEG)64, neuroimaging77, Hammersmith Infant Neurological Examination (HINE) and neuroimaging70.
For these excluded studies, sensitivity values were as high as 100%53,54,60,66,69 and specificity similar close to53,70 or at 100%75. We contacted the authors of the study closest to our inclusion criteria, Solemani et al.75, as they delineated their populations by NE and by GA but their outcome was reported as “neurodevelopmental outcomes” and not CP. They reported to us that they did not specifically report CP and so could not be included for us. PPV and NPV was reported for eight of the excluded studies5,64,46,67,70,73,75,77 with some studies reporting PPV as high as 98% when used in combination with HINE and neuroimaging70 and NPV as high as 100%66. Themes for the limitations identified by the authors can be summarized as limited external validity due to small population size,60,62,70,79,80 selection bias related to recruitment from high–risk populations5,9,70,and practice variation between sites53,66,69. The most common reasons for exclusion of these studies were failure to delineate their participants for the diagnosis of NE, most quoting their participants as high-risk infants, or not delineating their GA into the groups relevant to our questions (late-preterm and term).
Only two articles therefore, one by Ferrari et al.81and the other by Prechtl et al.82, were identified as meeting selection criteria and were included in the final review. The results of the search were reported here in a flow diagram (Figure 1)52. The final two studies included were case series81,82 from Italy. The total number of participants were only 60 term neonates (34 and 26 participants respectively); neither included late-preterm neonates. NE was reported as a single group by Ferrari et al.81 but divided into mild-moderate and severe by Prechtl et al.82. The high-risk groups in both studies were identified by history only. The GMA used by both studies was Prechtl. Table 6 (Additional file 4) presents the characteristics of these two studies. Both were published more than five years ago and reported on sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV).
A variety of standardized tools were used for CP diagnosis between the two studies. Table 7 (Additional file 4) details the key findings and the outcomes evaluated, with the limitations identified by the authors. In the more recent study, Ferrari et al.81 reported that the presence of any CS movements between term and four to five months post-term had a sensitivity of 100%, specificity of 68.7%, with a PPV 100% and a NPV of 78.3% for predicting CP. In the older study by Prechtl et al.82, the predictive ability in terms of the timing of the GMA was determined, that is, if done early, in the first two weeks of life versus late assessments between 15-22 weeks of life. Their findings were: sensitivity 100%, specificity 46.2%, with PPV 65.0% and NPV 100% for the early assessments, compared to late assessments with 84.6% across the board for sensitivity, specificity, PPV and NPV. Neither of the studies included infants receiving therapeutic hypothermia for NE which was not yet the standard of care. Ferrari et al.81 identified selection bias as a limitation, where mild HIE as a contributor to NE may have been underrepresented due to these infants not being referred for evaluation. Prechtl et al.82 did not state their limitations.
Risk of Bias
Even though this was a scoping review and did not require the critical appraisal of the two included articles, the critical appraisal tool for JBI83 helped to identify differences and similarities between these two case studies. These main points are summarized here and details are presented in Table 8 (Additional file 5).
The quality of evidence derived from a review is largely dependent on the quality of the studies included. Neither study scored 100% on all ten questions. The two studies scored 100% for six of the ten questions on the checklist. These questions assess the two included studies as being moderate quality case series as there were limitations. They had good scores for using valid methods for identification of the condition for all participants, having clear reporting of the demographics of the participants in the study, as well as, having clear clinical information of the participants. The outcomes of the cases were clearly reported for both studies. They also had clear reporting of the presenting sites demographic information and used appropriate statistical analysis.
According to the JBI method, for the study participants, the authors should provide clear and exclusion criteria. These inclusion and exclusion criteria should be specified with sufficient detail and all the necessary information critical to the study. While Ferrari et al.81 did fulfil this criteria, of note, Prechtl et al.82 did not state their exclusion criteria and so this may limit the generalizability of the results. For good quality case series, the study should clearly describe the method of measurement of the condition. This should be done in a standard (i.e. same way for all patients) and reliable (i.e. repeatable and reproducible results) way. The clinical condition for our study is NE. Both studies listed a number of criteria for possible inclusion for NE but did not state the number or combination of these criteria required for the diagnosis and so scored 0.0% for this question. They did use a standard, albeit different, method for NE severity, with Ferrari et al.81 use the Sarnat staging22 while Prechtl et al.82 used the Levene method84. With regards to consecutive inclusion, studies that indicate a consecutive inclusion are more reliable than those that do not. Neither of our included studies stated clearly if they did consecutive inclusion of every neonate meeting the inclusion criteria, at their institutions, during the identified periods. Thus they both scored 0.0% for this. Along a similar vein, the completeness of a case series contributes to its reliability. Studies that indicate a complete inclusion are more reliable than those that do not. Neither Ferrari et al.81 nor Prechtl et al.82 clearly stated that they included all the patients in their studies and scored 0.0% for this question.
The biases include selection, information and sampling variation. Selection bias is typical of case series as it is a choice of a series of patients with a particular illness (NE), and a suspected linked outcome (CP)85. Selection bias limits the generalizability of results. Information bias is less in retrospectively collected data as it is determined by what is already documented in the medical chart. These two studies both were prospectively collected data making them susceptible to information bias. With regards to sampling variation, the precise determination of the rate of a disease, other than by chance, requires a large sample size. Both studies can be described as employing small sample sizes, Ferrari et al.81 had 34 cases and Prechtl et al.82 had 26 cases with a follow up period of over three to four years. Sample size may have been limited by the collection method as neither study stated if they were inclusive of every neonate meeting the inclusion criteria, at their institutions, during the identified periods.