Study selection
EndNote X9 was used for citation collation. Duplicates were removed manually. Covidence was used for screening by two independent reviewers (JS and ML). Disagreements were resolved through a third reviewer (RB). The results of the search were reported in a Preferred Reporting Items for Systematic Reviews and Meta-analyses extension for scoping reviews.
Ethical approval was not required as this was a scoping review and did not contain information directly identifying patients or content requiring patient consent.
We conducted our bibliographic database searches between April 30th and March 30th, 2020. The reference lists of all full-text relevant studies that were identified were hand-searched for additional relevant studies. Citations were identified, duplicates removed and screened by two independent reviewers (JS, ML). Relevant studies were identified for full text review and searched for via Google Scholar, institutional journal access, e-Resources and databases sites.
Any disagreements that arose between the reviewers at each stage of the study selection process was resolved through discussion. A third reviewer (RB), was the final arbitrator for any unresolved disagreements. From the full text articles, articles were selected for further review that met most of the inclusion and exclusion criteria. From these, articles were identified that fully met all the criteria. The results of the search were reported in a Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA-ScR) flow diagram31.
Each article was independently reviewed and assessed by two of the authors (JS, ML). The data was extracted from articles using a data extraction tool developed by the reviewers. The distribution of the studies was determined by year of publication, as well as countries of origin. These were important contextual factors to consider as the older studies and the country of origin’s physical and human resources may be limiting factors in the studies coverage of a representative portion of the population. These factors affect the generalizability of the results.
The study characteristics included information such as gender, GA, birth weight, numbers of term and late-preterm neonates in the sample as well as the overall sample size, number of neonates diagnosed with NE and the number of cases of CP.
Following the searches, 883 citations were identified. Removal of duplicates yielded 537 titles and abstracts. Studies excluded by title and abstract were 428. In total, 109 articles were deemed as eligible for assessment by full text, 71 full texts were reviewed, but only 20 met most of the inclusion criteria.
There were 18 studies32,33,36,39,41,43,44,46,47,49,50,52,54,56,58-61 that included late-preterm and term infants but did not delineate them as a specific group as it relates to their diagnosis of NE and their CP outcomes and when using GMA as a predictive tool. Table 4 (Additional file 4) presents the summary of the characteristics of the excluded studies. There was a wide variety in their key characteristics. We summarize these characteristics here. These studies had a wide date range from 1997 to 2019. They were mainly prospective studies (13 of the 18) and the majority used clinical assessments only to identify infants at high risk (10 of the 18). The Pretchl GMA was almost exclusively the assessment used (17 of the 18). In our study question we looked at a CP diagnosis by 2 years of age and 11 of the excluded studies met this criteria variety of Standardized assessments were used for the CP diagnosis, with the most frequent being by Amiel-Tison and Grenier37. Eight studies either used non-standardized methods or did not clearly state their method.
Table 5 (Additional file 4) presents the summary of the key findings of these excluded studies and reasons for their exclusion. These findings showed that in high-risk infants, including those with NE, GMA is a strong predictor of CP32, especially when used in the fidgety period33,47,49,56,59,61 . Absent fidgety50 and CS GM39 are highly predictive of CP. The trajectory of the GMA is more important as a predictor of CP33. The GMA is more sensitive than the traditional neurological examination34,39,44 and the sensitivity increases with combined use of other modalities such as electroencephalogram (EEG)44, neuroimaging58, Hammersmith Infant Neurological Examination (HINE) and neuroimaging50.
For these excluded studies, sensitivity values were as high as 100%32,33,39,46,49 and specificity similar close to32,50 or at 100%56. We contacted the authors of the study closest to our inclusion criteria, Solemani et al.56, as they delineated their populations by NE and by GA but their outcome was reported as “neurodevelopmental outcomes” and not CP. They reported to us that did not specifically report CP and so could not be included for us. PPV and NPV was reported for seven of the excluded studies44, 46, 47, 50, 54, 56, 58 with some studies reporting PPV as high as 98% when used in combination with HINE and neuroimaging50 and NPV as high as 100%46. Themes for the limitations identified by the authors can be summarized as limited external validity due to small population size39,41, 50, 59, 60, selection bias related to recruitment from high–risk populations43, 49,50 and practice variation between sites32,46,49,. The most common reasons for exclusion of these studies were failure to delineate their participants for the diagnosis of NE, most quoting their participants as high-risk infants, or not delineating their GA into the groups relevant to our questions (late-preterm and term).
Only two articles therefore, one by Ferrari et al.62 and the other by Prechtl et al.63, were identified as meeting selection criteria and were included in the final review. The results of the search were reported here in a flow diagram (Figure 1), adapted from the PRISMA-ScR25 structure.
The final two studies included were case series62, 63 from Italy. The total number of participants were only 60 term neonate (34 and 26 participants respectively); neither included late-preterm neonates. NE was reported as a single group by Ferrari et al. 62 but divided into mild-moderate and severe by Prechtl et al. 63. The high-risk groups in both studies were identified by history only. The GMA used by both studies was Prechtl. Table 6 (Additional file 4) presents the characteristics of these two studies. Both were published more than five years ago and reported on sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). A variety of standardized tools were used for CP diagnosis between the two studies. Table 7 (Additional file 4) details the key findings and the outcomes evaluated, with the limitations identified by the authors. In the more recent study, Ferrari et al. 62 reported that the presence of any CS movements between term and four to five months post-term had a sensitivity of 100%, specificity of 68.7%, with a PPV 100% and a NPV of 78.3% for predicting CP. In the older study by Prechtl et al. 63, the predictive ability in terms of the timing of the GMA was determined, that is, if done early, in the first two weeks of life versus late assessments between 15-22 weeks of life. Their findings were: sensitivity 100%, specificity 46.2%, with PPV 65.0% and NPV 100% for the early assessments, compared to late assessments with 84.6% across the board for sensitivity, specificity, PPV and NPV. Neither of the studies included infants receiving therapeutic hypothermia for NE which was not yet the standard of care. Ferrari et al.62 identified selection bias as a limitation, where mild HIE as a contributor to NE may have been underrepresented due to these infants not being referred for evaluation. Prechtl et al.63 did not state their limitations.
Risk of Bias
Even though this was a scoping review and did not require the critical appraisal of the two included articles, the critical appraisal tool for JBI64 helped to identify differences and similarities between these two case studies. These main points are summarized here and details are presented in Table 8 (Additional file 5).
The quality of evidence derived from a review is largely dependent on the quality of the studies included. Neither study scored 100% on all ten questions. The two studies scored 100% for six of the ten questions on the checklist. These questions assess the two included studies as being moderate quality case series as there were limitations. They had good scores for using valid methods for identification of the condition for all participants, having clear reporting of the demographics of the participants in the study, as well as, having clear clinical information of the participants. The outcomes of the cases were clearly reported for both studies. They also had clear reporting of the presenting sites demographic information and used appropriate statistical analysis.
According to the JBI method, for the study participants, the authors should provide clear and exclusion criteria. These inclusion and exclusion criteria should be specified with sufficient detail and all the necessary information critical to the study. While Ferrari et al.62 did fulfil this criteria, of note, Prechtl et al.63 did not state their exclusion criteria and so this may limit the generalizability of the results. For good quality case series, the study should clearly describe the method of measurement of the condition. This should be done in a standard (i.e. same way for all patients) and reliable (i.e. repeatable and reproducible results) way. The clinical condition for our study is NE. Both studies listed a number of criteria for possible inclusion for NE but did not state the number or combination of these criteria required for the diagnosis and so scored 0.0% for this question. They did use a standard, albeit different, method for NE severity, with Ferrari et al.62 use the Sarnat staging10 while Prechtl et al.63 used the Levine method65. With regards to consecutive inclusion, studies that indicate a consecutive inclusion are more reliable than those that do not. Neither of our included studies stated clearly if they did consecutive inclusion of every neonate meeting the inclusion criteria, at their institutions, during the identified periods. Thus they both scored 0.0% for this. Along a similar vein, the completeness of a case series contributes to its reliability. Studies that indicate a complete inclusion are more reliable than those that do not. Neither Ferrari et al.62 nor Prechtl et al. 63clearly stated that they included all the patients in their studies and scored 0.0% for this question.
The biases include selection, information and sampling variation. Selection bias is typical of case series as it is a choice of a series of patients with a particular illness (NE), and a suspected linked outcome (CP)66. Selection bias limits the generalizability of results. Information bias is less in retrospectively collected data as it is determined by what is already documented in the medical chart. These two studies both were prospectively collected data making them susceptible to information bias. With regards to sampling variation, the precise determination of the rate of a disease, other than by chance, requires a large sample size. Both studies can be described as employing small sample sizes, Ferrari et al.62 had 34 cases and Prechtl et al.63 had 26 cases with a follow up period of over three to four years. Sample size may have been limited by the collection method as neither study stated if they were inclusive of every neonate meeting the inclusion criteria, at their institutions, during the identified periods.