Prediction Models for Successful External Cephalic Version: An Updated Systematic Review

Abstract Objective  To review the decision aids currently available or being developed to predict a patient's odds that their external cephalic version (ECV) will be successful. Study Design  We searched PubMed/MEDLINE, Cochrane Central, and ClinicalTrials.gov from 2015 to 2022. Articles from a pre-2015 systematic review were also included. We selected English-language articles describing or evaluating models (prediction rules) designed to predict an outcome of ECV for an individual patient. Acceptable model outcomes included cephalic presentation after the ECV attempt and whether the ECV ultimately resulted in a vaginal delivery. Two authors independently performed article selection following PRISMA 2020 guidelines. Since 2015, 380 unique records underwent title and abstract screening, and 49 reports underwent full-text review. Ultimately, 17 new articles and 8 from the prior review were included. Of the 25 articles, 22 proposed one to two models each for a total of 25 models, while the remaining 3 articles validated prior models without proposing new ones. Results  Of the 17 new articles, 10 were low, 6 moderate, and 1 high risk of bias. Almost all articles were from Europe (11/25) or Asia (10/25); only one study in the last 20 years was from the United States. The models found had diverse presentations including score charts, decision trees (flowcharts), and equations. The majority (13/25) had no form of validation and only 5/25 reached external validation. Only the Newman–Peacock model (United States, 1993) was repeatedly externally validated (Pakistan, 2012 and Portugal, 2018). Most models (14/25) were published in the last 5 years. In general, newer models were designed more robustly, used larger sample sizes, and were more mathematically rigorous. Thus, although they await further validation, there is great potential for these models to be more predictive than the Newman–Peacock model. Conclusion  Only the Newman–Peacock model is ready for regular clinical use. Many newer models are promising but require further validation. Key Points 25 ECV prediction models have been published; 14 were in the last 5 years. The Newman–Peacock model is currently the only one with sufficient validation for clinical use. Many newer models appear to perform better but await further validation.

Since the turn of the century, breech presentation has become one of the most common indications for primary cesarean delivery as the landmark Term Breech Trial showed that vaginal delivery of a breech fetus increases the risk of death or serious morbidity for the neonate. 1,2To decrease the prevalence of cesarean birth, current practice guidelines recommend offering patients an external cephalic version (ECV).This procedure attempts to manually reposition the fetus out of breech into the cephalic position. 3If successful, vaginal delivery can then safely follow an ECV, either immediately or in the weeks to follow.ECV procedures have proven to be a critical part of modern obstetrical practice by decreasing the number of cesarean deliveries and their associated complications. 4 ECV for a term breech presentation varies widely in success, ranging from 16 to 100%, with a pooled success rate of 58%. 3 While generally considered safe and noninvasive, ECVs can rarely result in complications, including placental abruption, uterine cord prolapse, and fetal heart rate abnormalities. 5Therefore, predicting whether an ECV is likely to be successful is an important clinical consideration.A patient with a low likelihood of success may opt to have a planned cesarean delivery instead of first undergoing an ECV attempt.
7][8][9] Due to the high number of variables identified in relation to ECV outcomes, researchers have turned to predictor models to calculate the likelihood of success of ECV. 10 A systematic review by Velzel et al published in 2015 examined prior predictor models for ECV success. 10his article aims to update this systematic review with the new predictive models published in the last 7 years.In addition, we re-examine those older predictive models reviewed by Velzel et al and place them in context of the new ones.

Sources
We searched the PubMed/MEDLINE, Cochrane Central, and ClinicalTrials.govdatabases for all articles reporting ECVs published between January 1, 2015 and December 31, 2022.The starting date was chosen as the last date searched by the prior systematic review. 10For PubMed/MEDLINE, the search criterion was "version, fetal[mesh] OR cephalic version [Title/Abstract]" with publication date as the only filter; the criteria for the other databases can be found in ►Supplementary Material S1, available in the online version.A clinical librarian helped with conducting a thorough search strategy.The search results were exported to the web-based software Covidence (Veritas Health Innovation, Melbourne, Australia; available at: www.covidence.org).
Because "prediction models" have many synonyms (including clinical prediction rule, decision aid, clinical decision tool, clinical prediction algorithm, and prognostic model 11 ), we identified those articles describing a prediction model by manual selection instead of by search criteria.Additionally, the selection for those articles that were original works and not review articles and those that had an English translation was also made by manual selection.

Study Selection
This review focuses on predictor models for ECV success.For our purposes, a "model" is a multivariate function that reports the outcome of an ECV for an individual patient as a function of at least two variables that were known before the ECV was attempted.We included models regardless of the specific outcome returned.Possible outcomes included the odds of cephalic presentation at the end of the ECV procedure (i.e., initial ECV success), the odds of ultimately having a vaginal delivery, and the odds of ultimately having a vaginal delivery given that the ECV was initially successful.Prediction of the opposite of the aforementioned outcomes (e.g., odds of cesarean birth) was also acceptable as the model would be mathematically equivalent.
Two independently working reviewers (R.S.Y. and P.K.P.) selected articles by assessing titles and abstracts only.Any articles that were in conflict between the two reviewers were first discussed and if there was no consensus, were included for full-text review.Similarly, both reviewers conducted the full-text review independently.Any disagreements after the full-text review stage were resolved with discussion.Throughout the process, we followed the updated (2020) version of the PRISMA guidelines (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) (►Supplementary Table S1, available in the online version). 12fter a list of included articles was created, citation searching was performed.Additionally, PubMed/MEDLINE was searched again for articles that either cited any of the included articles or were authored by the first or last author of any of the included studies.This was done iteratively until there were no new studies to include.

Quality Assessment
Velzel et al 10 customized a framework based on the guideline of Hayden et al in combination with the Quality in Prognosis Studies (QUIPS) tool to assess for bias of the prediction models in the studies. 13,14Four domains were analyzed to represent important aspects of prediction model studies: participants, predictors, outcome, and analysis.As recommended by Hayden et al, 14 the four domains were assessed in two steps.
During the first step, 1 to 6 signaling questions were introduced per domain (11 questions across all domains) and were scored with a "yes," "no," "partly," or "unclear."During the second step, we combined the scored responses to judge the overall risk of bias and quality.The weighted importance of certain domains was considered, with the "predictors" and "analysis" domains taking priority.Ultimately, the largest factor taken into account when assessing study quality was whether the outcome was likely to be different amongst groups within a similar population.The overall risk of bias and quality of the studies were then rated with low, moderate, or high risk accordingly.

Predictor Variable and Model Development Assessment
The development of a predictive model (clinical decision rule) is known to follow a three-stage sequence: derivation, validation, and impact analysis. 11,15In this paradigm, the derivation stage encompasses the initial creation of the model and is usually done in a single article/study.Within this stage, the first step is to find a dataset upon which to derive the model (called the training data).Then, the univariate analysis step follows whereby individual possible predictor variables are compared between the success and failure groups, such as by a t-test or univariate logistic regression.Variables that seem to have predictive value (e.g., p-value < 0.2) progress to the third step: multivariate analysis.Usually, this step involves a multivariate logistic regression and variables being removed or added to optimize the model's predictive value while preventing overfitting.Ideally, the predictive value of the final model is reported.Finally, the derivation stage culminates in a presentation step.A model presentation, such as a score chart, decision tree, nomogram, or equation, is created to practically use the model.If the multivariate analysis step was a logistic regression, then this step may not be explicitly stated as the equation for the regression is assumed.
In the second stage, the validation stage, the model is tested on data from which it was not derived from (these data are often referred to as the "testing data" or "validation data").The easier and less rigorous form is internal validation, which can be performed by a random data split (whereby the original dataset is randomly divided into two groups: the training data and the validation data) or by advanced statistical techniques (cross-validation or bootstrapping).
In external validation, the validation data have one or more fundamental differences (minor or major) from the training dataset.The smallest difference is found in temporal external validation, whereby the data are from the source, but are from different years than the training data (e.g., training data are from 2018 to 2020 and testing data are from 2021 to 2023).In more rigorous ("broader") forms of valida-tion, the investigational team, geographic region, or specific disease process may change. 16,17Ideally, multiple types of validation should be performed, spanning many studies. 18In our analysis, we considered two studies to be from the same investigational team if they shared at least one author and if they occurred at the same institution.
The final stage of model development is impact analysis, whereby a model is implemented into clinical practice and the ultimate outcomes to patient care are seen.
In this review, we read the original included articles, any articles they cited, any articles that cited the original articles, and any articles by the same investigational team to determine the latest stage of model development that a model is in.

Model Performance Assessment (Calibration and Discrimination)
For the models in this study, we assessed their calibration and discrimination metrics (if reported) to evaluate their overall model performance.To put simply, discrimination is how well the model can predict a discrete outcome (e.g., if the model predicts a successful ECV, will it actually be successful?).By contrast, calibration is how well the model can predict the odds of an outcome (e.g., if the model predicts a 75% chance of a successful ECV, will 75% of those cases be successful-not any more or any less?). 19Discrimination is easier to study and report because it can succinctly be summarized into one metric: area under the curve (AUC) of the receiver operating characteristic (ROC) curve (although a figure of the ROC curve still adds value by showing point values of sensitivity, specificity, etc.).However, calibration is critical because, in actual clinical practice, physicians often prefer to use the odds reported by a model instead of just a binary outcome.The gold standard to report calibration is to display a calibration plot, which graphs predicted probability by observed probability.This cannot be as succinctly summarized into one statistic.A less ideal way of reporting calibration is the Hosmer-Lemeshow test, which returns a p-value that is statistically significant if a model is poorly calibrated; however, if the p-value is nonsignificant, this test offers little information on exactly how well-calibrated a model is.
In our analysis, we reported whether articles included calibration and discrimination metrics, which metrics they used, whether the metrics were on the training or validation data, and what those values were.
►Supplementary Material S4 [available in the online version] for notes on categorizing Lau et al's models).Thus, in total, the previous systematic review included eight articles describing seven models from six investigational teams.

New Studies from our 2015 to 2022 Systematic Search Update
Our computerized search for articles since January 2015 yielded a total of 412 references (308 PubMed/MEDLINE, 83 Cochrane Central, 21 clinicaltrials.gov).Removal of 32 duplicates yielded 380 references for screening.During title and abstract review, 331 references were screened out, leaving 49 for full-text analysis.Of those 49 references, 3 were excluded because their full-text was not available in English (2 French only, 1 Chinese only), and 3 were removed for being abstracts only instead of full-length articles and thus did not describe their methods and results in sufficient detail, 2 were excluded for being clinical trial registry entries without any available inferential statistics or analysis, and 25 were excluded for not meeting our criteria for a prediction model for ECV prediction (►Fig.1, ►Supplementary Material S1 [available in the online version]).
Ultimately, 17 articles from our 2015 to 2022 search were included in this analysis, including 1 found from citation searching.These articles described a total of 18 models.With the incorporation of the 8 articles describing 7 models from the 2015 systematic review, 10 we analyzed a total of 25 different models across 25 articles from 20 investigational teams.Among the 25 articles, 22 articles proposed 1 to 2 models each while 3 articles validated already published models without proposing any new ones.An organization of the articles is displayed in ►Fig.2, which also shows the source Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.CAD, cephalic at delivery; this can be measured as whether the fetus had cephalic presentation on the day of delivery or whether the ultimate delivery (vaginal or C-section) was a breech delivery.IQR, interquartile range (25th-75th percentile)."Range" otherwise refers to the min to max range.

ÃÃ
This represents a value that was not directly reported, but was able to be calculated from information reported in the article (e.g., if the overall mean was not reported, but the means of various subgroups were).Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.
6][27][28] Additionally, ►Tables 1 and 2 display the key components of the included studies such as inclusion and exclusion criteria, location, and demographics Additional information can be found in ►Supplemental Material S5 and S6 (available in the online version).

Relationships between Articles
Of the 17 new articles, 4 had some form of overlap with articles from the prior systematic review (►Fig. 2).Two 29,30 were external validation studies of a 1993 prediction model 31 in a new population by a new investigational team; one of the validation studies also described a new model. 30A third article 32 had a partially overlapping dataset with the investigational team's prior articles, 22,23 but proposed a model for predicting delivery type, while the prior articles focused on predicting cephalic presentation immediately after ECV.The fourth overlapping article 33 was performed in the same hospital and by some of the same researchers as their prior articles, 20,21 but used new data and proposed a new model.
The remaining 13 new articles used datasets, proposed models, and were performed by researchers who were entirely distinct from the 8 original articles.Of these articles, two were from the same team of authors and used the same dataset; Anand et al used the dataset to create a model predicting the cephalic position after the ECV, while Palepu et al created a model to predict vaginal delivery. 34,35The remaining 11 articles were completely independent from each other; 2 of these articles proposed two models each, 36,37 while the remaining 9 proposed a single model each [38][39][40][41][42][43][44][45][46] (►Fig.2).
Two of the articles had an erratum: one was trivial 40,47 and only modified a sentence in the abstract, while the other was a correction of a figure describing the decision tree model. 41,48See ►Supplementary Material S4, available in the online version for notes on how distinct models were counted in certain circumstances.

Geographic Distribution
As shown in ►Table 3, Europe was the most represented region with 9 models taking place solely there (3 Spain, 2 a Articles were by the same investigational team if they had at least one overlapping author and were conducted in the same institution.b Primary research articles include those articles found by the previous or our systematic review.It only includes the modeling research articles and not any background articles that describe the dataset (e.g., if the dataset was published for other reasons) or the ECV methodology.c Total number of patients is the sum of the sample size of the studies in the region.This represents the unique number of patients that received an ECV and were included in a dataset by modeling studies in that region.d Burgos et al's 2011 and 2012's dataset 22,23 contains all 1,000 ECV attempts in their hospital from 2002 to 2010.517 of those attempts were successful.Burgos et al's 2015 dataset 32 contains all 627 successful ECVs in their hospital from 2002 to 2012.Since there is significant, but not complete overlap, we do not know the exact number of unique patients included in at least one of their studies.e Hong Kong was transferred from the United Kingdom to the People's Republic of China on July 1, 1997.The Hong Kong articles were from July 1997 24 and March 2000 50 .We still separated the region out from mainland China because of major cultural differences between the two regions.f Hutton et al's 2017 article used a multinational dataset from 22 countries.Patients were primarily from Western Europe, Australia, South America, and Canada.The author team was from Canada. 44erican Journal of Perinatology © 2024.Thieme.All rights reserved.
Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Reporting the number of ECVs instead of the number of patients was also acceptable. d The reported success rate could be for any ECV outcome.Possibilities include cephalic presentation immediately after the procedure, ultimately having a vaginal delivery, and having a vaginal delivery given cephalic presentation immediately after the procedure.
e Overall assessments only concern the operationalized items mentioned in this table.Other aspects will be discussed in the Discussion section.At least 10 "Yes" were required for a low risk of bias/high quality article and at least 6 for a medium risk of bias/medium quality article.
American Journal of Perinatology © 2024.Thieme.All rights reserved.
Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.x (V, 84%) 1

Tocolytic agent k
x (V, 96.7%) x (A) x (A) x (A) x (A) x (A) x (A) x (A) x (A) x (N) x (V, 96%) 0 Note: Not all models had a multivariate analysis step that was distinct from the model presentation step.Therefore, for all models, the þ/À evaluation was done based on the univariate analysis only.As an initial step, Hutton et al separated the data by parity and created a separate tree model for each parity (which is mathematically equivalent to a larger tree that combines both trees by parity).Thus, it is unclear if they actually tested parity in the univariate analysis.k Among those articles using tocolysis, a diversity of agents were used: fenoterol, atosiban, terbutaline, ritodrine, hexoprenaline, magnesium sulfate.
American Journal of Perinatology © 2024.Thieme.All rights reserved.
Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.
Netherlands, 2 Germany, 2 Sweden), followed closely by Asia with 12 models (3 Hong Kong, 4 mainland China, 2 India, 1 Pakistan, 2 Israel).Only three models were from the United States and only one of those were from the 21st century. 31,46,49The remaining article was the only multinational study and was authored by a Canadian team; patients were primarily from Western Europe, Australia, South America, and Canada. 44

Quality and Bias
The results of the study quality assessment based on the signaling questions are shown in ►Table 4. Out of 17 new articles, 10 studies were identified as having low overall risk of bias and high study quality, 22,[33][34][35][36]39,41,[44][45][46] 6 studies were identified as having moderate bias and moderate study quality, 29,37,38,42,43,47 and 1 was considered to have high bias and low quality. 30 This is n addition to Velzel et al's assessment of the eight prior articles of which four were low risk of bias and high study quality and 4 were moderate risk of bias and study quality.10 Studies were considered to have moderate risk of bias and quality if model-building strategies were not described, as it introduces concerns regarding consistency in applying the models to a similar population.

Predictor Variables
The predictor variables used in the models are displayed in ►Table 5, divided into patient, ultrasound, clinical exam, and procedural characteristics.The most included patient characteristic was parity (included in 19 of the 25 models), followed by maternal age (6/25 models) and gestational age at ECV (6/25).Among the ultrasound characteristics, the most commonly included were placental location (12/25), amniotic fluid amount (11/25), type of breech (6/25), and estimated fetal weight (EFW; 5/25).Among the clinical exam characteristics were breech engagement or station (10/25), ability to palpate or grasp the fetal head (8/25), uterine tone or contractions (5/25), and maternal weight gain or body mass index (BMI; 8/25).While the large majority of articles studied maternal weight or BMI, only a few indicated whether this was collected or intended to be a pregravid, peri-ECV, or peripartum BMI.Of the eight models using some form of maternal weight or BMI, one model used BMI at ECV, 46 one model used the BMI increase from prepregnancy to ECV, 39 two models from the same article used prepregnancy BMI, 36 and four models were not clear on when BMI was evaluated. 30,31,41,44

Phases of Model Development
44][45]49 Seven models (from six articles) had internal validation by either random split, bootstrapping, and/or cross-validation, but had no form of external validation. 24,33,34,36,41,46f the remaining five models with external validation, three models solely had temporal external validation, 21, 23,50 while two had additional forms as well. 31,39Lin et al's model had temporal external validation; additionally, the ECV operator was not restricted during the validation phase. 39he Newman-Peacock (NP) model was proposed (n ¼ 106) and temporally validated (n ¼ 266) by an American team in 1993. 31The model was externally validated again by independent teams in Pakistan in 2012 (n ¼ 166) and Portugal in 2018 (n ¼ 266) 29,30 (►Table 7).The validation studies were of appropriate size and showed measurable, though modest, predictive ability: the Pakistani study had an approximate AUC of 0.590 (not reported in the article, but able to be estimated from the given information) and the Portuguese study reported an AUC of 0.642.By comparison, the original study had 266 patients and an AUC of 0.727 (not reported in the article, but calculable from the given information). 29,30ith the exception of the aforementioned NP model, no other articles had external validation in another hospital system or by another investigational team, and no other models had a second external validation study.
Many models did not have a presentation.Of those models that did, the most common were score charts, as shown in ►Tables 7-12 and equations, as shown in ►Supplementary Material S2, available in the online version.Two models were presented as decision trees (flowcharts) (►Figs.3 and 4).Lau et al had a combination table (►Table 13). 24We homogenized the different presentations to standardized formats and displayed them in the aforementioned tables and figures.

Model Performance (Calibration and Discrimination)
Only 7 of the 25 models displayed a calibration plot, which is the gold standard for calibration reporting (►Table 6).Nine of 25 models performed the Hosmer-Lemeshow statistical test, including 5 that performed it in addition to the calibration plot and 4 that performed it as the sole method of calibration reporting.Of the 14 articles that reported neither a calibration plot nor a Hosmer-Lemeshow test, 6 had some other metric that can loosely be used to measure calibration (e.g., likelihood ratios or observed vs. predicted chance by score value), while the remaining 8 had none whatsoever.Those articles that reported some form of calibration metric all reported good calibration, except Dong et al which reported good calibration for their ECV success model but not the delivery outcome model. 36][31] Some metrics for model discrimination (e.g., sensitivity, AUC, accuracy, positive predictive value) were displayed in 17 of the 25 models (►Table 6).In 13 models, a value for the AUC of the ROC curve (or the mathematically equivalent Cstatistic) was reported for at least one portion of the dataset.For 10 of these models, the article displayed the ROC curve as a figure.Four models did not have an AUC reported, but discrimination metrics such as accuracy, specificity, and American Journal of Perinatology © 2024.Thieme.All rights reserved.
Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.This represents a value that was not directly reported, but was able to be calculated from information reported in the article.
d This represents a value that was not directly reported, but was able to be estimated from information reported in the article.
American Journal of Perinatology © 2024.Thieme.All rights reserved.
Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.
sensitivity were reported or could be calculated by values reported in the text, tables, or figures.

Discussion
The field of prediction modeling of ECVs is booming.The majority of models in this review were published in the last 5 years (14/25 from 2018-2022).Additionally, the articles that produced the more recent models are more robust with larger sample sizes, better reporting, and more rigorous statistical measures (e.g., using bootstrapping and cross-validation, presenting calibration metrics, etc.).This is evidenced by the larger percentage of studies identified as high quality and low bias than that in the prior review despite using similar signaling questions.We suspect that contributions to the increase in research activity and quality is twofold: (1) increased clinical need for ECVs as vaginal breech deliveries have become increasingly rare and (2) increased availability of programming and data science both in the general public and in the medical community.

Areas for Improvement in the Literature and for Future Articles
Despite the increase in research output and quality, there remains much to be improved.First, in any field, prediction modeling studies of the highest quality should always evaluate and report calibration metrics.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.
index (AFI), parity, and EFW; the specific definitions of certain variables such as when BMI was taken; the exact definition of a successful model outcome; the inclusion and frequency of transverse lies, repeat ECVs, and vaginal birth after cesareans in the study group.Second, there are numerous articles in the earliest stages of development and scant thereafter.Only five articles have been externally validated and four of those were narrow validations.The NP model is the exception: it has been validated by new investigational teams in other countries (►Table 7).We believe the medical community would benefit if further research was shunted away from development of new models and toward the external validation of the many already published ones.
Interestingly, there have been geographical changes in the articles.In the last 3 years, most of the articles were from India, mainland China, and Israel, although there are also many articles from Europe in the last 5 years.Dahl et al's article is the only one from the United States since 1999. 46Because geographic regions have significant differences in patient demographics, physician practice patterns, and cultural values, it is possible for a model to be successful in one region but unsuccessful in another.The increase in models from Asia provides benefit to the research community by diversifying the model populations.However, there is a scarcity of United States models (either models created or validated in the United States) relative to Europe.Therefore, we encourage more American researchers to pursue research in this exciting field.
Lastly, we provide recommendations on conducting future systematic review updates in this area in ►Supplementary Material S5, available in the online version.

Recommendations for Clinical Practice
Based on the findings in this review, the NP model to predict ECV success (odds of cephalic presentation after the ECV procedure) is currently the most clinically useful model.Even though it is the oldest model in the review, it includes three of the most widely included prediction features: parity, placental location, and station.Furthermore, it is the only model that has been externally validated in a significantly different population and by a different investigational team than what was used to create the model.
With that said, the NP model's predictive value is limited as most individuals fall into the middle score groups and because the studies do not consider oblique or transverse lies.With the improvement of data science techniques in the last several years, it is likely that one of the more recently proposed models has better sensitivity and specificity in predicting ECV success or the likelihood of vaginal delivery post-ECV.Nevertheless, without external validation, physicians should be cautious in employing these models in their patient populations, especially in locations distant from where the study was conducted.In the upcoming years, we anticipate further validation of several of the models in this review, which may result in a more clinically appropriate prediction model to use in counseling patients on the likelihood of ECV success.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.Table 13 Combination table for model proposed by Lau et al in 1997 (this was one of the two models presented in the article, but the only one displayed as a combination table 24 ) Predicting a Successful Cephalic Version-A Systematic Review Yerrabelli et al.
This document was downloaded for personal use only.Unauthorized distribution is strictly prohibited.

Fig. 1
Fig.1Flowchart of study selection process.This is in agreement with the 2020 PRISMA guidelines.12

Fig. 2
Fig. 2 Categorization of articles in this and the prior systematic review.
was acceptable if the information needed was available in a citation in the article (e.g., a previous article by the same authors describing methodology).a Burgos 2011, Burgos, 2012, and Burgos 2015 are three successive papers by the same team that include progressively larger datasets.Only Burgos 2011 and 2012 were included in Velzel 2015's systematic review.b Anand 2019 and Palepu 2022 articles use the same datasets.c Perinatology © 2024.Thieme.All rights reserved.

d
NÀ for nulliparous model, but Yþ for multiparous (aka was only univariate statistically significant and ultimately included for multiparous.e Type of breech/overall fetal position (Frank vs. non-Frank breech).f Position of fetal spine/back (anterior, posterior, lateral).g Velzel et al 2018 used 1/AFI instead of AFI directly in their logistic regression model.h Aisenbrey used uterine tone and contractions synonymously, while Lau and López-Pérez distinguished the two.Wong used a definition of uterine tone that clearly did not consider contractions while Kok/De Hundt used uterine tone, but did not clarify their definition.i Fetal head clock orientation with abdomen Zheng et al defined 5 options: fetal head under the maternal xiphoid process (1-11 o'clock), in upper L maternal abdomen (1-2 o'clock), in upper R maternal abdomen (10-11 o'clock), in L maternal abdomen (2-3 o'clock), and in R maternal abdomen (9-10 o'clock).Svensson et al defined four options: left arcus, left side, right arcus, right side.j Breech location by abdominal exam (into/out of pelvis) per Aisenbrey.

Source:
Adapted with permission from Zheng LG, Zhang HL, Chen RX, et al.Scoring system to predict the success rate of external cephalic versions and determine the timing of the procedure.Eur Rev Med Pharmacol Sci.2021;25(1):45-55.doi:10.26355/eurrev_202101_24345.

Fig. 4
Fig.4Isakov 2019 decision tree, adapted to standardized format.Percentages were not explicitly given by the article, but were estimated from the figures.41,48

Table 1
Key characteristics, inclusion criteria, and exclusion criteria of the included studies

Table 1 (
Continued) Abbreviations: AFI, amniotic fluid index; SDP, single deepest pocket; wga, weeks of gestational age; yo, years old.aThisindicates location of patient population, which is not necessarily the location of author affiliations or of the journal.bTransverseand oblique lies are generally known to have significantly higher success rates than true breech presentations.As such, authors may decide to exclude them from their model datasets, but still perform the ECVs on these patients.cThisreports all other exclusions not in the prior categories.The text was standardized to be more easily compared across articles.dInother articles by Silva et al (e.g., Vaz de Macedo 2015), only the first ECV was used.eBurgoset al 2015 used a dataset significantly overlapping the datasets from Burgos et al 2011 and Burgos et al 2012.fBurgoset al 2015 stated "Vaginal birth after cesarean delivery is standard practice at the study hospital and is attempted when there are no contraindications (!1 previous classic cesarean, previous uterine surgery accessing the uterine cavity, previous uterine rupture, contraindications for vaginal birth, or >3 cesareans)."However,this was not mentioned in Burgos et al 2011 and Burgos et al 2012.gKoket al 2011 and De Hundt et al 2012 included both vaginal and "abdominal" delivery in their definition of multiparity.Therefore, it is implied that prior C-sections were not an exclusion criterion for these studies.hHuttonet al 2017 included the last successful ECV attempt.If all failed, then they included the first ECV failure.Multiple attempts occurred in 138 (11.0%) cases.iSvenssonet al 2021 did not detail specific inclusion or exclusion criteria.jAnandetal 2019 and Palepu et al 2022 were from the same group of authors.Anand et al 2019 report their dataset beginning January 1, 2010, while Palepu et al 2022 report beginning January 1, 2011.They both had the same number of ECVs and reported the same end date (December 31, 2017).Thus, we presume that the earlier article had a typo in the earlier start date.American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 2
Population demographics of the included studies American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 2 (
Continued) Abbreviations: SD, standard deviation.Note: P(ECV) ¼ Probability of ECV success (i.e.cephalic presentation at the end of the ECV attempt).P(ECV þ CAD)¼ Probability of ECV success AND cephalic presentation at delivery.P(ECV þ CAD þ VD) ¼ Probability ofECV success AND cephalic presentation at delivery AND vaginal delivery (i.e., successful ECV attempt and vaginal delivery of cephalic fetus).P(VD | ECV) ¼ Probability of vaginal delivery GIVEN successful ECV.P(VD | ECV þ CAD) ¼ Probability of vaginal delivery GIVEN successful ECV AND cephalic presentation at delivery.
Table1did not clarify whether the reported values were means and min-max ranges or medians and min-max ranges.We assume the former based on wording in the methods.not report mean values for maternal age, GA at ECV, BMI, and AFI.Instead, they reported the medians of 4 subgroups, which cannot be used to calculate an overall mean.The same situation happened for Zheng et al's GA at ECV and AFI values, but with only two subgroups (failures vs. successes).included multiple ECV attempts in the same pregnancy.Thus, their model outcome was cephalic presentation BOTH at the end of the last ECV attempt AND at birth (Group A a Dahl et al did not report means: 11 (1.3%) oligo, 806 (96.8%) normal, 16 (1.9%)poly.Presumably, missing data for all but 833 (11 þ 806 þ 16).b Dahl et al's model outcome was cephalic presentation at the end of the ECV attempt and persistence until hospital discharge.c Tasnim et al reported BMI in categories: BMI d Silva et al did not report a mean for GA.Instead, they reported categories: 132 (39.3%) at 36 w, 134 (39.9%) at 37 w, 70 (20.8%)at 38 w. e Burgos et al's 2011 and 2012 articles reported amniotic fluid amounts in categories: 33 (6.6%) scarce, 452 (90.4%) normal, 15 (3.0%) abundant (Ph. 1 only).f Burgos et al's 2015 article used a dataset significantly overlapping the datasets presented in Burgos et al's 2011 and 2012 articles.g Burgos et al's 2015 article reported GA in categories: 451 (72.9%) at 37 w, 34 (5.4%) earlier, 140 (22.9%)later.h Burgos et al's 2015 article only reported adverse events that were an indication for IOL/delivery or a specific delivery method (e.g., PROM).i Burgos et al 2015 predicted vaginal delivery given a successful and uncomplicated ECV, cephalic presentation at delivery, and delivery at same institution.j López-Pérez et al did not report a mean for BMI.Instead, they reported categories: 64 (21.8%) at k López-Pérez et al's model outcome was cephalic presentation at the end of the ECV attempt.They specifically stated that this was regardless of any complications including emergency CS.l The outcome for Kok 2011's, De Hundt 2012's, and Velzel 2018's articles (all published by the same investigational team) was cephalic presentation after the ECV attempt that was maintained for !30 minutes.m Velzel et al's n Hutton et al did o Hutton et al p Ebner et al did not report a mean for amniotic fluid.Instead, they reported categories: 15 (12.7%) reduced, 97 (82.2%) normal, 4 (3.4%)increased.q Svensson et al did not report mean values for maternal age, GA at ECV, BMI, and amniotic fluid amount.Instead, they reported categories: maternal age: r Bilgory et al did not report a mean for AFI.Instead, they reported categories: 66 (7.0%) at 5-7.9 cm, 756 (79.9%) at 8-19.9 cm, 124 (13.1%) at !20 cm.s Dong et al did not report mean values for maternal age, GA, BMI, and AFI.Instead, they reported categories: maternal age: 72 (21.0%) at American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 3
Aggregate geographic distribution of the articles and prediction models

Table 4
Risk of bias and study quality of the new, included studies

Table 5
Predictor variables evaluated and used by the prediction models found in the included studies

Table 5 (
Continued) If there is general characteristics table and a table with binned groups, the latter was used when possible.If a characteristic is only found in the former (e.g., Dong et al 2022), then the former was used.If p-values/CI are given for both a general comparison test (e.g., t-test) and for odds ratios, then use the latter.If a CI, but not a p-value was given, then the CI was used for assessing statistical significance.
a Velzel et al 2018 was the only one to include race.They included it as a binary variable as "Caucasian" or not.b This is a change from Velzel et al's 2015 review.c

Table 6
Evaluation of model development and model performance of the prediction models found in the included studies American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 6 (
Continued) S6 for more information, available in the online version.S6 describes the specific software programs used in the articles.(availablein the online version) Note: In situations where it was not completely clear on what data section (training, internal validation, external validation) the metrics are reported for, the text "implied" was added in parentheses.This represents a value that was not directly reported, but was able to be calculated from information reported in the article.that the reported AUC numbers are for internal validation and not for the training cohort, but this is not explicitly stated.A similar situation happened in Kok 2011.This calculator only works on Firefox as of July 2023.On Safari, Chrome, and Edge, it returns "There was something wrong, please contact the admin."after submitting the input.PIMS score, it is unclear if Tasnim et al 2012 is the initial study describing model development or if it is an external validation study or a model development study.If the latter, the model development study was not referenced or could be found.Pérez 2020's accuracy and related metrics, the total is only 227 even though the n is 317 for the study.The article does not clarify why.
a c In Burgos 2011, an ROC curve is shown for model that only has 3 of the 4 final variables.Burgos 2012 has the #s for the final model, but no graphs.d Dahl 2021 implies e For Lau et al's uterine tone model, close examination of the article reveals methods that are functionally equivalent to k-fold cross-validation with k f Dahl et al 2021 provide a web link for their calculator (https://www.ecvcalculator.com/).g For the GNKh López-Pérez 2020 has a screenshot of a calculator, but there is no way for readers to access a working calculator.i For López-American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 7
Score chart for the Newman-Peacock model and associated success value by score value 29-31 Portions of the table were adapted with permission from Newman RB, Peacock BS, Peter VanDorsten J, Hunt HH.Predicting success of external cephalic version.Am J Obstet Gynecol.1993;169(2):245-250. doi:10.1016/0002-9378(93)90071-P.Note: All three articles only reported the numeric results for score groupings, not for individual score values.Furthermore, Tasnim et al did not adhere to the score groups defined by Newman et al (0-4, 5-7, 8-10).As such, aggregating the results of the three studies is not easily possible.However, we attempted to do so by extracting data at specific score values for Newman et al and Silva et al from the articles' figures and computer simulations of possible data that achieve the reported means, SDs, and AUC.These results, as well as the aggregated results from all three studies, are in Abbreviations: AUC, area under the curve; SD, standard deviation.Source: a Procedure success is cephalic presentation immediately after the ECV procedure.b AUC represents the area under the ROC curve and is a measure of model performance in the study population.c 22,23 chart proposed by Burgos et al in 2010 and 201222,23 there are other items that were not always reported: percentage of missing data or lack thereof as well as imputation methods; whether the reported calibration and discrimination metrics are from the training or validation set; mean, median, and standard deviation of key demographic variables of the overall study population, namely maternal age, gestational age, BMI, amniotic fluid Table9Source: Adapted with permission from Burgos J, Cobos P, Rodriguez L, et al.Clinical score for the outcome of external cephalic version: a two-phase prospective study: Clinical score for external cephalic version.Aust N Z J Obstet Gynaecol.2012;52(1):59-61.doi:10.1111/j.1479-828X.2011.01386.x.

Table 8
50ore chart proposed by Wong et al in 200050: Adapted from Wong WM, Lao TT, Liu KL.Predicting the success of external cephalic version with a scoring system.A prospective, two-phase study.J Reprod Med.2000;45(3):201-206. 50a "The head was defined as palpable when the whole fetal head could be delineated per abdomen and was ballottable." 50b "The breech was considered unengaged if not fixed in the pelvis during abdominal examination." 50c "The uterine consistency was considered relaxed when, on deep palpation of the fetal parts, the fetal parts could be felt drifting away and returned to the examiner's hand.Otherwise, the uterus was considered tense." 50d A total score vs. procedure success table was not directly presented in the Wong et al's article but can be extracted from the information given.

Table 10
Score chart proposed by De Hundt et al in their 2012 validation study of Kok et al's model 20,52 : Adapted with permission from De Hundt M, Vlemmix F, Kok M, et al.External validation of a prediction model for successful external cephalic version.Am J Perinatol.2012;29(03):231-236. doi:10.1055/s-0031-1285098. in De Hundt et al's 2012 article.The original data are not available.
Sourcea These values are estimated from ►Fig. 3 American Journal of Perinatology © 2024.Thieme.All rights reserved.

Table 12
38ore chart proposed by Zheng et al in 202138

Table 11
Score chart proposed by Anand et al in 2019 34 Source: Adapted with permission from Anand K, Keepanasseril A, Amala R, Nair NS.Development and validation of a clinical score to predict the probability of successful procedure in women undergoing external cephalic version.J Matern Fetal Neonatal Med 2019;34(18):2925-2931.doi:10.1080/14767058.2019.1674803.Note: Success results by score were not reported.a Station indicates if fetal pole is engaged.