This meta-epidemiological study identified 235 systematic reviews addressing a question related to the prevalence of any clinical condition. Our investigations have found that across the included systematic reviews of prevalence, there are significant and important discrepancies in how these reviews consider searching, risk of bias and data synthesis. In line with our results, a recently published study assessed a random sample of 215 systematic reviews of prevalence and cumulative incidence, and also found great heterogeneity among the methods used to conduct these reviews.(7) In our view, the study conducted by Hoffmann et al. and our study are complimentary. For instance, in the first one, the authors included reviews published in any year and compared the characteristics of reviews published before or after 2015 and with or without metanalysis; in our study, we assessed other methodological characteristics of the included reviews, specially related to the conduction of meta-analysis.
One area for potential guidance to inform future systematic reviews of prevalence data is in the risk of bias or critical appraisal stage. As can be seen from the results of our review, there are a number of checklists and tools that have been used. Some of the tools utilized were not appropriate for this assessment, such as the Newcastle Ottawa Scale (designed for cohort and case-control studies)(8) or STROBE (a reporting standard)(9) as a quality assessment tool. Interestingly, some of the tools identified were designed specifically for studies reporting prevalence information, whilst other tools were adapted for this purpose, with the most adaptions to any one tool (n= 13, 5.5%) being made to the Newcastle Ottawa Scale. This is particularly concerning when reviews have used results of quality assessment to determine inclusion in the review, as was the case in 17% of the included studies. The combination of (a) using inadequate or inappropriate critical appraisal tools and (b) using the results of these tools to decide upon inclusion in a systematic review could lead to the inappropriate exclusion of relevant studies, which can alter the final results and produce misleading estimates. As such, there is an urgent need for the development and validation of a tool for assessing prevalence, along with endorsement and acceptance by the community to assist with standardization in this field. In the meantime, we urge reviewers to refer to the JBI critical appraisal tool,(4) which has been formally evaluated and is increasingly used across these types of reviews.
Encouragingly, 72.3% of the included reviews adhered to or cited a reporting guideline in their review. The main reporting guideline reported was the PRISMA statement.(6) However, this reporting guideline was designed particularly for reviews of interventions of therapies. As such, there have been multiple extensions to the original PRISMA statement for various review types (10, 11), yet no extension has yet been considered for systematic reviews of prevalence data. To ensure there is a reporting standard for use in prevalence systematic reviews, an extension or a broader version of the PRISMA statement including items important for this review type is recommended.
It was encouraging to see that multiple databases were often searched during the systematic review process, which is a recommendation for all review types. However, it is also important in systematic reviews that all the evidence is identified, and in the case of prevalence information, it may be particularly useful to search for data in unpublished sources, such as clinical registries, government reports, census data, and national administrative datasets for example.(12, 13) However, there are no clearly established procedures on how to deal with this kind of information. Further guidance on searching for evidence in prevalence reviews is required.
In our review, we found 64.7% of reviewers conducted meta-analysis. There has been debate regarding the appropriateness of meta-analysis within systematic reviews of prevalence, (14, 15) (16-18) largely surrounding whether synthesizing across different populations is appropriate, as we reasonably expect prevalence rates to vary across different contexts and where different diagnostic criteria may be employed. However, meta-analysis, when done appropriately and using correct methodology, can provide important information regarding the burden of disease, including identifying differences amongst populations and regions, changes over time, and can provide a summarized estimate that can be used when calculating baseline risk, such as in GRADE summary of findings tables.
The vast majority of meta-analyses used classic methods and a random effects model, which is appropriate in these types of reviews (17) (18). Although the common use of random effects across the reviews is encouraging, this is where the consistency ends, as we once again see considerable variability in the choice of methods to transform prevalence estimates for proportional meta-analysis. The most widely used approach was the Freeman-Tukey double arscine transformation. This has been recommended as the preferred methods for transformation (15, 18), although more recently it has come under question.(19) As such, further guidance and investigation into meta-analytical techniques is urgently required.
In reviews including meta-analysis, heterogeneity was assessed with the I2 in 94.7% of the included studies. Although I2 provides a useful indication of statistical heterogeneity amongst studies included in a meta-analysis, it can be misleading in cases where studies are providing large datasets with precise confidence intervals (such as in prevalence reviews).(20) Other assessments of heterogeneity, such as T2, may be more appropriate in these types of reviews. (20, 21) Prediction intervals include the expected range of true effects in similar studies.(22) This is a more conservative way to incorporate uncertainty in the analysis when true heterogeneity is expected; however, it is still underused in meta-analysis of prevalence. Further guidance on assessing heterogeneity in these types of reviews is required.
Of the 235 systematic reviews analyzed, only 9 (3.8%) included a formal quality assessment or process to establish certainty of the entire body of evidence. Separate to critical appraisal, quality assessment of the entire body of evidence considers other factors in addition to methodological assessment that may impact on the subsequent recommendations drawn from such evidence.(23) Of these 9 reviews, only 4 (1.7%) followed the GRADE approach (24) which is now the commonplace methodology to reliably and sensibly rate the quality of the body of evidence. The considerably small number of identified reviews that included formal quality assessment of the entire body of evidence might be linked to the lack of formal guidance from the GRADE working group. The GRADE approach was developed to assess issues related to interventions (using evidence either from interventional or observational studies) and for diagnostic tests, however there are several extensions for the application of GRADE methodology. While there is no formal guidance for GRADE in systematic reviews of prevalence, there is some guidance into the use of GRADE for baseline risk or overall prognosis (25) which may be useful for these types of reviews. While no formal guidance exists, using the guidance cited above can serve as an interim method and is recommended by the authors for all future systematic reviews of prevalence. Whilst the number of identified studies that utilized GRADE in particular is small, it is encouraging to see the adoption of these methods in systematic reviews of prevalence data, as these examples will help contribute to the design of formal guidance for this data type in future.
It is reasonable to expect some differences in how these types of reviews are conducted, as different authors groups will rationally disagree regarding key issues, such as whether Bayesian approaches are best or the ideal method for transforming data. However, the wide inconsistency and variability noted in our review are far beyond the range of what could be considered reasonable and is of considerable concern for a number of reasons. First (1); this lack of standardization may encourage unnecessary duplication of reviews as reviewers approach similar questions with their own preferred methods; (2) novice reviewers searching for exemplar reviews may follow inadequate methods as they conduct these reviews; (3) the general confusion in end users, peer reviewers and readers of these reviews as they are required to become accustomed to various ways of conducting these studies; (4) review authors themselves following inadequate approaches, missing key steps or information sources, and importantly increasing the potential for review authors to report inaccurate or misleading summarized estimates; (5) a lack of standardization across reviews limits the ability to streamline, automate or use artificial intelligence to assist systematic review production, which is a burgeoning field of inquiry and research; and (6) these poorly conducted and reported reviews may have limited or even detrimental impacts on the planning and provision of healthcare. As such, we urgently call for the following to occur to rectify these issues, (1); further methods development in this field and for updated guidance on the conduct of these types of reviews, (2); we urgently require a reporting standard for these reviews (such as an extension to PRISMA), (4); the development (or endorsement) of a tailored risk of bias tool for studies reporting prevalence estimates, (5); the further development and promotion of software(26) and training materials(27) for these review types to support authors conducting these reviews.
Limitations of our study
In this study, we have collated and interrogated the largest dataset of systematic reviews of prevalence currently available. Although only a sample, it is likely to be representative of all published prevalence systematic reviews, although it is important that we acknowledge we only searched MEDLINE over a period of 1 year using a search strategy that only retrieved studies with the terms “prevalence” and “systematic review” in the title. There is potential that systematic reviews of prevalence published in journals not indexed (or not published at all) are meaningfully different from those characterized in our results. However, given that reviews indexed on MEDLINE are (hypothetically) likely to be of higher quality than those not indexed, and given that we still identified substantial inconsistency, variability and potentially inappropriate practices in this sample, we doubt that a broader search will have altered our main conclusions significantly. Similarly, we believe that reviews that do not use the terms “prevalence” and “systematic review” in the title terms would have, overall, even more inappropriate methods. Regarding the timeframe limitation, we decided to include only reviews recently published because older reviews may not reflect the current practice.