“A study shows…” This phrase, and similar variations, are commonly used in public discourse to express that a research study unquestionably demonstrates an effect. However, a study’s results and conclusions should never be taken at face value and are only as reliable as the study’s methods. To understand if research can be trusted, a study must be critically appraised to assess if methodological weaknesses may have influenced the results, deviating their estimations away from the truth. Critical appraisal involves evaluating the study’s design and potential errors to determine if weaknesses in one or both of these areas are present that would diminish the validity of the results and conclusions.
To assess the truthfulness of research aimed at improving human health, it is necessary to understand the fundamentals of how clinical studies are designed and conducted. Healthcare research often begins with pre-clinical studies that are performed in laboratory settings using various models, including cell cultures, invertebrates, and/or vertebrate animals. Pre-clinical studies demonstrate how biological systems and disease processes work and provide us with a basis for understanding if an intervention is safe for clinical research involving direct interaction with human subjects.
Clinical research seeks to identify and treat disease by examining the frequency, risk factors, underlying processes, natural history, available treatments, and prognostic factors of a disease. To identify these components, two or more groups are tested and compared to each other. Multiple types of clinical studies can be conducted - some more suitable than others depending on the objective - but there are two primary categories:
1) Interventional trials, where investigators intervene by giving an intervention to participants to prevent or treat disease, such as quasi-experimental trials and randomized controlled trials.
2) Observational studies, where the researchers observe populations or individuals but do not intervene, including ecological studies, cross-sectional studies, cohort studies, and case-control studies.
To treat disease, clinicians need to predict how a future patient will respond to a treatment based on how one has responded in the past. Clinical studies give healthcare providers the knowledge-base necessary to make predictions and understand relationships between treatments, or risk factors, and outcomes. To predict how a population will respond to an intervention in the future, researchers perform sampling, a process where a small subset of a larger population is included in a trial. The study needs to be powered correctly, meaning the sample size is large enough, based on previous or assumed predictions, to detect a difference between groups that will be compared. Sampling allows interventions to be tested more quickly and with fewer resources than studying an entire population. From sampling, statistical inferences can be made with the aim of estimating the population mean from the sample mean so that clinical predictions can be made.
To ensure that the estimate is as true as possible to the larger population, the trial needs to be free from errors and bias. Four types of errors exist:
1. Random errors, which occur purely due to chance.
2. Systematic errors, also known as bias, which address the internal validity of the trial and whether the study correctly answers the research question. Systematic errors are factors originating from the researchers and/or the participants that methodically alter the outcomes away from their true values.
3. Measurement errors, originating from the researchers, which are errors in the measurement of the exposure to the intervention, risk factors, and outcomes.
4. Design errors, which relate to the external validity of the results and address whether the researchers asked the correct question. Common design errors include studying the wrong sample, making the wrong comparison, or using an incorrect method of analysis.
Ideally, a trial will be free from systematic, measurement, and design errors, all of which are attributable to human actions. Given that random errors occur purely due to chance, these errors should be the only error that could influence the trial's outcomes. Random errors are assessed via significance testing using a p-value. The p-value is the probability that the observed difference is solely due to chance, or in other words, random error. For example, a p-value of less than 0.05 means that there is less than 5% chance that an observed difference between groups is attributable to random error. Assuming an ideal scenario, if the trial is free from systematic, measurement, and design errors and the trial has been powered correctly, then the observed difference between groups can be considered a truthful estimation that increases confidence in the trial’s results and conclusions.
Unfortunately, the ideal scenario, rarely, if ever, occurs. Instead, human-caused errors are commonly found in research studies that lead to wide-ranging inconsistencies and contradictions in the literature. For example, discrepancies in the literature are evident when comparing randomized controlled trials to observational studies on the same topic. Studies examining the antioxidant properties of vitamins have demonstrated inverse associations with all-cause mortality, cancer, and cardiovascular disease in observational studies. However, the protective nature of antioxidant supplementation disappeared when multiple randomized controlled trials were conducted that tested the same associations. Inconsistencies also occur between studies utilizing similar trial designs. Re-analysis of 37 randomized controlled trials demonstrated that 62% of results were changed after re-examining the results, including re-interpretation of the patients that should be treated and changes in the direction, magnitude of treatment effect, or statistical significance. Similarly, out of 49 highly cited original clinical trials, subsequent studies contradicted 16% of the original research, while 16% demonstrated weaker treatment effects. Discrepancies may be due to errors in the conduct, reporting, and analysis methods. Re-analysis of 250 controlled trials demonstrated that treatment effects were overestimated (P < 0.001) when trials utilized inadequate concealment methods or failed to adequately report the concealment methods.
Due to these inconsistencies and contradictions, faith cannot be invested in the reported results from these trials. Among the types of errors that are most likely to confound the results, systematic errors (bias) are among the most common and detrimental. Bias is defined as “a deviation away from the truth.” Hence, if research is biased, the obtained results will be deviated away from an accurate estimate of the true results, leading to wrong conclusions.
To evaluate systematic errors, critical appraisal of the research is conducted to identify if the following types of bias are present in the research:
Numerous “assessment tools” exist that provide guidelines for assessing the above-listed bias domains. These tools have been created by various evidence-based healthcare organizations, such as: Cochrane, Scottish Intercollegiate Guidelines Network (SIGN), Critical Appraisal Skills Programme (CASP), Centre for Evidence Based Medicine (CEBM), and others. Assessment tools are available for each type of study design in the form of checklists, scales, and domain-based evaluation tools that provide guidelines for appraising bias in trials that may limit their internal validity. The framework to assess the risk of bias of randomized controlled trials using the gold-standard assessment tool from Cochrane is provided in Part II of this series.