Background: Although randomized controlled trials (RCT) are the gold standard to estimate treatment effects, they are often criticized in terms of generalizability. Observational data might alleviate this problem by being readily available in large quantities. However, observational data are potentially confounded. In this methodological study we use a large-scale RCT as the gold standard to examine the performance of various statistical methods to control for potential confounding in observational data.
Methods: In this paper we compare three types of methods that allow researchers to correct for such potential confounding: direct methods, inverse probability weighting (IPW) methods and doubly robust (DR) methods. We uniquely compare estimates obtained from the population-wide Netherlands Cancer Registry (NCR) on colon cancer (n=52621) with estimates obtained from a large-scale RCT. As the RCT differs from the observational data both in its sampling mechanism and in its treatment assignment mechanism, we first resample the NCR data to reflect the distribution in RCT data. Next, we correct for potential confounding using three alternative types of methods and consequentially evaluate their estimates to those obtained in the RCT.
Results: We find that while all estimators qualitatively approximate to findings in the RCT, methods that can flexibly model the response (i.e., direct estimation and DR estimation) performed consistently superior to the inverse propensity score method. Subgroup analysis indicates that relatively simple models allow us to properly estimate the treatment effect. However, these simple models do not properly identify heterogeneous treatment effects in stage2 colon cancer. Careful sensitivity analysis using more flexible models demonstrates both the uncertainty and the potential heterogeneous treatment effect in stage2 cancer and provides robust estimation of treatment effect in stage3 cancer.
Conclusions: Our results suggest that both the direct method and the DR method, when executed with care, can be used to reliably estimate treatment effects based on registry data. This methodological validation opens the door to more extensive use of registry data for the estimation of (individual) treatment effects. Additionally, our identification of potentially meaningful subgroups of stage2 colon cancer patients who, based on our analysis seem to benefit from chemotherapy, should be further explored.