Background: Cochran’s Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value (under an incorrect null distribution) is part of several popular estimators of the between-study variance, τ2. Those applications generally do not account for estimation of the variances in the inverse-variance weights. Importantly, those weights make approximating the distribution of Q (more explicitly, QIV) rather complicated.
Methods: As an alternative, we are investigating a Q statistic, QF , whose constant weights use only the studies’ effective sample sizes. For log-odds-ratio, log-relative-risk, and risk difference as the measures of effect, we study, by simulation, approximations to distributions of QIV and QF , as the basis for tests of heterogeneity.
Results: The results show that: for LOR and LRR, a two-moment gamma approximation to the distribution of QF works well for small sample sizes, and an approximation based on an algorithm of Farebrother is recommended for larger sample sizes. For RD, the Farebrother approximations works very well, even for small sample sizes. For QIV , the standard chi-square approximation provides levels that are much too low for LOR and LRR and too high for RD. The Kulinskaya et al. (2011) approximation for RD and the Kulinskaya and Dollinger (2015) approximation for LOR work well for n ≥ 100 but have some convergence issues for very small sample sizes combined with small probabilities.
Conclusions: The performance of the standard χ2 approximation is inadequate for all three binary effect measures. Instead, we recommend using a test of heterogeneity based on QF and provide practical guidelines for choosing an appropriate test at the .05 level for all three effect measures.