Revisiting Statistics and Evidence-Based Medicine: On the Fallacy of the Effect Size Based on Correlation and Misconception of Contingency Tables

BACKGROUND: Evidence-based medicine (EBM) is in crisis, in part due to bad methods, which are understood as misuse of statistics that is considered correct in itself. The correctness of the basic statistics related to the effect size (ES) based on correlation (CBES) was questioned. METHODS: Monte Carlo simulation of two paired binary samples, mathematical analysis, conceptual analysis, bias analysis. RESULTS: Actual effect size and CBES are not related. CBES is a fallacy based on misunderstanding of correlation and ES and confusion with 2 × 2 tables that makes no distinction between gross crosstabs (GCTs) and contingency tables (CTs). This leads to misapplication of Pearson’s Phi, designed for CTs, to GCTs and confusion of the resulting gross Pearson Phi, or mean-square effect half-size, with the implied Pearson mean square contingency coefficient. Generalizing this binary fallacy to continuous data and the correlation in general (Pearson’s r) resulted in flawed equations directly expressing ES in terms of the correlation coefficient, which is impossible without including covariance, so these equations and the whole CBES concept are fundamentally wrong. misconception of contingency tables (MCT) is a series of related misconceptions due to confusion with 2 × 2 tables and misapplication of related statistics. Problems arising from these fallacies are discussed and the necessary changes to the corpus of statistics are proposed resolving the problem of correlation and ES in paired binary data. CONCLUSIONS: Two related common misconceptions in statistics have been exposed, CBES and MCT. The misconceptions are threatening because most of the findings from contingency tables, including meta-analyses, can be misleading. Since exposing these fallacies casts doubt on the reliability of the statistical foundations of EBM in general, we urgently need to revise them.


KEY FINDINGS
The study:  exposes for the first time two related common misconceptions in statistics, the fallacy of effect size based on correlation and misconception of contingency tables;  shows that the misconceptions are threatening and most of the contingency tables findings, including meta-analyses based on correlation, can be misleading;  resolves the problems arising from these fallacies and proposes the necessary changes to the corpus of statistics;  clarifies existing and introduces new statistical definitions, including for 2 × 2 tables, creating the basis for further development;  questions the reliability of the statistical foundations of EBM and for the first time states the need to revise them.

INTRODUCTION
Evidence-based medicine (EBM) is "one of our greatest human creations," [1,2] but there is a growing awareness that it is undergoing a crisis, [1,3,4,5,6] in part due to "bad methods." [1] However, the idea of bad methods comes down to misusing statistics, [1] that is, misusing methods that are correct in themselves. [6] Therefore, believing in the reliability of the statistical foundations is the cornerstone of EBM. Unfortunately, there is cause for concern. This article exposes two common misconceptions in statistics, a fallacy of the effect size based on correlation (CBES), which has been around for over 70 years and remains unnoticed, and a related misconception of contingency tables (MCT).
The concept of CBES is included in all statistical and meta-analysis manuals and is widely used, especially in psychometrics. [7,8,9,10,11] The basic equation [10] [1] 2 = , where is the effect size (ES) known as Cohen's d, is the coefficient of bivariate correlation commonly known as Pearson's (productmoment) coefficient of correlation, [12] given equal groups ( 0 = 1 = ), reduces to , so [3] ≅ √ 2 +4 ; [13] this is the basic formula used by Cohen. [14] The corresponding equation for the dependence of from is √1− 2 . [9,14] The CBES was the weakest place in the Cohen's effect size concept, since the large ES ( = 0.8) corresponded to = 0.371, which is a weak to moderate correlation according to Pearson. To get around this problem, Cohen had to introduce a "biserial" estimate connected to the raw "point" estimate with a correction factor of 1.253, [15] but even the adjusted "large" was only 0.465 and still was lower than the Pearson's strong correlation limit of 0.5 (and much lower than the modern limits of 0.7 or 0.6 [16] ). Motivated by this discrepancy, I investigated ES versus correlation to identify the cause of the discrepancy.

METHODS
A simple model shown in Table 1 was used to analyze ES versus correlation. A working model in MS Excel is available in the Supplement.

RESULTS
In conflict with equations [2]- [4] implying functional relationship between CBES and AES, the simulation showed an extremely weak positive correlation between them ( Figure 1A), as well as between AES and correlation ( Figure 1B) and AES and covariance ( Figure 1C) (0 < R 2 < 0.02 in all cases). The result did not depend on the parameters of the samples (μ, σ). Note the striking difference between AES and CBES (Table 1). Thus, CBES and correlation are not related to the AES, suggesting that equations [2]- [4] and the whole CBES concept are flawed.

DISCUSSION
Although after Cohen's milestone monograph, [7] the relationship between correlation and ES seems apparent and even trivial, it is actually a logical fallacy stemming from the trivial notion that, since they are related to between-group differences, they are interrelated and therefore mutually convertible. As the mean difference normalized to variance (Table 2), ES is a kind of signal-to-noise ratio that characterizes the magnitude of the mean difference, regardless of the concordance of specific (paired) differences, and therefore applies to any sample, paired or unpaired.
Correlation, which is covariance cleared of variance, is a measure of relationship (causation or dependence) that characterizes the concordance of specific (paired) differences, regardless of their magnitude, and therefore applies only to paired samples. Thus, these are fundamentally different parameters that in principle cannot be reduced to each other, which can be proved mathematically.
Correlation coefficient r (equation [8]), given equal variances, comes down to the equation so the ES, following equation [7], is Variance is a pure measure of sample variability, correlation is a pure measure of the association (concordance of changes) of samples, and covariance is a complex parameter that combines variability and association. As seen from Table 3, correlation has nothing to do with variance, so the variance-based ES has nothing to do with the correlation ( Figure 1B). ES is also not related to covariance ( Figure 1C), while correlation and covariance are strongly correlated (e.g., R 2 ≈0.9 in Figure 1D) since both depend on association. A visual representation of the CBES fallacy is given in Figure 2 ( Table S1 in   Left columndiscordant cases (r=-1); right columnconcordant cases (r=1).
Another reason why the inconvertibility of correlation and ES remains obscured is rooted in the misconception of contingency tables (MST). Currently, any crosstab is considered contingency table by default, [17,18] resulting in the severe fallacy shown in Figure 3.
Thus, we come to new definitions of crosstabs:  Categorical crosstab is an n × m matrix that displays the mutual frequency distribution of two categorical variables having n and m categories, respectively.
 Binary crosstab (BCT) is a 2 × 2 matrix that displays the mutual frequency distribution of two binary variables.
 Gross crosstab (GCT) is a BCT that displays the mutual frequency distribution of two different binary variables, one of which is paired (i.e., the options are interrelated).
 Contingency table (CT) is a BCT that displays the frequency distribution of the feature pairs against the featured paired cause, so that the marginal statistics of CT match the cells of the parent GCT for these paired binary variables (Figure 3-B1).
The differences between BCT, GCT and CT are summarized in Table 4. Table 4. Differences between binary crosstabs, gross crosstabs and contingency tables. CT has a double-decker design (Figure 3-IIB1), [26] where the first level is formed by the feature binary variable, and the second by the causal binary variable, and counts pairs. GCT and BCT The example of thoroughbred racehorses [19] (Figure 3 , [27] and in large samples matches equation [4], albeit is totally different in nature. [4]; δs, the simplified effect size by equation [14]; d, the binary Cohen's d by equation [15].

Figure 4. Binary effect sizes versus the gross Pearson Phi ( ̅): A Monte Carlo simulation (100 iterations). δb, the biased effect size by equation
So, the term Pearson's Phi (φ) or "mean square contingency coefficient" should only be applied to CTs. The result of applying equations [12] and [13] to GCTs is in fact the "meansquare (or chi-square) effect half-size" and should be denoted as the gross Pearson Phi ( ̅). Figure 3 shows that ̅, which is an effect size parameter (part A), has nothing to do with (part B), which is a correlation parameter.
The last source of the CBES fallacy is the equation [6] that establishes a relationship between correlation and Sp. Since Sp determines the ES (equation [7]), which is functionally related to the significance of differences (SOD) [16] = √ 2 , [10] it seems logical to conclude that correlation and the ES are mutually related, which is a fallacy, the nature of which is parsed in Figure 5.
As discussed above, ES depends on both variance and covariance (equation [11]). As schematically shown in Figure 5A, the basic ES values (δ1 and δ2) depend on the variance and correspond to zero-correlation significances 1 0 2 and 2 0 2 . With these particular variances, the change in correlation does indeed change the ES and significance. Therefore, the dependence of ES from correlation (equation [6]) is only valid for constant variance, that is, for the set of CTs within each GCT ( Figure 5C), which doesn't matter since in real life we are dealing with the only association for each GCT ( Figure 5D). An example case is shown in Figure 5B-D. Two equal-sized GCTs with the zero-correlation ESs of 0.2 and 0.5 allow a number of associations ( Figure 5B), and within each GCT, the relationship between the correlation and ES / significance is functional ( Figure 5C). However, when considering two GCTs, the relationship is blurred, since the same correlations (here, ≈ 0.5 and ≈ −0.5) correspond to different ESs / significances. In real life, when there is the only association per each GCT, the relationship actually disappears (0 < 2 < 0.02) ( Figure 5D, Figure 1B), revealing itself only by the fact that the correlation is always positive (Figure 1). The main misconception stemming from this fallacy is a widespread tendency to draw conclusions about association based on the SOD and vice versa.
Thus, the CBES fallacy is that ES has nothing to do with correlation, so the concept is fundamentally wrong. However, in the case of binary variables, the misapplication of Pearson's Phi to GCTs results in the mean-square effect half-size. The fallacy is that this parameter is still misleadingly considered a measure of association. Generalization of this binary fallacy to the correlation in general (Pearson's r) and the entire range of binary and continuous data led to the erroneous equations [2]- [4] that, in turn, lead to erroneous metaanalysis, [10,11] misunderstanding of the nature of correlation, [7] and erroneous conversions and transformations based on the CBES. [28,29] The emergence of the CBES fallacy was associated with the introduction of meta-analysis [8,30] and the idea of effect size [7] in early 80s. The confusion with 2 × 2 tables seems to have started even earlier. At least, Cramer in 1946 [31] still correctly applied Pearson's Phi to CTs, while Cohen in 1988 [7] was already in the misconception. In fairness, this misconception seems to go back to Karl Pearson himself, who (or someone of his team) boldly calculated for all 2 × 2 tables, including GCTs. [32] In a sense, the confusion is due to the term "2 × 2 table," which makes no distinction between GCTs and CTs and facilitates the misuse of Pearson's Phi.
Despite the apparent inadequacy, the CBES concept has never been questioned, is included in all guidelines, [9][10][11] equations [2]- [4] are commonly used for calculating ES and related conversions, [28,29] and even for the correlation-based definition of the ES, [10] so it is a common misconception. The introduction begins with a flawed equation [1], [10] in which correlation related to paired samples is combined with unequal groups, that is, with unpaired statistics.
The same applies to the MCT, which is a series of related misconceptions. Typically, it looks like treating GCTs, or even unpaired BCTs, as CTs, misleadingly attributing to them the ability to assess the association of the variables, leading to misapplication of association statistics (e.
g., applying association statistics to GCTs) and independence statistics (e. g., applying Pearson's chi-square to CTs). Fundamentally, this misconception stems from three fallacies: confusion of effect and association, as discussed above for CBES; misunderstanding of the mutual relationship between GCTs and CTs as parent and child tables, leading to the belief that CTs simply arise in a single specific form; [26] and lack of understanding of the pairwise nature of association (if samples are not paired (i.e., the pairs are not intrinsically bound), they can be resorted in any order and any association is random (corresponds to a certain random order), therefore meaningless). Finally, Pearson's Phi can be calculated using equation [13] for the gross Pearson Phi, [33] which is non-directional, so it is not a measure of association that requires equation [12].
An example of the consequences of these misconceptions is shown in Figure 6. Section I presents example CTs (Table 1-3) taken from a credible source. [26] Sections II and III include the corresponding GCTs and CTs, respectively, obtained by adjusting the original tables. Only Table 1 was indeed a two-decker CT, obtained by applying two binary variables (opinions on death penalty and gun registration) to the same subject (paired samples). However, the confusion of association and effect led to the misleading conclusion that "P value is 0.0232 … suggests that there is an association … ." In fact, there was no association (φ=-0.061, Table 1-III), and the conclusion is a statistical error caused by MCT. Figure 6. An example of the misconception of 2 × 2 tables: [26]  Other examples are not CTs since count subjects, not pairs. A pseudo-two-decker design of  [34] Thus, this is a BCT ( Table 2-II) that cannot be reduced to CT since any association in the BCT is random, therefore misleading.
Finally, Table 3-I is a typical BCT with unequal groups that technically cannot be converted to CT, so its pseudo-two-decker design is simply anecdotal. All examples misuse the significancebased association inference.
In addition, in the Table 1 example, Pearson's chi-square was misapplied to the CT, so that the reported p-value of 0.0232 (χ 2 =5.15) is incorrect, and the actual p-value is 0.0013 (χ 2 =10.283). Table 5 Table 5. Significance and association error due to misconception of contingency tables ( Figure   3II). The example in Figure 6 shows that the MCT is threatening, because much of the findings obtained from CTs can be misleading. The misconception is widespread: the idea of CT is typically explained using GCTs; [11,26,35] BCTs are misleadingly referred to as CTs; [17,26] CTs are often (mostly?) pseudo CTs; [26] and even true CTs are still misused in terms of significance testing [26] and association measure. [33] With all of the above, there seems to be no publication on CTs unaffected by the misconception and therefore not misleading.
Given the above, the following changes should be made to the corpus of statistics:  The CBES concept should be abolished as misleading.
 New definitions for BCTs, GCTs and CTs should be adopted, the term "2 × 2 table" should be avoided as confusing.   All meta-analyses based on CBES should be revised.
 All findings and conclusions based on CTs should be revised.
 The relevant chapters in statistical and meta-analysis manuals should be revised.

CONCLUSIONS
This article exposes two common misconceptions in statistics, CBES that has been around for over 70 years and remains unnoticed, and MCT, that casts doubt on the reliability of the statistical foundations of EBM in general. If the statistical foundations are corrupted, then the problems of EBM are deeper than it is believed, because they are not limited to the misuse of statistics but extends to the bad statistics itself. However, this can be a problem and a solution at the same time, as many of the EBM problems can actually be caused by incorrect statistics and resolved by fixing these flaws. That is why we urgently need to revise the statistical foundations of EBM. This article completely revisits the correlation and effect size problems in binary data and corrects all their shortcomings.