Comparison of Cardinality Matching and Propensity Score Matching for Causal Inference in Observational Research

Background: Cardinality matching (CM), a novel matching technique, nds the largest matched sample meeting prespecied balance criteria thereby overcoming limitations of propensity score matching (PSM) associated with limited covariate overlap, which are especially pronounced in studies with small sample sizes. The current study compares CM and PSM in terms of post-match sample size, covariate balance and residual confounding at progressively smaller sample sizes. Methods: To evaluate CM and PSM within a comparative cohort study of new users of angiotensin-converting enzyme inhibitor (ACEI) and thiazide or thiazide-like diuretic monotherapy identied from a U.S. insurance claims database. Candidate covariates included patient demographics, and all observed prior conditions, drug exposures and procedures. Propensity scores were calculated using LASSO regression, and candidate covariates with non-zero beta coecients in the propensity model were dened as matching covariates for use in CM. One-to-one matching was performed using progressively tighter parameter settings. Covariate balance was assessed using standardized mean differences. Hazard ratios were estimated using unconditional Cox models for negative control outcomes perceived as unassociated with treatment (e.g., true hazard ratio of 1). Residual confounding was assessed using the expected systematic error of the empirical null distribution of negative control effect estimates compared to the ground truth. Analyses were repeated within 10%, 1% and 0.5% subsample groups. Results: A total of 172,117 patients (ACEI: 129,078; thiazide: 43,039) met the study criteria. Compared to PSM, CM was associated with increased sample retention except for analyses failing to converge to a matched sample. Although PSM achieved balance across all matching covariates within the full study population, substantial matching covariate imbalance was observed within the 1% and 0.5% subsample groups. Meanwhile, CM achieved matching covariate balance across all analyses. PSM was associated with better candidate covariate balance within the full study population. Otherwise, both matching techniques achieved comparable candidate covariate balance and expected systematic error. Conclusion: CM found the largest matched sample meeting prespecied


Background
Randomization tends to produce comparable study groups in terms of both observed and unobserved covariates in controlled experimentation. Unfortunately, random assignment of treatment is conspicuously absent from observational studies 1 . In the absence of randomization, differences in covariate distributions between study groups may prevent valid statistical inference from data 2 . As such, a key component in the design of observational studies includes addressing the presence of confounding covariates to reduce study bias using statistical methods such as matching 3,4,5 .
While propensity score matching (PSM) is the most ubiquitous matching technique for causal inference in observational research, the technique is subject to limitations. First, PSM is susceptible to substantial bias, large variance in estimates and poor sample retention in studies with limited overlap of covariate distributions between study groups 4,6,7 . Second, due to limited degrees of freedom, restrictions on the number of matching covariates used may be necessary to avoid model over-parameterization and over tting 8 . These limitations are especially pronounced in studies with small sample sizes.
A novel matching method, cardinality matching (CM), uses recent advancements in integer programming to nd the largest matched sample meeting a set of prespeci ed balance criteria 4 . For instance, CM solves for the optimal (i.e., largest) matched sample subject to investigator-de ned constraints on the maximum standardized mean difference of covariates between study groups. By matching directly on the original covariates rather than propensity scores, CM handles issues of limited overlap of covariate distributions and maximizes sample size retention while meeting covariate balance criteria 4 .
The current study compares the performance of CM and PSM in an observational study of new users of angiotensin converting enzyme inhibitor (ACEI) vs. thiazide or thiazide-like diuretic monotherapy. Both matching techniques are evaluated in terms of post-match sample size, candidate and matching covariate balance and residual confounding at progressively smaller sample sizes and more stringent parameter settings.

Study design and data source
We conducted a retrospective comparative new-user cohort study in the IBM® MarketScan® Commercial Claims and Encounters Database (CCAE), which primarily consists of de-identi ed, patient-level health data from over 142 million individuals enrolled in employer-sponsored health insurance plans in the United States. The CCAE database includes adjudicated health insurance claims (inpatient, outpatient, and prescription) and enrollment data from large employers and health plans who provide private insurance coverage. Data were standardized to the Observational Health and Data Sciences and

Study population
We identi ed new users of ACEI and thiazide or thiazide-like diuretic monotherapy between October 1, 2015 and January 1, 2017. For each patient, we de ned the index as the date of rst drug exposure.
The study was limited to patients with a minimum of 365 days of continuous enrollment in the database prior to index. We required patients to have a recorded diagnosis for hypertension at or within 365 days prior to index (see Supplemental Appendix A for a list of codes used to query the database). As described in Suchard et al., new users were de ned as patients whose rst observed treatment for hypertension was ACEI or thiazide or thiazide-like diuretic monotherapy 10 . Patients with exposure to any other active ingredient listed within the ve primary drug classes for the treatment of hypertension in the 2017 American College of Cardiology/American Heart Association (ACC/AHA) guidelines (i.e., ACEI, thiazide or thiazide-like diuretics, angiotensin receptor blockers, dihydropyridine calcium channel blockers, nondihydropyridine calcium channel blockers) within 365 days prior or 7 days post-index were excluded 10,11 .

Example outcome of interest
We examined the safety outcome of angioedema, which was identi ed from diagnoses recorded on inpatient and emergency room healthcare claim records. Patients with a recorded diagnosis for angioedema at or within any time prior to index were excluded from the study.

Time-at-risk
The time-at-risk window was de ned based on the intention-to-treat principle, and patients were followed from day 1 post-index to the earliest of July 31, 2019 or end of continuous observation in the database 12 . Analyses were limited to patients with a minimum time-at-risk of 1 day.

Patient demographic and clinical characteristics
We measured patient demographics at index including age, grouped into categories in 5-year increments; sex; and index year and month. Patient clinical characteristics included all observed condition, drug exposure, measurement and observation codes occurring within a long-term or short-term window (i.e., at or within 365 or 30 days prior to index, respectively). Furthermore, we measured all observed drug exposures occurring within the time-at-risk window. All drug exposures were grouped at both the ingredient-level and according to the Anatomical Therapeutic Chemical (ATC) classi cation system. Patient comorbidities were measured using the Charlson Comorbidity Index (CCI) 13 . Finally, we measured the following disease severity and risk scores: Diabetes Complications Severity Index (DCSI), CHADS 2 score, and CHA 2 DS 2 -VASc score 14,15,16 . The CCI, DCSI, CHADS 2 score and CHA 2 DS 2 -VASc score were measured based on all observed conditions occurring prior to the end of the time-at-risk window.
Large-scale propensity score matching Candidate covariates were de ned as all aforementioned patient demographic and clinical characteristics, and heuristic feature selection was used to identify candidate covariates with a frequency greater than 0.1%. We developed propensity models using LASSO regression with 10-fold cross-validation for hyperparameter tuning including all candidate covariates identi ed through heuristic feature selection, and propensity scores were calculated using the propensity model 17 . New users of ACEI and thiazide or thiazide-like diuretic monotherapy were matched at a 1:1 ratio using greedy matching enforcing a caliper of 0.10 and 0.20 of the pooled standard deviation of the logit of propensity scores in two separate analyses. To facilitate comparisons between CM and PSM, we de ned matching covariates as candidate covariates with non-zero beta coe cients in the propensity model.

Cardinality matching
Heuristic feature selection of candidate covariates was performed as previously described with one notable exception: due to memory constraints associated with CM, in analyses using the full study population, the heuristic feature selection used a frequency threshold of 2% instead of 0.1%. Speci cally, CM failed to converge to a matched sample due to insu cient memory while attempting to match on approximately 220 million data points (172,117 patients and 1,237 matching covariates). The frequency threshold used within all subsample group analyses was consistent between CM and PSM.
Matching covariates -covariates used in the CM -were empirically selected; propensity scores were estimated as previously described and matching covariates were de ned as candidate covariates with non-zero beta coe cients in the LASSO propensity model. CM utilizes advancements in optimization algorithms to solve for the largest sample size meeting prespeci ed balance criteria (e.g., maximum standardized mean difference [SMD] of matching covariates) 4 . We performed CM using the following prespeci ed balance criteria in four separate analyses: exact marginal distributional balance (i.e., ne balance; SMD = 0) and maximum SMD of 0.01, 0.05 and 0.10 of matching covariates between study groups.
All analyses were performed using an Amazon Web Services (AWS) Virtual Private Cloud (VPCx) m4.4xlarge Elastic Compute Cloud (EC2) instance. This instance included 16

Evaluation of post-match covariate balance
The performance of CM and PSM were compared in terms of post-match covariate balance. In order to determine the level of balance achieved within covariates indirectly and directly adjusted during matching, candidate and matching covariate balance, respectively, were assessed separately. SMDs, as de ned by Rosenbaum et. al (see Eq. 1), were used to assess the post-match balance of candidate and matching covariates; speci cally, SMD = (x̄t reatment -x̄c omparator ) / s p where x̄t reatment and x̄c omparator represent the post-match covariate mean of treatment and comparator group, respectively, and s p represents the pre-match covariate pooled standard deviation 19 . An absolute SMD less than 0.10 was considered balanced.

Evaluation of post-match residual confounding
Residual study bias due to unmeasured potential confounders and systematic error may still exist subsequent to CM or PSM 20,21 . To quantify the magnitude of residual study bias, we included a total of 105 negative control outcomes in our experiment believed to be caused by neither ACEIs nor thiazide or thiazide-like diuretics, which, therefore, have a true hazard ratio equal to 1 20 . These negative control outcomes were identi ed through a data-rich algorithm and manual clinical review (see Supplemental Appendix B for a list of negative control outcomes used in the current study) 22 . Hazard ratios were estimated for negative control outcomes as well as the example outcome of interest (angioedema) using unconditional Cox proportional hazards models in the matched samples Comparing the estimated hazard ratios of the negative control outcomes to the ground truth (of no effect) provides insight into residual study bias. We assume the observed log hazard ratio (^θ i ) depends on the log of the true effect size (θ i ), which is assumed to be 0, plus a systematic error component (β i ), and let τ i denote the standard error corresponding to θ i . Furthermore, we assume β i to be distributed following a normal distribution with parameters µ and σ 2 , which we estimate using the observed estimates (i.e., ^θ i ) of negative control outcomes 20 . In summary, we assume: θ i ∼ N (θ i + βi, τ 2 i ), and To summarize the empirical null distribution into a single measure we computed the expected systematic error (ESE), de ned as the expected absolute systematic error based on the estimated null distribution parameters: Given a nite number of negative control outcomes and uncertainty in estimated hazard ratios due to limited sample size, the distribution parameters and, therefore, the ESE come with uncertainty, which we quanti ed using Markov-Chain Monte Carlo and expressed as 95% credible intervals.

Analyses of angioedema outcome
Unconditional Cox proportional hazards models were used to compare the safety outcome of angioedema between study groups in the matched samples. All hazard ratio (HR) estimates, 95% con dence intervals (CI) and p-values were calibrated to incorporate the uncertainty expressed in the empirical null distribution of negative control outcomes 20,23 . We considered a two-sided calibrated p-value < 0.05 to be statistically signi cant. For reference, we further examined uncalibrated effect estimates.

Analyses of subsample groups
All aforementioned analyses, with the exception of analyses of the angioedema outcome, were repeated in a series of progressively smaller subsample groups, including a 10%, 1% and 0.5% subsample group. The 10%, 1% and 0.5% subsample groups included 5, 50 and 100 subsample draws, respectively. Each subsample draw was performed by random sampling without replacement from the study population strati ed by study comparison group.
Within each subsample draw, candidate covariates were de ned as all aforementioned patient demographics and clinical characteristics observed within the respective subsample draw, and ltered using the aforementioned frequency thresholds. Propensity scores were estimated within each draw as previously described, and matching covariates were de ned as those candidate covariates with non-zero beta coe cients in the propensity model for that draw. As such, a distinct set of candidate covariates and matching covariates were identi ed for each subsample draw; however, within individual subsample draws, candidate and matching covariates were consistent between CM and PSM.
For each subsample group, we assessed the average post-match sample size of their respective subsample draws. Meanwhile, candidate and matching covariate balance were assessed based on the SMD of covariates across all subsample draws within each subsample group considered jointly. Hazard ratios for negative control outcomes were estimated independently within each subsample draw using unconditional Cox proportional hazards models. Residual confounding was assessed based on the ESE of the empirical null distribution of negative control outcomes, which was derived from hazard ratio estimates considered jointly across all subsample draws within each subsample group. Analyses of the angioedema outcome were not performed across subsample groups due to insu cient occurrence of the outcome.

Results
Post-match sample size The average post-match sample size across all analyses is shown in Fig. 1. In the full study population, CM failed to converge to an optimal solution while requiring ne balance of matching covariates but was able to match every patient in the thiazide or thiazide-like diuretic group to a patient in the ACEI group (matched sample size = 86,078) at all other prespeci ed balance criteria. The use of more stringent balance criteria and a tighter caliper was associated with a slight reduction in post-match patient retention in CM and PSM, respectively, within subsample group analyses. With the exception of CM requiring ne balance of matching covariates, CM was associated with greater sample size retention as compared to PSM.

Post-match matching covariate balance
In the full study population, 1,237 matching covariates were identi ed by LASSO regression for analyses using PSM. Due to memory constraints associated with CM at larger sample sizes, the frequency threshold of heuristic feature selection used to limit candidate covariates considered during LASSO regression was increased from 0.   The SMD of candidate covariates before matching and across all analyses is shown in Fig. 3. Overall, candidate covariate imbalance was negatively correlated with sample size. In the full study population, no imbalanced covariates were observed post-PSM (see Supplemental Appendix D). Similarly, PSM was associated with a small, albeit non-signi cant, improvement in the average SMD of candidate covariates in the full study population as compared to CM (see Supplemental Appendix E). Nevertheless, comparable improvements in candidate covariate balance were achieved by both matching techniques within each subsample group.
Post-match residual confounding The expected systematic error (ESE) prior to matching and subsequent to CM and PSM within the full study population and each subsample group is shown in Fig. 4 As compared to the pre-match sample, both matching techniques were associated with a substantial decrease in ESE. Furthermore, the post-match reduction in ESE was most pronounced in analyses with smaller sample sizes. Speci cally, CM and PSM were associated with a signi cant decrease in ESE relative to the pre-match sample across most analyses within the 1% and 0.5% subsample groups (e.g.,

Analyses of angioedema outcome
Results from analyses of the safety outcome of angioedema between new users of ACEI vs. thiazide and thiazide-like monotherapy within the full study population are presented in Fig. 5. As compared to thiazide or thiazide-like monotherapy, ACEI monotherapy was found to be associated with a signi cant increase in the risk of angioedema across all analyses (calibrated p < 0.05), and calibrated HR estimates did not signi cantly differ between CM and PSM. Furthermore, CM was associated with a slight decrease in the standard error of calibrated HR estimates relative to PSM. Similar trends were observed among uncalibrated effect estimates.

Discussion
In this applied comparison of CM and PSM among new users of ACEI vs. thiazide and thiazide-like diuretic monotherapy, CM found the largest matched sample meeting prespeci ed balance criteria. The performance of both matching techniques was assessed at progressively smaller sample sizes. While both matching techniques achieved similar candidate covariate balance, CM was associated with improved matching covariate balance in analyses with smaller sample sizes. Furthermore, CM was associated with improved patient retention as compared to PSM translating to slight improvements in the precision of effect estimates. Finally, CM and PSM were associated with similar improvements in residual confounding, which was assessed based on the ESE of empirical null distribution of negative control outcomes.
Prior literature comparing CM and PSM is limited. In a study examining the impact of earthquakes on electoral outcomes in Chile, Visconti et al. describe the performance of both matching techniques. Before matching, the study included a total of 172 observations. As compared to PSM, CM was associated with a decrease in both post-match sample size (108 vs. 154) and, as evidenced by a SMD greater than 0.10, matching covariate imbalance (0 vs. 13 out of 18 imbalanced matching covariates) 4 . Similarly, in a Monte Carlo simulation study, Resa et al. found CM to systematically select the largest sample size meeting a set of prespeci ed balance criteria 24 .
Consistent with prior literature, as evidenced by a SMD less than 0.10, CM achieved balance for all matching covariates across all analyses. While PSM achieved balance of all matching covariates in analyses with larger sample sizes (e.g., full study population and 10% subsample groups), the matching technique was associated with substantial matching covariate imbalance in analyses with smaller sample sizes (e.g., the 1% and 0.5% subsample groups). Furthermore, CM was associated with improved sample retention across all analyses with the exception of ne balance within the study population, which failed to converge to an optimal solution, indicating that the achievement of prespeci ed balance criteria was not mutually exclusive to superior sample size retention.
Both candidate covariate imbalance and ESE were negatively correlated with sample size. As compared to the pre-match sample, improvements in candidate covariate balance were achieved with either matching technique; however, PSM achieved better candidate covariate balance in analyses with larger sample sizes. That being said, it is important to note that fewer matching covariates were used with CM as compared to PSM (717 vs. 1,237) in analyses within the full study population due to memory limitations associated with CM. The pre-match ESE of the full study population was signi cantly higher as compared to the 1% and 0.5% subsample groups indicating increased baseline residual confounding at smaller sample sizes. Reductions in residual confounding were comparable between matching techniques and especially pronounced in analyses with smaller sample sizes (e.g., the 1% and 0.5% subsample groups) as evidenced by a signi cant post-match decrease in the ESE relative to the prematch sample. These ndings may indicate both matching techniques are comparable in reducing residual confounding stemming from imbalances in unmeasured or otherwise unadjusted covariates.
The current study also found pre-match ESE to be signi cantly higher within analyses of smaller sample size. This nding may indicate the presence of an additional source of bias within the pre-match sample. Speci cally, the increase in systemic bias may be due to a failure to meet the normality assumption on the likelihood distribution of Cox proportional hazards models at smaller sample sizes prior to matching. Additional research is necessary to explore this hypothesis.
Calibrated hazard ratio estimates were similar in direction and magnitude across all analyses within the study population indicating ACEI monotherapy was associated with a signi cant increase in the risk of angioedema as compared to thiazide or thiazide-like monotherapy. However, as compared to PSM, CM was associated with a slight reduction in the standard error of estimates. Similar trends were observed among uncalibrated analyses. The increased precision of effect estimates may be due to the improved sample retention observed with CM.

Limitations
The current study was subject to limitations. First, due to memory constraints, the identi cation of matching covariates through LASSO regression within the full study population was limited to covariates with a minimum frequency of 2% for CM and 0.1% for PSM. As such, the performance of CM as compared to PSM in addressing potential confounding within studies of large sample sizes may have been underestimated. Nevertheless, this highlights practical limitations of CM in large-scale studies associated with limitations in computing power; CM failed to converge due to memory constraints using a dataset containing approximately 220 million data points (172,117 observations and 1,237 matching covariates) but successfully converged using a dataset containing approximately 120 million data points (172,117 observations and 717 matching covariates). These practical limitations may be overcome with access to more powerful computing resources.
Second, the use of negative control experiments limited analyses to subsample groups with a pre-match sample size su cient to ensure the observation of negative control outcomes after matching. The current study addressed this limitation by considering the joint results of analyses across multiple subsample draws within subsample groups, and the smallest subsample group contained 860 patients. Nevertheless, the relative performance of CM may be improved in studies with even smaller sample sizes, which are more likely to suffer from issues of limited covariate overlap and potential model overparameterization.

Conclusion
The current study compared the performance of CM and PSM in terms of post-match sample size, covariate balance and residual confounding. CM found the largest matched sample meeting prespeci ed balance criteria thereby achieving superior sample retention and, in analyses with smaller sample sizes, improved matching covariate balance as compared with PSM. Candidate covariate balance and residual bias were comparable between matching techniques. These ndings support the use of CM as an alternative to PSM for causal inference in observational research with small sample sizes where balance on a speci c set of matching covariates is desired. Further research is necessary to compare the performance of CM and PSM in studies where empirical covariate selection may not be possible due to limited sample size or availability of data.