Multinomial propensity score for ternary exposure for genetic study

Background: Propensity score (PS) is a popular method for reducing multiple confounding effects in observational studies. It is applicable mainly for situations wherein the exposure/treatment of interest is dichotomous and the PS can be estimated through logistic regression. However, multinomial exposures with 3 or more levels are not rare, e.g., when considering genetic variants, such as single nucleotide polymorphisms (SNPs), which have 3 levels (aa/aA/AA), as an exposure. Conventional PS is inapplicable for this situation unless the 3 levels are collapsed into 2 classes first. Methods: A simulation study was conducted to compare the performance of the proposed multinomial propensity score (MPS) method under various contrast codings and approaches, including regression adjustment and matching. Results : MPS methods had more reasonable type I error rate than the non-MPS methods, of which the latter could be as high as 30~50%. Compared with MPS-direct adjusted methods, MPS-matched cohort methods have better power but larger type I error rate. Performance of contrast codings depend on the selection of MPS models. Conclusions : In general, two combinations had relatively better performance in our simulation of ternary exposure: MPS-matched cohort method with Helmert contrast and MPS-direct adjusted regression with treatment contrasts. Compared with the latter, the former had better power but larger type I error rate as a trade-off.


Background
Propensity score (PS) approach is a useful and popular method in epidemiology for mitigating confounding effects in observational studies [1]. It was first proposed by Rosenbaum and Rubin to infer cause and effect from observational studies [2]. A PS is estimated through a conditional probability wherein a particular treatment is assigned given a vector of observed covariates. Adjustment for the PS is sufficient to remove bias due to all observed covariates.
An observational study attempts to estimate the effects of a treatment or exposure (X) on the outcomes (Y) of subjects who are not assigned at random to the treatment or control group [3]. In such a study, the adjustment for confounders (Zs) that are associated with X and Y is crucial. Otherwise, inference can be severely biased.
Common ways for adjustment include matching, stratification, and covariate-adjusted regression [4]. However, when potential confounders are numerous, the above approaches may be implausible. After stratifying on many covariates, some strata will have insufficient cases for analysis even for datasets with large samples. For covariate-adjusted regression, adjustment with excessive covariates or by applying a variable selection scheme can result in severe over-dispersion on the inference [5]. In such cases, PS is considered to be an efficient way to reduce the dimensions of covariates for valid inference. It adjusts for the effects of multiple confounders to estimate treatment effects and balances covariates (such as age, gender, or population principal components) such that the treatment and control groups are comparable [1].
For a binary exposure X, the logistic regression model is often used to estimate the true PS by regressing X on observed covariates. Several ways to adjust for covariates when estimating the effects of treatment on outcomes by using PS include matching, stratification, direct adjustment in regression, and inverse probability treatment weighing [6]. Ali et al. reported that in studies using PS to control for confounding, matching on the PS is the most common approach (68.9%), followed by PS direct adjustment (20.9%); together, such studies comprise more than 90% of the applications involving PS [7]. Therefore, in this work, we focused on the categories of the MPS-direct adjusted and MPS-matched methods.
Most of the applications of PS have been in binary exposure. However, multinomial exposures with 3 or more levels are not rare, e.g., smoking can have 3 statuses, namely, "never", "current", and "ever-smoking". In genetic studies, specific genetic variants, such as single nucleotide polymorphisms (SNPs), which have 3 levels (aa/aA/AA), can also be considered as an exposure. Conventional PS is inapplicable in this situation unless some levels are collapsed first. Yoshida et. al. (2019) considered an extended propensity score for exposure X with J+1 levels [8]. They suggested using PS vector ′ = ( 0 … ) with 1 probability of assignment for each level where = P(X = j|covariate Z′s) for j ∊ {0,1, … , J} and used simulation to study the influence of the proposed ternary PS trimming methods for the inverse probability weighting approach. Using the same PS vectors, Wang et al. [9] proposed an application of stratification on the multiple propensity score to dose-response relationships in drug safety studies, and Wang et al, [10] generalized the pair-patching scheme to a trio-matching via defining a new distance among subjects from 3 treatment groups and discuss the choice of optimal caliper. Another alternative for multiple treatments is the generalized boosted model, a machine learning approach that uses inverse probability weighting regression to capture complex relationships between a treatment assignment and pretreatment covariates [11].
For the application of PS in genetic association studies, Jiang and Zhang (2011) suggested using nonparametric techniques to obtain PS while adjusting for covariates, such as population stratification or environmental factors for SNPs of interest, to identify disease associations [12]. Instead of directly testing epistatic effects from numerous combinations of SNPs, Sengupta Chattopadhyaya et al. (2016) proposed using PS as a dimension-reduction tool to improve the marginal single-point association result for each SNP by accounting for the loss of heritability [13]. However, to be able to apply conventional PS, which assumes binary X for logistic regression, the 3-level SNP variable is first collapsed to 2 levels for dominant or recessive traits.
In this study, we compare 3 contrast codings for multinomial propensity scores (MPSs) to cope with genetic marker such as SNP, when it is considered as an exposure of interest. Simulation was used to assess the performances of the codings.

Methods
Let y be the outcome variable, x be the risk factor of interest, and Z = (z1…zp) be numerous covariates. Among these covariates, some are confounders of x that are associated with x and y. The purpose of studies is to investigate the association between x and y. We consider the following generalized linear model system among x, y, and Z: For binary y and binary x, the conventional procedure of PS is as follows: In the case of binary y and ternary x with numerous Zs, we considered the following 3 contrast codings to convert x into dummy variables before estimating PS.
In treatment contrast, 1 group needs to be assigned as the baseline reference, and x is decomposed into dummy variables T1 and T2 as follows: In the following example, AA (x = 2) was used as the reference group, as follows: Therefore, 1 = ( − ) and 2 = ( − ). If A is the disease allele, 1 is expected to be significant in all 3 (dominant, recessive, and additive) traits, whereas 2 is expected to be significant in the additive and recessive traits.
In Helmert contrasts, Therefore, suggests the difference between (aa, ), and that of 2 suggests the difference between and the average of Aa and . Therefore, if A is the disease allele, 1 is expected to be significant in the dominant and additive traits, whereas 2 is expected to be significant for the recessive and additive traits and has weak power for the dominant trait due to the effect size being partially offset by the cancel-out effect between AA and Aa.
Finally, we also consider a custom contrast, which we refer to as SNP contrasts, as shown below: Under the contrasts, Consequently, 1 = ( − ) and 2 = ( − ). Given A as disease allele, a recessive trait is expected to have significant 1 but nonsignificant 2 and a dominant trait is expected to have significant 2 but nonsignificant 1 , whereas an additive trait is expected to have significant 1 and 2 .
If 1 is significant but 2 is not, this situation indicates that ( ) is different from (aa, ). This situation is referred to as the recessive trait. If 2 is significant but 1 is not, this indicates that ( , ) is different from (aa). This situation is referred to as the dominant trait. On the other hand, if 1 and 2 are significant and have the same sign, this situation can be referred to as the additive trait.
For each contrast variable x1 and x2, we estimate the corresponding PSs via the stepwise logistic regression of xi on Zs as PS i = P( x i = 1 |selected s), i = 1, 2, and then fit y on (x1, x2, PS1, and PS2) along with some Zs selected through the variable selection procedure. We refer to the above procedure as the MPS method. In direct-adjustment approach, y is regressed on x1, x2, PS1, and PS2 directly. In case-control matching, since only 2 groups (case and control) needs to be balanced, the conventional matching scheme can be applied. In cohort matching with 3 exposure levels, instead of the trio-matching used by Wang et al. [10], we adopt a simpler double-pair-matching scheme which can be easily implemented in R package.
Regarding the exposure SNP marker =0/1/2 as group 1/2/3 respectively, when treatment contrast is used, a subject from group 3 was matched with one from group 1 based on PS1, and one from group2 based on PS2 separately, that together form a matched trio. Analogously when Helmert contrast is used, after a matched pair between group 1 and 2 based on PS1 being formed, their average PS2 was calculated, and then a subject from group 3 with the closest PS2 to the average was selected to form a trio.
Simulation study was conducted using R package (version 4.0.1) to assess the performance among the combination of 8 models (2 non-PS models, 3 MPS-direct adjustment models and 3 MPS-matching models as shown in Table 1) and 3 types of contrast codings (treatment, SNP and Helmert) via simulation.
Comparing the simulation setups of Yoshida et al. (2019) for the log-linear model and Lian (2003) for the logit model [5,8] reveals that Yoshida's simulation is mainly used for assessing marginal estimands. However, in our case wherein SNPs were considered as exposures, we focused on assessing its conditional effect with other SNPs and phenotypes being adjusted as covariates. Therefore, in our simulation, the exposureoutcome functions were specified, and the later setup by Lian (2003) was adopted.

Simulation setup
The data of sample size m = 100, such that the m by 1 vectors binary outcome y; ternary exposure of interest x; and 20 covariates 1 -20 , were generated by using the following linear equation system: logit (w )= • (z 1 + ⋯ + z 10 ), = 1, 2, According to the above linear system, z1-z5 are associated x only; z6-z10 are confounders that are associated with x and y; z11-z15 are associated with y only; and z16-z20 are unassociated with neither x nor y. The effect sizes re denoted by () as shown in Fig 1. The parameter setup used in this study is given by Table 1. The zis were generated from N(0,1), whereas w and y were generated from Bernoulli distribution as shown in Fig 2. The following models are the methods compared in this work. The corresponding fitting procedures are listed in Table 2.
---Insert Tables 1-2  Cochran-Armitage Trend (CAT) test with 3 weighting schemes, namely, additive, recessive, and dominant, were used [14]. CAT, which uses the Chi-square statistics of 1 degree of freedom, is considered to be more powerful than the regular Peason's Chi-square of 2 degrees of freedom [15]. Here, Model 6 was used as a reference for comparison with Models 7 and 8 because matched case-control is inefficient or may actually introduce additional cofounders and bias and is not recommended [16][17].

Assessment of performance
A total of 12 parameter setups () and 3 genetic traits (additive, recessive, and dominant) on exposure X were assumed in the simulation. For each of the 12 × 3 scenarios, n = 1000 replicates were generated. For the regression method (Models 1-7 Table 1 based on 1000 replicated samples corresponding to 1 of the parameter/trait scenario listed in Table 2. Notably, the proportion of significance is regarded as the type I error rate when the true  is 0 (no x-y association) and is regarded as power when ≠0. As a nominal Wald's statistics on , the T-statistic is supposed to follow an asymptotic standard normal distribution with variance 1; however, the variable selection procedure and the confounding effect can distort the assumption [5,18]. With STD_T > 1, the T-statistic has a heavier tail than the standard normal and therefore the null hypothesis is likely to be rejected; this situation results in a high type I error rate.
This scenario is known as over-dispersion. By contrast, STD_T < 1 is known as under-dispersion, which is likely to produce a conservative result.

Type I error comparison
Tables 3-4 list the type I error and std of T-statistics for Models 1-8 and 3 contrast codings across 12 simulated scenarios: 2 confounding (high and moderate), 2 x-y association (high and moderate), and 3 genetic traits (additive, dominant, and recessive). For each model, the contrast with the smallest type I error rate among the 3 genetic traits is bolded.
As expected, Model 1 had unacceptably large type I error rates that ranged from 30%-50% for the high-confounding scenario and from 15%-17% for the low-confounding scenario. After adjusting for the selected covariates, Among the 3 contrast codings, averaged SNP contrast had the smallest type I error, followed by Helmert contrast, and treatment contrast had the largest type I error. This situation was roughly true across all 3 genetic traits and 8 methods. The tendency shown in Table 4 for the low-confounding scenario was similar to that shown in Table   3, but with a smaller size of type I error.
Given that Models 1 and 2 had considerably larger type I error rates than other models, only MPS-Models 3-8 were compared for power. Tables 5-8  the worst power, which coincided with the conclusion that matched case-control is inefficient or may actually introduce additional cofounding and bias [16][17] and is therefore not recommended.

Power comparison
As expected, the powers of all models increased as the x-y association increased.
In general, the MPS-matched cohort (Models 7 and 8) had better power than the MPS-direct adjusted cohort (Models 3-5). In general, the power followed the order of

When confounding is absent
As a reference, Tables 9 and 10 show the results obtained with MPS when confounding variables were absent. As shown in Table 9, some zs were associated with y, but none were associated with x. As illustrated in Table 10, some zs were associated with x, but none were associated with y. In these cases, the MPS-matched cohort had lower power than the MPS-direct adjusted regression because matching might be unnecessary to reduce the sample size due to the drop in unmatched observations.

Discussion
Our results showed that in the presence of high confounding effect from multiple confounders, MPS methods had much more reasonable type I error rate than the

Conclusions
To summarize, for ternary exposure, when a strong confounding effect due to observed covariates is believed to exist, then the MPS-matched cohort method with proposed an approach of combining double-adjustment and matching in propensity score analysis [19]. Our results may suggest a future study on such combination in MPS.

Ethics approval and consent to participate
This study involved only computer simulation data, no IRB approval or consent to participate is needed.

Consent for publication
The manuscript contains no individual person's data in any form.

Availability of data and materials
R code for simulation in this study is available upon request.

Competing interests
All the authors declare no competing interest and author. All authors read and approved the final manuscript, and had agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work.