Bayesian Variable Selection For High-Dimensional Data With An Ordinal Response: Application Predicting Prognostic Risk Group In Acute Myeloid Leukemia

Background: Acute myeloid leukemia (AML) is a heterogeneous cancer of the blood, though speciﬁc recurring cytogenetic abnormalities in AML strongly are associated with attaining complete response after induction chemotherapy, remission duration, and survival. Therefore recurring cytogenetic abnormalities have been used to segregate patients into favorable, intermediate, and adverse prognostic risk groups. However, it is unclear how expression of genes is associated with these prognostic risk groups. We postulate that expression of genes monotonically associated with these prognostic risk groups may yield important insights into leukemogenesis. Therefore, in this paper we propose penalized Bayesian ordinal response models to predict prognostic risk group using gene expression data. We consider a double exponential prior, a spike-and-slab normal prior, a spike-and-slab double exponential prior, and a regression-based approach with variable inclusion indicators for modeling our high-dimensional ordinal response, prognostic risk group, and identify genes through hypothesis tests using Bayes Factor. Results: Gene expression was ascertained using Aﬀymetrix HG-U133Plus2.0 GeneChips for 97 favorable, 259 intermediate, and 97 adverse risk AML patients. When applying our penalized Bayesian ordinal response models, genes identiﬁed for model inclusion were consistent among the four diﬀerent models. Additionally, the genes included in the models were biologically plausible, as most have been previously associated with either AML or other types of cancer. Conclusion: These ﬁndings demonstrate that our proposed penalized Bayesian ordinal response models are useful for performing variable selection for high-dimensional genomic data and have the potential to identify genes relevantly associated with an ordinal phenotype.


Background
Acute myeloid leukemia (AML) is a heterogeneous disease [1]. Cytogenetics, which is the study of chromosomes, is routinely performed in bone marrow and/or blood samples of AML patients at diagnosis. In fact, specific recurring cytogenetic abnormalities in AML strongly associate with attaining complete response after induction chemotherapy, remission duration, and survival and have therefore been used to segregate patients into favorable, intermediate, and adverse prognostic risk groups [2,3]. These risk groups have been used to guide therapeutic decisions such as post-remission therapy [4], but still represent only a gross examination of the underlying molecular traits of AML patients. We postulate that elucidating the molecular characteristics associated with these prognostic risk groups would aid clinicians in developing a more precise understanding of this disease and potentially identify therapeutic targets. Although these prognostic risk groups are categorical, they are also ordered and therefore are an ordinal categorical response. We therefore seek to fit an ordinal response model to high-dimensional gene expression data, where the number of genes (p) is much greater than the sample size or number of patients (n).
Given the high-dimensional feature space that accompanies high-throughput gene expression assays, we desire an ordinal response method that performs automatic variable selection. While frequentist approaches to fitting penalized ordinal response models have been developed, problematically, these methods lack variable selection methods that are rooted in a hypothesis testing framework. For example, the least absolute shrinkage and selection operator (LASSO) approach [5] was previously extended to ordinal response models where the solution can be obtained through the Generalized Monotone Forward Stagewise Method [6] or coordinate descent [7]. The penalty parameter, λ, or analogously the final model, is commonly selected using information criteria (e.g., BIC, AIC), cross-validation, generalized cross-validation, or ideas based on Stein's unbiased estimate of risk [5] and all variables having non-zero coefficients in the final model are considered important. However, a key disadvantage is that once λ (or the final model) is selected, the parameter estimates β are conditional on that selected value. Additionally, penalized regression models result in point estimates for the model parameters but generally lack estimates of variability, prohibiting confidence interval construction and hypothesis testing. Therefore most analysts identify variables as important predictors if they have a non-zero estimate in the selected model.
To overcome these shortcomings, herein we present four Bayesian ordinal response modeling methods that can be used to identify molecular features from high-dimensional datasets with an ordinal response. Model I is based directly on the Bayesian LASSO whereas Models II, III, and IV additionally include variable inclusion indicators. All four models permit hypothesis testing through Bayes factor which provides statistical evidence of which coefficients are or are not zero. Such approaches are relevant for identifying meaningful predictors in multivariable models, that is, to guide variable selection or identify a good subset of predictors.

Results
Data Pre-processing High-throughout gene expression data from the Affymetrix HGU133Plus2.0 GeneChips for 525 adult patients with de novo AML were downloaded from Gene Expression Omnibus (GSE14468) [8]. The Affymetrix Detection Call algorithm was used to determine whether probe sets were present, marginally present, or absent in each sample. The 3 ′ :5 ′ ratio for GAPDH and the percentage of present calls for each sample was examined [9]. Subsequently, samples with any quality concerns were excluded (N=4). To obtain probe set expression summaries, we used the robust multiarray average method [10]. We restricted our penalized Bayesian ordinal response models to the 446 patients for whom prognostic risk group was available, which included 97 favorable risk, 259 intermediate risk, and 97 with adverse risk [11]. Patients with inv(16)/t(16;16), t(8;21), or t(15;17) abnormalities, regardless of any other cytogenetic abnormality, were classified as favorable risk [8]. Patients with -5/del(5q), -7del(7q), t(6;9), t(9;22), 3q26 abnormality, or those complex karyotype (that is, havinf more than 3 abnormalities) were considered adverse risk, provided they lacked inv(16)/t(16;16), t(8;21), or t(15;17) abnormalities. All others were considered intermediate risk. We also filtered the probe sets to include only the most variable probe sets as determined by quantiles of probe set level standard deviation estimates. Prior to model fitting, probe set expression summaries were centered and scaled.
Thereafter, we fit our four proposed penalized Bayesian ordinal response models for high-dimensional covariate spaces. The prior, π, represents the proportion of important genes which can either be set to a fixed constant or assigned a hyperprior. For Models II, III, and IV, we examined different priors for π j including fixing π j = 0.50, fixing π j = 0.05, and assigning the hyperprior π j ∼ Beta (1,19). Our four models were programmed using the "rjags" package in the R programming environment. Using the "dclone" package we ran three chains with 5,000 burn-in, 5,000 tuning steps, and thinned to keep every third step in the sampling process to reduce auto-correlation in our posterior samples, resulting in 9,999 saved steps. Convergence was assessed using Gelman and Rubin's potential scale reduction factor.
Evaluation procedure for identifying important genes We considered two methods for identifying important variables for our proposed Bayesian ordinal response models for high-dimensional data, both of which are based on hypothesis tests using Bayes factor. First, we wanted to determine whether β j has a non-zero effect ∀j. Since β j is continuous, it is not possible to test β j = 0 directly. Instead, we tested an interval null hypothesis [12]. Suppose ǫ is a small positive value that is close to 0. For Models I, II and III, we tested H 0j : |β j | ≤ ǫ versus H aj : |β j | > ǫ. For Model IV, we tested H 0j : |γ j β j | ≤ ǫ versus H aj : |γ j β j | > ǫ. Bayes factor, B 10 , is defined as the ratio of posterior odds to prior odds, where the prior odds = P (H1) P (H0) and the posterior odds = P (H1|Data) P (H0|Data) . The derivation of prior odds for the four models is shown in the Supplemental Material and the posterior odds was estimated empirically through MCMC posterior output [13,14].
For Models II, III and IV, a variable inclusion indicator, γ j , is included in the model. When using the Gibbs sampler to generate sequences of γ j , j = 1, . . . , p the sequences converge in distribution to the posterior distribution of γ and provide relevant information for variable selection [15]. More specifically, when γ j is frequently one in the posterior samples, its corresponding β j is non-zero and therefore x j should be included in the model. When a γ j is more often zero in the posterior samples, its corresponding x j should be excluded from the model. Therefore, some have performed variable selection criteria based on whether the posterior mean of γ j is greater than 0.50 [16]. Rather than use a threshold, we tested the hypotheses H 0j : γ j = 0 versus H aj : γ j = 1 and considered the corresponding variable to be important when H 0j is rejected. We reject H 0j if the Bayes factor exceeded a certain threshold. The prior odds is obtained through the prior specification π = t or π ∼ Beta(c, d) which is detailed for each model in the Supplemental Material. The posterior odds is estimated empirically through MCMC posterior samples. To determine an appropriate threshold for Bayes factor, others previously characterized B 10 ∈ (3, 10) to represent substantial evidence in favor of the alternative, B 10 ∈ (10, 100) to represent strong evidence in favor of the alternative, and B 10 > 100 to represent decisive evidence in favor of the alternative [17]. In our application, we rejected H 0j if B 10 > 5.
Genes associated with acute myeloid leukemia prognostic risk group When applying Bayes Factor to the models that fixed the prior at π = 0.05 and testing H 0j : γ j = 0 versus H aj : γ j = 1, Models II, III, and IV identified 9, 12, and 19 and probe sets respectively (Table 1). Similar results were obtained when using π ∼ Beta (1,19) as the prior, which identified 10, 12, and 20 probe sets, respectively. Fewer probe sets were identified when using an uninformative prior (π = 0.50), with only 5, 1, and 3 probe sets identified in Models II, III, and IV, respectively.
Probe sets identified when applying Bayes Factor to test H 0j : |β j | ≤ ǫ versus H aj : |β j | > ǫ for Models II and III or to test H 0j : |γ j β j | ≤ ǫ versus H aj : |γ j β j | > ǫ for Model IV were always a subset of those identified when applying Bayes Factor to test H 0j : γ j = 0 versus H aj : γ j = 1. Following previous work of others in the logistic regression setting [14], for all four models we let ǫ = 0.10. When testing H 0j : |β j | ≤ ǫ versus H aj : |β j | > ǫ, Model I did not identify any probe sets using Bayes Factor; in fact, the largest Bayes Factor for Model I was 1.04608. Because Model I does not include γ j , no results are available for testing H 0j : γ j = 0 versus H aj : γ j = 1. We note that when applying Bayes Factor to the β j 's, one needs to specify ǫ, which is an arbitrary choice with no suitable way of providing guidance on selecting ǫ for case applications. This threshold is not required when applying Bayes Factor to test H 0j : γ j = 0 versus H aj : γ j = 1. Therefore, we prefer the H 0j : γ j = 0 versus H aj : γ j = 1 testing approach.
An informative prior, when π j was either fixed at 0.05 or ∼ Beta (1,19), identified more features than an uninformative prior (π j = 0.50). We suspect this corresponds to previous research in the frequentist setting that observed increased power under FDR control when the proportion of truly null features is accurately estimated [18]. Here the prior π j is an analogous quantity, reflecting the proportion of truly differentially expressed features in the dataset. For application data, the proportion of truly differentially expressed features is likely small, otherwise, a high proportion of differentially expressed genes would prove lethal to the organism. To determine if fixing π j versus placing a hyperprior on π j affected run time, we compared run times among our models and across different priors. There was not a large time difference when fixing the prior π j at 0.05 versus putting a Beta hyperprior on π j ( Table 2).
We compared our results to those obtained from the BhGLM R package [19]. Somewhat similar to our Model I, BhGLM includes a function bpolr that fits a Bayesian hierarchical ordered logistic regression model using a Student-t prior on β j rather than a double exponential prior. When this default prior was used, no probe sets were identified by bpolr.
Overall, there was general consistency between the probe sets identified, with 21 unique genes corresponding to 24 probe sets included in our nine fitted models (Table 3). Because no probe sets were identified by the BhGLM R package, there is no corresponding column for BhGLM in Table 3. Ten probe sets were identified by Models II, III, and IV for at least one of the three priors examined. Many probe sets that were identified interrogate genes that have already been associated with AML, including CEBPD, PBX3, HGF, FSCN1, TNFSF10, RUNX1T1, HOXB3, TAL1, SLC40A1, and CLEC11A. CEBPD is thought to be a tumor suppressor gene, given it is commonly hypermethylated in AML and thus results in low CEBPD expression [20]. PBX3 is co-expressed with HOXA9, specifically in patients with MLL-rearranged AML, and these two genes coordinate synergistically in leukemogenesis [21]. Serum levels of HGF were higher in AML patients compared to healthy subjects, and HGF was prognostic for complete remission attainment, leukemia-free and overall survival in AML [22]. FSCN1 is upregulated in several cancers and is over-expressed in AML compared to healthy controls [23]. TNFSF10 is the gene that encodes TRAIL, a protein that induces apoptosis in tumor cells, which differed in expression levels by European Leukemia Net risk group in AML patients [24]. Further, lower levels of TRAIL conferred worse prognosis in AML patients [24]. In fact, inhibitors of histone deacetylases (HDACIs) induced expression of TNFSF10 and hence TRAIL, demonstrating the important role of this gene in HDACI therapy in AML [25]. RUNX1T1 is involved in the RUNX1-ETO fusion product which results from the recurrent t(8;21)(q22;q22) abnormality that is common in AML [26]. Hypomethylation of HOXB3 was associated with increased expression in intermediate risk AML patients [27] and plays an important regulatory role, as its over-expression inhibits FLT3 -ITD in AML patients carrying that mutation [28]. Low TAL1 expression negatively impacts hematopoietic development and results in low myeloid production and decreased colony formation from CD34+ eythroid progenitors [29]. Low levels of SLC40A1, the gene that encodes the iron exporter ferroportin, has been associated with good prognosis in AML [30]. In fact, researchers previously found that SLC40A1 had lower expression levels in patients with core-binding factor AML, who all belong to the favorable risk group [30].
CLEC11A, formerly known LSLCL with homologous protein SCGF, is thought to be involved in early hematopoiesis and was detected in immature neutrophils in patients with chronic and acute myeloid leukemia as well as other hematologic disorders [31]. Other probe sets were associated with genes that have been previously described as prognostic markers or implicated in other cancers (NKX2-3, CFD, VNN1, ST18, H1-0, SLC44A1, MSRB3, and OLIG1 ). Though the function role of ARMH1 is unclear, NCBI's Entrez Gene states that this gene is over-expressed in bone marrow, which may be particularly relevant in AML. Two probe sets consistently identified by the three models, 204961 s at and 214651 s at, no longer map to a gene when using current annotation, though the latter was intended to interrogate HOX9A which is involved in leukemogenesis [21].

Discussion
Our study differs from the initial study of this publicly available acute myeloid leukemia dataset in some fundamental ways. The initial study sought to identify genes associated with CEBPA mutation status, which tends to confer favorable risk [8]. Herein we were interested in identifying genes whose expression is predictive of prognostic risk group, a three-level ordinal response. The initial study used Affymetrix Microarray Suite 5 to obtain probe set expression summaries whereas we used the more commonly applied RMA method [10]. Further, the initial study used a frequentist method, Prediction Analysis of Microarrays [32], whereas we used Bayesian methods. We identified several genes that have previously been linked to AML or cancer. Nevertheless, we did not identify any of the 19 probe sets which mapped to 16 unique genes in the primary paper.
The state-of-the-art in AML diagnosis has dramatically changed over the last few decades [33]. In this study, prognostic risk groups were based on cytogenetics as defined by the Eastern Cooperative Oncology Group/Southwest Oncology group classification scheme rather than the European LeukemiaNet (ELN) risk stratification system, therefore known prognostic mutations such as NPM1, FLT3 -ITD, CEBPA, RUNX1, ASXL1, and TP53 were not included when defining the three ordinal classes. Since the initial study, the ELN risk stratification system was developed by consensus using an expert panel which stratified patients into four prognostic risk groups: favorable, intermediate I, intermediate II, and adverse risk [34]. An evaluation of this initial ELN standardization system in a large cohort of AML patients demonstrated these categories are associated with attainment of complete remission, disease-free survival, and overall survival in younger (< 60) and older (≥ 60) patients [35]. Due to improved genetic testing and novel discoveries regarding the importance of genetic mutations, ELN was subsequently updated and treatment decision-making guides were outlined [36]. The new ELN risk stratification system includes three ordinal levels: favorable, intermediate, and adverse. Future research to identify molecular features associated with this new ELN risk stratification system may further our understanding of AML biology and identify the prognostic relevance of molecular features.
Our penalized Bayesian ordinal response models overcome shortcomings of frequentist methods, permitting hypothesis testing through Bayes factors. Through extensive simulation studies, we previously demonstrated the superiority of Model IV, the regression-based approach with variable inclusion indicators, over two frequentist methods, ordinalgmifs and ordinalNet [37]. Others have also suggested the use of Bayesian credible intervals for variable selection [16]. Therefore, we also briefly examined the results when variables were identified as important based on equal-tailed (ET) credible intervals and Highest Posterior Density (HPD) intervals. For Models I, II and III we identified the covariates as important when their corresponding 95% equal-tailed credible or HPD intervals for β j did not include zero. For Model IV, we identified the covariates as important when their corresponding 95% equal-tailed credible or HPD intervals for γ j β j did not include zero. Using 95% equal-tailed credible or HPD intervals yielded no features for Model I, and the 95% HPD intervals identified one feature for Model IV only when the prior was fixed at π j =0.05. Somewhat similarly, for Model III, 95% equal-tailed credible intervals identified one feature when the prior fixed at π j =0.05. However, when fitting Model II, 95% equal-tailed credible intervals identified four features when the prior fixed at π j =0.50 but one feature when the prior was either fixed at π j =0.05 or took on a Beta (1,19) hyperprior. Some may postulate that a credible interval covering 0 indicates the predictor is not statistically reliable. However, we identified several genes associated with both AML and cancer when using Bayes Factor. Therefore, we suspect that credible and highest posterior density intervals for this gene expression dataset cover zero due to multicollinearity, which results in sign flipping of the coefficient estimates when collinear variables are included in the model. We further note that because applying Bayes Factor when testing β j one needs to specify ǫ which is an arbitrary choice, we prefer using Bayes Factor to test H 0j : γ j = 0 versus H aj : γ j = 1 which does not require an arbitrary threshold.
When assuming proportional odds, the effect of each independent variable is consistent across different levels of response categories, in other words, the slopes across the different levels of the response categories are parallel. We used the latent variable model which may not be appropriate for models without proportional odds [38]. We are working to develop penalized Bayesian stereotype logit model for highdimensional data, which may be better suited for modeling ordinal responses that are assessed, such as cytogenetic and ELN risk group.

Conclusions
Our penalized Bayesian ordinal response Models II, III, and IV combined with the use of Bayes Factor for testing H 0j : γ j = 0 versus H aj : γ j = 1 can be used for modeling an ordinal response in the presence of a high-dimensional covariate space, such as data from high-throughput genomic assays. These identified relevant genes in our AML application data, and do not require specification of an arbitrary threshold for testing nor do they require selection of a specific value for the penalty parameter λ, because λ is assigned a a diffuse hyperprior. Because there is similarity between resulting Models II, III, and IV but noted differences in run times, we recommend Model IV.

Methods
Prior to introducing the four penalized Bayesian ordinal response models, we briefly review the cumulative logit model and Bayesian approaches to the cumulative logit model when n > p. We then review the Bayesian LASSO for continuous and dichotomous response models, then subsequently describe our four penalized Bayesian ordinal response modeling methods for p > n scenarios. I is based directly on the Bayesian LASSO whereas Models II, III, and IV additionally include variable inclusion indicators.
Cumulative Logit Model Let x = (x 1 , x 2 ..., x p ) ′ denote a vector of observed covariates, where p is the number of predictors and each subject's response, Y , is one of K ordinal categories. Let β = (β 1 , β 2 ..., β p ) ′ denote the vector of unknown regression parameters. Assuming proportional odds, the cumulative logit model has the form: where P r(Y ≤ k|x) is the cumulative probability of the event Y ≤ k given x.

Bayesian Ordinal Regression Models
Albert & Chib (1993) discussed Bayesian analyses for a binary response and generalized the method to a multinomial response under ordered (i.e., ordinal) and unordered cases [40]. For the ordinal response, an underlying latent continuous distribution was assumed to be Z i ∼ N (β ′ x i , σ 2 ) for i = 1, ..., n, and modeled as a linear combination of covariates. The ordinal response was represented by imposing cut-offs to the continuous response and modeled using a cumulative probit model. They assigned a diffuse prior for regression parameters β and cut-offs α. They then implemented a Gibbs sampler with initial values for β and α selected to be their MLEs [41,42]. Therefore their Bayesian ordinal regression model pertained to data sets where the sample size is larger than the number of covariates. Albert (2016) later used a uniform prior for α, under the constraint α 2 ≤ ... ≤ α K−1 , where α 1 was set to zero [43]. A similar uniform prior was suggested for β. The method was applied to an example data set bioChemists in the pscl R package which included 915 observations where gender, number of children aged 5 or younger, and number of articles produced by the Ph.D. mentor during the last 3 years were used to predict number of articles produced during last 3 years of Ph.D. The response variable was categorized to be ordinal with four categories, though the cut-offs for the ordinal categories were not provided. Model fitting was achieved using the MCMCoprobit function in the MCMCpack R package, which applies a hybrid Metropolis-Hastings and Gibbs algorithm under the probit link scenario.
Existing publications on proportional odds Bayesian ordinal regression models when number of observations exceeds the number of features, i.e. p < n, have mostly employed an underlying latent continuous variable Z for outcome Y . The cut-off values α 1 , α 2 , ..., α K−1 are specified such that the ordinal outcome We note that when proportional odds are assumed, the only parameters that designate class membership are the cut-off α's.

Proposed Ordinal Bayesian Models for High-Dimensional Data Model I: Bayesian LASSO Ordinal Regression Model
The seminal LASSO paper [5] briefly mentioned that the LASSO estimate can be derived as the Bayesian posterior mode when the regression parameters β j , j = 1, ..., p, have independent double-exponential (i.e., Laplace) priors, where τ = 1/λ is the inverse of shrinkage parameter λ. Initially, the Bayesian LASSO was described for continuous [44,45,46,47] and subsequently dichotomous outcomes [14,48]. Typically, a diffuse hyperprior for λ [47,14,48] or λ 2 [44,45,46] is used, which avoids the procedure of explicitly selecting a single value for the penalty term. A common choice is a Gamma(a, b) prior with small values for a and b so that the prior is diffuse and therefore non-informative [45]. It has been reported that given a and b are small (e.g. a = 0.1, b = 0.1), the posterior distributions are not sensitive to the choices of a and b [46] though larger values of a and b have also been used [14,47]. Our first model (Model I) is a Bayesian LASSO ordinal regression model. Following Tibshirani (1996) [5], we assign an independent double-exponential (DE) prior to each β j , j = 1, ...p, and extend the model from a continuous response to an ordinal response:

Model II: Spike and Slab Normal Prior
Many Bayesian variable selection methods have been proposed in recent years. Mitchell and Beauchamp (1988) introduced the "spike and slab" prior for each regression coefficient β j , j = 1, ..., p, which is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere [49]. Instead of using a probability mass at 0, George and McCulloch (1993) assigned the following prior to each β j : where the latent variable γ j takes a value of either 0 or 1 [15]. Setting σ 2 βj to a small value leads to a small variance for β j such that β j will frequently be close to 0 when γ j = 0. Alternatively, setting s j to a large value (e.g., s j > 1) leads to a moderate or large variance for β j such that β j will frequently be non-zero when γ j = 1. Letting P (γ j = 1) = 1 − P (γ j = 0) = π j , then π j represents the prior probability that β j is non-zero, or the prior probability that x j should be included in the model. Two different priors for γ were described. One lets each γ j be independent with a Bernoulli(1, π j ) distribution, where fixing π j = 0.5 is a special case. Kohn et al (2001) discussed a more flexible approach by considering a beta hyperprior Beta(c, d) for each π j , where j = 1, ..., p [50]. The parameters c and d can be chosen to match the desired value of mean and variance for the number of parameters that enter the model, where a smaller variance indicates a more informative hyperprior for π j . When c = d = 1, π j ∼ Uniform (0, 1), such that the hyperprior for π j is completely uninformative.
Our second model (Model II) assigns a prior to each β j similar to George and McCulloch's (1993) "spike and slab" normal prior [15]. We assume where σ 2 0 and σ 2 1 are constant. We set σ 2 0 to a small value and σ 2 1 to a large value such that β j has a small variance when γ j = 0 and β j has a moderate to large variance when γ j = 1. σ 2 1 should be selected such that the prior values for each β j is within a reasonable range. The model has the following formulation: Model III: Spike and Slab LASSO Prior Yuan and Lin (2005) discovered a connection between Bayesian variable selection, which introduces the binary vector γ, and the LASSO for a normal continuous outcome by assigning the following mixture prior to β j : is the point mass distribution centered at zero and DE(τ ) has the density τ exp(−τ |x|)/2 [51]. This forces β j = 0 if γ j = 0 so the model can be re-expressed under γ as: where X γ and β γ are columns of X and rows of β with corresponding γ = 1. Unlike the more widely used prior P (γ) = π |γ| (1 − π) 1−|γ| with a prespecified π, they proposed the following prior for γ: where |γ| = γ j , j = 1, ..., p. Their proposed prior avoids highly correlated predictors from entering the model simultaneously. They selected the model corresponding to γ that maximizes P (γ|Y ).
The model selected by the LASSO algorithm was that having the highest posterior probability under this setting when w = π 1−π τ 2 √ 2π * σ 2 = 1 for a normal continuous outcome. To avoid confusion, the constant defining the ratio of a circle's circumference to its diameter is represented with π * , whereas π is used to denote the probability for the Bernoulli prior of γ. Under an orthogonal design and when π = 0.5, w = 1 is equivalent with taking λ = 8σ 2 π * and τ = λ 2σ 2 .
The spike-and-slab LASSO assigns the following prior to each β j : with λ 1 small and λ 0 large and γ j = 1 corresponding to a large β j effect and γ j = 0 corresponding to a negligible or small β j effect [53]. The spike-and-slab LASSO has been extended to generalized linear models [19] and the Cox model [54] where a Bernoulli(π) prior is assigned for each γ j with π taking on either a fixed value [53] or assigned either a Beta [53] or Uniform [54] prior for π [19]. Our third model (Model III) is an extension of Ročková and George (2018) Spikeand-Slab LASSO model [53]. We assume Letting λ 0 be a large positive constant (e.g. λ 0 = 20), when γ j = 0, β j has small variance and clusters around 0. Instead of varying λ at different values as Ročková and George (2018), we assign a Gamma prior λ ∼ Gamma(a, b). The model has the following formulation: α k ∼ Normal(0, σ 2 α k ), α 1 < α 2 < ... < α K−1 , for k = 1, 2, ..., K − 1 γ j ∼ Bernoulli(π j ), for j = 1, ..., p π j = t or π j ∼ Beta(c, d), for j = 1, ..., p Model IV: Regression Approach with Variable Inclusion Indicator Kuo and Mallick (1998) discussed one drawback of George and McCulloch's method, that they need to choose sophisticated tuning factors for the two variances, i.e. σ 2 βj and s 2 j , in the hierarchical prior for each β j [55]. Instead of specifying a hierarchical model, they specified a regression model that incorporates 2 p submodels by including an indicator vector γ. Their linear regression model has the following form: γ j β j x ij + ǫ i , for i = 1, ..., n, j = 1, ..., p.
For j = 1, ..., p, γ j is an indicator variable that takes on a value of 0 or 1. As before, when γ j = 1, x j is included in the model. When γ j = 0, x j is omitted from the model.
An independent Bernoulli prior, Bernoulli(π j ), can be assigned to each γ j , j = 1, ..., p. They fixed π j at 0.5∀j so that the likelihood prior for each of the 2 p models are the same [55]. They approximated the posterior distribution of γ by means of γ from the Markov chain Monte Carlo (MCMC) algorithm, and suggested that predictors having higher posterior variable inclusion indicator frequencies should be included in the model. Kuo and Mallick (1998) prior for β is equivalent to Geweke (1996) by letting θ j = γ j β j , for j = 1, ..., p [56]. Then the prior for θ j is a mixture of point mass at 0 with probability 1 − π j and normal distribution with probability π j . This approach has been used by others [57,52].
Our fourth model (Model IV) incorporates the Bayesian variable selection method from Kuo and Mallick (1998) by including an indicator variable γ j for each β j , j = 1, ..., p [55]. We assume each γ j follows an independent Bernoulli distribution with probability π j , where π j can be a fixed constant. Following Kohn et al. (2001), we will additionally consider a more flexible approach by considering a beta hyperprior for π j : π j ∼ Beta(c, d) [50].