Bayesian variable selection in joint modeling of longitudinal data and interval-censored failure time data

Joint modeling of longitudinal data and survival data has gained great attention in the last two decades. However, most of the existing studies have focused on right-censored survival data. In this article, we study joint analysis of longitudinal data and interval-censored survival data and conduct Bayesian variable selection in this framework. A new joint model is proposed with a shared frailty to characterize the dependence between the two types of responses, where the longitudinal response is modeled with a semiparametric linear mixed-effects submodel and the survival time is modeled by a semiparametric normal fraility probit sub-model. Several Bayesian variable selection approaches are developed by adopting Bayesian Lasso, adaptive Lasso, and spike-and-slab priors in order to simultaneously select significant covariates and estimate their effects on the two types of responses. Efficient Gibbs samplers are proposed with all unknown parameters and latent variables being sampled directly from well recognized full conditional distributions. Our simulation study shows that these methods perform well in both variable selection and parameter estimation. A real-life data application to joint analysis of blood cholesterol level and hypertension is provided as an illustration.


Introduction
Longitudinal data commonly arise in epidemiological, social-economic, and medical studies where participants are measured multiple times at discrete time occasions during the study, and the research interests are to estimate the mean trajectory of the response over time and to assess the effects of potential covariates.Interval-censored data also arise in such studies, but the response variable of interest is the failure time of a certain event such as a disease, or death.Since the event status is only evaluated at the examination times, the exact time of the failure event is never observed but is only known to fall within some interval formed by two adjacent examination times with failure status being changed.In this article, we study joint modeling of longitudinal data and interval-censored data by borrowing information between correlated longitudinal and survival responses particularly in the case that there are many potential covariates available for both types of responses.
Joint modeling of longitudinal and survival responses has been popular in the statistical literature in the last two decades because it provides more efficient parameter estimation than separate univariate analyses (Elashoff et al., 2008;Diggle et al., 2008;Asar et al., 2015;Chen et al., 2018).Most of the existing works on joint modeling have focused on the joint analysis of longitudinal data and right-censored survival data, where the failure time is either exactly observed or right-censored for all subjects in the study.The research on joint analysis of longitudinal data and interval-censored data is relatively sparse due to the more complex data structure but has drawn more attention in recent years.Available research works on this topic include Lee et al. (2011), Gueorguieva et al. (2012), Ye et al. (2015), Rouanet et al. (2016), Chen et al. (2018), Grand et al. (2015), Yi et al. (2020), Weng et al. (2022), andYi et al. (2022).It is worth noting that all of these approaches were developed from frequentist's perspectives, and the development of Bayesian research on this topic is clearly needed.
In real life longitudinal studies, there are usually many demographic variables and health-related characteristics available for participants when examined multiple times.Identifying significant risk factors for the responses of interest is naturally an important and challenging research topic.Variable selection methods have been developed to tackle this problem, and existing techniques include forward selection, backward elimination, stepwise selection, and model comparisons using Bayes factor or some other information criteria (Akaike, 1974;Hocking, 1976;Lee and Tang, 2006).These traditional methods are usually computationally expensive and not well suitable to deal with models with a large number of covariates.In response to that, some regularization approaches, such as the least absolute shrinkage and selection operator (Lasso), the adaptive Lasso, and the smoothly clipped absolute deviation (SCAD), have been developed for variable selection (Tibshirani, 1996;Fan and Li, 2001;Zou, 2006).In particular, Tibshirani (1996) showed that Lasso does parameter estimation and variable selection simultaneously based on the shrinkage property of the L 1 penalty and suggested from a Bayesian perspective that Lasso estimates can be interpreted as the posterior mode when the regression parameters are assigned with independent and identical Laplace (i.e., double-exponential) priors.Park and Casella (2008) developed an efficient Gibbs sampler for the Bayesian Lasso by exploiting a hierarchical representation of the full model.Leng et al. (2014) further proposed Bayesian adaptive Lasso that allowed different penalties for different regression parameters.The spikeand-slab method is also widely adopted in variable selection (Mitchell and Beauchamp, 1988;George and McCulloch, 1997;Ishwaran and Rao, 2005).O' Hara and Sillanpää (2009) also provided a comprehensive review and comparison of the above-mentioned Bayesian variable selection methods in the conventional linear regression settings.
There are only a few works in the literature on variable selection in the context of joint modeling of longitudinal data and survival data due to the great complexity in such data.For joint analysis of longitudinal and right-censored survival data, Tang et al. (2017) proposed a longitudinal sub-model with a normal mixture for the error distribution and developed a Bayesian Lasso combined with the use of penalized splines, and Xie et al. (2020) developed an adaptive group Lasso to select both time-varying and time-invariant effects in both longitudinal and survival submodels.Yi et al. (2022) is the only available work to date on variable selection in joint modeling of longitudinal and interval-censored data, and they developed Lasso and adaptive Lasso based on a Monte Carlo expectation-maximization (MCEM) algorithm in their estimation approach.All these studies adopted the frailty proportional hazards (PH) model for the survival time in their modeling.
The objective of this article is to develop efficient Bayesian variable selection approaches to simultaneously select covariates and estimate their effects in joint modeling of longitudinal and interval-censored survival data.The proposed model contains a semiparametric mixed-effects submodel for the longitudinal response, a frailty semiparametric probit submodel for the survival time (Lin and Wang, 2010;Wu and Wang, 2019), and a shared random effect to characterize the association between the longitudinal and survival responses.We explore Bayesian Lasso, adaptive Lasso, and spike-and-slab methods in this joint modeling framework in a unified manner with different prior specifications for the regression coefficients.Our proposed Gibbs samplers are computationally efficient because the full conditional distributions of all unknown parameters and latent variables take standard forms such as normal, gamma, and inverse-Gaussian distributions.
The remainder of the article is organized as follows.In Section 2, we present our proposed joint model for longitudinal and interval-censored survival responses as well as the observed likelihood.In section 3, a unified Bayesian estimation approach is developed for the joint analysis with the incorporation of variable selection techniques, resulting in four computationally efficient Gibbs samplers.Section 4 evaluates and compares the proposed methods through a simulation study, and Section 5 presents a real-life application as an illustration of the proposed methods.Some concluding remarks are given in Section 6.
2 Joint model of longitudinal and survival responses

The proposed joint model
For a typical longitudinal study, let t i1 < t i2 < • • • < t imi denote the sequence of pre-specified examination times for the ith individual.Let T i denote the failure time of interest for subject i, X i a vector of covariates for subject i, and y ij = y(t ij ) the observed value of the longitudinal response at t ij for j = 1, . . ., m i and i = 1, . . ., n.
Under the longitudinal study design, the statuses of failure are only examined at the observation times t ij 's, so T i is not directly observed but is known to fall within some interval (L i , R i ) that is formed by two closest examination times (including 0 and ∞) with changed statuses.We say that and strictly interval-censored otherwise.
We consider the following semiparametric mixed effects model for the longitudinal response, where µ(•) is the unspecified baseline mean function over time, θ θ θ is the vector of fixed covariate effects on the response, ξ i is a frailty to introduce within-subject association among the longitudinal measurements for subject i, u i is a frailty shared by both the longitudinal and survival responses for subject i, and ϵ ij 's are independently and identically distributed random errors.It is assumed that , and all these frailties and random errors are mutually independent.
The following semiparametric normal frailty probit model is assumed for the event time T i conditional on the frailty u i , where F (•|X i , u i ) is the conditional cumulative distribution function (CDF) of T i given covariates X i and frailty u i , Φ(•) is the CDF of a standard normal distribution, β β β is the vector of covariate effects on the event time, and α(•) is an unspecified increasing function with lim t→0 α(t) = −∞ and lim t→+∞ α(t) = +∞.We assume that the longitudinal and survival responses are conditionally independent given the shared random effect.The shared random effect induces statistical dependence between the longitudinal response and the survival response.The statistical association between y ij and T i under the proposed joint model can be summarized in terms of Kendall's τ in the following simple explicit form, where ρ is the Pearson's correlation coefficient between longitudinal response y ij and the transformed failure time α(T i ) taking the following form .
The expression of Kentall's τ in (3) is obtained based on the relationship between Pearson's correlation coefficient and Kentall's τ under bivariate normal distribution (Kruskal, 1958;Hougaard, 2000).This statistical association in terms of Kendall's τ is easy to calculate with a simple explicit form.The expression in (3) also suggests that the statistical association does not depend on any covariates but rather depends on the variances of the frailties and random errors involved in the joint model.Kendall's τ in equation ( 3) is always negative under the proposed joint model, and this means that larger values of the longitudinal response correspond to smaller values of the survival time, i.e., a higher risk of survival event.If the statistical association is in an opposite direction in a real life study, one can modify model (2) by simply replacing u i with −u i for the application.
Under the proposed joint model and the aforementioned assumptions, the observed likelihood takes the following form where g(y ij |x i , u i , ξ i ) is the normal density function of y ij given covariates X i and frailties u i and ξ i based on equation (1), and ϕ(•) is the probability density function of the standard normal distribution.The unknown parameters involved in this likelihood include two baseline functions µ(•) and α(•), two sets of regression parameters θ θ θ and β β β, and three variance parameters for the frailties and random errors.Since the integrals in this observed likelihood do not have a closed form, estimation based on this likelihood directly is challenging from both frequentist's and Bayesian perspectives.

Modeling unspecified functions with splines
Another complication in the estimation is that the two unknown baseline functions µ(•) and α(•) are both infinitely dimensional.To tackle this problem, spline methods are adopted so that one only needs to estimate a finite number of parameters for these two functions.Specifically, we use a linear combination of M-splines to approximate µ(•) as follows, where M h 's are M-spline basis functions (Ramsay, 1988) and η h 's are unconstrained spline coefficients.The values of the M-spline basis functions M h 's depend on the chosen knots and degree d.The knots are typically an increasing sequence within the data range.The degree d can be 1, 2, or 3, which corresponds to linear, quadratic, or cubic M-spline basis functions.Let J denote the number interior knots, and the number of M-spline basis functions is determined as K = J + d + 1.We approximate α(•) in a similar manner.Since α is non-decreasing, following Lin and Wang (2010) we adopt the following monotone spline expression for modeling α(•), where b k 's are I-spline basis functions (Ramsay, 1988), which are integrated functions of corresponding M-splines, r k 's are the spline coefficients that are constrained to be nonnegative, and r 0 is an unconstrained intercept.These I-spline basis functions are nondecreasing piecewise polynomials, taking 0 in the first stage, increasing up to 1 in the second stage, and staying in plateau of 1 in the third stage.The number of I-spline basis functions K * is determined by the number of interior knots J * and the degree d * for the associated M-spline functions, i.e., K * = J * + d * + 1.The construction and evaluation of M-spline and I-spline functions can be performed directly using R package splines2 (Wang and Yan, 2021).
To ensure enough flexibility, one can use many knots in specifying the splines for modeling an unknown function.A potential problem of using a large number of knots is overfitting in addition to the computation burden.To tackle these problems, shrinkage priors can be used for the spline coefficients to penalize large values and shrink the coefficients of the unnecessary basis functions towards zero (Lin and Wang, 2010;Wang and Dunson, 2011).
3 Data augmentation and prior specification

Data augmentation
The complicated form of the observed likelihood make it a great challenge for posterior computation no matter what priors are adopted for the parameters.To develop an efficient Bayesian estimation approach, we first propose a two-stage data augmentation that leads to a complete data likelihood with a simple and nice form for posterior computation.
In the first stage, we consider the following conditional likelihood given u i 's and ξ i 's, (5) The second-stage of our data augmentation is motivated by Lin and Wang (2010) to simplify the likelihood contribution from the interval-censored survival data in the observed likelihood (4).For convenience, let δ i1 , δ i2 , and δ i3 be the censoring indicators of T i being left-, interval-, and right-censored, respectively.Since only one type of censoring occurs for each subject, there is a constraint δ i1 + δ i2 + δ i3 = 1 for each i.With these notations, we introduce a normal latent variable subject to a constraint z i ∈ C i for subject i, where t i = R i 1 (δi1=1) +L i 1 (δi1=0) , and the constrained space C i takes the form of (0, ∞) if Based on this data augmentation, the complete data likelihood given all z i 's, u i 's and ξ i 's takes the following form (6) It is straightforward to show that integrating out all the z i 's in the complete likelihood (6) leads to the conditional likelihood (5).This complete data likelihood (6) has a nice form of a multiplication of the normal density functions and allows us to develop a computationally efficient Bayesian estimation approach.

Review of several Bayesian variable selection priors
Since this article considers the case that there are many potential covariates for both types of responses, we would like to develop Bayesian variable selection techniques in this framework by adopting different priors for the two sets of regression parameters.Here we provide a brief review of several Bayesian variable selection priors for regression parameters in the conventional linear model setting.The linear model specifies y = 1 n µ + Xβ β β + ϵ, where y is an n × 1 vector of responses, 1 n is an n × 1 vector with all elements being 1, X is an n × p matrix of regressors, and ϵ ϵ ϵ is an n × 1 vector of independent and identically distributed normal errors with mean 0 and variance σ 2 .
The Lasso estimator of the regression parameters β β β can be viewed as the minimizers of the following penalized least squares with L 1 penalty (Tibshirani, 1996), where λ is a positive parameter that controls the degree of penalty.This Lasso estimator can also be viewed as the posterior mode in a Bayesian framework when a double-exponential prior is assigned for the regression parameters β β β. Park and Casella (2008) proposed Bayesian Lasso using a conditional Laplace prior specification given and developed a Gibbs sampler for the Bayesian Lasso.Their Gibbs sampler exploited the representation of the Laplace distribution as a scale mixture of normals with an exponential mixing density (Andrews and Mallows, 1974), This eventually leads to the following hierarchical representation of the Bayesian Lasso, where E(a) denotes an Exponential distribution with rate parameter a.
Bayesian adaptive Lasso is a generalization of Bayesian Lasso by allowing different penalties to be used for different regression parameters (Zou, 2006), and specifically, the adaptive Lasso estimator is the minimizer of the following penalized function where λ j is a positive penalty parameter associated with β j for j = 1, . . ., p.The hierarchical structure of the Bayesian adaptive Lasso prior (Leng et al., 2014) is The spike-and-slab prior has also been widely used for variable selection (Mitchell and Beauchamp, 1988;George and McCulloch, 1997;Ishwaran and Rao, 2005).The spike component concentrates its mass near zero, and allows to shrink small effects towards zero.On the other hand, the slab component has its mass spread over a wide range of plausible values and can help identify nonzero parameters.
The spike-and-slab prior is usually accompanied with a stochastic search variable selection process.First, indicator variables ∆ = (∆ 1 , ..., ∆ p ) are used to indicate the allocation of a regression coefficient to the slab component or the spike component.
Here we define ∆ j = 1 if β j is from the slab component.The variance of the normal spike prior σ 2 spike is taken to be much smaller than that of the normal slab prior variance σ 2 slab .The allocation probability P (∆ j = 1) = ω j is specified to follow a Beta distribution with shape parameters a ω and b ω .Accordingly, the spike-and-slab prior has the following hierarchical structure where Ber and B denote the Bernoulli and Beta distributions, respectively.

Bayesian variable selection priors for regression parameters
Now we apply the Bayesian Lasso priors, Bayesian adaptive Lasso priors, and the spike-and-slab priors to both sets of regression parameters β β β and θ θ θ under the proposed models for longitudinal and interval-censored data.Specifically, the Bayesian Lasso priors for β β β and θ θ θ have the following hierarchical structures where Ga(a ϵ , b ϵ ) denotes the gamma distribution with shape parameter a ϵ and rate parameter b ϵ .The Bayesian adaptive Lasso priors for β β β and θ θ θ have the following hierarchical structures, The spike-and-slab priors for β β β and θ θ θ have the following hierarchical structure For the purpose of comparison, we also consider a basic model using a general multivariate normal prior for β β β and θ θ θ,

Prior specifications for other parameters
In addition to the above prior specifications for the regression parameters β β β and θ θ θ, we assign a Gamma prior Ga(a u , b u ) for σ −2 u , a Gamma prior Ga(a ξ , b ξ ) for σ −2 ξ , a multivariate normal prior N K (η η η 0 , Σ Σ Σ η ) for η η η, a normal prior N (m 0 , ν −1 0 ) for r 0 , and independent exponential priors Exp(ρ) for {r k } K * k=1 .The adoption of independent exponential priors for {r k } ′ s serves to shrink small coefficients to zero and helps prevent over-fitting problems due to the use of many knots in the monotone spline specification (Lin and Wang, 2010;Wang and Dunson, 2011).To allow the data to inform the degree of shrinkage, we treat ρ as random and assign a Gamma prior Ga(a ρ , b ρ ) for ρ.

Proposed Gibbs samplers
Gibbs samplers can be derived based on the complete data likelihood (6) and the priors specified in Sections 3.3 and 3.4.Below we present the Gibbs sampler associated with the Bayesian Lasso priors and refer it as Bayesian Lasso Gibbs sampler.The proposed Gibbs samplers developed based on other prior specifications in Section 3.3 are similar to the Bayesian Lasso Gibbs sampler but with some modifications. where , where 1 2 and scale parameter λ 2 s for j = 1, . . ., p.
), and 12. Sample u i from The proposed Gibbs sampler based on the Bayesian adaptive Lasso prior in Section 3.3 modifies the above Bayesian Lasso Gibbs sampler by replacing Steps 6, 7, 9, and 10 with the following steps.6a.Sample λ s,j from Ga(1 1 2 , λ 2 l,j ) for j = 1, . . ., p.
The proposed Gibbs sampler based on the multivariate normal priors for β β β and θ θ θ is referred to as General Bayesian Gibbs sampler, which replaces Steps 5-10 of the Bayesian Lasso Gibbs sampler with the following two steps.
It is appealing that the full conditional distributions of all the parameters and latent variables are well recognized distributions with explicit forms in each of the proposed Gibbs samplers.Thus, sampling these unknowns are straightforward, and no black-box or any complicated Metropolis steps are required.These properties make proposed Gibbs samplers easy to implement and computationally efficient.
Here 0 n and 1 n are n dimensional vectors with all values equal to zero and one, respectively.These two scenarios represent regular and sparse cases of important covariates with a total number of covariates p = 10 and 30, respectively.Equal number of covariates were generated from N (0, 1) and Bernoulli(0.5)for both scenarios.With all these specifications, the longitudinal responses and the failure times were then generated from the longitudinal model ( 1) and the survival model (2), respectively.The observed interval for the failure time of each subject was determined by the two adjacent observation times (i.e., t ij 's together with 0 and ∞) that contained the failure time.For each parameter setting, 500 data sets were generated, each with a sample size n = 500.
For the spline specifications in modeling µ(•) and α(•), we chose cubic basis functions (i.e d = d * = 3) and used 3 interior knots for each set of M-splines, i.e., J = J * = 3.The interior knots were specified as the quantiles of the observation times t ij 's for modeling µ and the quantiles of the end points of the observed intervals (L i , R i ]'s excluding 0 and ∞ for modeling α.These specifications result in 7 basis functions for each set, i.e., K = K * = 7.We adopted the following prior specifications for the unknown parameters, a u = b u = 0.1 leading to a Ga(0.1,0.1)prior for σ −2 u , a ξ = b ξ = 0.1 leading to a Ga(0.1,0.1)prior for σ −2 ξ , m 0 = -6 and v 0 = 1 leading to a N (−6, 1) prior for r 0 , a ρ = b ρ = 1 leading to a Ga(1,1) prior for ρ, a multivariate normal prior N p (η η η 0 = 0 k , Σ Σ Σ η = 100I K ) for η η η = (η 1 , ...η k ) ′ , a ϵ = b ϵ = 0.1 leading to a Ga(0.1,0.1)prior for σ −2 ϵ , a λ = 1 and b λ = 2 leading to a Ga(1,2) prior for all the Lasso parameters, and a ω = b ω = 1 leading to a B(1, 1) prior for all the allocation probabilities.The σ 2 spike and σ 2 slab were taken to be 0.01 and 100, respectively.We also used a multivariate normal prior N p (0 p , 100I p ) for both β β β and θ θ θ in the more basic model.
We ran each of the proposed Gibbs samplers with a chain of 5,000 iterations, and fast convergence of the MCMC was observed for all of our Gibbs samplers in the simulation.Statistical summaries such as the posterior means and quantiles were obtained based on the 4,000 iterations of MCMC after discarding the first 1,000 iterations as a burn-in for each Gibbs sampler and for each data set.
Tables 1 and 2 display BIAS, the difference between the average of the 500 posterior means and the true value, RMSE, the root mean squared errors based on the 500 point estimates, and CP95, the coverage probability of 95% credible intervals, for parameters based on the proposed Bayesian approaches across the settings in terms of σ 2 u and σ 2 ϵ .As seen in the two tables, all of our proposed methods work well with small bias and the 95% coverage probabilities being close to the nominal value 0.95 for all parameters.The Bayesian Lasso, Bayesian adaptive Lasso, and spike-and-slab methods produce smaller RMSEs for most of the parameters across the setups than the general Bayesian method, especially for the parameters of the unimportant covariates (i.e, those with their corresponding coefficients equal to 0).Among these four methods, the spike-and-slab method has the smallest RMSEs for the parameters of the unimportant covariates.
We also compare the performance of these four Bayesian methods from variable selection point of view.For each method, a regression parameter is labeled as "negative" if its 95% credible interval contains 0 and as "positive" otherwise.Tables 3 and  4 present the results of variable selection accuracy in terms of average size (aver.size):average number of regression parameters labelled as "positive"; true negatives (TN): average number of regression parameters labelled as "negative" correctly; and false negatives (FN): average number of regression parameters labelled as "negative" incorrectly.The true average size and true negative are 5 and 5, respectively in Scenario I and are 10 and 20, respectively in Scenario II.Several conclusions can be drawn from the two tables.First, all the methods have similar variable selection performance across the parameter settings in terms of σ 2 u and σ 2 ϵ .Second, the three variable selection methods (BL, BAL, and SS) perform better than GB in general with average size and number of true negatives closer to the true values.Third, SS performs best among all methods in Scenario I but performs worst in the survival submodel in Scenario II.Overall, the three variable section methods are comparable based on our simulation study.

Real-life data application
Initiated in the 1970s, the Aerobic Center Longitudinal Study (ACLS) aimed to investigate health outcomes associated with physical activity and cardiorespiratory fitness (Sui et al., 2013) at the Cooper Clinic in Dallas, Texas.Patients visited the private clinic periodically and received extensive medical examinations during each visit.More information about this study can be found in Sui et al. (2007) and Sui et al. (2017).
Here we consider the ACLS data from the 13,827 patients who had at least two clinic visits recorded between 1974 and 2006 and focus on a joint analysis of blood cholesterol level and the time to hypertension.The total cholesterol levels were measured in millimoles per litre (mmol/L) in this study, and hypertension was defined as a systolic blood pressure of 140 mmHg or higher, a diastolic blood pressure of 90 mmHg or higher, or physician diagnosis of hypertension (Sui et al., 2017).Cholesterol level was measured for all patients at each visit and thus has a longitudinal data structure.Blood presure was also measured at each visit, and the time to hypertension has a structure of interval-censored data.The left, interval, and right censoring rates for the time to hypertension are 29.1%,24.0%, and 46.9%, respectively.
Twenty-five covariates are considered in our analysis, and we are interested in variable selection and parameter estimation in this joint analysis.These covariates include age and body mass index (BMI) at entry, gender (1 for female and 0 for male), parental history of high blood pressure, diabetes, and premature coronary heart disease, depression, anxiety, smoking status, and heavy drinking (1 for yes and 0 otherwise), marital status (1 for being married and 0 otherwise), and physical activity status (1 for being inactive and 0 otherwise).The two continuous covariates are standardized before fitting the model.For the splines specifications, we choose degree to be 3 and take 3 interior knots placed at the quartiles for both sets of M-splines.Bayesian Lasso, Bayesian adaptive Lasso, spike-and-slab, and the general Bayesian methods proposed in Section 4 are implemented for the variable selection in the context of joint modeling.The same prior specifications are adopted as in the simulation study in Section 5. A total of 15,000 iterations were run for each of our Gibbs samplers, and the first 5,000 iterations were discarded as a burn-in.Statistical summaries were calculated based on the latter 10,000 iterations of the chains for each Gibbs sampler.
Tables 5 and 6 present the estimated covariate effects as well as their 95% credible intervals on the longitudinal and survival responses, respectively.As seen in these two tables, the four methods yield very similar results, although the general Bayesian method seems to give slightly wider 95% credible intervals than the three other methods.For the longitudinal analysis, females in this study were more likely to have higher levels of cholesterol than their male counterparts.Also, patients who reported heavy drinking, chest pain or pressure, no leisure-time physical activity, being currently unmarried and parental history of coronary heart disease, whose initial triglyceride was higher, and who were enrolled at an older age were more likely to have higher levels of cholesterol.For the survival analysis, those who were males, had higher levels of initial BMI, reported chest pain or pressure, rapid or irregular heartbeats, heavy drinking, being currently unmarried, no leisure-time physical activity, and parental history of high blood pressure and of premature coronary heart disease, and were enrolled at a younger age were more likely to develop hypertension earlier in their life.
Other specifications of the monotone splines in the proposed methods were also investigated.Table 7 summarizes the estimation results of the variance parameters and Kendall's τ when using different numbers of interior knots.As seen clearly in Table 7, the estimation results are very close across the numbers of interior knots for all the proposed methods.This indicates that the proposed methods are robust to the monotone spline specifications, and this phenomenon was also observed in other works on monotone splines in literature (Ramsay, 1988;Wang and Dunson, 2011;Wang et al., 2016).

Discussion
Although Bayesian Lasso method developed by Park and Casella (2008) has been used in various models such as linear regression models (Hans, 2009;Lykou and Ntzoufras, 2013) and semiparametric structural equation models (Guo et al., 2012), there is a general dearth of work on it in the context of joint analysis of longitudinal and survival data.In this article, we develop several Bayesian variable selection methods including Bayesian Lasso in joint analysis of longitudinal and interval-censored data.The proposed Gibbs samplers are computationally efficient because all of the full conditionals are standard distributions and are easy to sample from, based on our well calibrated data augmentation techniques.The proposed methods are shown to have excellent estimation performance as well as good variable selection performance in our simulation study.
The proposed methods can be extended to more advanced models.For example, one may incorporate more random effects terms or time-varying covariates in the longitudinal submodel.One may also relax the normal frailty assumption in the joint model by allowing a nonparameteric distribution for the shared frailty.Future research will be devoted to the exploration of such extensions.

Table 3
Simulation results in simulation Scenario I: 10 coeffients with 5 zeros.Summarized results include average size (aver.size):average number of coefficients labelled as "positive", true negatives (TN): average number of coefficients labelled as "negative" correctly, and false negatives (FN): average number of coefficients labelled as "negative" incorrectly, for all parameter configurations for the longitudinal and survival submodels.

Table 4
Simulation results for simulation Scenario II: 30 coefficients with 20 zeros.Summarized results include average size (aver.size),true negatives (TN), and false negatives (FN) for all parameter configurations for the longitudinal and survival submodels.