Patient data came from the Pediatric Eczema Elective Registry (PEER), which consisted of children who were diagnosed with atopic dermatitis 9–11. Eligibility for the study included being between 2 and 17 years of age at enrollment and using pimecrolimus cream 1% for at least 6 weeks in the half year prior to enrollment. Every 6 months for up to 10 years, information regarding disease severity and medication use was obtained based on self-report. A subset of the cohort provided DNA samples, from which the KRT6A gene was sequenced. The study was approved by the University of Pennsylvania Institutional Review Board. All DNA samples were obtained after written informed consent was obtained, and all data were stripped of personal identifiers. Further, all methods were performed in accordance with the relevant guidelines and regulations. The set of KRT6A SNPs that we consider are based on the a priori decision to analyze variants with minor allele frequency of at least 5%.
The primary outcome of disease persistence (yes/no) was measured for each participant at every six-month survey. Subjects were deemed to not have persistent AD symptoms if they reported complete disease control in the previous 6 months and were no longer using prescription creams or ointments. Multiple measurements of the outcome of interest were available over time for each subject; the number of visits varied between 1 and 21. We used a Bayesian generalized linear mixed model (GLMM) to study the associations between SNPs of the KRT6A gene and AD persistence over time 12,13. We included random effects in the GLMM to account for correlations between measurements on the same individual and to allow subject-specific trends. The model was used to assess the persistence of AD over the follow-up time with larger odds ratio (OR) values suggestive of lower disease persistence when the variant is present. These models were fit for the entire sample (N = 740) and then stratified by race, White (N = 393) and Black (N = 326), using the brms package in R 12,14.
Specifically, for the full sample, the outcome for subject i at visit number j is modeled according to the following generalized linear mixed model with a logit link:
\(logit P\left({Y}_{ij}=1\right)= {\beta }_{0}+{b}_{0i}+{\beta }_{1}j+{b}_{1i}j+{\beta }_{2}\text{v}\text{a}\text{r}\text{i}\text{a}\text{n}\text{t}\) where \({\beta }_{0}, {\beta }_{1}, {and \beta }_{2},\) are the intercept and regression coefficients for the fixed effects of visit number, and the variant, respectively, and \({b}_{0i}\) and \({b}_{1i}\) are the random intercept and random slope for visit number, each of which is distributed as \(N\left(0,{\sigma }^{2}\right)\). For an adjusted analysis, we next fit a model that also includes the covariates age of onset, sex, and race. While age of onset and sex are not considered true confounders since they do not affect the genetic variation in individuals, we include these since they may explain some of the variability in the outcome. For the analyses stratified by race, we include age of onset and sex in the model.
For a Bayesian approach, we place priors on the fixed effect coefficients as well as the standard deviations and correlations of the random effects. Each \(\beta\) is given a \(N\left({\text{0,10}}^{2}\right)\) prior, which is a weakly informative prior to reflect absence of prior information about the effects of the variant considered. The standard deviation of subject-level effects \({\sigma }^{2}\) is given a half student-t prior with 3 degrees of freedom and a scale parameter determined by the standard deviation of the response variable after the link transformation. This choice of prior is weakly informative so that its influence is small but provides regularization for the purpose of convergence and sampling efficiency. Correlations between the random effects are given a LKJ(1) prior, so that all correlation matrices are equally likely. Our choices of noninformative priors minimize the impact of the prior relative to the data on resulting estimates. The Bayesian framework has been previously employed in transethnic meta-analysis of genomewide association studies 15,16.
Prior to the analysis of individual SNPs, all SNPs were evaluated to determine if they were correlated. Those with high correlation, based on pairwise squared correlation coefficient (\({R}^{2}\)) of at least \(0.70\), were combined as a single analytic unit. Table 1 gives the pairwise \({R}^{2}\) values between all the SNPs. Specifically, we see that rs177079, rs298118, rs298121, rs298122, and rs1054122 have pairwise \({R}^{2}\) values close to 1. We refer to this set as Group 1. We similarly define Groups 2, 3, and 4; the SNPs in each are provided in Table 2. Based on these groups, we create 4 composite variables that each indicate the presence of any of the SNPs in the group. For instance, the value of Composite 1 is 1 if the subject is observed to have at least one of SNPs in Group 1 and is 0 otherwise. There were three SNPs (rs376545, rs711317, rs3858631) that did not fall into any of the four groups in the sense that their pairwise correlations with all other SNPs considered had \({R}^{2}\) values less than 0.70. We then employed the Bayesian GLMM model to assess the association between each composite variable and the heal outcome.