22q11.2DS cohort and clinical variables
This study involved a well-characterized cohort of adults with a typical 22q11.2 microdeletion ascertained from a specialized 22q11.2DS clinic in Toronto, Canada. Typical 22q11.2 deletions were identified through standard clinical laboratory methods [13,24] and precise 22q11.2 deletion extents confirmed using genome sequencing data (see Additional file 1: Table S1 for details).
To be included, participants had to have at least one recorded circulating lipid level of TC, TG, LDLC, and/or HDLC (Additional File 1: Figure S1), obtained from routine clinical bloodwork assessments. Measurements were taken predominantly in the non-fasting but not post-prandial state, as this was most feasible for this patient population [12]. For most individuals we used their most recent bloodwork. LDLC levels were calculated using the Friedewald equation. However, in cases where LDLC levels were unavailable due to high TG levels that result in an inaccurate estimation by Friedewald equation [25], we used records of LDLC levels at other time points when available. No LDLC levels were calculated using the Friedewald equation when TG levels were >4.52 mmol/L, consistent with previous lipid genetics studies [9,26].
Additionally, we assessed other traits known to influence lipid levels or genetic background, including sex, age, BMI, T2D, psychotic illness [27,28], and ancestry. T2D was defined as having a hemoglobin A1c value ≥6.5% and/or diagnosed with T2D as indicated by medical records. We defined “psychotic” as individuals diagnosed with schizophrenia or schizoaffective disorder; all other individuals were deemed “non-psychotic”. European versus non-European ancestry was assigned using principal component analysis (PCA) of common genetic variants (Additional file 1: Figure S2), which showed complete concordance with pedigree-derived information.
For details on genome sequencing methods and variant annotation, see Additional file 2: Supplementary Methods.
Polygenic risk score analyses
We used lipid PRSs that were previously constructed in a UK Biobank study [29] (PGS Catalog publication ID: PGP000263), with a development sample of 391,124 European individuals using the penalized regression (bigstatsr) method. Genotype positions and effect sizes for the TG (PGS001979), HDLC (PGS001954), LDLC (PGS001933), and TC (PGS001895) PRSs were retrieved from the PGS catalog [30] (Additional file 1: Table S3). Individual-level PRSs for the study cohort were calculated using PRSice-2 following QC (Additional File 1: Figure S3).
We tested for associations between PRSs and their corresponding lipid level using linear regression in 1) a univariable model and 2) a multivariable model that adjusted for other key phenotypic variables, batch (TCAG vs IBBC cohort and sequencing platform), and the first four principal components (PC) of ancestry.
1) lipid level ~ lipid PRS
2) lipid level ~ lipid PRS + sex* + age + BMI + T2D* + psychotic illness* + cohort + sequencing platform + PC1–PC4
*binary variable
Binary variables were coded as 0 or 1 and all values were standardized using the scale() function in R to produce standardized beta coefficients. For regression analyses, TG levels were natural log transformed to approximate a normal distribution, as done previously [8,9,31] (Additional File 1: Figure S4). For individuals on statins, LDLC and TC levels were divided by 0.7 and 0.8, respectively, to adjust for the cholesterol-lowering effects of these medications, as done previously [8,9,32]. The variance in lipid level explained by each multivariable model was measured using the multiple R2 metric. The variance in lipid levels explained by the PRS variable alone in a multivariable model (i.e., ΔR2) was calculated as the difference in the multiple R2 between the multivariable model when including the PRS variable (full model) versus without the PRS variable (covariate only model). Additionally, we tested for an interaction between TG-PRS x BMI by adding this interaction variable to a model that included TG-PRS and BMI as other independent variables and to the multivariable model (2) (Additional file 1: Table S4).
Receiver operating characteristic (ROC) curve analyses
Given the elevated baseline risk for mild-moderate HTG for individuals with 22q11.2DS, we constructed ROC curves to classify mild-moderate HTG status based on logistic regression models using TG-PRS, sex, and BMI as predictor variables, independently or in various combinations (TG-PRS+BMI, TG-PRS+sex, BMI+sex, TG-PRS+BMI+sex). Logistic regression models were implemented using the glm() function in R and all visualizations and analyses related ROC curves were done using the R package “pROC” [33]. Delong’s test for two correlated ROC curves was used to test for the difference between the area under the curve (AUC) of two ROC curves and the optimal sensitivity and specificity of each ROC curve was determined using Youden’s J statistic. Confidence intervals for AUCs were calculated using 2000 bootstrap replicates.
Rare variant analyses
To prioritize variants for assessment of clinical relevance with respect to their relationship to causing extreme lipid levels (i.e., high TG, LDLC, HDLC, and low HDLC), we restricted to variants affecting protein coding or splicing regions that are (1) very rare (gnomAD PopMax filtering allele frequency<0.2%), (2) loss of function (LoF) or predicted damaging missense, and (3) within genes relevant to lipid levels that are part of a targeted next generation sequencing (NGS) panel (n=33 candidate genes) used at a specialized genetics clinic for lipid metabolism disorders in London, Ontario [34] (Additional file 1: Table S5). Prioritized rare variants were then assessed using the American College of Medical Genetics and Genomics (ACMG) variant interpretation guidelines [35] or LDLR-specific guidelines developed by ClinGen [36]. For further details on variant prioritization, see Additional File 2: Supplementary methods.
Additionally, we sought to assess whether being a carrier of a rare variant, including those with potentially smaller effect sizes that are not considered pathogenic/likely pathogenic per ACMG criteria, would be associated with altered lipid levels (Additional file 1: Table S5). An association between “rare variant carrier status” and lipid levels was assessed using the same univariable and multivariable linear regression models as for PRS analyses, but with the rare variant carrier status variable in place of the PRS variable. For additional details on the filtering criteria for rare variants for this analysis, see Additional File 2: Supplementary Methods.
All statistical analyses were performed using R version 4.0.3. Statistical significance was defined as p<0.05. P-values were not adjusted for multiple testing.