A multi-trait epigenome-wide association study identified DNA methylation signature of inflammation among people with HIV

Inflammation underlies many conditions causing excess morbidity and mortality among people with HIV (PWH). A handful of single-trait epigenome-wide association studies (EWAS) have suggested that inflammation is associated with DNA methylation (DNAm) among PWH. Multi-trait EWAS may further improve statistical power and reveal pathways in common between different inflammatory markers. We conducted single-trait EWAS of three inflammatory markers (soluble CD14, D-dimers, and interleukin 6) in the Veteran Aging Cohort Study (n = 920). The study population was all male PWH with an average age of 51 years, and 82.3% self-reported as Black. We then applied two multi-trait EWAS methods—CPASSOC and OmniTest—to combine single-trait EWAS results. CPASSOC and OmniTest identified 189 and 157 inflammation-associated DNAm sites respectively, of which 112 overlapped. Among the identified sites, 56% were not significant in any single-trait EWAS. Top sites were mapped to inflammation-related genes including IFITM1, PARP9 and STAT1. These genes were significantly enriched in pathways such as “type I interferon signaling” and “immune response to virus”. We demonstrate that multi-trait EWAS can improve the discovery of inflammation-associated DNAm sites, genes, and pathways. These DNAm sites suggest molecular mechanisms in response to inflammation associated with HIV and might hold the key to addressing persistent inflammation in PWH.


Introduction
Antiretroviral therapy (ART) has successfully transformed human immunode ciency virus (HIV) infection from a once-terminal condition into a manageable chronic disease 1 .However, people with HIV (PWH) have an increased risk of a wide range of comorbidities compared to individuals of the same age without HIV infection 2 .This could be due to the persistent in ammation associated with HIV even in PWH who are virologically suppressed on ART.Chronic HIV infection elevates levels of speci c in ammatory biomarkers such as interleukin-6 (IL-6), soluble cluster of differentiation 14 (sCD14) and D-dimer, which have been shown to correlate with excess comorbidity risk among PWH with or without ART [3][4][5] .However, the underlying mechanisms remain largely unknown.
DNA methylation (DNAm) is an epigenetic mechanism that can regulate gene expression without altering the DNA sequence by adding a methyl group, usually at the c5 position of a cytosine that precedes a guanine nucleotide (CpG site).Evidence from current epigenome-wide association studies (EWAS) suggests that DNAm may play an important role in the pathophysiology of in ammation in the general population 6- 9 .The genetic and environmental determinants of in ammation could be very different for PWH and could modify the association between in ammation and DNAm 10,11 , thus ndings in the general population may not be applicable to PWH.To date, only one EWAS of sCD14 in 1,075 PWH has been conducted and identi ed 118 CpG sites, 10 of which predicted PWH's survival time among PWH 12 .One proposed hypothesis behind the association between in ammation and DNAm is that DNAm captures cumulative exposures and physiological responses to in ammation that lead to the variation in disease outcomes in PWH.An alternative hypothesis suggests that DNAm changes in response to HIV infection elevate and maintain the levels of in ammatory markers observed in PWH.The absence of clarity highlights the need for a more extensive and comprehensive study to better understand the DNAm signature of in ammation in PWH.
Many in ammatory markers may share a common physiological pathway and their levels could be intercorrelated.For example, sCD14 can indirectly trigger the release of IL-6 13 .Positive correlations between sCD14 level and D-dimer and between IL-6 and D-dimer have also been reported 14,15 .Nevertheless, current EWAS usually only study one trait at a time, missing the opportunity to systematically incorporate information from multiple available phenotypes.Empirical evidence from genome-wide association studies (GWAS) has suggested that searching for variants associated with multiple traits could improve statistical power 16,17 .A few applications of such multi-trait analysis in EWAS have also demonstrated a similar bene t 18, 19 .
We hypothesized that a multi-trait EWAS that jointly analyzes multiple in ammatory markers would enhance statistical power in detecting in ammation-associated DNAm.This approach also aimed to uncover epigenetic pathways shared among various in ammatory markers.In this study, we applied two methods originally developed for GWAS to explore the DNAm signature of three in ammatory markers (IL-6, sCD14 and D-dimers) simultaneously in a cohort of 920 PWH.We identi ed multiple novel CpG sites that were mapped to in ammation-related genes.Further functional analyses revealed that these genes were enriched in the immune response to viral infection.

Study population
We utilized data from the Veterans Aging Cohort Study (VACS), a study that prospectively enrolled veterans with and without HIV at Department of Veterans Affairs medical centers across the United States, matched on age, race/ethnicity, sex, and geographic region 20 .In 2005-2007, 1,525 PWH and 843 people without HIV from the original VACS cohort consented to provide whole blood samples for the VACS Biomarker Cohort (VACS-BC) 21 .The specimens were collected using serum separator and EDTA blood collection tubes, and stored at the central repository at the Massachusetts Veterans Epidemiology Research and Information Center in Boston, Massachusetts 21 .

In ammatory markers (independent variables)
Whole blood samples collected from VACS-BC participants were used to measure levels of in ammatory markers including IL-6, sCD14 and Ddimer.IL-6 was measured using the QuantiGlo® IL-6 immunoassay, sCD14 was measured by the Quantikine sCD14 Immunoassay and D-dimer was measured using the Liatest D-DI immunoturbidometric assay.Details of the measurement of these markers can be found in a previous publication 21 .For each marker, raw values that fell outside of the 3 standard deviation boundaries were de ned as outliers.Individuals with an outlier value in at least one marker were removed from analyses (n outlier =10).

DNA methylation data (dependent variables)
Blood samples of a subset of participants were selected for genome-wide CpG site pro ling using the Illumina In nium Human Methylation 450K BeadChip 22 .We refer to this subset as VACS-450K (n = 740).Subsequently, due to the availability of a newer-generation DNA methylation array and additional funding, another subset of participants' blood samples were selected for genome-wide CpG site pro ling using the Illumina Methylation EPIC (850K) BeadChip 23 .We refer to this subset as VACS-850K (n = 553).VACS-450K and VACS-850K are mutually exclusive, and the assignment of VACS participants to the two DNA methylation platforms was arbitrary 24 .Associations of DNAm with several outcomes have been consistently replicated in the VACS-450K and VACS-850K groups 12,24 .
Details of DNAm pro ling and quality control have been previously described 10,12,24 .Brie y, DNA samples extracted from whole blood were bisul te-converted, genome-wide ampli ed, fragmented, puri ed, hybridized to the arrays, uorescently stained, and scanned to assess for uorescence intensities following standard procedures.Intensities were then imported in R for quality control, quantile normalization, and batch correction using the min R package 25 .DNAm beta values were generated for each CpG site from the processed intensities.CpG sites from both platforms were mapped to Genome Research Consortium human build 37.After all procedures, there were DNAm values for 412,583 CpG sites in VACS-450K, 797,676 CpG sites in the VACS-850K, and 374,642 CpG sites common to both platforms.

Additional covariates
Demographic and clinical information including age, sex, race/ethnicity, HIV viral load, use of medications, chronic health conditions and opportunistic infections closest to the date of blood sample collection were obtained from the electronic health record.HIV viral load was dichotomized as less than 75 or more than 75 copies/ml.Hepatitis B (HBV) and hepatitis C (HCV) infections were both dichotomized as antibody positive or negative.Smoking and alcohol use were obtained from the VACS survey completed by participants closest to the date of blood collection 26,27 .Six main cell type proportions in whole blood (CD4 + T cells, CD8 + T cells, natural killer T cells, B cells, monocytes and granulocytes) were estimated using cell-type speci c CpG sites from a reference panel implemented in the R min package 28 .

Final analytical dataset
In the current study, we only included male PWH with complete information for all three in ammatory markers, DNAm, and the additional covariates in VACS-450K (n = 469) and VACS-850K (n = 451), forming the nal analytical dataset (n = 920).

EWAS of single in ammatory markers
Linear mixed effect regression models were utilized to assess the associations between levels of each in ammatory marker and DNAm at individual CpG sites separately in VACS-450K and VACS-850K.Chip IDs were included as a random effect to account for potential batch effects.The regression models were adjusted for age, race, HIV viral load, BMI, HCV, HBV, diabetes, any ART medication, antihypertensive medication usage, cigarette use, alcohol use and calculated cell-type proportions as xed effects.Summary statistics including beta-coe cients (Beta), standard errors (SE), t-statistics, and p-values were obtained from the linear mixed effect regression models.
In an EWAS of one in ammatory marker within one platform, signi cance was de ned as a Bonferroni corrected p-value less than 0.05.For the 374,642 CpG sites shared between VACS-450K and VACS-850K, meta-analyses were conducted using the inverse variance-based approach in METAL software 29 .Signi cance in the meta-analyses was de ned as having a p-value less than 0.05 after Bonferroni correction for 374,642 testing (nominal p-value < 1.33×10 − 7 ).

Multi-trait EWAS
Overall work ow of the multi-trait EWAS is summarized in Fig. 1.We considered two approaches for multi-trait EWAS.The rst one is the test for homogeneous genetic effect across phenotypes in the cross-phenotype association test (CPASSOC), which is essentially a xed-effect meta-analysis method that can account for the correlations induced by correlated traits measured on the same individuals 30,31 .CPASSOC was originally developed for GWAS data, and it was suggested to use a set of genetic variants not in linkage disequilibrium (LD) to estimate the correlation structure between phenotypes.In our application of CPASSOC to DNAm data, Pearson correlations between in ammatory markers were used to estimate the correlation structure.The CPASSOC package was downloaded from http://hal.case.edu/~xxz10/zhu-web//.
The second approach is an omnibus test (OmniTest) that uses the aggregated Cauchy association test combining p-values from multiple tests, including the minimum of p-values approach (MinP), generalized Berk-Jones test (GBJ), and the generalized higher criticism test (GHC) 32 .MinP accounts for the correlation structures among multiple phenotypes by computing a p-value adjusted for correlated tests (P ACT ), which is the probability of observing at least one p-value as small as the minimal p-value under the null hypothesis of no association, assuming that the tstatistics for three in ammatory markers follow an asymptotic normality 33 .Both GBJ and GHC were originally designed to test for the association between a set of genetic variants and an outcome while accounting for LD structures 34,35 .In OmniTest, the LD structures are replaced by the correlation structures among in ammatory markers estimated by Pearson correlations.
We considered these two methods because they showed similar performance in a previous study 32 , and we believe they could be complimentary to each other for epigenome-wide discovery.We applied these two approaches to three sets of summary statistics from EWAS of single in ammatory markers, separately in VACS-450K and VACS-850K.A meta-analysis by Fisher's method was conducted to combine pvalues from results in VACS-450K and VACS-850K.Bonferroni corrections were used to adjust for multiple testing in VACS-450K, VACS-850K and meta-analysis separately.

Associations of identi ed CpG sites with other phenotypes in the general population
To examine the difference between in ammation-associated DNAm signatures in the general population and PWH, we compared our results to an EWAS of C-reactive protein (CRP) in the general population 36 in terms of beta-coe cients and p-values.The identi ed CpG sites from multitrait EWAS in VACS were also searched in the EWAS catalog and EWAS atlas to explore their associations with different disease outcomes in the general population 37,38 .

Gene set enrichment analysis
We conducted pathway enrichment analysis based on mapped genes from identi ed CpG sites using the GOmeth function in R package missMethyl 39 .In missMethyl, signi cant enrichment of Gene Ontology (GO) terms was tested while accounting for the uneven distribution of CpG sites across the genome and CpG sites mapping to more than one gene.We then used REVIGO online software to remove functionally redundant GO terms and visualize the remaining non-redundant GO terms 40 .
There were 11, 39, and 112 overlapping CpG sites between CpG sites identi ed from CPASSOC and OmniTest in VACS-450K, VACS-850K and meta-analysis respectively (Fig. 3).The full list of identi ed CpG sites by CPASSOC and OmniTest are shown in Supplementary Table 1-3.The beta-coe cients for these CpG sites estimated from EWAS of single in ammatory markers showed high correlations (R 2 > 0.85) (Supplementary Fig. 8), suggesting that changes in methylated cells at these CpG sites are associated with concurrent increase or decrease of all three in ammatory markers.We compared our results to an EWAS of CRP in the general population 36 .One hundred out of 256 (39%) CRPassociated CpG sites in the general population were identi ed by the multi-trait EWAS in PWH (Supplementary Table 4) and their betacoe cients were highly correlated with those in single-trait EWAS in PWH (R 2 > 0.6) (Fig. 4).

Discussion
Our epigenome-wide study jointly analyzed three in ammatory markers and found a total of 295 associations between blood-based DNAm and in ammation in more than 900 male PWH from VACS.Many of these associations were signi cant in more than one single-trait EWAS, particularly IL-6 and sCD14.By leveraging information from all three in ammatory markers, we identi ed several CpG sites that were not revealed by any of the single-trait EWAS.The beta-coe cients estimated in the three single-marker EWAS were highly correlated.They were also correlated with beta-coe cients estimated from an EWAS of CRP in the general populations 36 .This suggests concurrent increase or decrease of in ammatory markers might be associated with changes of DNAm at these identi ed CpG sites.
The leading CpG sites were mapped to genes implicated in immune and antiviral response, such as STAT1 and IFITM1.Many identi ed genes harbored multiple signi cant CpG sites, suggesting that CpG methylation in these regions may be regulated concomitantly in response to in ammation in PWH.Many of the identi ed CpG sites have been reported as associated with at least one autoimmune disease such as rheumatoid arthritis, Crohn's disease, or multiple sclerosis in the general population [41][42][43][44] .The gene-set enrichment analysis based on signi cant CpG sites also con rmed the role of these genes in in ammatory processes by highlighting the defense response to virus, type-1 interferon signaling, and innate immunity pathways.
Identi ed genes including STAT1, IFITM1, IFIT1, IFIT3, MX1, OAS2-3, PARP9, PSMB8, TMEM49, EPSTI1, IFI27, IFI44L, IRF7, ISG20 and ISG15 are all interferon stimulated genes (ISGs), namely genes whose expression can be robustly up-regulated by interferon signaling 45 .Beta-coe cients for all signi cant CpG sites located in these ISGs are negative, suggesting that hypomethylation at these CpG sites at ISGs, which usually leads to their higher gene expression, is associated with in ammation.CpG sites on PARP9, MX1, IFI44L, EPSTI1, and TAP1 have been previously reported to be hypomethylated in ART-naïve PWH compared to individuals without HIV, which could be related to in ammation in response to HIV infection 46 .In our study, over half of the participants are virally suppressed, and the hypomethylation of the CpG sites at ISGs may be due to oxidative stress triggering a non-speci c response, herpesvirus reactivation, gut translocation of bacteria, or persistent viral replication 47 .
The transcription of ISGs is induced by interferon signaling through the Janus kinase (JAK)-signal transducer and activator of transcription (STAT) pathway 48 .The JAK-STAT pathway can both be triggered by and cause in ammation.In the JAK-STAT pathway, STAT1 molecules interact with other STAT proteins and MUC1 to bind to the promoter regions of other ISGs which leads to their increased expression 49,50 .The protein encoded by IFITM1, which harbored 10 signi cant CpG sites in our study, inhibits HIV replication partially through interfering with viruscell membrane fusion, viral particle infectivity, and viral protein synthesis [51][52][53] .IFIT proteins can develop a functional complex to restrict viral translation and modulate innate immune signaling 54 .Speci cally, IFIT1 molecules bind to an uncapped 5′-ppp on HIV RNA to sequester those RNA while IFIT3 proteins serve to enhance the binding function of IFIT1 molecules 54,55 .Proteins encoded by genes MX1, OAS2-3, IFI27, IRF7 and ISG20 have also been shown to inhibit HIV viral replication by diverse mechanisms [56][57][58][59] .On the other hand, proteins PARP9 and IFI44L may contribute to antiviral responses by mitigating STAT1 molecules in the JAK-STAT pathway 60,61 while ISG15 functions as a negative regulator of IFN-1 signaling to prevent overt in ammation 62 .A recent study has shown that cells treated with IFITM1-targeted Small interfering RNA (siRNA) cells had attenuated levels of STAT1 and ISG15, suggesting a mediating effect of IFITM1 protein in IFN-γ stimulated protein synthesis 63 .The interferon signaling pathways could be sustaining the persistent in ammation in PWH.Targeting DNAm at the ISGs-mediated pathways may provide new therapeutic avenues for addressing HIV-associated chronic in ammation and its many downstream effects.

Figures Figure 1
Figures

Figure 4 Comparison
Figure 4

Table 1
Demographic and clinical characteristics of the study population in the Veteran Aging Cohort Study (VACS).Mean ± standard deviation or median (quartile 1 -quartile 3) is reported for continuous variables, and n (%) is reported for categorical variables.
EWAS of IL-6 identi ed 43 CpG sites in VACS-450K, 13 CpG sites in VACS-850K, and 97 CpG sites in the meta-analysis (Table

Table 3
Top 20CpG sites associated with in ammation markers in meta-analyses of the results from cross-phenotype association analyses (CPASSOC) in VACS-450K and VACS-850K.

Table 4
Top 20CpG sites associated with in ammation markers in meta-analyses of the results from an omnibus test (OMNITEST) in VACS-450K and VACS-850K.