Positive Selection Affects the Expression of Systemic Lupus Erythematosus Associated Loci in Human Populations

Background: Systemic lupus erythematosus (SLE) shows marked population-specic disparities in disease prevalence, including substantial variation in manifestations and complications according to genetic ancestry. Several recent studies suggest that a substantial proportion of variation of gene expression shows genetic ancestry-associated differences in gene regulation on immune responses. Positive selection may act in a population-specic manner on expression quantitative trait loci (eQTLs) and thereby contributes to the difference in the differences of SLE prevalence and manifestation in human populations. We tested the hypothesises that some of the identied SLE risk polymorphisms display pleiotropic effects or polygenicity driven by positive selection. We performed a genome-wide scan for recent positive selection by using integrated Haplotype Score (iHS) statistics in different human populations. In addition, we estimated the timing of benecial mutations to understand what possible selective pressures drive positive selection at SLE-associated loci. Results: We identied several SLE risk loci that are population-specically under positive selection. Almost all SNPs that are under positive selection function as cis-eQTLs in different tissue types. We determined that adaptive eQTLs affect the expression of fewer genes than non-adaptive eQTLs, suggesting a limited range of effect of an eQTL at SLE risk sites that show signatures of positive selection. Furthermore, some positively selected SNPs are located in transcription factor binding sequences. The timing of positive selection for the studied loci suggests that both environmental and recent lifestyle changes during as well as after the Neolithic Transition may have become selectively effective. We propose a novel link between positively selected eQTLs at a certain SLE risk locus in Europeans and a physiological pathway not previously considered in SLE. Conclusions: We conclude that population-specic adaptive eQTLs contribute to the observed variation in specic manifestations and complications of SLE in different ethnicities. Our results suggest also that human populations adapt more rapidly to environmental and lifestyle stimuli via modication of gene expression without having to alter the genetic code. type I and type III interferon (IFN) signaling insulin haplotype to generate a sample of the posterior distribution for the time of the common ancestor (TMRCA). The model takes advantage of both the length of the ancestral haplotype on each chromosome and the accumulation of derived mutations on the ancestral haplotype to generate a sample of the posterior distribution for the time to the TMRCA. The model requires a sample (panel) containing the selected allele and a reference panel of haplotypes without the selected allele. We chose four representative populations for the analyses: LWK (Africa), GBR (Europe), BEB (South Asia) and CHB (East Asia). We used recombination rates from the sex-averaged recombination map from the deCODE to model recombination rate variation across the human genome. We analysed 1 Mb regions surrounding each genetic variant under selection with an assumed mutation rate of 1.6 x 10 − 8 . We ran three independent MCMC chains, each with 25,000 iterations. Subsequently, we discarded the rst 9000 iterations (burn-in), retaining the remaining successful iterations. Timing of selection (TMRCA estimates) are given in thousand years ago (kya). We assumed 25 years as generation time.


Background
Over the last few years, numerous genome-wide association studies (GWAS) have been performed in an attempt to uncover the genetic basis of human autoimmune diseases such as SLE. This in ammatory autoimmune disorder is characterised by an impaired clearance of apoptotic cells, production of autoantibodies against nuclear antigens, deposition of immune complexes and involvement of multiple organs, leading to considerable variation in clinical manifestations. SLE can lead to increased morbidity and early mortality from end-organ damage; more than half of all SLE patients will develop lupus nephritis, which is one of the severe complications of SLE [1]. Sex-speci c hormones and/or certain environmental factors are thought to trigger speci c genetic and epigenetic mechanisms that lead to failure of self-tolerance with increased in ammatory responses. These responses include both the innate and adaptive immune systems, ultimately leading to damage or destruction of multiple organs and tissues such as the skin, joints, kidneys, heart and brain [2]. A striking feature of SLE is that it affects mainly women of reproductive age (c. 90% of cases). A recent study reported that SLE ranks among the top 20 leading causes of death in females between 5 and 64 years of age [3]. This striking female predominance has been explained by the modulating effect estrogen has on the activation of the immune system (or alternatively, in males, the immunosuppressive effects of testosterone leads to reduced susceptibility for SLE in males). Moreover, SLE shows substantial differences in prevalence -it is more common in African-Americans and Asians than in Europeans [4,5]. People with African genetic ancestry who have migrated to North America or Europe are at a higher risk of developing SLE compared to those of North American or European origin [6,7]. The exact underlying mechanisms for the population-speci c disparities are currently under revision and remain elusive. Evidence from family and linkage analyses indicates that a strong genetic contribution to SLE, with an approximately ten-fold higher risk in monozygotic versus dizygotic twins [8]. The concordance rate in monozygotic twins, however, appears to vary substantially among studied cohorts, ranging from 25-57% [8][9][10]. Nonetheless, the genetic heritability of SLE is probably substantial, with reports from 44% [11] to 66% [12]. The strongest genetic association signal among the common variants from GWAS was obtained from the human leukocyte antigen (HLA) region and from genes involved in type I interferon (IFN) signalling and production; many other SLE susceptibility loci are located in gene regulatory acting sequences within or near genes with functional relevance to the immune system [13,14]. Importantly, many risk-variants are common, have small effect sizes and are often in near-perfect linkage disequilibrium (LD) with surrounding SNPs: this makes it challenging to unambiguously identify the causal genetic variants through traditional nemapping methods. In addition, the frequencies of risk loci can differ among human populations, resulting in different haplotypes and LD properties. The consensus is that genes close to the top associated risk SNP identi ed in GWAS are the most likely causal genes. Thus, however, does not apply to all the cases, as shown in recent studies [15]. For example, the SLE risk SNP rs4917014, which is a regulatory variant located close to the transcription factor encoding gene IKZF1, acts as a trans-eQTL altering C1QB and ve type 1 interferon response genes which are both hallmarks of SLE [15]. For GWAS these observations have proven to be highly challenging in trying to move from uncovering risk variants through GWAS to their biological interpretation [16,17].
From a population genetics viewpoint, the question arises whether any of the identi ed risk loci are adaptive to the environment, i.e., do changing conditions contribute to the risk of developing autoimmunity. Over the last 100,000 years, behaviourally modern humans have been subjected to a wide range of environmental changes, including the Out-of-Africa migration. The transition of most human populations from a hunter-gatherer form of subsistence to a sedentary, agro-pastoral lifestyle within the last 9,000-13,000 years ('Neolithic Transition') included substantial changes of environments and lifestyles related to the domestication of plants and animals. The transformations during the Neolithic demographic transition resulted, for example, in new dietary habits, increased demographic growth and elevated exposure to new pathogens (zoonosis) due to close contact with domesticated animals. The subsequent transition to modernity led to additional fundamental lifestyle and ecological changes such as lower fertility, excess of caloric availability, longer life and an ageing population, exposure to novel man-made pollutants and a decline of global biodiversity. According to the mismatch hypothesis, natural selection could not catch up with the speed of rapidly changing environments; this resulted in genomes maintaining alleles that were bene cial to survival in prior environments but may harm tness in contemporary habitats. Alleles that predispose post-transition populations to autoimmune-related diseases must have had substantial tness bene ts in prior environments. Due to strong selective forces in the past and despite their costs, these risk loci are being maintained in the genome. Ramos et al. [18] support the idea of recent positive selection as one important mechanism causing elevated populationspeci c frequencies of SLE risk loci. Moreover, Raj et al. [19] demonstrated that loci associated with several in ammatory-diseases, including several loci which show evidence of cis-regulatory effects on genes within the associated locus, are enriched for genomic signatures of recent positive selection. These two studies support the hypothesis that recent positive selection is a key mechanism causing an elevated population-speci c frequency of certain autoimmune risk loci.
Although most risk loci are consistently present among populations with different ancestries, some loci are exclusive to speci c populations. For example, risk alleles that are mapped to the genes PTPN22 and SH2B3-ATXN2 were identi ed as SLE risk variants only in Europeans and are apparently absent in Chinese populations [20]. Ramos et al. [18], who used integrated Haplotype Score (iHS) and F ST analysis, found evidence of positive selection at the PTPN22 locus in Africans. This suggests that the frequency of alleles at this locus differs among populations due to the effects of natural selection on allele frequency. Furthermore, many of the alleles associated with SLE function as expression quantitative trait loci (eQTLs), some of which have pleiotropic effects. Several recent studies suggest that a substantial proportion of variation of (immune-) gene expression shows genetic ancestry-associated differences in gene regulation on the immune response [21,22]. For African and European individuals, the study by Nedelec et al. [21] reported that cis-eQTLs of immune genes were signi cantly enriched for high iHS values, indicating the importance of regulatory genetic variation in recent human evolution. Accordingly, positive selection may act in a population-speci c manner on eQTLs and thereby contributes to the difference in the population-speci c prevalence of different diseases. We therefore hypothesise that some of the identi ed SLE risk polymorphisms display pleiotropic effects or polygenicity driven by positive selection. To investigate this idea, we conducted tests for potential eQTLs at genetic polymorphisms associated with SLE. We performed a genome-wide scan for recent positive selection by using iHS in populations with African, European, South Asian and East Asian genetic ancestry and investigated which tissues are mainly affected by positively selected eQTLs. Furthermore, we estimated the timing of selection in order to understand when a bene cial mutation arose and spread in human populations.

SLE risk loci under positive selection
Among the analysed 134 SLE risk loci (not included were the loci of the human leucocyte antigen region, HLA, which is known to be under some form of balancing selection), we identi ed 20 loci under positive selection ( Table 1)

GTEx eQTLs and functional analysis
Almost all of the SLE risk SNPs function as cis-eQTLs in different human tissue types for several genes ( Table 2). The relative magnitude of genes in uenced by the adaptive and non-adaptive SLE eQTLs in various tissue types is presented in Fig. 1. The eQTLs (both adaptive and non-adaptive eQTLs) in uence gene expression (eQTL-eGene interactions) mainly in tissues such as the thyroid, different brain tissues, skin, whole blood, adipose, oesophagus and muscle, suggesting that SLE-associated genetic variants in regulatory regions have tissue-speci c effects. In comparison to the non-adaptive eQTLs, adaptive eQTLs in uence signi cantly fewer (p < 0.03, t = 1.8459, t-test is one-tailed) genes in the different tissues. The funMotifs database shows ( Table 2) Estimate of timing of positive selection The timing of the selection of the loci IL12A, STAT4, TPRG1, TNIP1, HIP1, XKR6, BLK, RAD51B, UBE2L3, SYNGR1 falls in the period before the Neolithic Transition, between 45 kya and 13 kya. The timing of selection for the loci IL12RB2, ABHD6, BANK1, JAK2, CD44, DRAM1 and SH2B3-ATXN2, were estimated to a time frame of about 13 kya to 2 kya, i.e, during or after the Neolithic Transition (Table 2). Because most of these genes are involved in immunity, cell signaling, and metabolism, it is di cult to determine which selective mechanisms drive positive selection (Additional le 3). However, timing and eQTLs indicate that both environmental (i.e., pathogens) and lifestyle changes during and after the Neolithic Transition become possibly selective effective. The most recent timing for positive selection was calculated for the rs10774625-A allele at the SH2B3-ATXN2 locus which is only in Europeans under positive selection. This SNP functions as eQTLs in different tissues for the downstream gene aldehyde dehydrogenase 2 (ALDH2). The ALDH2 enzyme plays a key role in the major oxidative pathway of alcohol metabolism. The bene cial allele, which reaches a frequency of almost 50% in Europeans, is associated with increased expression. Because of the timing of positive selection, we think that the primary target of selection is the ALDH2 gene rather than ATXN2 or SH2B3. We therefore discuss below a possible novel link between ALDH2 and SLE.

Discussion
In this study, we identi ed several SLE-associated loci that are population- loci [18,19]. Similar to a previous study, we found that the number of positively selected SLE GWAS loci is higher in Europeans than in Africans [23]. Additionally, we found that no signatures of positive selection on SLE risk SNPs were shared between Africans and Europeans (exception: SLE risk region 7q11.23).
Most SLE GWAS, however, analysed mainly individuals with European genetic ancestry, which could result in a bias of selected loci when studying positive selection at the SLE-associated loci.
Almost all SNPs that are under positive selection function as eQTLs in different tissue types (Table 2). This is in line with a recent study which showed -for African and European individuals -that cis-eQTLs were signi cantly enriched for high iHS values, indicating the importance of regulatory genetic variation in recent human evolution [21,22]. We also detected that several TF binding motifs are located in adaptive sequences. Thus, it is not fully clear whether the potential SLE risk genes are per se under positive selection, or whether this pertains to eQTLs and TF binding motifs that regulate the expression of other genes. The adaptive eQTLs in uence gene expressions mainly in tissues such as the thyroid, brain, skin, whole blood, adipose, oesophagus and muscle, suggesting that these tissues were the primary targets of recent genetic adaptation. Moreover, few positive selected SNPs that function as eQTLs also in uence genes that are involved in immunity such as defensin beta 134 (DEFB134), APAF1 interacting protein (APIP) and neutrophil cytosolic factor 1 (NCF1). The latter gene, which is located in very close vicinity to GTF2I locus, has been reported to be associated with SLE in Japanese populations [24]. Furthermore, while some adaptive eQTLs affect the expression of multiple genes, the majority affect -as opposed to non-adaptive eQTLs -signi cantly fewer genes and tissues. This nding is in line with [25], who reported that adaptive eQTLs tend to affect fewer tissues than non-adaptive eQTLs. These results suggest that genetic adaptation reduces the effect of pleiotropy.
Purifying selection is expected to purge deleterious genetic polymorphisms from a population's gene pool. Importantly, our ndings show that risk variants are retained in the human genome. This somewhat contradictory result is attributable to the fact that phenotypic adaptations (i.e., epigenetics and eQTLs) can occur faster than genotypic adaptations. These changes entail the potential of a population to adapt more rapidly to environmental stimuli via modi cation of gene expression without having to alter the genetic code. Moreover, for a genetic polymorphism to be subjected to negative selection, it must exert a drastic harmful effect in terms of reducing individual tness. SLE mainly affects women of reproductive age and this disease therefore clearly reduces the tness of affected individuals. From an evolutionary perspective, the question arises as to why the underlying alleles have not been driven to low frequencies if SLE reduces survival and reproduction chances? Pathogen-mediated positive selection is thought to drive genetic adaptation at loci involved in immune responses. However, recent lifestyle changes during as well as after the Neolithic Transition may have become selectively effective (e.g., as shown for dairy farming and the associated genetic adaptation of the lactase gene).
Our study found a very recent timing of positive selection (2.2 kya -3.2 kya) for the bene cial allele rs10774625-A in Europeans at the SLE risk locus SH2B3-ATXN2. The protein encoded by SH2B3 in uences a variety of signalling pathways mediated by Janus kinase (JAK) and receptor tyrosine kinases (RTKs), acts as a negative regulator in cytokine signalling, and is involved in growth factor signalling activities [26]. The gene ATXN2 lies very close to this gene. It encodes a polyglutamine protein involved in RNA metabolism and metabolic homeostasis [27]. The SH2B3-ATXN2 region was found to be associated with various other diseases such as rheumatoid arthritis [28], type 1 diabetes [29] and coronary artery disease [30]. Our study shows that the SH2B3-ATXN2 region is exclusively under strong positive selection in Europeans, as reported in two previous studies [31,32]. SH2B3 has been suggested to play a role in protecting against bacterial infection because a risk allele for celiac diseases was found to be associated with activation of the NOD2 recognition pathway. This could potentially explain the selective sweep at this locus [33]. Importantly, we recorded signatures of positive selection across the genes SH2B3-ATXN2 (Additional le 4) which are also characterised by strong LD (Additional le 5); their close proximity on the chromosome hinders determining which of these two genes is actually under positive selection. Either one of the genes or potentially both are under positive selection; another possibility is that the identi ed eQTLs within this region are under positive selection because they control the expression of the downstream ALDH2 gene (Additional le 5 shows that SH2B3-ATXN2 and ALDH2 are located in the same extended LD block in Europeans). This suggests that the ALDH2 gene is the primary target of positive selection in Europeans. The encoded protein of this gene, which functions in the mitochondrial matrix, belongs to the aldehyde dehydrogenase family of proteins and represents the second enzyme of the major oxidative pathway of alcohol metabolism; it plays a key role in oxidising acetaldehyde into nontoxic acetate. The alcohol ush reaction is predominantly shown by people of Asian ancestry carrying an ALDH2*2 allele, i.e., carrying the inactive isozyme with limited activity to convert acetaldehyde into acetate. On contrast, people of Caucasian ancestry usually have only the active isozyme (ALDH2*1 allele). The positively selected allele rs10774625-A as well as other positively selected SNPs in this genomic region in Europeans function as eQTLs and are associated with increased ALDH2 expression. We therefore conclude that the culturally driven European lifestyle involving high alcohol consumption is the driving force for positive selection at this locus. The result is an increasing frequency of the SLE risk allele in Europeans. The frequency (in the 1000 Genomes populations) of the bene cial allele rs10774625-A in Europeans is about 0.48, whereas it is virtually absent in East-Asians. In Africans the allele frequency at this locus is 0.02; in South-Asians about 0.07 (Additional le 1). Finally, Neanderthal and Denisovan sequences show the ancestral allele (G) at this SNP.
How is this bene cial allele possibly liked to SLE susceptibility in Europeans? ALDH2 is not only a major detoxi cation enzyme for ethanol-derived acetaldehyde but also mediates the activation of nitric-oxide synthases (NOS), a family of enzymes catalysing the production of nitric oxide (NO) [34]. NO is involved in various vital physiological processes including host defence. However, NO can also form peroxynitrite (ONOO − ), reactive NO species (RNOS) exerting various proin ammatory actions [34,35]. Acute or chronic exposure to ethanol is apparently associated with increased RNOS through induction of NOS expression, protein nitration and lipid oxidation. Studies indicate that increased inducible NOS activity is associated with the progression of SLE [36]. Other studies, however, indicate that moderate alcohol consumption is associated with decreased SLE risk [37,38]. We propose that the bene cial allele associated with increased ALDH2 expression has positive effects on ethanol detoxi cation (after increased alcohol consumption) and NOS activation. Nonetheless, in SLE progression, increased inducible NOS activity (potentially due to oxidative stress) limits ALDH2 activity, possibly leading to (sex-speci c [39]) increased formation of RNOS and reactive aldehydes (such as 4-hydroxynonenal and malondialdehyde). This alters the immune response in SLE. We thus suggest that this risk SNP acts as a 'promoter' rather than 'trigger' of SLE in Europeans. Furthermore, this SNP is located within enhancer histone marks, and the funMotifs database shows that it also located within the binding sequence for the transcription factor forkhead box protein O1 (foxo1). This TF is involved in regulating glucose metabolism, insulin signalling and regulates metabolic homeostasis in response to oxidative stress. Foxo1 also interacts with NAD-dependent deacetylase sirtuin1 (SIRT1), which has been shown to inhibit T cell activation [40]. Thus, the regulatory elements at this locus, including the effects of positive selection, could contribute to substantial phenotypic/trait differences among individuals with different genetic ancestries.

Conclusion
Positive selection acts mainly on regulatory elements such as cis-eQTLs and transcription factor binding motifs, thereby affecting the expression of the various SLE-associated genes and genes located near the risk loci. Most loci analysed are population-speci cally under positive selection, suggesting local adaptation to speci c environmental/lifestyle conditions. Thus, positive selection in uences the frequency of haplotypes encompassing the genetic risk polymorphisms in populations differently.
Moreover, we also determined that adaptive eQTLs affect the expression of fewer genes than nonadaptive eQTLs, suggesting a limited range of effect of an eQTL at SLE risk sites that show signatures of positive selection. We conclude that population-speci c adaptive eQTLs contribute to the observed variation in speci c manifestations and complications of SLE in different ethnicities. We also highlight the proposed novel link between positively selected eQTLs at the SH2B3-ATXN2SLE risk locus in Europeans and the physiological pathway via ALDH2 in SLE.

SLE GWAS data
From the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/) we obtained (accessed November 2019 -January 2020) 31 publications and the reported risk loci for SLE (Additional le 6). We included risk loci with a reported statistical threshold of < 5x10 − 8 in the studies, which yielded 145 genetic loci associated with SLE. However, we excluded the human leucocyte antigen (HLA) region because several studies already showed that HLA loci bear signatures of both positive and balancing selection.

Positive selection and FST analysis
We used the integrated Haplotype Score (iHS) statistics [44] which are implemented in the software program selscan version 1.2.0a (https://github.com/szpiech/selscan) [45] to detect positive selection in genomic population data. These statistics are based on the model of selective sweeps, i.e., a bene cial mutation rises quickly in frequency until it reaches xation within a population and thereby simultaneously reduces genetic diversity at sites in very close genetic vicinity. The iHS statistic compares integrated EHH values between alleles at a given SNP and is based on the decay of the haplotype homozygosity model as a function of recombination distance. The underlying rationale is that selected alleles will have an unusually long-range linkage disequilibrium (LD) given their frequency in the population. Signi cant (based on empirical p-values < 0.05) negative iHS values (absolute iHS score <-2.0) indicate unusually long haplotypes carrying the derived allele, and signi cant positive values (absolute iHS score > 2.0) are associated with long haplotypes carrying the ancestral allele [44]. All scans were run on phased whole chromosome data with the default selscan model parameters. The unstandardized iHS scores were normalized in frequency bins across the entire genome using the script norm, provided with the selscan program. We considered candidate regions of positive selection as genomic regions containing clustering of SNPs with high iHS statistics. This was quanti ed as ≥ 20 SNPs with |iHS| > 2.0 in 100 kb nonoverlapping windows. We used the LDlink, a web-based application (https://analysistools.cancer.gov/LDlink/?tab=home) [46], to explore population-speci c linking of correlated alleles between the SLE risk SNPs and SNPs under positive selection. Pairwise F ST were calculated for each SNP under positive selection using the Weir & Cockerham F ST calculation [47] implemented in the PLINK program. Negative F ST values were set to zero.

Genotype-Tissue Expression (GTEx) eQTL and functional analysis
We used the GTEx Portal V8 Release (https://www.gtexportal.org/home/) [48] to obtain expression quantitative trait loci (eQTLs) (accessed January -August 2020 (dbGaP Accession phs000424.v8.p2). In recent years, hundreds of thousands of human eQTLs have been catalogued in the GTEx database. GTEx reports global RNA expression within individual tissues: treating the expression levels of genes as quantitative traits enables identifying variations in gene expression that are highly correlated with genetic variation as eQTLs [49]. We used the R program [50] and the R packages ggplot2 [51] to calculate and plot the relative magnitude of eQTLs-eGene interaction in different human tissues (eGene is de ned as a gene in which the expression level is signi cantly associated with an eQTL). We used funMotifs DB (http://bioinf.icm.uu.se:3838/funmotifs/) (accessed August 2020) to analyse whether any of the SNPs under selection contain transcription factor (TF) motifs [52]. In this analysis we used as input type 'genomic region' (and the tissue blood) containing the core SNP under selection including 10 SNPs upand downstream of that SNP. The UniProt database (https://www.uniprot.org/) was used to extract information on gene/protein functions. The Reactome Database (https://reactome.org/) was used to obtained locations of the gene in the Pathway Browser. Genomic feature plots were generated by using R package Gviz [53].
Estimation for the timing of selection on bene cial alleles We estimated the timing of selection on a bene cial allele by applying a recently published method Startmrca [54].

Availability of data and materials
The SLE-GWAS data can be obtained from https://www.ebi.ac.uk/gwas/. The gnomic data can be obtained from 1000 Genomes database (see Additional le 6). The generated iHS dataset is available from the corresponding author on reasonable request.