GWAS for Systemic Sclerosis Identified six novel susceptibility loci including penetrating Fcγ-Receptor Region

DOI: https://doi.org/10.21203/rs.3.rs-2712663/v1

Abstract

We conducted a Japanese GWAS for systemic sclerosis (SSc) comprising 1,428 cases and 112,599 controls, the largest Asian GWAS for SSc ever, and identified three novel signals. The lead SNP in FCGR/FCRL region had a strong effect size (OR 2.05, P = 4.9×10−11). The complete LD SNP, rs10917688, was found in a cis-regulatory element and a part of binding motifs for IRF8. IRF8 was a significant locus in the European GWAS and rs10917688 showed an association only in the presence of the risk allele of IRF8 in Japanese. rs10917688 was marked with H3K4me1 in primary B cells, and the heritability was enriched in active histone marks of primary B cells. A meta-analysis with the latest European GWAS found additional 30 significant loci including three novel signals. PRS constructed with the effect sizes of the meta-analysis indicated potential portability of genetic associations beyond populations (AUC: 0.593). The fitting of PRS was improved by further prioritizing the top 5% SNPs of IRF8 biding sites in B cells, underscoring common genetic architecture across populations and critical roles of B cells and IRF8 for SSc development.

Introduction

Systemic sclerosis (SSc) is one of the systemic autoimmune diseases (AIDs) characterized by fibrosis in connective tissues and internal organs, such as the lung or the kidney as a consequence of microvascular dysfunction and dysregulated immune systems (1). Despite recent progress in disease management (2), SSc still has high morbidity and mortality mainly due to a poor understanding of its underlying pathophysiological mechanisms (1). The etiology of SSc is complex and is not fully understood, but as with most AIDs, it is widely accepted that both environmental (3, 4) and genetic factors contribute to the risk of the disease.

SSc can be classified according to clinical features or serological profiles. The clinical subtypes, limited cutaneous SSc (lcSSc) and diffuse cutaneous SSc (dcSSc), are based on the distribution of skin fibrosis. Organs involved in each subtype also differ and dcSSc tends to involve critical organs (e.g., the lungs), and hence has a worse prognosis than lcSSc. Anti-centromere antibody (ACA), and anti-topoisomerase antibody (ATA) are the two most prevalent autoantibodies and they do not usually co-exist in the same individual. Importantly, the presence of the autoantibody often parallels with clinical subtypes and thus can be considered biomarkers for SSc. Indeed, ACA is more prevalent among lcSSc patients while ATA is more found in dcSSc subjects (5).

After the introduction of genome-wide association studies (GWASs) (6), the number of genetic markers convincingly associated with SSc has exponentially increased. The first GWAS for SSc was reported by Arnett et al in 2010 (7) and there have been six large-scale GWASs published thereafter (813). As a result, disease risk variants that surpassed the genome-wide significant level (P < 5.0×10− 8) have been identified in a total of 27 loci outside the HLA region (Supplementary Table 1). Among these, SNPs in STAT4 have been well established for the trans-ethnic disease association. STAT4 has also been implicated in the association with multiple AIDs including systemic lupus erythematosus (SLE) and Rheumatoid arthritis (RA) (14). On the other hand, the rest of the loci were more pronounced mainly in Europeans.

A candidate gene analysis (CGA) has also reported multiple potential susceptibility loci to SSc. An association of TNFAIP3 has repeatedly been implicated by multi-ethnic CGAs (1517); however, the association exceeding the GWAS significant level has never been reported by any of the GWASs. PLD4 region had been known as one of the risk loci for Japanese SLE and was found also to be associated with Japanese SSc, but without fulfilling the GWAS significance (17).

While the previous studies had greatly contributed to a better understanding of genetic backgrounds of SSc pathology as well as shared and population-dependent genetic architecture, the study subjects enrolled were mostly limited to European descendants; there have been only three east Asian GWASs with limited sample sizes (11, 13, 18). Furthermore, it has been difficult to identify disease-associated variants with low allele frequencies in Asians because of the limited sample size and a lack of imputation reference panels containing a sufficient number of whole-genome sequencing (WGS) data of East Asian populations. Since rare susceptibility variants tend to have larger effect sizes compared to common SNPs, and importantly, may have population-specific effects, we can analyze rare variants by enrolling enough subjects including both cases and controls in the Asian population.

Here we conducted GWAS for Japanese SSc comprising a total of 114,027 subjects, consisting of 1,428 cases and 112,599 controls, taking advantage of an imputation reference panel containing more than 3,000 Japanese WGS data. These numbers of cases resulted in the largest Asian GWAS for SSc ever. We further conducted a trans-ethnic GWAS meta-analysis by combining our GWAS dataset with a comprehensive European meta-GWAS result and downstream analyses to clarify the genetic basis of SSc.

Results

Three novel risk SNPs identified in the non-HLA region by Japanese GWAS

The current study included two independent datasets, which resulted in a total of 114,108 samples (Supplementary Fig. 1). Set 1 consisted of 694 cases enrolled from the previous study (11) and 2,095 controls. Set 2 consisted of newly enrolled 734 cases and 110,504 controls enrolled from the BioBank Japan (BBJ) project (19, 20). Genotype phasing and imputation were conducted after rigorous quality controls (QCs) of samples and variants (Supplementary Fig. 2). Then, an association analysis was conducted for each set, which identified one and six significant loci in Set 1 and Set 2, respectively (Supplementary Fig. 3–4 Supplementary Table 2–3).

Next, we combined the two datasets to maximize the statistical power of association analysis. Consequently, a total of five significant loci were identified outside the HLA region (Fig. 1A, Table 1) and three of them were novel; rs6697139 in FCGR/FCRL region, rs5029949 in the TNFAIP3 region, and rs2819422 in the AHNAK2-PLD4 region (Fig. 1B-D). All five loci showed strong associations both in Set 1 and 2 with the same effect directions (Table 1). The genomic inflation factor λGC was 1.044 (Supplementary Fig. 5) and the intercept of linkage disequilibrium score regression (LDSC) was 1.023 showing no obvious confounding bias. The effect directions of known risk loci were shared for all the loci between previous GWASs and the present study (Supplementary Table 1), indicating a strong shared genetic architecture between Europeans and Japanese in addition to the validity of the present GWAS.

We also ran the Firth regression (21) and a regression based on a generalized linear mixed model with the saddle point approximation using SAIGE (22) to address possible overestimation of the low frequent variants (such as the FCGR/FCRL SNP) due to the imbalance of case-control numbers, which revealed comparable results with those obtained by a logistic regression (Supplementary Fig. 6). The associations of genetic variations in TNFAIP3 (1517) and PLD4 (17) have been described previously, but only by CGAs, while both genes have been well documented for the associations with SLE (23, 24). Thus, this is the first GWAS that identified the associations of these loci with SSc at the GWAS significance. STAT4 and IRF5 are associated with multiple AIDs including SSc (14).

No further significant SNPs were identified by the stepwise conditional analyses with a relaxed threshold level (P = 1.0×10− 6).

Exploration of causal SNPs and potential impacts of GWAS SNPs on the SSc pathology

Having identified significantly associated SNPs prompted us to search for potential causal variants in each locus.

We searched for potential deleterious exonic or loss-of-function (LoF) SNPs among the lead SNPs and their linked SNPs (r2 ≥ 0.8) and found that none of the SNPs tested were predicted to be LoF SNPs or potential deleterious exonic SNPs (Supplementary Data 1–4).

Next, we conducted fine-mapping for each significant locus and identified the plausible causal SNPs in the FCGR/FCRL, STAT4, and IRF5 regions, where the top two variants had posterior probabilities (PPs) of more than 0.3 (Fig. 2A-E, Supplementary Table 4).

Then, we conducted a tissue-wide association study (TWAS) using the gene expression profiles in multiple tissues in the GTEx (ver.7) (25) and those in six subsets of white blood cells (WBCs) (26) to evaluate associations between SSc susceptibility and gene expression profiles of multiple tissues or cell types. A total of 26 gene-tissue/cell-type pairs were found to be significantly associated with SSc susceptibility (Supplementary Table 5). Among these, IRF5 is a well-established risk gene for SSc. Notably, the changes of IRF5 expression were mainly observed in various non-lymphoid tissues or organs, most of which can be involved in the disease course of fibrosis (Supplementary Table 5). Among the genes located in close proximities on chromosome 14, AHNAK2, C14orf180, GPR132, and PLD4 (Fig. 2), the strongest association was observed for an increased PLD4 expression in the spleen (TWAS P = 6.3×10− 14, Supplementary Table 5). Thus, together with the previous findings of auto-immune phenotypes in pld4 mutant mice (17), the altered expression of PLD4 expression in immune-related cells or tissues including the spleen might confer SSc susceptibility.

We also conducted pathway analyses to measure pathological impacts by the GWAS significant SNPs. As expected, significant enrichment and a high proportion of overlapping annotated genes were observed in the SLE-related pathway followed by various immune-related pathways, such as the butyrophilin-related pathway, IL-20 family signaling, Fc γ receptor-mediated phagocytosis, adaptive immune system, JAK-STAT signaling pathway, or IFN-γ pathway (Supplementary Table 6).

Trans-ethnic meta-analysis of Japanese and European SSc GWAS

In search of risk SNPs of SSc shared between Japanese and European populations, we estimated the correlation of the effect sizes of causal SNPs between the two populations (Supplementary Table 7). The recently published European GWAS meta-analysis summary statistics (12) was used as a representative European dataset. We found the trans-ethnic genetic correlation estimate (ρge) of 0.738 ± 0.418 (standard error of mean), suggesting highly shared genetic architecture between the two populations.

Then, we conducted a trans-ethnic meta-analysis by an inverse variance method with a fixed-effect model. We took a total of 3,686,421 SNPs shared between the two data sets for the meta-analysis. Although the genomic inflation factors (λGC) were 1.110 and 1.098 with and without SNPs in the HLA region, respectively (Supplementary Fig. 7), the LDSC intercept was 1.05, indicating that the genomic inflation observed in the meta-analysis was mainly due to polygenic nature of SSc.

A total of 24 lead SNPs were identified (Fig. 3A, Table 2) and three of them, rs398390 in LINC01980-CMC1-EOMES region, rs10484921 in the ESR1 region, and rs2074 in the SLC12A5 region, were found to be novel (Fig. 3B-D). We further conducted conditional analyses by conditioning on the lead SNPs in each population followed by a meta-analysis to obtain additional independently associated SNPs, which was repeated until no further SNPs remained significant at a relaxed significant threshold (P = 1.0×10− 6). We identified five more secondary and one more tertiary independently associated SNPs (Table 2). Among the lead SNPs identified in Japanese GWAS, all of them in the Europeans (if present) showed the same direction of effects as those of the Japanese (Supplementary Table 8). Taken together, a total of 30 signals including three novel associations in 24 regions were identified in the trans-ethnic meta-GWAS analysis of Japanese and European SSc.

Subsequently, a fine-mapping analysis was conducted to identify potential causal SNPs shared between the two populations. We found that a single SNP had the PPs close to 1.0 in eight regions (Supplementary Table 9). Furthermore, each credible set had only two SNPs in another four regions with the PPs > 0.6 for top SNPs. Among these 12 regions, a total of 5 regions, namely, NF-ҡB, PRDM1-ATG5, IRF5, TNFSF4, and IRF8, were those which were not successfully fine-mapped using the European meta-GWAS dataset only (12) (Supplementary Table 10), showing that the present trans-ethnic meta-analysis not only identified novel susceptibility SNPs but also successfully fine-mapped candidate causal SNPs by taking advantage of difference in LD structures between the populations.

Transcriptional regulation of FcγR family genes via IRF8 binding to rs10917688-containing cCRE

For further detailed downstream analyses, we focused on the FCGR/FCRL region since this region showed a penetrating association with SSc, especially in the Asian population, one of the unexplored populations and a main data source of novel findings in the current study. rs10917688, one in complete LD with the lead SNP in the FCGR/FCRL region (Fig. 1A, B, Table 1) with the high effect size (OR 2.05), was found to be positioned within a candidate cis-regulatory element (cCRE) nearby a cluster of Fcγ receptor (FcγR) family genes according to the ENCODE database (27, 28). This is in line with most GWAS signals found in intronic or intergenic regions, in which regulatory elements for gene expression are frequently located. Among such SNPs, a transcription factor (TF)-binding motif analysis for the cCRE containing rs10917688 identified a total of 28 TF binding motifs in the cCRE. We noted that more than half of these TFs were known for immune cell differentiation or function, and they tended to be ranked higher than non-immune-related TFs (Supplementary Table 11). Among these immune-related TFs, IRF8 was the only TF predicted to bind to the sequence spanning rs10917688 with statistical significance (Fig. 4A and Supplementary Table 11), with higher binding probability on the risk allele, indicating that altered IRF8 binding can affect SSc susceptibility. Notably, one of the lead SNPs identified in the trans-ethnic meta-GWAS, rs11117420, is located in the IRF8 region (Table 2). Furthermore, the association of rs10917688 was observed only in the presence of the risk genotype (GG) of rs11117420 (Fig. 4B), indicating an IRF8-dependent association of rs10917688. IRF-targeted gene enrichment, which included FCGR2B and FCGR2C, was also identified by gene set analysis (Supplementary Table 12), further supporting the association between IRF8 and rs10917688.

Then, we explored the cell-type specificity of the observed potential IRF8 binding to the cCRE. We identified enhancer-like histone marks, H3K27ac in primary CD8+ memory T cells and H3K4me1 in primary B cells at this locus (Supplementary Table 13). We further examined if rs10917688 is an expression quantitative trait locus (eQTL) for neighboring genes (Fig. 1B) using the Japanese eQTL dataset of six WBC subpopulations (26). For FCGR2A and FCGR2B, of which expressions were found in B cells, Monocytes, and NK cells in the dataset, the effect allele (T) of rs10917688 showed a trend of less FCGR2A and FCGR2B expressions (Supplementary Fig. 8A, B). The same trend was observed only in CD8+ T cells for FCGR3A and FCRLB expressions (Supplementary Fig. 8C, D). We also found that rs10917688 was eQTL for FCRLB expression in whole blood (normalized effect size (NES) -0.20, p = 3.1×10− 6) on GTEx database (v8.1).

Together these results indicate that rs10917688 may affect the expression of nearby FcγR family gene(s) by altering the binding affinity of IRF8 in the target cells, especially B cells.

Different genetic architectures between the systemic and the limited types of SSc

Next, we explored genetic associations in major clinical/serological subtypes of SSc, lcSSc, dcSSc, ACA-positive SSc, ATA-positive SSc, SSc complicated with ILD (ILD-SSc) (Fig. 5A, Supplementary Fig. 9, Supplementary Table 14). We found that there was a stark contrast between the systemic types of SSc (dcSSc and ATA-positive SSc) and the limited types of SSc (lcSSc and ACA-positive SSc) in terms of the HLA association; the solely strong association was identified in the systemic types of SSc, while the associations of HLA was much weaker in the limited types of SSc. In addition, several SNPs outside the HLA region including STAT4 locus were also significantly associated with the limited types of SSc. Effect sizes of the non-HLA lead SNPs in each subset or whole SSc were comparable among different subsets except for those of three SNPs, rs5862323, rs11177005, and rs79834248, associated with the ACA-positive SSc. The effect sizes of these SNPs were comparable only with those of lcSSc (Fig. 5B). Likewise, the association of the lead SNP in the FCGR/FCRL locus, rs6697139, was significant only in whole SSc and the effect sizes were almost identical except for ACA-positive SSc (Supplementary Fig. 10, Supplementary Table 15). We also observed a strong association of the same HLA SNP as the systemic subtypes with ILD-SSc reflecting a higher incidence of ILD in the systemic subtypes than in the limited subtypes. Though marginally significant, rs57919238 in the NBEA region was a SNP identified only in ILD-SSc.

Together these results highlight the different genetic architectures, especially HLA genes, between the systemic and the limited types of SSc.

Heritability enrichment in autoimmune-related traits/cells and polygenic features

Since it is common that genetic architectures are shared among several different diseases or traits (29), we measured the genetic correlation between SSc and SLE (30) or 47 target complex diseases of BBJ (20). As we expected, there was a significant genetic correlation between SSc and SLE, followed by RA (without statistical significance, Supplementary Table 16). We also measured genetic correlations between SSc and 60 quantitative traits adopted in the BBJ project (19, 20) and found none of the tested traits were genetically correlated with Japanese SSc (Supplementary Table 17).

Next, we conducted a partitioned heritability enrichment analysis and found that the heritability was mostly enriched in the connective tissues and bones followed by the hematopoietic tissues in the Japanese SSc, while the heritability was significantly and highly enriched in the hematopoietic tissues in the European SSc (Fig. 6A, Supplementary Table 18). The cell-type-based partitioned heritability of active histone marks, H3K9ac, H3K27ac, H3K4me3, and H3K4me1, showed that H3K4me1, one of the enhancer-related histone marks, was significantly and highly enriched in primary CD19+ B cells followed by CD4+ effector T cells in Japanese SSc (Supplementary Table 19). The same trends were also observed in European SSc with higher enrichment in CD4+ regulatory T cells and Th17 cells followed by B cells to a lesser extent (Fig. 6B, Supplementary Table 20).

We further applied the gchromVAR (31) to our Japanese dataset and found that B cell was the only significant cell type with enrichment of genetic variations in chromatin accessibility regions among 16 blood cells tested (Supplementary Table 21).

Together these results clearly show the polygenic architecture of SSc, which is similar to other AIDs, with strong heritability enrichment in lymphohematopoietic systems, especially in B cells.

Prediction of SSc development by polygenic risk scores

Having confirmed the heritability and the polygenicity of SSc shared between the European and the Japanese populations, we constructed polygenic risk scores (PRSs) using β coefficients of the European GWAS taking account of Japanese LD structures and applied them to our Japanese datasets to examine the predictive performance for the SSc development. Using the best predictive parameter set of GWAS p-value (PT) and r2 of LD determined by the Japanese Set 1 dataset (r2 = 0.4, PT = 5.0×10− 6; Supplementary Table 22), the disease risk was tested in Set 2, resulting in the AUC of 0.593 and the Nagelkerke value of 0.0092 (OR 1.336, 95% CI 1.263–1.424, p = 9.98×10− 19). The performance was much better than that constructed by β coefficients of the Set1 Japanese dataset (AUC of 0.519 and Nagelkerke value of 0.00055 in Set2 with the same parameter set as above). The subjects in the top 5% quantile had significantly higher OR for SSc susceptibility compared to those in the median quantile (Fig. 7A). When we used the effect sizes of a meta-analysis of the European and the Japanese Set 1 datasets, the predictive performance further improved (AUC from 0.593 to 0.604 and Nagelkerke pseudo R2 from 0.0092 to 0.0118; Supplementary Table 23) and the subjects in the top 5% quantile had a significantly higher risk than those in the median PRS (Fig. 7B) with the higher OR compared to one observed in the comparison with the use of the European GWAS effect sizes (Fig. 7A). These results highlight a shared genetic architecture of SSc between European and Japanese populations.

Next, we tested if the PRS predicts subsets of SSc better than a whole set of SSc using the best parameter sets determined above. Intriguingly, predictive performances were slightly higher for lcSSc or ACA-positive SSc, suggesting a more polygenic nature of the limited cutaneous types of SSc. Furthermore, in the intra-case setting, the PRS significantly discriminated SSc patients with ACA from those without ACA (Supplementary Table 24). We also examined a possible correlation between PRS and the age of SSc onset, which revealed no significant correlation (Supplementary Table 25).

Since the present findings consistently demonstrated the significant roles of B cells and IRF8 in SSc pathology, we hypothesized that the integration of functional annotations in B cells would improve the fitness of PRS. We obtained the top 5% of SNPs for IRF8 binding in one of the B cell lines, RAMOS cells, by IMPACT software (32) (33) and measured an improvement in the predictive performance of PRS. As expected, prioritizing these top 5% SNPs plus the lead SNPs of the meta-analysis further improved the predictive performance of PRS (AUC from 0.604 to 0.610 and Nagelkerke pseudo R2 from 0.0117 to 0.0130; Supplementary Table 26) and the top 5% quantile subjects had a significantly higher risk of SSc than those with the median PRS (Fig. 7C) with the higher OR compared to those observed without the SNP prioritization (Fig. 7A, B). Together these data further support the importance of IRF8 and B cells in SSc development as well as better trans-ancestral portability of PRS by prioritizing SNPs annotated according to TF-binding in tissue and cell-type specific manners.

Discussion

In the present study, we conducted the Japanese GWAS for SSc comprising a total of 114,027 subjects consisting of 1,428 cases and 112,599 controls, resulting in the largest Asian GWAS for SSc ever, and identified three novel significant loci. One of the SNPs, rs6697139 located at the intergenic region of FcγR family genes as well as its complete LD SNP, rs10917688, had a strong effect size (OR ~ 2.0) and were found to be plausible causal SNPs by fine-mapping. Considering the much higher MAF but with no significant association in the European dataset, this Japanese-specific loci was worth further investigation. Notably, rs10917688 is positioned within a cCRE and likely to be a part of binding motifs of IRF8, a key TF for the developmental trajectory of B cells (34) as well as dendritic cells, macrophages, and NK cells (35). One of the enhancer-related histone marks, H3K4me1, was identified in B cells and heritability of active histone marks was enriched in B cells both in European and Japanese populations, suggesting IRF8-FCGR/FCRL axis in B cells might be a novel pathological mechanism in SSc.

FcγRs are expressed on the surface of both innate and adaptive immune cells and confer immune modulatory responses by binding the Fc portion of immunoglobulin G (IgG). Based on their binding affinity, FcγRs are divided into low-affinity and high-affinity FcγRs and there are five low-affinity FcγRs, FcγRIIa, FcγRIIb, FcγRIIc, FcγRIIIa, and FcγRIIIb, each of which is encoded by FCGR2A, FCGR2B, FCGR2C, FCGR3A, and FCGR3B, respectively (36). Due to the close proximities of FcγR gene locations and segmental duplications (SD), loci with two or more highly similar and duplicated regions, it is difficult to determine which gene is critically affected by a causal SNP for SSc development. SD loci are enriched for immune genes and often show copy number variations (CNVs) (36). Indeed, CNVs in the FCGR region have been suggested to be associated with multiple AIDs including RA, SLE, celiac disease, and inflammatory bowel disease (36). However, the association of CNVs with SSc has never been reported, and thus the effect of the SNPs rather than CNVs is likely to be associated with SSc. In addition, the association was not significant despite higher MAFs in European populations, suggesting the association of these loci is specific to Japanese, which may be independent of CNV.

FCRLA encodes a protein similar to the FcγR and is selectively expressed in peripheral and germinal center B cells, but not in terminally differentiated plasma cells. Thus, it has been considered to be involved in B-cell development through consensus Immunoreceptor Tyrosine-based Activation Motifs (ITAMs) and Immunoreceptor Tyrosine-based Inhibitory Motifs (ITIMs) (37). The ligands had long been unknown, but it has recently been found that they bind IgA, IgM, and IgG inside endoplasmic reticulum (ER) and hence they are now supposed to work as ER chaperones (38). Although they are mainly retained in ER, one of the isoforms has a longer signal peptide that allows for secretion outside cells, implying its role as a secreted protein (39). Despite these advances in understanding, FCRLA is still a gene of unknown function both for physiological and pathological conditions.

FcγRIIa, CD32a, is expressed in monocytes, neutrophils, and eosinophils, and mediates phagocytosis of opsonized antigens or immune complexes. FcγRIIa also has ITAMs and ITIMs and thus can be functionally divergent. A well-known functionally relevant SNP, rs1801274, is a nonsynonymous mutation, which alters arginine (R) to histidine (H) at amino acid position 131 of the extracellular domain and confers binding to IgG2 and IgG3 with higher affinity than RR receptors (40). The SNP has been implicated for the susceptibility to SLE, Kawasaki disease, cystic fibrosis, and several infectious diseases including invasive pneumococcal or meningococcal disease, severe malaria, dengue fever, respiratory syncytial virus, and SARS-CoV (40).

FcγRIIb, CD32b, is a solely inhibitory receptor among FcγRs expressed mainly on B cells but is also expressed on dendritic cells, macrophages, and mast cells. It inhibits phagocytosis of immune complexes and antibody production by B cells (41). FCGR2B has been implicated for its association with multiple AIDs including RA (42), SLE (43, 44), type I diabetes (45), and IgG4-related disease (46). Although the genetic association of FCGR2B with SSc has never been reported, one small study observed higher levels of anti-FcγRIIB/C antibodies in sera of Japanese dcSSc patients compared to those of lcSSc or non-SSc controls (47).

Despite limited sample numbers and genotype information, our eQTL analysis using the Japanese eQTL dataset of six WBC subpopulations (26) showed a trend of decreased FCGR2A and FCGR2B expression in relevant cell types, B cells, NK cells, and Monocytes. Although we were not able to assess an eQTL effect on FCRLA due to a lack of genotype information in the database, the gene is also located in close proximity to rs6697139 and rs10917688. Thus, further eQTL studies with a sufficient number of samples are highly warranted to measure the effect of the variant, rs10917688, on any of the gene(s) in this region.

Interferon-regulatory factor 8 (IRF8), also known as interferon consensus sequence binding protein (ICSBP), is a TF exclusively expressed in hematopoietic lymphoid and myeloid cells. IRF8 deficiency in humans was previously documented and the subjects suffered from severe immunodeficiency due to depletion or impaired functions of dendritic cell subsets, monocytes, and NK cells (48). Sequence variants near IRF8 have repeatedly been identified as risk factors for various AIDs including SSc (12) according to the previous GWASs. IRF8 has also been implicated in early B-cell development, Igk rearrangement, germinal center formation, and plasma cell generation (49). Several mouse experiments showed that IRF8 worked together with other TFs, such as PU.1, IRF4, IKAROS, or E2A, to modulate lineage specification, commitment, and differentiation in B cells (50, 51). An intronic variant, rs8057456, was one of GWAS significant variants for serum immunoglobulin levels in the GWAS of Scandinavian populations (52). Together these studies strongly suggest that genetic variations of IRF8 in B cells can modulate susceptibility to autoimmunity including SSc. The present study indicated that IRF8 may bind to a motif containing FCGR/FCRL variant, rs10917688, within a cCRE, which has never been reported so far. Furthermore, our genotype combination analysis (Fig. 4B) revealed that IRF8-binding to this motif is indispensable for the risk effect of rs10917688, supporting an interactive effect of IRF8 and the FCGR/FCRL variant. Although we were not able to identify eGene(s) for rs10917688 and thus the result has not been fully convinced, further independent studies for both east Asians and other populations and validation experiments such as reporter assays or single nucleotide editing approaches will validate our findings as well as clarify more precise molecular mechanisms.

As can be observed in other AIDs, our study revealed that SSc has also polygenic architecture. Intriguingly lcSSc or ACA-positive SSc tended to more fit the PRS than systemic types of SSc, implying different genetic backgrounds and hence pathological mechanisms between these distinctive phenotypes. The PRS constructed from GWAS SNPs moderately fit the predictive model indicating a potential utility of PRS for a risk assessment of SSc in clinical settings. Furthermore, prioritizing the top 5% SNPs annotated according to IRF8-binding in RAMOS cells improved the predictive performance showing the better trans-ancestral portability of PRS with the use of IMPACT-annotated SNPs.

The trans-ethnic GWAS meta-analysis identified a total of 30 GWAS loci, most of which were derived from those identified in European populations. Three of these loci including EOMES, ESR1, and SLC12A5 have never been reported. Eomesodermin encoded by EOMES and T-bet are TFs belonging to the T-box family and known to differently regulate CD8+ T cell differentiation and function as well as exhaustion. (5355). Considering the association of EOMES variation with multiple AIDs such as RA (56), multiple sclerosis (57), and ankylosing spondylitis (58), as well as immune-related traits including lymphocyte count and granulocyte count (59), variations of EOMES may be a trigger of autoimmunity shared among multiple AIDs including SSc. A nuclear hormone receptor, estrogen receptor 1, encoded by ESR1 is involved in various gene expression which affects cellular proliferation and differentiation in given tissues. ESR1 binds to nuclear factor-κB (NF-κB) and these two molecules mutually trans-repress each other to regulate cellular response including cytokine production (60). Of note, rs230534 in the NFKB1 region was also one of the lead SNPs of our meta-analysis (Table 2). Since females are ~ five times more affected by SSc than males, the association of ESR1 with SSc is quite reasonable, but on the other hand, it was surprising that previous studies have never identified this locus; it might be due to the relatively weak association and effect sizes. Potassium-chloride transporter member 5 (KCC2) encoded by SLC12A5 is a brain-specific chloride potassium symporter and is well-characterized in neuronal cells for its function of maintaining intracellular chloride concentrations (61). However, the functional relation of these loci to immune regulations or tissue fibrosis has never been known thus far and is remained to be investigated.

As we expected, most of the susceptibility genes including novel risk loci identified in the present study overlapped with those associated with other AIDs, especially SLE. Consequently, there was a significant genetic correlation between SSc and SLE, and to a lesser extent RA despite the relatively small number of cases in the present study. Our study did not answer several questions, such as how identified and unidentified susceptibility variants, HLAs, interaction with other genes (62), or environmental factors contribute to the difference in disease phenotypes among AIDs, and this question is also still open to be explored.

As is often the case of current GWASs, we were not able to include SNPs with very low allele frequencies and other types of genetic variations such as structural variants not tagged by SNPs, which are the main limitations of the present study and remained for future studies with more sample sizes and better imputation utilizing a WGS-based reference panel. Nevertheless, our largest-scale Asian GWAS provides valuable insights into the complex genetic architecture of SSc as well as the prioritization of targets for future functional studies.

Methods

Study subjects

A total of 1,499 cases consisted of 712 who had been enrolled in the previous study (11) and 787 who were newly enrolled. A total of 112,609 controls consisted of 2,105 from a single center and 110,504 from BBJ project in which 200,000 individuals with one of 47 common diseases were enrolled (19, 20). All the subjects were Japanese nations. The diagnosis of SSc was made by physicians according to the ACR/EULAR classification criteria (2013) (63).

Written informed consent was obtained from all the participants before enrollment into the study. The ethical committees approved this research at the RIKEN Center for Integrative Medical Sciences.

Genotyping and quality control

All the case samples and the non-BBJ control samples were genotyped using Illumina Infinium CoreExome Array or a combination of Illumina Infinium Core Array and Illumina Infinium Exome Array. BBJ samples were genotyped using Illumina Human OmniExpress Exome BeadChip or a combination of Illumina HumanOmniExpress and HumanExome BeadChips (CoreExome Array and OmniExpress Exome BeadChip are not exome arrays, but genome-wide arrays).

The genotype QC criteria were set as follows and variants that did not meet any of these criteria were excluded from further analyses; genotyping call rate ≥ 99%, Hardy–Weinberg equilibrium p-values (HWE-P) > 1.0×10− 6, allele frequency difference from those of the imputation panel < 3.0%, and MAF > 0.01.

The subjects who met any of the following criteria were excluded from the analyses; subjects with genotyping call rate < 0.98, those who were in a high degree of relatedness showing PiHAT > 0.25 estimated by PLINK1.9 (64), or those who were outliers of East Asian (EAS) in the principal component analysis (PCA) (Supplementary Fig. 1, 2). For the PCA, we extracted variants shared between our datasets (Set1 and Set2) and those in the HapMap project.

Imputation

After the sample and the variant QCs, genotypes were phased and imputed altogether to enhance the accuracy of genotyping imputation. For a reference genotype panel, we used a previously constructed imputation panel from the phase 3 1,000 genome project ver.5 (1KGp3ver5) data (65) combined with high-depth WGS data from 3,256 Japanese subjects from BBJ (J3K) (manuscript in preparation), which enables rare variant detection (66). The genotype data were phased and imputed by EAGLE (ver.2.3.5) and Minimac4 (ver.1.0.0), respectively (67). Variants with low imputation accuracy (R2 < 0.3) or MAF < 0.01 among control samples were excluded. After the imputation, 9,246,028 autosomal variants and 255,517 X chromosome variants remained.

Association analysis of Japanese SSc

An initial association analysis was conducted by a logistic regression model using PLINK (ver.2.0) (64). Ten genetic principal components were used as covariates. For an X chromosome analysis, males and females were separately analyzed first and then meta-analyzed with an inverse-variance fixed-effect model to estimate overall associations. The genome-wide significant threshold of p-value was set at 5.0×10− 8. Novel variants were defined as those which passed the genome-wide significant threshold and had not been identified as significant variants in the previous GWASs or were at least 1Mb apart from a significant locus in the previous GWASs. Conditional analyses were conducted using GCTA-COJO (68), where association analyses were repeated conditioning on given significant variants in a region within ± 1 Mb from the variant until the association did not meet the threshold p-value of significant level. We used a relaxed threshold level, p = 1.0×10− 6 (69), for the conditional analyses (70). The obtained results were confirmed by conducting conditional analyses using PLINK2.0.

We also applied the Firth regression (21) using PLINK2.0 and a generalized linear mixed model using a scalable generalized linear mixed model for region-based association test (SAIGE) (22) to address potential bias caused by low allele frequencies and case-control imbalance. For the Firth regression, all the case samples (N = 1,428) were included while control samples consisted of randomly selected 13,946 samples from BBJ and the Set 1 control samples, which resulted in an improvement of case/control ratio from 1/79 to 1/11. For the linear mixed effect model, only identical samples were excluded in the kinship adjustment (PiHAT > 0.90).

Fine-mapping analysis

In search of a causal SNP in each independently associated region, we utilized a statistical fine-mapping based on the asymptomatic Bayes factors. Bayes factor was calculated from minor allele frequencies and the p-values through Wakefield's approximations (71), and posterior probabilities that a single SNP is causal in the region based on the two key assumptions; 1) all SNPs have been genotyped, and 2) there is a single causal variant. The 95% credible set of SNPs located within 250kb distance from the genomic position of each lead SNP was defined. For visualization, regional locus zoom plots were drawn with LocalZoom (v0.14.0) (72).

Transcriptome-wide association study

TWAS, an association analysis for disease susceptibility based on gene expressions estimated by summary statistics, was conducted with FUSION software (73) using the GTEx ver.7 multi-tissue models consisting of 48 cell/tissue types (25). We also conducted TWAS for subsets of white blood cells, CD4 T cells, CD8 T cells, NK cells, Monocytes, B cells, and neutrophils, using the Japanese eQTL study for 105 Japanese healthy subjects (26). The statistical significance threshold was based on Bonferroni correction accounting for all the tested genes (0.05/279,331).

Disease-related pathway analysis

Potential disease-related pathways were explored by FUMA (v1.3.8), an integrative online platform for functional mapping and annotation of genetic associations (74). The GWAS summary statistics was uploaded for the initial SNP2GENE analysis and the resultant mapped genes were utilized for the GENE2FUNC analysis.

Trans-ethnic genetic effect correlation

The transethnic genetic effect correlation between the European and Japanese was estimated using the python package, Popcorn (ver.0.9.9) (75). Briefly, the cross-population scores were computed using the 1KGp3-EAS plus J3K and 1KGp3-EUR, and the heritability and the transethnic genetic correlation of a pair of GWAS summary statistics were fitted to the scores (65).

Trans-ethnic meta-GWAS analysis with European meta-GWAS dataset

A trans-ethnic meta-analysis between Japanese and European SSc was conducted by PLINK (ver.1.9) with the use of an inverse-variance fixed-effects model. The latest GWAS meta-analysis summary data of the European population consisted of 9,095 cases and 17,584 controls (12) was used. Conditional analyses were also conducted using GCTA-COJO, where the association analysis was conducted by conditioning on significant variants in each population and the resultant association data were meta-analyzed to obtain significant variants. The analyses were repeated until no variants reached a relaxed threshold level of significance, p = 1.0×10− 6.

SNP annotation

Functional annotations of given variants, including potential alteration of protein function for exonic variants, were identified by ANNOVAR (version: 2017-07-17) (76). For the lead variants and their strong LD variants (R2 ≥ 0.8), we examined if these variants were also eQTL variants in the previous eQTL study of leukocyte subgroups (26) or GTEx ver.8 (25). Enrichment of GWAS significant variants in cell type-specific active histone marks was measured using Haploreg ver.4.1, an online tool exploring annotations for noncoding variants based on DNAse and chromatin immunoprecipitation (ChIP) sequencing data (77). LoF variants or deleterious exonic variants were explored among the lead variants and their LD variants (R2 ≥ 0.8) by VEP/LOFTEE (v1.0.2) or Polyphen2 (v2.2.13) and SIFT (v5.2.2), respectively.

Transcription factor binding motif analysis

Transcription factor binding motif analysis was conducted using Tomtom (v5.3.3) (78), an online-based motif comparison tool, using the cCRE sequence containing the reference allele (C) or the alternative allele (T) of rs10917688 as input sequences.

LD estimation

The regional plots were drawn using LocusZoom software (79). LD of a given variant with the corresponding lead variant was estimated using PLINK (ver.1.9) referring to the imputation results of the data of 1KGp3ver5 plus J3K.

LD score regression

To estimate the heritability, the LDSC was conducted using LDSC software (ver.1.0.0) (80) with a liability scale under the disease prevalence of 0.1% for Japanese (6) and 0.2% in European populations (1), respectively. Partitioned heritability enrichment was also measured in specific cell groups and detailed cell types using the baseline model (ver.2.2) (81). To measure genetic correlations between SSc and various complex diseases (82), the summary statistics of 40 target diseases of BBJ and that of Japanese SLE (30) were utilized.

gchromVAR

The gchromVAR weights chromatin features by posterior probabilities of fine-mapped variants and computes the enrichment for each cell type versus an empirical background matched for GC content and feature intensity (31). The gchromVAR was applied to the Japanese SSc GWAS summary statistics using 18 bulk ATAC-seq for FACS-sorted hematopoietic progenitor populations derived from bone marrow samples of multiple healthy individuals. The threshold of significance was based on Bonferroni correction (P = 0.002778).

Polygenic risk scores for SSc susceptibility

PRSs were calculated to assess the polygenic feature of SSc and the predictive ability of given SNPs for disease susceptibility.

Since the European dataset had the largest case numbers among the datasets used in the present study and thus enabled the most accurate approximate coefficients of variants, we initially used the summary statistics of the European GWAS to generate matrices consisting of r2 of LD and GWAS p-values. To test the better approximation of the coefficients, we meta-analyzed the European and Set 1 Japanese GWASs and used the summary statistics.

Set 1 Japanese dataset was used as a discovery dataset to assess the predictive ability of PRS models and to determine the threshold of p-values for variants included in the construction of the PRS. Set 2 Japanese dataset was used as a test dataset to evaluate the performance of the PRS.

SNPs (N = 3,652,217) outside the HLA region shared between Japanese and European datasets were extracted and SNPs with a minor allele frequency < 0.01 or those with an imputation quality score (Rsq) < 0.3 for each cohort were excluded. We applied the standard pruning and thresholding method to construct the PRS (83). Specifically, the clump function of PLINK (1.90b) was used to generate eligible SNPs with the 250 kb window. A total of 9 pruning thresholds of LD (r2) from 0.1 to 0.9 and a total of 20 thresholds of p-values in GWAS (PT) (5×10− 8, 5×10− 7, 1×10− 6, 5×10− 6, 1×10− 5, 5×10− 5, 5×10− 4, 0.05, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) were set to construct PRS matrices. For each, r2 was used to generate different PRSs based on the PTs. The natural logarithms of the GWAS odds ratios (ORs) were used as weights across all datasets. The SNP alleles used in the PRS were aligned to risk alleles for SSc susceptibility. All the effect sizes were set to positive values and the effect alleles were defined accordingly. The PRS was the sum of the weighted allele counts (by their respective GWAS effect sizes) across all the SNPs included in the PRS according to the following formula:

$${PRS}_{i}={\sum }_{k=1}^{n}{\beta }_{k}{X}_{k,i}$$

where i denotes each subject, n denotes the number of variants passing a threshold, βk denotes an effect size of k-th SNP and Xk,i denotes the genotype dosage of k-th SNP in individual i.

After the calculation of PRSs, a logistic regression model was applied to calculate the association between the PRS and susceptibility to SSc for the Japanese subjects. The area under the receiver operating characteristic (AUROC), sensitivity, and specificity of each model, was generated by the R package pROC version 1.13.0. To assess the goodness of fit of a given model, Nagelkerke’s pseudo-R2 metric was also calculated.

We split the dataset into 20 quantiles according to individual PRS to evaluate the performance of the PRS to distinguish case and control. The tenth quantile in 20 quantiles was used as the reference (84). We applied logistic regression by comparing the remaining subgroups with the reference.

IMPACT

IMPACT (Inference and Modeling of Phenotype-related ACtive Transcription), is a genome annotation strategy to identify regulatory elements defined by cell-state-specific TF binding profiles and can capture more cis-eQTL variation than sequence-based annotations (32, 33).

To examine the improvement of PRS performance with the use of IMPACT-annotated SNPs, the top 5% of SNPs annotated according to IRF8-binding in one of B cell lines, RAMOS cells, and the lead SNPs of the meta-analysis for the European and Set1 Japanese GWASs were prioritized to generate PRSs. The rest of the procedures were the same as described above, and the predictive performance and the goodness of fit of the model were compared with those obtained without prioritization of SNPs.

Declarations

Acknowledgments

We deeply thank all the individuals who participated in the study and the staff in the BBJ project for their efforts. We also thank Midori Suzuki in Aichi Cancer Center Research Institute for array genotyping experiments and all members in Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences for the technical support.

Funding

This study was supported by Medical Research and Development (AMED) under Grant Number JP21kk0305013, JP21tm0424220 and JP21ck0106642, Japan Society for the Promotion of Science KAKENHI Grant JP20H00462, THE KATO MEMORIAL TRUST FOR NAMBYO RESEARCH and the Medical Research Support Project of the Shizuoka Prefectural Hospital Organization.

Author contributions

Conceptualization: Y. Ishikawa, C. Terao

Methodology: Y. Ishikawa, N. Tanaka, N. Otomo, Y. H. Suetsugu, Koike, K. Tomizuka,

S. Yoshino, X. Liu, S. Ito, K. Hikino, T. Amariuta, C. Terao

Investigation: Y. Ishikawa, N. Tanaka, C. Terao

Visualization: Y. Ishikawa, C. Terao

Funding acquisition: C. Terao

Project administration: C. Terao

Supervision: C. Terao

Writing – original draft: Y. Ishikawa, C. Terao

Writing – review & editing: Y. Ishikawa, N. Tanaka, Y. Asano, M. Kodera, Y. Shirai,

M. Akahoshi, M. Hasegawa, T. Matsushita, S. Kazuyoshi, S. Motegi, H. Yoshifuji, A. Yoshizaki,

T. Kohmoto, K. Takagi, A. Oka, M. Kanda, Yoshihito Tanaka, Y. Ito, K. Nakano, H. Kasamatsu,

A. Utsunomiya, A. Sekiguchi, H. Niro, M. Jinnin, K. Makino, T. Makino, H. Ihn, M. Yamamoto,

C. Suzuki, H. Takahashi, E. Nishida, A. Morita, T. Yamamoto, M. Fujimoto, Y. Kondo, D. Goto,

T. Sumida, N. Ayuzawa, H. Yanagida, T. Horita, T. Atsumi, H. Endo, Y. Shima, A. Kumanogoh,

J. Hirata, N. Otomo, H. Suetsugu, Y. Koike, K. Tomizuka, S. Yoshino, X. Liu, S. Ito, K. Hikino,

A. Suzuki, Y. Momozawa, S. Ikegawa, Yoshiya Tanaka, O. Ishikawa, K. Takehara, T. Torii,

S. Sato, Y. Okada, T. Mimori, F. Matsuda, K. Matsuda, T. Amariuta, I. Imoto, K. Matsuo,

M. Kuwana, Y. Kawaguchi, K. Ohmura, C. Terao

Competing interests

H. Yoshifuji has received lecture fee from Chugai, Boehringer Ingelheim, and Nippon Shinyaku.

Data and materials availability

The data underlying this article are available in the article and in its online supplementary material. Individual-level deidentified patient data, survey results, interview transcripts, statistical code, and spreadsheets would be available from Chikashi Terao (ORCID 0000-0002-6452-4095) at any time only on reasonable request. There is no additional information available.

References

  1. C. P. Denton, D. Khanna, Systemic sclerosis. Lancet 390, 1685-1699 (2017).
  2. C. Campochiaro, Y. Allanore, An update on targeted therapies in systemic sclerosis based on a systematic review from the last 3 years. Arthritis Res Ther 23, 155 (2021).
  3. M. De Martinis, F. Ciccarelli, M. M. Sirufo, L. Ginaldi, An overview of environmental risk factors in systemic sclerosis. Expert Rev Clin Immunol 12, 465-478 (2016).
  4. I. Walecka, M. Roszkiewicz, A. Malewska, Potential occupational and environmental factors in SSc onset. Ann Agric Environ Med 25, 596-601 (2018).
  5. Y. Hamaguchi, Autoantibody profiles in systemic sclerosis: predictive value for clinical evaluation and prognosis. J Dermatol 37, 42-53 (2010).
  6. Y. Ishikawa, C. Terao, Genetics of Systemic Sclerosis. J Scleroderma Relat Disord 5, 192-201 (2020).
  7. F. C. Arnett et al., Major histocompatibility complex (MHC) class II alleles, haplotypes and epitopes which confer susceptibility or protection in systemic sclerosis: analyses in 1300 Caucasian, African-American and Hispanic cases and 1000 controls. Ann Rheum Dis 69, 822-827 (2010).
  8. T. R. Radstake et al., Genome-wide association study of systemic sclerosis identifies CD247 as a new susceptibility locus. Nat Genet 42, 426-429 (2010).
  9. Y. Allanore et al., Genome-wide scan identifies TNIP1, PSORS1C1, and RHOB as novel risk loci for systemic sclerosis. PLoS Genet 7, e1002091 (2011).
  10. O. Gorlova et al., Identification of novel genetic markers associated with clinical phenotypes of systemic sclerosis through a genome-wide association strategy. PLoS Genet 7, e1002178 (2011).
  11. C. Terao et al., Transethnic meta-analysis identifies GSDMA and PRDM1 as susceptibility genes to systemic sclerosis. Ann Rheum Dis 76, 1150-1158 (2017).
  12. E. López-Isac et al., GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways. Nat Commun 10, 4955 (2019).
  13. W. Pu et al., Exome-Wide Association Analysis Suggests LRP2BP as a Susceptibility Gene for Endothelial Injury in Systemic Sclerosis in the Han Chinese Population. J Invest Dermatol 141, 1254-1263.e1256 (2021).
  14. C. Yang et al., STAT4: an immunoregulator contributing to diverse human diseases. Int J Biol Sci 16, 1575-1585 (2020).
  15. P. Dieudé et al., Association of the TNFAIP3 rs5029939 variant with systemic sclerosis in the European Caucasian population. Ann Rheum Dis 69, 1958-1964 (2010).
  16. E. Koumakis et al., Brief report: candidate gene study in systemic sclerosis identifies a rare and functional variant of the TNFAIP3 locus as a risk factor for polyautoimmunity. Arthritis Rheum 64, 2746-2752 (2012).
  17. C. Terao et al., PLD4 as a novel susceptibility gene for systemic sclerosis in a Japanese population. Arthritis Rheum 65, 472-480 (2013).
  18. X. Zhou et al., HLA-DPB1 and DPB2 are genetic loci for systemic sclerosis: a genome-wide association study in Koreans with replication in North Americans. Arthritis Rheum 60, 3807-3814 (2009).
  19. A. Nagai et al., Overview of the BioBank Japan Project: Study design and profile. J Epidemiol 27, S2-S8 (2017).
  20. M. Hirata et al., Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases. J Epidemiol 27, S9-S21 (2017).
  21. X. Wang, Firth logistic regression for rare variant association tests. Front Genet 5, 187 (2014).
  22. W. Zhou et al., Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts. Nat Genet 52, 634-639 (2020).
  23. S. Akizuki et al., PLD4 is a genetic determinant to systemic lupus erythematosus and involved in murine autoimmune phenotypes. Ann Rheum Dis 78, 509-518 (2019).
  24. X. Liu, H. Qin, J. Wu, J. Xu, Association of TNFAIP3 and TNIP1 polymorphisms with systemic lupus erythematosus risk: A meta-analysis. Gene 668, 155-165 (2018).
  25. G. Consortium, The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318-1330 (2020).
  26. K. Ishigaki et al., Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat Genet 49, 1120-1125 (2017).
  27. E. P. Consortium, An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).
  28. C. A. Davis et al., The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 46, D794-D801 (2018).
  29. C. Orvain, S. Assassi, J. Avouac, Y. Allanore, Systemic sclerosis pathogenesis: contribution of recent advances in genetics. Curr Opin Rheumatol 32, 505-514 (2020).
  30. X. Yin et al., Meta-analysis of 208370 East Asians identifies 113 susceptibility loci for systemic lupus erythematosus. Ann Rheum Dis, (2020).
  31. J. C. Ulirsch et al., Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat Genet 51, 683-693 (2019).
  32. T. Amariuta et al., IMPACT: Genomic Annotation of Cell-State-Specific Regulatory Elements Inferred from the Epigenome of Bound Transcription Factors. Am J Hum Genet 104, 879-895 (2019).
  33. T. Amariuta et al., Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat Genet 52, 1346-1354 (2020).
  34. H. Xu et al., Regulation of bifurcating B cell trajectories by mutual antagonism between transcription factors IRF4 and IRF8. Nat Immunol 16, 1274-1281 (2015).
  35. N. M. Adams et al., Transcription Factor IRF8 Orchestrates the Adaptive Natural Killer Cell Response. Immunity 48, 1172-1182.e1176 (2018).
  36. L. Franke et al., Association analysis of copy numbers of FC-gamma receptor genes for rheumatoid arthritis and other immune-mediated phenotypes. Eur J Hum Genet 24, 263-270 (2016).
  37. T. J. Wilson, S. Gilfillan, M. Colonna, Fc receptor-like A associates with intracellular IgG and IgM but is dispensable for antigen-specific immune responses. J Immunol 185, 2960-2967 (2010).
  38. T. Santiago et al., FCRLA is a resident endoplasmic reticulum protein that associates with intracellular Igs, IgM, IgG and IgA. Int Immunol 23, 43-53 (2011).
  39. S. Kulemzin et al., Characterization of human FCRLA isoforms. Immunol Lett 152, 153-158 (2013).
  40. M. P. Holgado et al., Fcγ Receptor IIa (FCGR2A) Polymorphism Is Associated With Severe Respiratory Syncytial Virus Disease in Argentinian Infants. Front Cell Infect Microbiol 10, 607348 (2020).
  41. F. Nimmerjahn, J. V. Ravetch, Fcgamma receptors as regulators of immune responses. Nat Rev Immunol 8, 34-47 (2008).
  42. A. M. Walsh et al., Integrative genomic deconvolution of rheumatoid arthritis GWAS loci into gene and cell type associations. Genome Biol 17, 79 (2016).
  43. X. W. Zhu et al., Comprehensive Assessment of the Association between FCGRs polymorphisms and the risk of systemic lupus erythematosus: Evidence from a Meta-Analysis. Sci Rep 6, 31617 (2016).
  44. J. S. Verbeek, S. Hirose, H. Nishimura, The Complex Association of FcγRIIb With Autoimmune Susceptibility. Front Immunol 10, 2061 (2019).
  45. L. Yip, R. Fuhlbrigge, R. Alkhataybeh, C. G. Fathman, Gene Expression Analysis of the Pre-Diabetic Pancreas to Identify Pathogenic Mechanisms and Biomarkers of Type 1 Diabetes. Front Endocrinol (Lausanne) 11, 609271 (2020).
  46. C. Terao et al., IgG4-related disease in the Japanese population: a genome-wide association study. The Lancet Rheumatology 1, E14-E22 (2019).
  47. T. Kadono, M. Tomita, Z. Tamaki, S. Sato, Y. Asano, Serum levels of anti-Fcγ receptor IIB/C antibodies are increased in patients with systemic sclerosis. J Dermatol 41, 1009-1012 (2014).
  48. S. Hambleton et al., IRF8 mutations and human dendritic-cell immunodeficiency. N Engl J Med 365, 127-138 (2011).
  49. R. Lu, Interferon regulatory factor 4 and 8 in B-cell development. Trends Immunol 29, 487-492 (2008).
  50. H. Wang, H. C. Morse, IRF8 regulates myeloid and B lymphoid lineage diversification. Immunol Res 43, 109-117 (2009).
  51. S. Carotta et al., The transcription factors IRF8 and PU.1 negatively regulate plasma cell differentiation. J Exp Med 211, 2169-2181 (2014).
  52. S. Jonsson et al., Identification of sequence variants influencing immunoglobulin levels. Nat Genet 49, 1182-1191 (2017).
  53. E. L. Pearce et al., Control of effector CD8+ T cell function by the transcription factor Eomesodermin. Science 302, 1041-1043 (2003).
  54. M. A. Paley et al., Progenitor and terminal subsets of CD8+ T cells cooperate to contain chronic viral infection. Science 338, 1220-1225 (2012).
  55. L. M. McLane et al., Role of nuclear localization in the regulation and function of T-bet and Eomes in exhausted CD8 T cells. Cell Rep 35, 109120 (2021).
  56. E. Ha, S. C. Bae, K. Kim, Large-scale meta-analysis across East Asian and European populations updated genetic architecture and variant-driven biology of rheumatoid arthritis, identifying 11 novel susceptibility loci. Ann Rheum Dis 80, 558-565 (2021).
  57. A. H. Beecham et al., Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet 45, 1353-1360 (2013).
  58. A. Cortes et al., Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci. Nat Genet 45, 730-738 (2013).
  59. S. Sakaue et al., A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415-1424 (2021).
  60. H. Liu, K. Liu, D. L. Bodenner, Estrogen receptor inhibits interleukin-6 gene expression by disruption of nuclear factor kappaB transactivation. Cytokine 31, 251-257 (2005).
  61. Y. Ben-Ari, Excitatory actions of gaba during development: the nature of the nurture. Nat Rev Neurosci 3, 728-739 (2002).
  62. S. Kawano et al., Phenotype conversion from rheumatoid arthritis to systemic lupus erythematosus by introduction of Yaa mutation into FcγRIIB-deficient C57BL/6 mice. Eur J Immunol 43, 770-778 (2013).
  63. F. van den Hoogen et al., 2013 classification criteria for systemic sclerosis: an American College of Rheumatology/European League against Rheumatism collaborative initiative. Arthritis Rheum 65, 2737-2747 (2013).
  64. S. Purcell et al., PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-575 (2007).
  65. A. Auton et al., A global reference for human genetic variation. Nature 526, 68-74 (2015).
  66. N. Tanaka et al., Eight novel susceptibility loci and putative causal variants in atopic dermatitis. J Allergy Clin Immunol, (2021).
  67. P. R. Loh, P. F. Palamara, A. L. Price, Fast and accurate long-range phasing in a UK Biobank cohort. Nat Genet 48, 811-816 (2016).
  68. J. Yang, S. H. Lee, M. E. Goddard, P. M. Visscher, GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88, 76-82 (2011).
  69. C. N. Spracklen et al., Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582, 240-245 (2020).
  70. J. Yang et al., Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369-375, S361-363 (2012).
  71. J. Wakefield, Bayes factors for genome-wide association studies: comparison with P-values. Genet Epidemiol 33, 79-86 (2009).
  72. A. P. Boughton et al., LocusZoom.js: Interactive and embeddable visualization of genetic association study results. Bioinformatics, (2021).
  73. A. Gusev et al., Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 48, 245-252 (2016).
  74. K. Watanabe, E. Taskesen, A. van Bochoven, D. Posthuma, Functional mapping and annotation of genetic associations with FUMA. Nat Commun 8, 1826 (2017).
  75. B. C. Brown, C. J. Ye, A. L. Price, N. Zaitlen, Agian Genetic Epidemiology Network Type2 Diabetes Consortium, Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 99, 76-88 (2016).
  76. K. Wang, M. Li, H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010).
  77. L. D. Ward, M. Kellis, HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 40, D930-934 (2012).
  78. S. Gupta, J. A. Stamatoyannopoulos, T. L. Bailey, W. S. Noble, Quantifying similarity between motifs. Genome Biol 8, R24 (2007).
  79. R. J. Pruim et al., LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336-2337 (2010).
  80. B. K. Bulik-Sullivan et al., LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015).
  81. H. K. Finucane et al., Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228-1235 (2015).
  82. B. Bulik-Sullivan et al., An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015).
  83. A. V. Khera et al., Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50, 1219-1224 (2018).
  84. S. W. Choi, T. S. Mak, P. F. O'Reilly, Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc 15, 2759-2772 (2020).

tables

Tables 1 and 2 are available in the Supplementary Files section.