1.1 Study design and data summary
The overall study design is shown in Fig. 1. We used summary statistics retrieved from publicly available GWASs, ASD from the Psychiatric Genomics Consortium (Ncase/Ncontrol = 18,381/27,969) 2, IBS from the UK BioBank (Ncase/Ncontrol = 53,400/433,201) 15, multisite chronic pain from the UK BioBank (N = 387,649) 16, and fatigue from the UK BioBank (N = 108,976) 17. Polygenic risk score analyses were conducted in the large Dutch Lifelines cohort.
1.2 Estimation of genetic correlations
We performed genome-wide genetic correlation analyses for ASD and FSS including IBS, pain, and fatigue, respectively. First, global genetic correlations (rg) were estimated from GWAS summary statistics by linkage disequilibrium score regression (LDSC) 26. Global genetic correlation refers to the correlation of genetic effects between two related traits throughout the genome, highlighting the degree of pleiotropy and genetic overlap 26, 27. We filtered included genetic variants using the following criteria: imputation score (INFO) > 0.9 and minor allele frequency (MAF) > 0.01. Indels, strand-ambiguous SNPs, and SNPs with duplicated rs numbers were removed. We adjusted p-values for multiple testing by the false discovery rate (FDR) using the Benjamini-Hochberg procedure and a p < 0.05 was considered statistically significant.
The global genetic correlation describes the average effect of pleiotropy across all causal loci, but the underlying architecture of correlations at individual loci can vary 21. To gain better insight into genetic correlation at the locus level (within genome-wide partitioned LD blocks), local genetic correlations between ASD and FSS were calculated by the Local Analysis of coVariant Association (LAVA) tool. Unlike the global correlation analysis, the local approach enables us to identify and estimate genetic correlation in specific genomic regions, thereby providing insights into their local effects and shared genetic basis. The method has been detailed elsewhere 28. Briefly, 2495 semi-independent LD blocks were defined using European-ancestry 1000 Genome data 28. Then we conducted univariate association analyses on the local genetic signal of each trait. Regions that showed significant univariate associations for more than one trait were selected to test for local genetic correlation. For this purpose, we used a relatively lenient p-value threshold of 0.05. Finally, pairwise bivariate local genetic correlation analyses across the selected regions were estimated between ASD, IBS, pain, and fatigue. For the correlation analyses, we adjusted p-values for multiple testing by FDR using the Benjamini-Hochberg procedure and a p < 0.05 was considered statistically significant.
1.3 Multi-trait analysis of GWAS
Because of the putative genetic correlations between ASD and FSS, the multi-trait analysis of GWAS (MTAG) was conducted to perform the meta-analysis of the four traits 22. MTAG leverages the genetic correlations between multiple traits to generate trait-specific effect estimates for each SNP using GWAS summary statistics. As the authors describe, MTAG is robust to potential sample overlap between different GWASs. It improves the effect estimates of each trait and boosts statistical power to detect genetic associations Therefore, it can also improve the power of PRS 22. We filtered included genetic variants using default MTAG parameters 29. Briefly, variants were restricted to those common to all four traits of the GWAS, with a MAF > 0.01. For each trait, we calculated the 90th percentile of the SNP sample-size distribution. All SNPs with a sample size below 75% of this calculated value were removed.
To identify genomic risk loci from the MTAG results, functional Mapping and Annotation (FUMA) analyses were conducted 30. SNPs with p-value < 5 × 10− 8 were genome-wide significant. Lead SNPs were LD-pruned based on a 250 kilobase (kb) window and r2 < 0.1 using the 1000 Genome European population as a LD reference panel. Lead SNPs which were closer than 250 kb were merged into one genomic risk locus. ANNOVAR employed in FUMA was used to map SNPs to genes 31. A locus that was not found to be significant in previous GWASs and was not in LD with a previous loci in GWASs (r2 < 0.1 within 1 megabase (Mb) window) for the phenotype was labeled as ‘novel.’
1.4 PRS analyses
To validate the shared architecture between ASD and FSS, PRS analyses were conducted in the Lifelines Cohort, a large and multi-generational population-based cohort study starting in 2006 32, 33. Lifelines assesses biomedical, socio-demographic, behavioural, physical, and psychological factors related to the health and disease of over 167,000 residents living in the three northern provinces of the Netherlands. This cohort is broadly representative of socioeconomic characteristics, diseases, and general health of the population in the north of the Netherlands 34. The participants were recruited between 2006 and 2013 at baseline and followed up from 2014 onwards. The Lifelines Cohort Study was conducted according to the principles of the Declaration of Helsinki and approved by the ethics committee of the University Medical Centre Groningen. All participants signed an informed consent form 35.
For the current study, we included 14,134 participants with European ancestry from the Lifelines cohort with genetic data and our phenotypic data of interest. DNA samples were genotyped using the Illumina Global Screening Array (N = 10,758) and Illumina CytoSNP12v2 array (N = 3,376). After quality control, both genotyping datasets were then imputed at the Sanger imputation server using the Haplotype Reference Consortium panel r1.1 36. Details of genotyping, quality control, and imputation in Lifelines have been published elsewhere 37. Two PRSs of ASD were calculated in the Lifelines samples based on the summary statistics of ASD GWAS and ASD MTAG results, respectively. SNPs of low quality were removed from summary statistics (INFO ≤ 0.8, MAF ≤ 0.01, strand-ambiguous or duplicated SNPs). To obtain an independent set of SNPs, the LD-driven clumping procedure was performed in PLINK version 1.90 (r2 < 0.1, kb = 250) using the 503 European samples from 1000 Genome projects as reference. Then weighted PRSs were calculated as the sum of risk allele dosages weighted by the corresponding effect size estimated in the GWAS results. PRSs were constructed across a range of p-value thresholds (5 × 10− 8, 5 × 10− 7, 5 × 10− 6, 5 × 10− 5, 5 × 10− 4, 5 × 10− 3, 0.05, 0.5, 1) and were then standardized using z-score transformations. Thereafter, we conducted a principal component analysis (PCA) on the resulting PRSs and used the first principal component as the final ASD PRS in subsequent association analyses 38. This approach uses PCA to reweight the SNPs included in the PRS to achieve maximum variation across all the p-value thresholds. It avoids choosing one optimal p-value threshold, therefore controlling type-1 error and preventing overfitting.
Linear regression models were applied to examine the associations between ASD PRSs and phenotypes of ASD, IBS, pain, and fatigue, adjusted for age, sex, arrays, and 10 genetic principal components that correct population structure. Measurements of the phenotypes were described in the supplementary material. PRS and phenotype measures were standardized to Z-scores (mean = 0 and standard deviation = 1) to facilitate interpretation. We adjusted p-values for multiple testing by FDR using the Benjamini-Hochberg procedure and a p < 0.05 was considered statistically significant.
1.5 Bidirectional Mendelian randomization analyses
Bidirectional two-sample MR was conducted to infer putative causal relationships between ASD and FSS from GWAS summary statistics. MR estimates the causal effects of the exposure on the outcome using genetic variants as IVs 24. We performed the method of random-effect inverse variance weighted (IVW) as the main analysis, and four complementary methods as sensitivity analyses, including MR Egger, weighted median, and median-based methods (simple and weighted) 39. IVs were selected based on association statistics of the exposure GWAS summary statistics (Stable 1). First, genome-wide significant SNPs (p < 5×10− 8) were selected. For SNPs that were not available in outcome GWASs, a proxy SNP identified in high-LD (r2 > 0.7) was used. If the proxies were not available (e.g., the only one significant SNP in the GWAS of fatigue is 1:64178756_C_T and it does not have an rs id or proxy), we chose SNPs that have a p-value less than 5×10− 7. Second, independent genetic variants were selected by LD clumping procedure at r2 < 0.001 within 10 Mb. SNP alleles were harmonized between exposure and outcome summary statistics and palindromic SNPs with intermediate allele frequencies (0.42–0.58) were discarded. The Steiger filtering method was applied to remove potential invalid IVs that explain more variance in the outcome than the exposure 40. To evaluate horizontal pleiotropy, heterogeneity tests by MR-Egger and IVW methods and Egger-intercept tests were applied 41. MR analyses were performed using the TwoSampleMR (version 0.5.6) package for R (version 4.0.5) 40, 42. We adjusted p-values for multiple testing by FDR using the Benjamini-Hochberg procedure and a p < 0.05 was considered statistically significant.