LDSC identified significant genetic correlations between all the included traits (Supplementary Fig. 1). However, genetic correlations with headache/migraine were lower than the other traits (rgaverage = 0.27 vs. 0.58). A common factor CFA fit well (\(\:\chi\:\)2(5) = 21.61, p = 0.001, AIC = 41.61, CFI = 0.99, SRMR = 0.03; Fig. 2b). Standardized loadings ranged from 0.35 (for headache/migraine) to 0.89 (for fatigue). Although the loading of headache/migraine met our pre-specified minimum of 0.30, it was substantially lower than the other traits (all > 0.65), suggesting that headache/migraine was not adequately represented by the common factor. To maintain a higher standard of representation and consistency, we excluded headache/migraine from the final model. After its exclusion, model fit improved significantly (\(\:\varDelta\:\chi\:\)2(3) = 12.31, p = 0.006; fit statistics: \(\:\chi\:\)2(2) = 9.30, p = 0.01, AIC = 25.30, CFI = 1, SRMR = 0.03).
GWAS of the somatoform factor identified 134 significantly associated loci (Neff = 799,429; Supplementary Table 1), of which 44 were not GWS in any of the input GWAS, and 8 had not been previously associated with somatoform traits (Fig. 2a). The most strongly associated locus (p = 6.80e-16) was outside the MHC region of chromosome 6 nearest to gene UHRF1BP1, which is involved in assisting or regulating epigenetic modifications.69 The lead SNP in this locus (rs9469907) functions as an eQTL for several genes, including C6orf106, SNRPC, and CLPS, in PsychENCODE and GTEx brain tissues.
In SNP-level PheWAS (Supplementary Table 2 and Supplementary Fig. 2), the lead SNPs from the 8 novel loci were associated primarily with physical health, immunological, and mental health measures (e.g., insomnia, C-reactive protein levels, body mass index, wellbeing, depression, and Type 2 diabetes). Based on the QSNP test, 8 loci exhibited heterogeneous effects across the somatoform traits (Supplementary Table 3). Of the lead SNPs from these loci, all but one showed the strongest association with pain intensity. The exception, rs10795616, was most strongly associated with health satisfaction (p = 2.21e-08).
Gene Mapping and Functional Annotation
MAGMA identified 874 genes based on position, eQTL, and chromatin interactions. One-third (33.98%; n = 297) of genes were mapped by more than one approach, and 11.21% (n = 98) were mapped by all three. eQTL and chromatin interaction plots are in Supplementary Fig. 3. Candidate SNPs showed an enrichment of intronic (p = 7.95e-317), intergenic (p = 7.43e-297), non-coding RNA intronic (p = 1.08e-8), 3' UTR (p = 7.87e-7), and 5' UTR (p = 2.62e-4) functional categories (Fig. 2c). Of candidate SNPs (n = 5,555), 30 had a CADD score > 20, and 255 (4.59%) had scores ≥ 12.37, which is suggestive of deleteriousness to gene function (Supplementary Table 4). Of the candidate SNPs, 59.17% (n = 3287) had RegulomeDB scores indicative of regulatory functions related to transcription factor binding and gene expression (i.e., 1a-1f; Supplementary Table 5).
Gene-Based Enrichment
MAGMA gene-set analyses showed enrichment for genes involved in negative regulation of synaptic transmission (b = 0.71, SE = 0.14, p = 3.24e-07). Gene-property analyses identified significant gene enrichment in 11 GTEx v8 brain tissues (Fig. 3a). The strongest associations were for the cerebellar hemisphere (b = 0.05, SE = 0.01, p = 1.43e-12), cerebellum (b = 0.05, SE = 0.01, p = 2.63e-12), and frontal cortex (b = 0.05, SE = 0.01, p = 1.97e-9). In addition, there was significant enrichment for gene expression in the pituitary gland (b = 0.05, SE = 0.01, p = 5.64e-6). Gene-property analyses using BrainSpan brain samples from 11 developmental stages showed enrichment for gene expression in the early mid-prenatal (b = 0.04, SE = 0.01, p = 0.001) and late mid-prenatal (b = 0.05, SE = 0.02, p = 0.002) stages (Supplementary Fig. 4).
To investigate cell-specific enrichment, we performed cell-type specificity analyses using 16 human brain scRNA-seq datasets and identified 7 cell-specific gene expression profiles associated with the somatoform factor after multiple testing correction (Fig. 3b). Of the seven, three (GABAergic neurons in the prefrontal cortex at gestational week 26, GABAergic neurons in the human midbrain, and inhibitory neurons from PsychENCODE adult brain samples) were independently associated with the somatoform factor, with the others jointly explained by their association with the independent cell types. In cross-dataset conditional analyses, GABAergic neurons at gestational week 26 and inhibitory adult neurons were significantly collinear, suggesting that the associations of these two cell types are driven by similar genetic signals.
SMR with Brain eQTLs and Blood Plasma pQTLs
We conducted summary-data-based Mendelian randomization (SMR) analyses to examine whether the associations of SNPs with somatoform traits were mediated by effects on gene expression in the brain and protein expression in blood plasma. We identified 28 genes whose expression levels in the brain exerted putatively causal effects on somatoform traits (Bonferroni-adjusted p < 0.05 and pSMR > 0.05; Fig. 4a). Among these were UHRF1BP1, which is known to interact with a key regulator of DNA methylation, and HLA-DRB1, part of the human leukocyte antigen (HLA) family of genes that is critical in initiating immune responses and implicated in many autoimmune conditions.
Using data from the UKB Pharma Proteomics Project (UKB-PPP), we identified 117 genes whose protein levels were significantly associated with somatoform traits, including several members of the CD300 family that are involved in modulating inflammatory responses (Fig. 4b). Additionally, genes involved in neural development and synaptic processes were significant, such as HS6ST1 and LRRN1. Using plasma proteomic data from deCODE, we identified 57 genes whose effects on protein abundance were putatively causally associated with somatoform traits (Fig. 4c). Across the UKB-PPP and deCODE analyses, 6 genes were consistently identified: two members of the CD300 family of genes (i.e., CD300A and CD300C), CLEC4G, HS6ST1, LRRN1, and RNASET2.
Transcriptome-Wide Association Analyses
To examine genes whose expression in enriched tissues was related to somatoform traits, we performed two TWAS using MetaXcan.44,45 In the first, we used S-MultiXcan to simultaneously examine gene expression profiles across the 12 tissues that showed significant enrichment in MAGMA analyses. After Bonferroni correction (p = 0.05/14,368 = 3.48e-6), we identified 158 genes with significantly altered expression levels associated with somatoform traits (Fig. 5a). In a second TWAS of brain tissues in psychiatric cases and controls,70 we identified 131 genes that showed significantly different (p \(\:\le\:\) 4.16e-6) expression levels (Fig. 5b). Across the two TWAS, 34 genes were consistently identified, including genes related to immune function (e.g., MST1R), oxidative stress response (e.g., GPX1), neural development (e.g., DPYSL5 and SORCS3), and neurotransmitter signaling (e.g., GRK4). Several showed enriched expression across brain tissues (e.g., DPYSL5 and GPX1; Supplementary Fig. 5). Across the two TWAS and the SMR analysis, five genes (CCDC144CP, PPP6C, SCAI, UHRF1BP1, USP32P3) were consistently implicated.
Polygenicity, Discoverability, and Polygenic Overlap with Psychopathology
We used MiXeR software to conduct univariate and bivariate causal mixture models to assess the polygenicity (i.e., the number of variants estimated to be needed to explain 90% SNP-heritability) and discoverability (i.e., the causal effect size variance) of somatoform traits, as well as their polygenic overlap with psychopathology spectra (Supplementary Fig. 6). MiXeR models estimated 11,321 causal SNPs for the somatoform factor (SD = 562.64), with an average discoverability of 7.50e-6 (SD = 3.55e-7). The somatoform factor and externalizing psychopathology were moderately genetically correlated (rg = 0.46, SD = 0.02), and shared an estimated 76% of their causal variants (SD = 0.18). Similarly, the somatoform factor and internalizing psychopathology were significantly genetically correlated (rg =0.63, SD = 0.01) and shared 79% of their causal variants (SD = 0.12). Estimates were also similar for the polygenic overlap between general psychopathology and the somatoform factor (rg = 0.69, SD = 0.01, Dice = 0.83, SD = 0.07).
Genetic Correlations
Genetic correlations were performed between the somatoform factor and 1,426 publicly available GWAS. After Bonferroni correction (p = 0.05/1,426 = 3.51e-5), the somatoform factor was significantly genetically correlated with 646 phenotypes (Supplementary Table 6 and Supplementary Fig. 7). Consistent with the somatoform factor representing PPS for which there is no identifiable medical cause, one of the top genetic correlations was with “symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified” (rg = 0.80, SE = 0.02, p = 1.69e-242). Of the significant associations, at least 70 (10.84%) were with pain-related conditions and medications. Many psychiatric traits were also significantly genetically correlated with the somatoform factor, including major depressive disorder (rg = 0.60, SE = 0.02, p = 3.29e-212), mood swings (rg = 0.60, SE = 0.02, p = 1.73e-166), and loneliness/isolation (rg = 0.62, SE = 0.02, p = 6.02e-136). Results identified significant genetic correlations with obesity-related phenotypes (e.g., waist circumference, body mass index, and whole-body fat mass), socioeconomic status (e.g., educational attainment, unemployment, and financial difficulties), and general health (e.g., having a longstanding illness, taking prescription medications, and lacking physical activity).
Lab- and Phenome-Wide Association Scans
BioVU and PMBB Meta-Analysis. After meta-analyzing effects across BioVU and PMBB, there were 229 significant associations with the somatoform PGS (Fig. 6). The top associations included obesity (beta = 0.18, SE = 0.01, p = 2.76e-76), tobacco use disorder (TUD; beta = 0.18, SE = 0.01, p = 9.69e-72), type 2 diabetes (beta = 0.17, SE = 0.01, p = 3.67e-69), and mood disorders (beta = 0.14, SE = 0.01, p = 3.00e-52). There were many significant associations with pain-related conditions, including nonspecific chest pain (beta = 0.12, SE = 0.01, p = 2.36e-38), unspecified muscle pain (beta = 0.18, SE = 0.02 p = 3.64e-29), abdominal pain (beta = 0.09, SE = 0.01, p = 2.18e-27), and chronic pain (beta = 0.12, SE = 0.01, p = 2.83e-22). Study-specific results from BioVU and PMBB are in Supplementary Tables 7 and 8.
BioVU LabWAS. After Bonferroni correction, LabWAS identified 40 significant associations of the somatoform factor with biomarkers (Supplementary Fig. 8), including elevated C reactive protein levels (CRP; beta = 0.03, SE = 0.005, p = 4.43e-15) and erythrocyte sedimentation rates (beta = 0.06, SE = 0.01, p = 1.92e-06). Other notable associations were with higher erythrocyte distribution widths (beta = 0.06, SE = 0.004, p = 6.66e-52) and white blood cell counts (beta = 0.05, SE = 0.004, p = 1.58e-37), and with lower levels of iron (beta = -0.04, SE = 0.01, p = 1.75e-6) and Vitamin D (beta = -0.06, SE = 0.01, p = 5e-19). Consistent with the link with diabetes in the PheWAS, the somatoform PGS was also related to higher hemoglobin A1c levels (beta = 0.03, SE = 0.005, p = 4.43e-15).
Yale-Penn. In the Yale-Penn sample, there were 229 significant PheWAS associations after Bonferroni correction (Fig. 6 and Supplementary Table 9). The sample is enriched for substance use disorders, and these were the most prevalent associations identified. However, other phenotypes like lower educational levels (beta = -0.17, SE = 0.01, p = 1.15e-38), poorer self-reported health rating (beta = -0.18, SE = 0.02, p = 1.29e-25), lower household income (beta = -0.32, SE = 0.04, p = 2.19e-17), and greater childhood adversity (beta = 0.28, SE = 0.04, p = 1.16e-12) were also significant in the deeply phenotyped sample.
Drug Repurposing
We used LINCS to match medication signatures to somatoform gene expression signatures. After Bonferroni correction, we identified 324 perturbagens (Supplementary Table 10) comprising a wide array of mechanisms and classes across both emerging (i.e., undergoing clinical trials; n = 113) and FDA-approved (n = 211) therapeutics. Several were of relevance to PPS, including analgesics (e.g., diclofenac, ibuprofen, and venlafaxine), antidiarrheals (e.g., loperamide), and antidepressants (e.g., bupropion). Of note, two identified drugs targeted the MAP2K1 gene and reversed the gene expression signature found in the PsychENCODE TWAS. Both drugs (pd-0325901 and selumetinib) are kinase inhibitors, with selumetinib having been approved to treat symptomatic plexiform neurofibromas in children with a genetic disorder that causes tumors to grow along nerves.71 Additionally, of the identified perturbagens, ten had gene targets that mapped to GWS SNPs (Supplementary Fig. 9). These included four dopamine receptor antagonists (i.e., nemonapride, melperone, benperidol, and carmoxirole) four kinase inhibitors (i.e., vemurafenib, dabrafenib, pf-562271, and sorafenib), an HDM2 antagonist (i.e., serdemetan), and a calcium channel blocker (i.e., nimodipine). After Bonferroni correction, the DRUGSETS analysis identified one significant association with Anatomical Therapeutic Chemical (ATC) code G04B (t = 3.89, p = 5.44e-5), which comprises urological drugs.
Potentially Causal Effects of the Gut Microbiome
Of the 418 taxa in the MiBioGen and Dutch Microbiome Projects, 326 (130 from MiBioGen and 196 from the Dutch Microbiome Project) had genetic variants associated with their abundance at p < 1e-5 and were included in MR analyses. Of these, 23 (10 from MiBioGen and 13 from the Dutch Microbiome Project) exhibited putatively causal effects on the somatoform factor prior to multiple testing correction, but only four remained significant after Benjamini-Hochberg false discovery rate (FDR) correction (all from the Dutch Microbiome Project; Supplementary Fig. 10). One significant result was based on a single SNP so was not interpreted. At the species level, Adlercreutzia equolifaciens (beta = -0.04, SE = 0.01, p = 5.13e-5) and Ruminococcus bromii (beta = -0.05, SE = 0.01, p = 6.65e-4) exhibited putatively causal protective effects on the somatoform factor. Similarly, the genus Adlercreutzia (beta = -0.04, SE = 0.01, p = 5.14e-5) was also significant and potentially protective. In the MiBioGen cohort, the genus Adlercreutzia was not putatively causally associated with the somatoform factor. However, analyses in that cohort showed a consistent direction of effect. In the MiBioGen cohort, the putative effects of Ruminococcus on the somatoform factor were in a matching direction as well, and the association with the genus was significant prior to FDR correction (beta = -0.01, SE = 0.006, p = 0.04). Therefore, findings in MiBioGen provide relatively modest replication of our results.
Steiger directionality tests supported the proposed causal direction of microbiota on the somatoform factor (all ps < 2.05e-5). We used the Egger intercept to evaluate confounding by pleiotropy, but due insufficient instruments, we were only able to evaluate this for Ruminococcus bromii (Egger intercept = -0.02, SE = 0.05, p = 0.76). There was no evidence for heterogeneity in the effects of instruments for Adlercreutzia equolifaciens (Q(1) = 0.02, p = 0.88), Ruminococcus bromii (Q(2) = 3.65, p = 0.16), or Adlercreutzia (Q(1) = 0.03, p = 0.86).