Sex-stratified GWAS datasets
The East Asian sex-stratified GWAS summary statistics datasets of T2D and PAD were obtained from the BioBank Japan (BBJ) and the Asian Genetic Epidemiology Network (AGEN). BBJ (http://jenger.riken.jp/en/)54 is a database that collects disease samples and GWAS data from > 200,000 registered Japanese individuals (male: female = 53.10%: 46.90%; average baseline age at 62.70 and 61.5 for men and women). The sex-stratified BBJ-T2D GWAS55 (Ncase = 40,250 [male = 25,705, female = 14,545], Ncontrol = 170,615 [male = 82,774, female = 87841]) and BBJ-PAD GWAS56 (Ncase = 3,593 [male = 2,784, female = 809], Ncontrol = 208,860 [male = 106,563, female = 102,297]) were both derived from a linear mixed model via SAIGE57, adjusted by age and top five genetic principal components. In addition, we downloaded sex-stratified GWAS summary statistics of T2D from the AGEN (https://blog.nus.edu.sg/agen/summary-statistics/t2d-2020/). The AGEN-T2D GWAS1 contains 22 East Asian population-based cohorts (Ncase = 55,397 [male = 28,027, female = 27,370], Ncontrol = 162,425 [male = 89,312, female = 135,055]) and was generated by a fixed-effect inverse-variance-weighted meta-analysis using METAL58. Considering the sample overlap between AGEN-T2D and BBJ-PAD (~ 14%) is less than that between BBJ-T2D and BBJ-PAD, in cross-trait analysis (e.g., LDSC, LAVA, and MR), we used the analysis between AGEN-T2D and BBJ-PAD as the main result and the ones between BBJ-T2D and BBJ-PAD as a sensitivity analysis.
The European sex-stratified GWAS summary statistics datasets of T2D and PAD were generated with our in-house UK Biobank (UKB) data (application number: 65805). UKB involves > 500,000 individuals of European ancestry from the UKB cohort (male: female = 45.60%: 54.40%; average baseline age at 56.74 and 56.35 for men and women). Quality control was conducted by removing: (1) The sample with non-British ethnicity and a missing rate > 0.05. (2) SNPs with call rate < 0.05, minor allele frequency (MAF) < 0.01, imputation INFO score < 0.8, Hardy-Wein-berg test P < 1× 10− 6. Finally, a set of ~ 425,000 individuals and 8.8M SNPs were included in this study. Cases of a certain disease were determined with the code of the International Classification of Diseases 10th (ICD-10) (UKB Field ID: 41270). An individual with an ICD-10 code of non-insulin-dependent diabetes mellitus (E11) or diseases of arteries, arterioles and capillaries (I70-I79) was identified as a T2D or a PAD patient1, and the corresponding control is the individual without T2D or PAD. Then, we used BOLT-LMM57 to generated sex-stratified UKB-T2D GWAS (Ncase = 22,813 [male = 14,059, female = 8,754], Ncontrol = 402,585 [male = 181,145, female = 221,440]) and UKB-PAD GWAS (Ncase = 6,294 [male = 3,712, female = 2,582], Ncontrol = 419,104 [male = 191,492, female = 227,612]) after adjusting age and cryptic relatedness. We converted the SNP effects (\(\beta\)) and their associated standard errors (\(SE\)) to a quantitative scale (\(\beta {\prime }\) and \(SE{\prime }\)) using the approximate approach \(\beta {\prime }\left(\text{o}\text{r} SE{\prime }\right)= \frac{\beta \left(\text{o}\text{r} SE\right)}{\mu (1-\mu )}\), where µ is the proportion of cases59.
We also generated the sex-stratified reference genome (including 23 chromosomes) of East Asians and Europeans. The East Asian reference genome was generated with 491 Chinese males and 831 Chinese females. The European reference genome was generated with 10,000 British males and 10,000 British females randomly sampled from the whole British population. All reference genome were hg19-based, and SNPs with MAF < 0.01 or located within the major histocompatibility complex (MHC) region (chromosome 6: 28,477,797 − 33,448,354)60 was excluded.
Global and local Genetic correlation analysis
We used LDSC to investigate the genetic correlation between T2D and PAD in males and females across the genome. The pre-computed LD scores of two populations were drawn from 481 East Asians and 489 Europeans from the 1000 Genomes and are available at the LDSC website (https://alkesgroup.broadinstitute.org/LDSCORE/). Then, we produced the sex-stratified LD score in the X-chromosome through the in-house reference. Before applying LDSC analysis, the SNPs to input were filtered with HapMap3 data31 (for autosomes) and the reference genome (for X-chromosome) for the purpose of quality control. The SNPs which were strand-ambiguous (i.e., the A/T and G/C SNPs) were excluded.
Single trait LDSC was conducted to estimate the sex-specific liability-scale heritability (h2) of T2D and PAD on global levels. The population prevalence of T2D (male: 8.6%61, female: 6.8%61) and PAD (male: 6.3%62, female: 7.0%62) in East Asians were set according to previous studies conducted by Wu et al.61 and Wang et al.62. Similarly, the corresponding prevalence of T2D(male: 3.8%63, female: 3.1%63) and PAD (male: 6.4%64, female: 5.1%64) in Europeans were collected based upon existing research reported by The Information Centre63 and Kroger et al.64.
Then, we implemented sex-stratified cross-trait LDSC to calculate the genetic correlation (rg) between T2D and PAD with unconstrained intercept26,65. For genetic correlation (rg), a P < 0.05/2 = 0.025 was considered significant. To quantify the sex difference in rg, the Z statistic was constructed as follows: \(Z=\frac{{Z}_{male}^{{\prime }}-{Z}_{female}^{{\prime }}}{\sqrt{s}}\), where \({Z}^{{\prime }}\) is converted from rg using the Fisher's Z-transformation66 (\({Z}^{{\prime }}=0.5\text{l}\text{n}\left(\frac{1+{r}_{g}}{1-{r}_{g}}\right)\)), \(s\) is defined as the standard error for the difference66(\(s=\sqrt{\frac{1}{{n}_{male}-3}+\frac{1}{{n}_{female}-3}}\)), and \(n\) refers to the sample size of rg66 (\(n=\frac{1-{r}_{g}}{{SE}_{rg}^{2}}+2\)). Differences were deemed significant if the P of the Z statistics was less than 0.05. The Z statistic of h2 was also established.
Local Analysis of [co]Variant Annotation (LAVA)27 was performed to estimate genetic correlations on particular genomic regions. It accounts for correlated SNPs due to LD by converting marginal SNP effects into their combined effects, based on an external LD reference. The inputted GWAS data and reference genome data are the same as LDSC analysis. For locus definition, we download the Europeans’ example locus file (based on 1000g data phase 3, build GRCh37/hg19) provided by LAVA’s support data(https://github.com/josefin-werme/LAVA/tree/main/support_data)67, which includes 2,495 semi-independent blocks of ~ 1 Mb. And we used approximately LD-independent breakpoint dataset of East Asians provided by Berisa et al68, which partitions the genome into 1445 blocks of ~ 1.8 Mb (based on 1000g data phase 1, build GRCh37/hg19). Using LAVA, we detected the univariate local genetic signal within each trait (i.e., the local h2). Loci that have significantly estimated local h2 (defined with a default threshold of P < 0.05 in univariate test) in both T2D and PAD were then used for bivariate tests (two-sided). P of the local rg were corrected (i.e., Padjust = 0.05/ [number of bivariate tests]) and the threshold was set to 0.05.
Z statistic was constructed to test the sex difference of local rg on particular region using the same method mentioned above. The lower and upper bound of the 95% CI of rg were transferred into the standard error of rg using the following equation: \({SE}_{rg}=\frac{{{r}_{g}}_{upper}-{{r}_{g}}_{lower}}{2*1.96}\). The threshold of significant sex difference was defined by P < 0.05.
The mendelian randomization analysis
For each sex, we performed Mendelian Randomization (MR)28 methods to infer the causal relationship between T2D and PAD. MR28 is a technique that can be used to determine the causal effects of a risk factor (i.e., exposure) on a trait (i.e., outcome). Six models were used in our analysis to lower the weakness or intrinsic biases in a single method, including Inverse-variance-weighted (IVW)69, MR-Egger70, generalized summary-data-based Mendelian Randomization (GSMR)71, weighed median72, weighted mode73, and the causal analysis using summary effect estimates (CAUSE)74.
Of these, five methods (IVW, MR-Egger, GSMR, weighted median, and weight mode) selected instrumental SNPs survived from LD clump with GWAS P < 5×10− 8 or 1×10− 5 when T2D or PAD served as “exposure” since no significant SNP remains in PAD GWAS at the P threshold of 5×10− 8. The LD clump was performed with LD \({r}^{2}\)<0.05 within a 1000-kb window. CAUSE selected more instrumental SNPs with GWAS P < 1×10− 3 and with LD \({r}^{2}\) < 0.1. To evaluate pleiotropy, we used MR-Egger intercept (indicates the uncorrelated pleiotropy), CAUSE q (indicates the uncorrelated pleiotropy), CAUSE η (indicates the correlated pleiotropy) and ELPD P (indicates the power of causal model). Small MR-Egger intercept, CAUSE q, CAUSE η and significant ELPD P are ideal. The MR analysis mentioned above can be performed using R packages “TwoSampleMR”, “gsmr” and “cause”. If significant pleiotropy is detected (P < 0.05), outlier tests would be performed to filter SNPs and check the robustness of the estimates via R package “MR-PRESSO”. A value of Bonferroni-corrected level MR P (i.e., <0.05/12≈4.17×10−3, adjusted by six bi-directional tests) was regarded as a significant causal relationship.
Then, we converted the logit-scale causal effects (i.e., \({\beta }_{MR}\)) estimated by MR models to liability-scale based on the method proposed by Byrne et al75:
$${{\beta }_{\text{M}\text{R}}}_{\text{l}\text{i}\text{a}\text{b}\text{i}\text{l}\text{i}\text{t}\text{y}}= \frac{{Z}_{{K}_{\text{x}}}{K}_{\text{y}}(1-{K}_{y})}{{Z}_{{K}_{\text{y}}}{K}_{\text{x}}(1-{K}_{\text{x}})}{{\beta }_{\text{M}\text{R}}}_{\text{l}\text{o}\text{g}\text{i}\text{t}}$$
,
where \({K}_{\text{x}}\) and \({K}_{\text{y}}\) refer to the population prevalence of exposure and outcome, and \({Z}_{{K}_{\text{x}}}\) and \({Z}_{{K}_{\text{y}}}\)represent the values of the standard normal distribution at the corresponding prevalence. Then, we transformed the liability-scale \(\beta\) to odds ratios (OR). The sex difference of causal effect between two sexes estimated with liability \({\beta }_{MR}\) was evaluated by paired t-test. Significance was set at a P less than 0.05.
Functional gene analysis
We explored the mechanism underlying the genetic relationship between T2D and PAD by investigating the functional gene of single trait T2D, PAD, and their cross-trait meta-analysis. Cross-trait GWAS of T2D and PAD were generated with inverse-variance-weighted meta-analysis via Multi-Trait Analysis of Genome-wide association summary statistics (MTAG)76, for investigating sex-specific risk genes shared by T2D and PAD. MTAG uses GWAS summary statistics from multiple traits and can boost statistical power when the traits are genetically correlated. Novel SNPs were ascertained with criteria of genome-wide significant (MTAG GWAS P\(<\)5\(\times\)10−8) associations with the cross-trait shared architecture of T2D and PAD but were not included in the single trait T2D or PAD.
Novel functional genes were annotated if they exhibited significant associations with the cross-trait shared architecture of T2D and PAD using MAGMA. We used 19,427 protein-coding genes (NCBI 37.3) for annotation, where SNPs were mapped to a gene if they were located within its transcription region. The 1000 Genomes Project European- and East Asian-based LD reference panels were utilized to correct LD structure. The sex-stratified gene-based analysis was performed with the SNP-wise Mean model, using GWAS summary SNP P. The significance thresholds of candidate genes were set at MAGMA Padjust <0.05. The number of significant genes overlapped in two sexes was then used for chi-square test to find whether overlapped genes are more than expected by chance. To investigate the function of candidate genes identified by MAGMA and shared by two sexes, we used R package clusterProfiler77 (https://guangchuangyu.github.io/software/clusterProfiler) to conduct GO analysis (Biological Processes).
Summary data-based Mendelian randomization analysis (SMR)31 was applied to capture sex-differential causal genes from single trait (T2D or PAD) and multi-trait GWAS. SMR uses the top SNPs in cis-eQTL (GWAS P < 5×10− 8) as instrumental variables to detect potential functional genes. Integrating GWAS summary data and eQTL information, it can enhance the power to search for significant gene-trait associations78. In our study, we accessed the public blood-based cis-eQTL summary data79, which contains 19,250 probes and covered > 30,000 Europeans, from the eQTLGen Consortium (https://eqtlgen.org/cis-eqtls.html). Then, SMR analysis was applied to each sex-stratified single trait GWAS, and the cross-trait GWAS of T2D and PAD generated with MTAG76. To correct the unexpected associations confounded by linkage (e.g., causal SNPs are in LD with each other, which alter the gene expression and the trait, respectively), we conducted HEIDI test31 in each SMR estimate. Functional genes were determined with MAGMA-significant genes with SMR FDR < 0.05/ [number of MAGMA-significant genes] and HEIDI P > 0.05 from at least 10 SNPs.