Genome-wide association study reveals novel genetic markers associated with endurance athlete status

Background: The genetic predisposition to elite athletic performance has been a controversial subject due to the underpowered studies and the small effect size of identified genetic variants. The aims of this study were to investigate the association of common single-nucleotide polymorphisms (SNPs) with endurance athlete status in a large cohort of elite athletes using GWAS approach, followed by a functional validation of significant SNPs by metabolomics analysis. Results: The association of 476,728 SNPs of Illumina DrugCore Gene chip and endurance athlete status was investigated in 753 European international-level athletes (594 males, 159 females) by comparing allelic frequencies between athletes specialized in sports with high (n=630) and low/moderate (n=123) aerobic component. Validation of results was performed by comparing the frequencies of the most significant SNPs between 176 elite Russian endurance athletes and 173 Russian controls or 43 sprinters. Two novel SNPs showed significant associations with endurance athlete status at Bonferroni level of significance (rs56330321 in ATP2B2, p=1.46E-7) and FDR level of significance (rs2635438 in SYNE1, p=2.54E-7), respectively. Replication study using Russian cohorts and a subsequent meta-analysis have confirmed the association of rs56330321 and rs2635438 SNPs with endurance athlete status at genome-wide significance (P= 5.13E-09 and 1.91E-08, respectively). Metabolomics analysis revealed several amino acids and lipids associated with the identified SNPs with potential roles in performance enhancement. Conclusions: This is the first report of GWAS significant SNPs and related metabolites associated with elite athlete status. Further investigations of the functional relevance of the identified SNPs and metabolites in relation to enhanced athletic performance are warranted.


Abstract
Background: The genetic predisposition to elite athletic performance has been a controversial subject due to the underpowered studies and the small effect size of identified genetic variants. The aims of this study were to investigate the association of common single-nucleotide polymorphisms (SNPs) with endurance athlete status in a large cohort of elite athletes using GWAS approach, followed by a functional validation of significant SNPs by metabolomics analysis. Results: The association of 476,728 SNPs of Illumina DrugCore Gene chip and endurance athlete status was investigated in 753 European international-level athletes (594 males, 159 females) by comparing allelic frequencies between athletes specialized in sports with high (n=630) and low/moderate (n=123) aerobic component. Validation of results was performed by comparing the frequencies of the most significant SNPs between 176 elite Russian endurance athletes and 173 Russian controls or 43 sprinters. Two novel SNPs showed significant associations with endurance athlete status at Bonferroni level of significance (rs56330321 in ATP2B2, p=1.46E-7) and FDR level of significance (rs2635438 in SYNE1, p=2.54E-7), respectively.
Replication study using Russian cohorts and a subsequent meta-analysis have confirmed the association of rs56330321 and rs2635438 SNPs with endurance athlete status at genome-wide significance (P= 5.13E-09 and 1.91E-08, respectively). Metabolomics analysis revealed several amino acids and lipids associated with the identified SNPs with potential roles in performance enhancement. Conclusions: This is the first report of GWAS significant SNPs and related metabolites associated with elite athlete status. Further investigations of the functional relevance of the identified SNPs and metabolites in relation to enhanced athletic performance are warranted.

Background 4
Elite athletic performance is a multi-factorial trait with input from both genetic and environmental factors. The superior performance of elite athletes has been historically considered an outcome of a special talent shaped by intensive training. The talent is now believed to be a product of additive genetic components predisposing the athlete to endurance, speed, strength, flexibility and coordination trainability under the control of strong environmental cues including exercise and nutrition. In this model, the genetic predisposition together with ability to respond to training are the keys to the superior physical performance of elite athletes (1).
Sports can be classified according to the type and intensity of the exercise required to perform during competition. The percentage of maximal oxygen uptake (VO 2m ax ) is a detrimental factor in the categorization of endurance sports, as it reflects the maximal cardiac output, the oxygen transport capacity, and the blood volume (2). Accordingly, sports can be divided into sport events with low, moderate and high aerobic (dynamic) component (3). Similarly, the percent of maximal voluntary contraction (MVC), which reflects the greatest amount of tension a muscle can generate and hold, is used to classify sports into sporting disciplines with low, moderate and high power component (3).
Classical twin and family genetic studies have suggested that VO 2m ax is up to 94% inherited (4,5). Genome-wide association studies (GWAS) in athletes versus non-athletes have uncovered many new loci in association with VO 2m ax (6, 7) and elite endurance performance (8). A more recent review of genetic predisposition to elite athletic endurance has highlighted 93 endurance variants (9). However, research into the genetics of athletic performance has been hindered by a small sample size and complex phenotype (10). One of the first GWAS in athletes using 143 K single-nucleotide polymorphisms (SNPs) and subsequent meta-analysis of 45 promising genetic markers in 1520 endurance athletes and 2760 controls has revealed only one statistically significant marker (rs558129 at GALNTL6) associated with endurance status in world class athletes, although the association did not reach the genome-wide level of significance (11). Therefore, the genetic predisposition to endurance traits remains unclear, largely due to the relatively underpowered elite athletes' cohorts.
Metabolomics analysis has presented a novel tool to validate genomics data by providing an intermediate phenotype (metabolites) in association with the identified genetic variants (12,13). Pilot metabolomics studies have revealed differences in the metabolic signature of moderate and high endurance elite athletes, such as steroid biosynthesis, fatty acid metabolism, oxidative stress and energy-related molecular pathways (14,15).
In this study, we aimed to investigate the association of multiple SNPs and endurance athlete status in a relatively large cohort of European elite athletes specialized in sports with high and low/moderate aerobic component using GWAS approach and replicate our findings in elite Russian athletes and matched controls. We also aimed to perform functional validation using metabolomics analysis by identifying metabolites that are associated with significant endurance-related SNPs.

Methods
The aim of this study is to investigate the genetic predisposition to elite athletic endurance through conducting the largest GWAS in elite athletes to date, followed by functional validation through metabolomics study to shed light on the underlying mechanisms of genetic associations.

Discovery study
Seven hundred and fifty-three consented European international-level athletes (594 males, 159 females) from different sports disciplines who participated in national or international sports events and tested negative for doping substances at anti-doping laboratories in Qatar (ADLQ) and Italy (FMSI) were included in this study. No other information of participants was available due to the strict anonymization process undertaken by the anti-doping laboratories. This study was performed in line with the World Medical Association Declaration of Helsinki -Ethical Principles for Medical Research Involving Human Subjects. All protocols were approved by the Institutional Research Board of ADLQ (F2014000009). Athletes were dichotomized into groups with different aerobic (dynamic) and power (static) components (Table 1) based on their sport types as described previously (3). Table 1 further lists the number of participants based on various analyses as per sport type in each class/group and their genders.

Replication study
The Russian athletes' study involved 219 athletes (95 females, age 21.9 (3.5) years, 124 Subsequently, re-suspended pellet was loaded in the beadchip then incubated overnight at 48 o C in hybridization oven. On third day, beadchips underwent enzymatic base extension and fluorescent staining. Lastly, after coating, the beadchips were imaged using iScan.

Replication study
Molecular genetic analysis in Russian cohorts was performed with DNA samples obtained from leukocytes (venous blood). Four ml of venous blood were collected in tubes containing EDTA (Vacuette EDTA tubes, Greiner Bio-One, Austria). Blood samples were transported to the laboratory at 4°C and DNA was extracted on the same day. DNA extraction and purification were performed using a commercial kit according to the

Data Extraction and SNP Identification
Raw data was extracted, peak-identified and QC processed using Illumina iScan hardware and software. These systems are built on a web-service platform utilizing Microsoft's NET technologies, which run on high-performance application servers and fiber-channel storage arrays in clusters to provide active failover and load-balancing.

Metabolomics
Screening of serum metabolites was performed in 490 elite athletes (Table S1) using protocols established at Metabolon, Durham, NC, USA. The platform utilizes Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. Detailed protocol and QC measures were previously published (14,42).

Statistical analysis
Following genotyping using Illumina's Drug Core SNP array, analysis was performed using Plink v1.9. Quality control measures were applied to the genotype data set to exclude samples with low genotype call rate or excess heterozygosity. Accordingly, SNPs with a genotype call rate < 98%, minor allele frequency < 1%, or deviating from Hardy-Weinberg equilibrium (P < 10E-6) were excluded. After filtering the data with the above criteria,

Genome-Wide Association study
Athletes from the discovery cohort were classified into different groups of sports following previously published sports classification criteria (3), as shown in Table 1.
The PCA of the genotyping data revealed no influence of sport disciplines ( Figure 1A reaching FDR level of significance of 5% (p=2.54E10-7). Table 2

Functional metabolic validation of GWAS significant SNPs
To validate the potential functionality of the identified GWAS SNPs, metabolomics of 750 metabolites was carried out in a subset of the discovery cohort (n=490) and enriched metabolic pathways associated with the rs56330321 and rs2635438 were determined (Table 3). Among the metabolic pathways associated with rs56330321, ceramides, fatty acid (Acyl Carnitine), polyamine and creatine metabolites were significantly altered by rs56330321 genotype (Table 3, Figure 3). Whereas, gamma-glutamyl amino acid and glutamate metabolic pathways were significantly changed with rs2635438 (Table 3, Figure   4).

Discussion
Genetic predisposition into cardiorespiratory fitness and response to exercise training has been previously described (5,(16)(17)(18)(19)(20). Since endurance sports are characterized by increased cardiorespiratory capacity, genetic predisposition into elite endurance performance is also expected to be genetically influenced (21). However, genetic studies of elite athletic endurance showed inconsistent results (10, [21][22][23]. The aims of this study were to carry out the largest GWAS study of elite European athletes to date using a unique SNP microarray that is enriched with genes involved in different metabolic pathways with direct influence on various physiological pathways characteristic of elite athletes. GWAS results have revealed two novel SNPs (rs56330321 and rs2635438) associated with endurance at Bonferroni and FDR level of significance, respectively. Validation of the results in an independent cohort of elite Russian athletes and controls has confirmed the association of rs56330321 and rs2635438 with endurance athlete status. Subsequent meta-analysis of the two cohorts has shown for the first time that both SNPs were associated with endurance athlete status at genome-wide level of significance.
The two novel SNPs (rs56330321 and rs2635438) are located within genes ATP2A2 and SYNE2, respectively. Although these two genes have not been previously implicated directly in physical performance, their potential roles in cell signaling and cytoskeletal structure of skeletal muscle cells were previously established (24,25). Investigation of the functional relevance of these SNPs in relation to enhanced athletic performance was sought using metabolomics analysis. The metabolic pathways associated with the two significant SNPs included various ceramides, fatty acidsacyl carnitine, polyamines, creatine/creatinine, gamma-glutamyl amino acids and glutamate metabolites. The functional relevance of these associations remains to be further validated.
The top GWAS significant SNP (rs56330321) is located within the intron of ATP2B2. This gene codes for the plasma membrane Ca2+ ATPase 2 (PMCA2) that belongs to the P-type primary ion transport ATPases. These enzymes can remove bivalent calcium ions from the cell against high gradients, providing a pivotal role in intracellular calcium homeostasis (24). PMCA2 is mainly expressed in the inner ear, the cerebellum and the mammary gland with an established role in hearing and balance in mice (26) and humans (27). The expression of different isoforms and splice variants is highly regulated following the physiological demand of the cell (28). The association between PMCA2 and physical performance has not been previously described. The under representation of the rs56330321 A allele in athletes specialized in sports with high aerobic component in the discovery and replication cohorts, compared to other athletes and controls, may suggest that carrying the A allele is disadvantageous for endurance athletes.
The rs56330321 A allele is associated with higher levels of several ceramides, fatty acidsacyl carnitine, polyamines, except for acisoga (N-(3-acetamidopropyl)pyrrolidin-2one), and creatine/creatinine ( Figure 3). Ceramides tend to accumulate in skeletal muscles and promote insulin resistance. Chronic endurance exercise lowers muscle ceramides and promotes the insulin-sensitivity in exercising muscle (29). Since the A allele is associated with higher ceramides levels, it could be compromising the beneficial effect of exercise in carriers on improving insulin sensitivity (30). The A allele is also associated with higher levels of fatty acids acyl carnitines, a hallmark of active fatty acid oxidation. During endurance exercise, fatty acids oxidation increases, sparing glycogen and delaying muscle fatigue (31). Despite the beneficial effect of fatty acid oxidation in endurance athletes, the elevated fatty acid acyl carnitines in A allele carriers may represent a compensatory mechanism to counteract ceramides-induced impairment of fatty acid oxidation (32). The A allele was also associated with higher polyamine accumulation, except for acisoga. The increase in polyamines concentration in exercising skeletal muscle after physical exercise reflects oxidative processes related to muscle adaptation to exercise (33). The elevated polyamines in A allele carriers may therefore reflect higher oxidative mechanisms, also suggested by the increased acyl carnitines, in response to endurance exercise. The elevated creatine/creatinine levels in A allele carriers may suggest worse renal functions compared to GG individuals, perhaps contributing to their lower prevalence in high endurance athletes (34). The direct link between rs56330321 and levels of these metabolites is yet to be determined.
The second GWAS significant SNP (rs2635438) is located within the intron of SYNE1. This gene encodes a spectrin repeat containing protein expressed in skeletal and smooth muscle, and peripheral blood lymphocytes, that localizes to the nuclear membrane.
Mutations in this gene have been associated with autosomal recessive spinocerebellar ataxia 8, Emery-Dreifuss muscular dystrophy type 4, dominant muscular dystrophy and Emery-Dreifuss muscular dystrophy-like (35)(36)(37)(38)(39)(40). Both discovery and replication cohorts have shown that the G allele is under represented in endurance athletes compared to other athletes, suggesting that carriers of the G allele may have lower endurance ability, perhaps through replacement of healthy muscle tissue by fibrosis and fatty infiltration described in recessive arthrogryposis families carrying mutations in SYNE1 gene (40). The G allele is associated with lower gamma-glutamyl aminoacids and glutamine but higher glutamate metabolism. Glutamine has various ergogenic benefits including increased muscle strength and better recovery (41). The lower levels of glutamine in G allele carriers may partially explain their lower prevalence in endurance athletes. The direct link between rs2635438 and levels of these and other metabolites remains to be determined.

Conclusions
This study reports the first GWAS significant SNPs associated with endurance athlete status in genes with no previous association with physical performance. This study also shows levels of metabolites associated with these SNPs and suggests potential role in

Consent for publication: Not applicable.
Availability of data and materials: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests:
The authors declare no competing interests.  Tables   Table 1. Classification of GWAS participants according to sports classes. Distribution of elite athletes in various categories based on sport type-associated peak dynamic (maximal oxygen uptake percentage; VO 2m ax ) and peak static (maximal voluntary muscle contraction percentage; MVC) components achieved during competition as described previously (3).