Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

doi:10.21203/rs.2.16139/v2

Download PDF

Research article

Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

https://doi.org/10.21203/rs.2.16139/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 24 Feb, 2020

Read the published version in BMC Medical Genetics →

You are reading this older preprint version

Read the latest preprint version →

Background: Sarcopenia is a skeletal muscle disease of clinical importance that occurs commonly in old age and in various disease sub-categories. Widening the scope of knowledge of the genetics of muscle mass and strength is important because it may allow to identify patients with an increased risk to develop a specific musculoskeletal disease or condition such as sarcopenia based on genetic markers. We used bioinformatics tools to identify gene loci responsible for regulating muscle strength and lean mass, which can then be a target for downstream lab experimentation validation. Single nuclear polymorphisms (SNPs) associated with various disease traits of muscles and specific genes were chosen according to their muscle phenotype association p-value, as traditionally done in Genome Wide Association Studies, GWAS. We've developed and applied a combination of expression quantitative trait loci (eQTLs) and GWAS summary information, to prioritize causative SNP and point out the unique genes associated in the tissues of interest (muscle).

Results: We found NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength as candidate genes to target for these phenotypes. The associated regulatory SNPs are rs464553, rs1028883 and rs3129753 respectively.

Conclusion: Transcriptome Wide Association Studies, TWAS, approaches of combining GWAS and eQTL summary statistics proved helpful in statistically prioritizing genes and their associated SNPs for the disease phenotype of study, in this case, Sarcopenia. Potentially regulatory SNPs associated with these genes can be then wet-lab verified, depending on the phenotype it is hypothesized to affect.

Medical Genetics

Genetics of Sarcopenia

prioritization of NUDT3 and KLF5

genes for lean mass and HLA-DQB1-AS1

Many diseases known to man originate from more than one genetic locus. Sarcopenia for example is multifactorial(1) degenerative loss of skeletal muscle mass, a condition that might pose a great risk for the aging world population. Since 2006, GWAS have allowed us to trace the multiple genetic factors for various traits using statistical tools that can lead to a more effective research of specific locus of interest(2). The data produced by these studies, which now rank in the thousands, are available online so further downstream research can be conducted, and new results can be incorporated. This is indeed valuable, since musculoskeletal diseases are one of the leading causes of disability in the world(3); treatment of these diseases costs the world medical industry around 125 billion dollars annually(3).

In this paper, we present the combination of summary level data from GWAS and publicly available eQTLs such as those from studies by GTEx(4) and Westra et al.(5). Based on available data and our approach of combining phenotype-associated SNPs (Single Nucleotide Polymorphism) and tissue-relevant gene-associated SNPs TADs (topologically associated domains) were plotted at the regions of interest. A TAD is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD(6).

We developed and applied a combination of expression quantitative trait loci (eQTLs) and GWAS summary information, to prioritize causative SNP and point out the unique genes associated in the tissues of interest (muscle).

The results of GWAS for lean mass (LM) and hand grip strength (HG) were published in studies by Karasik et al.(7), Zilikens et al.(8), Willems et al.(9) and Tikkanen et al.(10), in various large human populations. According to consensus in the literature, we used the genome-wide significance threshold of 5*10^-8 to consider SNPs to be associated for a follow-up. The summary of eQTL data was obtained by studies by Westra et al.(5) and by the GTEx(4) consortium. From the study of Westra et al. eQTLs of HSMM (Human skeletal muscle myoblasts) culture were obtains, while the GTEx consortium EQTLs were from human striated muscle samples.

(11). For the case of GTEx eQTL summary data, the execution was done for all tissues, and then we observed for genes which were enriched specifically in skeletal muscle tissue or specifically compared to the aggregate of all other tissue types. For the genes of interest as described in the above method, we went on to plot and examine TADs at the relevant regions in corresponding skeletal muscle tissues such as the psoas (striated) and bladder (smooth muscle) as done by Schmitt et al.(12)

GTEx tissue analysis found that for lean mass, two genes: NUDT3 and KLF5, were enriched in skeletal muscles (Figures 1, 2), and they were also found in Westra et al.(5) eQTL analysis (Table 2). In the GTEx tissue analysis for the hand grip trait, we found one gene, HLA-DQB1-AS1, which was specifically enriched in skeletal tissue compared to other tissues (Figure 3), with the associated SNP as rs3129753. Many other genes found to be enriched in skeletal muscle tissues and other tissues in common intersection (Figures 1,2,3) were also found in Westra eQTL analysis with our GWAS summary dataset. The second priority should be given to the genes found to be enriched in skeletal muscle tissue as well as any other tissue. Clearly, NUDT3 and KLF5 are very strong candidate genes for lean mass study, and their associated regulating SNP are rs464553 and rs1028883 respectively. TAD plots for the psoas and bladder tissues (which are skeletal and smooth muscle types, respectively) were plotted (figures 4-7) where KLF5 is seen to be present within a FIRE(12) (frequently interacting region) within the TAD of chromosome 13 in bladder (M.Detrusor) muscle (figure 4).

Apart from the combination of GWAS and eQTL summaries for the tissue of interest and searching for exclusivity of gene enhancement and the presence of SNP regulation near TAD boundaries, we also incorporated a novel scoring system to prioritize genes for functional validation of our results. Functional validation is a slow and costly process. Validation can take months and even years to complete without a promise of a positive result, hence a scoring system is vital for scrutinizing and grading our results to asses which gene might have an effect on muscle health. The process of functional validation is vital for a few reasons. For one, relying on TADs has its limitations. Chromosomes separate active and inactive chromatin into compartments A and B, respectively where compartment A correlates with high gene expression, active histone marks, and early replication timing, whereas the compartment B replicates late, is enriched with repressive histone modifications and has low gene expression. Compartments can be further subdivided into megabase-sized genomic regions known as topologically associating domains (TADs)(13,14). the function of TADs is not fully understood yet, although disrupting the TADs e.g. because of SNPs or InDels (insertions deletions) may result in the establishment of novel inter-TAD interactions. These have been shown to be associated with misexpression of Hox genes(15), up-regulation of proto-oncogenes(16), and developmental disorders(14). Furthermore, functional validation might also allow us to identify drugs that affect muscle in ways unknown before and therefore to reposition existing drugs to other uses, in accordance to their newly found target. This serves two purposes, first is validation of the scoring system itself as an algorithm for GWAS result validation, and the more important one is validation of new targets for further research and potentially, repositioning of existing drugs.

Our approach has its limitations and requires validation, as one can observe from the results. In spite the fact that TADs that were plotted by our approach of combining phenotype-associated SNPs and tissue-relevant gene-associated SNPs show that the genes of interest are located within a frequently interacting region, the rest of the data regarding these genes doesn’t support our hypothesis that they in fact have an effect on muscle health. NUDT3 (Nudix Hydrolase 3) for example, codes for the Nudix protein which act as homeostatic checkpoints at important stages in nucleoside phosphate metabolic pathways, guarding against elevated levels of potentially dangerous intermediates(17). GWAS associate RSP-NUDT3 readthrough to BMI with a P-value of 4*10^-12(18). The Malacards database also associates NUDT3 with hyperinsulinism and obesity in specific populations(19). KLF5 (Kruppel Like Factor 5), encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification(20). GWAS catalogue doesn’t relate this gene to muscle health phenotypes(18), the same is true for the Malacards(19) database. In contrast, the STRING database(21) finds relation between NUDT3 and ACTA2 (Actin Alpha 2) and GSK3B (Glycogen Synthase Kinase 3 Beta) which are related to actin production and energy metabolism respectively(20). HLA-DQB1-AS1 (HLA-DQB1 Antisense RNA 1) is an RNA Gene and is affiliated with the lncRNA (Long non-coding RNA) class is related to malignant diseases and doesn’t seem to be associated with muscle wasting disorders according to MalaCards(19). The above information emphasizes that these genes are not directly related to muscle health, yet they may have some indirect regulatory role in defining it. Functional validation is vital in the process of confirming or debunking the hypothesis that these genes are associated with muscle health. We suggest that the above genes be scrutinized using a scoring system for prioritizing candidate genes for functional validation which will be done by knocking out the gene in C2C12 mouse myoblast cell, assessing gene expression using RT-qPCR and comparing cell morphology to the morphology wild type C2C12 cells. The following constitutes the scoring system proposed for functional validation of our results: Potential genes were obtained from the work of Zillikens et al. (8), Karasik et al. (7) for LM and for HG, Willems et al. (9) and Tikkanen et al. (10). Genes provided by Karasik et al. Zillikens et al. and Willems et al. were graded as first tier genes, while genes provided by Tikkanen et al. were graded as second tier genes. The reason behind this is that the Tikkanen et al. research was published at the end of the year 2018, while the database was already being collected. The list of SNPs was mined with cis Expression quantitative trait loci analysis (eQTLs) for transcripts within 2 Mb of the SNP position was carried out as described by Zillikens et al.(8). Other similar datasets were scrutinized, and genes in proximity of SNPs were scaled according to a specifically developed scoring system which utilized the following publicly available databases: Malacards(19), COXPRESdb(22) gene co-expression database, PubMed search engine, Ensembl data base(23), the mouse genome informatics database(24), HaploReg(25) and the LDlink(26) database.

The above functional validation method combined with our approach to gene prioritization might help in Identifying new loci responsible for LM or HG, and thus identifying new genetic markers for sarcopenia. This approach may also be used by the pharmaceutical industry to identify targets for new pharmaceutical products or reposition existing drugs in accordance to new data on the activity of these drugs. The greatest hurdle with drug repositioning today is lack of solid databases needed to produce good results(27). Functional validation of the results presented in this study results can serve as a test to whether our approach to gene prioritization can resolve this problem.

The current work focused primarily on the combined bioinformatic approaches using GWASs and eQTLs for summary-data based Mendelian randomization (SMR). The results of exclusivity of the tissues of interest were further classified for their importance based on Venn diagrams and their corresponding TAD plots to look for the TAD boundaries where the associated regulating SNPs could be localized. NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength and their associated SNPs (rs464553, rs1028883 and rs3129753) had the highest priority as candidate targets for new or repositioned drugs.

One limitation of this study is that the eQTL analysis was not done on trans-association SNPs. Another is the limited knowledge on TAD function. In order to further asses, the results functional validation of the results is required.

We propose wet lab validation to ascertain the association of enhancement of a gene for patients with a SNP genotype that was associated positively with gene enrichment in the current study, thus either proving or rebutting the effect of prioritized genes on muscle tissue. We also proposed further steps that can be taken to further prioritize candidate genes as targets for new drugs or for existing drug repurposing.

SNPs – Single Nucleotide Polymorphism, GWAS – Genome wide association study, eQTL – expression quantitative trait loci, TADs – Topologically Associated Domains, LM – Lean Mass, LD – Linkage Disequilibrium, HG – Hand Grip strength, TWAS – Transcriptome Wide Association Study, RT-qPCR – Real Time quantitative polymerase chain reaction. Metal – meta-analysis.
GTEx – Genotype Tissue Expression. SMR – Summary-data-based Mendelian randomization, lncRNA - Long non-coding RNA, InDels - insertions deletions. FIRE - frequently interacting region

Ethical Approval and Consent to participate

No individual level data was used for the analysis, and thus consent for publication was not needed.

CONSENT FOR PUBLICATION

No individual level data has been used for this study, and thus consent for publication was not needed.

AVAILABILITY OF DATA AND MATERIAL

www.tinyurl.com/abinarain and then navigate to Educational Section and click where its written ‘GWAS eQTL Summary Approach with TADs for Skeletal Muscle work’.

COMPETING INTERESTS

None to be declared

AUTHORS' CONTRIBUTIONS

ANS conceived the presented idea, performed the statistical computational analysis of data and worked on the manuscript.
BG designed the gene scaling system, the functional validation algorithm and contributed to the manuscript.

Funding

This research was funded by the: "ISRAEL SCIENCE FOUNDATION (grant No. 1121/19)".

Acknowledgement

The author is thankful for the various software tools that have been made available for academic purposes such as SMR. The author is thankful to David Karasik of Bar Ilan University who provided GWAS data that he carried out for hand grip and lean mass study for his previous research work. David Laaksonen of UEF helped in contribution of the written content of the manuscript.

Cruz-Jentoft AJ, Baeyens JP, Bauer JM, Boirie Y, Cederholm T, Landi F, et al. Sarcopenia: European consensus on definition and diagnosis. Age Ageing. 2010 Apr 13;39(4):412–23.
Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, et al. Complement factor H variant increases the risk of age-related macular degeneration. Science (80- ). 2005 Apr 15;308(5720):419–21.
Barbe MF, Gallagher S, Massicotte VS, Tytell M, Popoff SN, Barr-Gillespie AE. The interaction of force and repetition on musculoskeletal and neural tissue responses and sensorimotor behavior in a rat model of work-related musculoskeletal disorders. BMC Musculoskelet Disord [Internet]. 2013 Dec 25 [cited 2019 Jan 26];14(1):303. Available from: http://bmcmusculoskeletdisord.biomedcentral.com/articles/10.1186/1471-2474-14-303
Aguet F, Brown AA, Castel SE, Davis JR, He Y, Jo B, et al. Genetic effects on gene expression across human tissues. Nature. 2017 Oct 11;550(7675):204–13.
Westra HJ, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013 Oct;45(10):1238–43.
Pombo A, Dillon N. Three-dimensional genome architecture: Players and mechanisms. Vol. 16, Nature Reviews Molecular Cell Biology. Nature Publishing Group; 2015. p. 245–57.
Karasik D, Zillikens MC, Hsu YH, Aghdassi A, Akesson K, Amin N, et al. Disentangling the genetics of lean mass. Am J Clin Nutr. 2019;109(2):276–8.
Zillikens MC, Demissie S, Hsu Y-H, Yerges-Armstrong LM, Chou W-C, Stolk L, et al. Large meta-analysis of genome-wide association studies identifies five loci for lean body mass. Nat Commun [Internet]. 2017 Dec 19 [cited 2019 Feb 11];8(1):80. Available from: http://www.nature.com/articles/s41467-017-00031-7
Willems SM, Wright DJ, Day FR, Trajanoska K, Joshi PK, Morris JA, et al. Large-scale GWAS identifies multiple loci for hand grip strength providing biological insights into muscular fitness. Nat Commun [Internet]. 2017 Jul 12 [cited 2019 Feb 11];8:16015. Available from: http://www.nature.com/doifinder/10.1038/ncomms16015
Tikkanen E, Gustafsson S, Amar D, Shcherbina A, Waggott D, Ashley EA, et al. Biological Insights Into Muscular Strength: Genetic Findings in the UK Biobank. Sci Rep [Internet]. 2018 Dec 24 [cited 2019 Feb 11];8(1):6451. Available from: http://www.nature.com/articles/s41598-018-24735-y
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016 May 1;48(5):481–7.
Schmitt AD, Hu M, Jung I, Xu Z, Qiu Y, Tan CL, et al. A Compendium of Chromatin Contact Maps Reveal Spatially Active Regions in the Human Genome HHS Public Access. Cell Rep [Internet]. 2016 [cited 2019 Dec 21];17(8):2042–59. Available from: www.cell.com/
Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions HHS Public Access. Nature [Internet]. [cited 2019 Dec 21];485(7398):376–80. Available from: http://www.nature.com/authors/editorial_policies/license.html#termshttp://chromosome.sdsc.edu/mouse/hi-c/database.html.
Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al. Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions HHS Public Access. Cell. 2015;161(5):1012–25.
Narendra V, Rocha PP, An D, Raviram R, Skok JA, Mazzoni EO, et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. [cited 2019 Dec 21]; Available from: http://www.ncbi.nlm.nih.gov/geo/
Flavahan WA, Drier Y, Liau BB, Gillespie SM, Venteicher AS, Stemmer-Rachamimov AO, et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas HHS Public Access. Nature [Internet]. 2016 [cited 2019 Dec 21];529(7584):110–4. Available from: http://www.nature.com/authors/editorial_policies/license.html#terms
Safrany ST. A novel context for the `MutT’ module, a guardian of cell integrity, in a diphosphoinositol polyphosphate phosphohydrolase. EMBO J. 1998 Nov 16;17(22):6599–607.
Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2018;47:1005–12.
Rappaport N, Twik M, Plaschkes I, Nudel R, Stein TI, Levitt J, et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res [Internet]. 2017 [cited 2019 Feb 11];45:877–87. Available from: https://academic.oup.com/nar/article-abstract/45/D1/D877/2572056
Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinforma [Internet]. 2016 Jun 20 [cited 2019 Dec 21];54(1). Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpbi.5
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res [Internet]. 2018 [cited 2019 Dec 22];47:607–13. Available from: https://string-db.org/.
Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res [Internet]. 2008 [cited 2019 Dec 21];36:77–82. Available from: http://coxpresdb.
Zerbino DR, Achuthan P, Akanni W, Ridwan Amode M, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Res [Internet]. 2018 [cited 2019 Feb 11];46. Available from: http://www.ensembl.org
Finger JH, Smith CM, Hayamizu TF, Mccright IJ, Xu J, Law M, et al. The mouse Gene Expression Database (GXD): 2017 update. Nucleic Acids Res [Internet]. 2017 [cited 2019 Dec 21];45. Available from: http://www.informatics.jax.org/gxdlit
Ward LD, Kellis M. HaploReg: A resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res [Internet]. 2012 [cited 2019 Apr 22];40(D1):930–4. Available from: http://compbio.mit.edu/HaploReg.
Machiela MJ, Chanock SJ. Genetics and population analysis LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. [cited 2019 Mar 21]; Available from: https://academic.oup.com/bioinformatics/article-abstract/31/21/3555/195027
Doostparast Torshizi A, Wang K. Next-generation sequencing in drug development: target identification and genetically stratified clinical trials. Vol. 23, Drug Discovery Today. Elsevier Ltd; 2018. p. 1776–83.

Please see the supplementary files section to access the tables.

Tables.docx

Download PDF

Journal Publication

published 24 Feb, 2020

Read the published version in BMC Medical Genetics →

Editorial decision: Minor revision
25 Jan, 2020
Editor assigned by journal
21 Jan, 2020
Submission checks completed at journal
20 Jan, 2020
Editor invited by journal
20 Jan, 2020

You are reading this older preprint version

Read the latest preprint version →

Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

Status:

Journal Publication

Version 2

Abstract

Figures

Background

Method

Results

Discussion

Conclusions

List of Abbreviations

Declarations

Ethical Approval and Consent to participate

CONSENT FOR PUBLICATION

AVAILABILITY OF DATA AND MATERIAL

COMPETING INTERESTS

AUTHORS' CONTRIBUTIONS

Funding

Acknowledgement

References

Tables

Supplementary Files

Status:

Journal Publication

Version 2