Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

doi:10.21203/rs.2.16139/v1

Download PDF

Research article

Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

https://doi.org/10.21203/rs.2.16139/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 24 Feb, 2020

Read the published version in BMC Medical Genetics →

You are reading this older preprint version

Read the latest preprint version →

Background: Sarcopenia is a skeletal muscle disease of clinical importance that occurs commonly in old age and in various disease sub-categories. Widening the scope of knowledge of the genetics of muscle mass and strength is important because may allows us to identify new genetic markers or identify patients with an increased risk to develop a specific musculoskeletal diseases or condition such as sarcopenia. It might also allow us to identify drugs that affect muscle in ways unknown before and therefore to reposition drugs to other uses, in accordance to their newly found target. We used bioinformatics tools to identify gene loci responsible for regulating muscle strength and lean muscle mass, which can then be a target for downstream lab experimentation validation. SNPs associated with various disease traits for the muscles and specific loci were chosen according to their muscle phenotype association p-value, as traditionally done in the GWASs. We developed and applied a combination of expression quantitative trait loci study (eQTLs) and GWAS summary information, to prioritize causative SNP and point out the unique genes associated in the tissues of interest (muscle).

Results: We found NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength as candidate genes to target for these phenotypes. The associated regulatory SNPs are rs464553, rs1028883 and rs3129753 respectively.

Conclusion: TWAS approaches of combining GWAS and eQTL summary statistics proved helpful in statistically prioritizing genes and their associated SNPs for the disease phenotype of study, here, Sarcopenia. Potentially regulatory SNPs associated with these genes can be analyzed with respect to TADs and then targeted for knock out in either C2C12 mouse myoblast cells, adipocytes or any other relevant cell, depending on the phenotype it is hypothesized to affect, as a downstream experimental validation plan.

Medical Genetics

Genetics of Sarcopenia

prioritization of NUDT3 and KLF5

genes for lean mass and HLA-DQB1-AS1

Many diseases known to man originate from more than one genetic locus. A GWAS (genome-wide association study) of lean muscle mass (LM) or hand grip strength (HG) was done in the studies by Karasik et al.[1], Zilikens et al.[2], Willems et al.[3] and Tikkanen et al.[4], in various large human populations. Various sample GWAS plots for LM and HG are also provided at the author’s webpage (www.tinyurl.com/abinarain and then navigate to “Educational Stuffs”). As such, genetic inheritance studies, although outlining genetic architecture, were unable to determine the specific loci responsible for these diseases[5].

Since 2006, GWASs have allowed us to trace the multiple genetic factors for such diseases using statistical tools that can lead to a more effective research of specific locus of interest[6]. The data produced by these studies, which now rank in the thousands, are available online so further downstream research can be conducted, and new results can be incorporated. This is indeed valuable, since, for example, musculoskeletal diseases are one of the leading causes of disability in the world [7]; treatment of these diseases costs the world medical industry around 125 billion dollars annually[7].

As shown in the scientific literature[8], muscle strength, hand grip (HG) in our case, and lean muscle mass (LM) are two of the most important factors that can predict muscle health. These are especially important in the context of sarcopenia, as they allow evaluating its progression. Furthermore, research has shown that with age these two factors tend to dissociate and thus need to be measured independently, as one cannot predict the other[8]. The discovery of new therapeutic targets which might increase LM or HG or both may be useful in the management or treatment of otherwise incurable diseases.

Previous studies have identified potential targets for research[9]. These targets were overlaid with eQTL and GWAS p values. These p values can later be added to data collected from other reliable online databases and summed into a grading system (scale), which can allow us to choose the most relevant genes to be knocked out, for example, in the mouse C2C12 myoblast cell line using the siRNA technique. However, for this paper, we present only the combination of summary level data from GWASs and publicly available eQTLs such as those from studies by GTEx[21] and Westra et. al. [20], and then plot the topologically active domains (TADs), at the loci of interest generated by this combination approach of phenotype-associated SNPs (Single Nucleotide Polymorphism) and tissue-relevant gene-associated SNPs. A TAD is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD[22]. The function of TADs is not fully understood yet, although disrupting the TADs e.g. because of SNPs or InDels (insertions deletions) can lead to chromosomal structure disruption and could lead to diseases by affecting gene regulation.

Method

The results of GWAS for lean mass (LM) and hand grip strength (HG) was published in studies by Karasik et al,.[1], Zilikens et al.[2], Willems et al.[3] and Tikkanen et al.[4], in various large human populations. The summary of eQTL data was obtained by studies by Westra et. al. [20] and by the GTEx21 consortium for various tissues. With these data, we executed “Summary-data-based Mendelian randomization” (SMR)[23] analysis. We chose not to investigate the SNPs for further categorization of pleiotropy or causality, which can be done using the Heidi test. For the case of GTEx eQTL summary data, the execution was done for all tissues, and then we observed for genes which were enriched specifically in skeletal muscle tissue or specifically compared to the aggregate of all other tissue types. For the genes of interest as described in the analysis section, we went on to examine the TAD plots for the corresponding skeletal muscle tissues such as the psoas and bladder from the study of Schmitt et. al. [24]

Our GTEx tissue analysis found that for lean mass, two genes, NUDT3 and KLF5, were enriched in skeletal muscles (Figure 1, 2), and they were also found in Westra et. al. [20] (Westra) eQTL analysis (Figure 4). In the GTEx tissue analysis for the hand grip trait, we found one gene, HLA-DQB1-AS1, which was specifically enriched in skeletal tissue compared to other tissues (Figure 12), with the associated SNP as rs3129753. Many other genes found to be enriched in skeletal muscle tissues and other tissues in common intersection as in Figures 1, 2, 12, were also found in Westra eQTL analysis with our GWAS summary dataset. We propose that those genes that are found to be exclusively enriched only in skeletal muscle tissues by both GTEx and Westra eQTL summary dataset combined with our GWAS summary dataset using the Mendelian randomization approach be given the first priority for further experimental validation, and least priority should be given to those genes which are found to be not enriched in skeletal muscle tissue in either the Westra or GTEx analysis. The second priority should be given to the genes found to be enriched in skeletal muscle tissue as well as any other tissue. Clearly, NUDT3 and KLF5 are very strong candidate genes for lean mass study, and their associated regulating SNP are rs464553 and rs1028883 respectively. Most importantly, the associative SNPs for these genes are also listed in figures 3–11, and RT-qPCR (Reverse Transcriptase quantitative Polymerase Chain Reaction) can be possibly performed on skeletal muscle samples from people with and without these SNP genotypes, to confirm the enrichment of these genes.

Chromosomes separate active and inactive chromatin into compartments A and B, respectively where compartment A correlates with high gene expression, active histone marks, and early replication timing, whereas the compartment B replicates late, is enriched with repressive histone modifications and has low gene expression. Compartments can be further subdivided into megabase-sized genomic regions known as topologically associating domains (TADs)[25][26], which act as regulatory scaffolds and are demarcated by binding sites of the architectural protein CTCF. Disruption of TAD boundaries results in the establishment of novel inter-TAD interactions. These have been shown to be associated with misexpression of Hox genes [27], up-regulation of proto-oncogenes [28], and developmental disorders [29]. TAD plots for the psoas and bladder tissues (which are skeletal and smooth muscle types, respectively) were also plotted and one of them is shown as Figure 13 for KLF5 bladder tissue. Clearly, these sites can be further investigated for their corresponding SNP positions and whether they are in TAD boundaries or not by knock down of the SNPs and observing for any disruption in cellular activity. TAD plots for other relevant genes using raw and normalized data can be found at the author’s webpage www.tinyurl.com/abinarain and then navigating to the “Educational Section”.

Apart from the combination of GWAS and eQTL summaries for the tissue of interest and searching for exclusivity of enhancement of gene and the presence of SNP regulation near TAD boundaries, as discussed above, we also incorporated other factors, such as gene score for relevant diseases, phenotype also known in mice to be tested, coexpression of the gene in question in relevant databases, known epigenetic factors, and medical implications for known drugs. Though the results for this study will still need time to

be substantiated for publication, we feel that it would make sense to state the method for the sake of completeness.

Potential genes were obtained from the work of Zillikens [2], Karasik[1] for LM and for HG, Willems[3] and Tikkanen[10]. Genes provided by Karasik, Zillikens and Willems were graded as first tier genes, while genes provided by Tikkanen were graded as second tier genes. The reason behind this was that the Tikkanen research [10] was published at the end of the year 2018, while the database was already being collected. The list of SNPs was mined with cis Expression quantitative trait loci analysis (eQTLs) for transcripts within 2 Mb of the SNP position was carried out as described by Zillikens et al [2]. Other similar datasets were scrutinized, and genes in proximity of SNPs were scaled according to a specifically developed scoring system with the following components: GWASs, eQTLs, Malacards, OMIM, TAD and epigenetic data.

For GWASs: 1 point for each significant P value for a locus in any published muscle-related GWAS. For eQTLs, 1 point if the locus has a muscle-specific significant P value (is an eQTL), and 0 if not. Data were provided by three eQTL studies [2,3,10]. Scoring was as follows: 1 point for each time the gene is expressed in relevant tissues (muscle, tendon or the tibial nerve). Data were also extracted from the Ensembl data base (build 95) [11]. One point for was given for each relevant disease the gene was associated with according to “Malacards 1” version 4.9.0.20 and OMIM. The Malacards scoring was on a scale of 0 to infinity. For this study, each disease with a score of more than 3 received one point in the score. OMIM was used for verification of the Malacards data in mice. One point was given for each time the gene was associated with a muscle-related phenotype in mice, in accordance with the mice genome informatics database version 6.3.0.14. One point was subtracted each time the gene was shown to be co-expressed with another gene that has a relevant effect or phenotype according to the COXPRESdb gene co-expression database (version 7.1) [15]

The Malacards relevance score algorithm can be found on this webpage: https://www.malacards.org/pages/searchguide

available datasets for Homo sapiens). Co-expression reduces the chance that a specific gene is responsible for a given trait; thus a point was subtracted. Genes of interest were also examined in relation to the active TAD to which they are mapped, to reveal whether a gene is in euchromatin and whether it interacts with other genes which might carry out a similar function or have a similar phenotype as the gene of interest. TADs were analyzed from samples of psoas muscle (striated) and bladder muscle (m. detrusor, smooth muscle) For epigenetic data: 1 point was awarded each time the SNP appeared to be an enhancer in a relevant tissue, according to HaploReg[16].

Proxies for every variant in the SNP table was extracted using Ldlink[17], the output was filtered according to the following criteria: LD R² = 0.8, location within the region of the gene of interest and a score of 5 or less according to RegulomDB[18], which means that the proxies had sufficient data validating their annotation.

In case two genes received the same score, biological data will be used to determine which one is more relevant to the study. Relevant loci will be then knocked out in the C2C12 mouse myoblast cell line. Identifying new loci responsible for LM or HG may be used by the pharmaceutical industry as new targets for pharmaceutical products or repositioning of existing drugs in accordance to new data on the activity of these drugs. The greatest hurdle with drug repositioning today is lack of solid databases needed to conduct efficient repositioning [9]. This study of future can serve as a test to whether this approach to gene prioritization can resolve this problem.

The current work focused primarily on the combined bioinformatic approaches using GWASs and eQTLs for summary-data based Mendelian randomization (SMR). The results of exclusivity of the tissues of interest were further classified for their importance based on Venn diagrams and their corresponding TAD plots to look for the TAD boundaries where the associated regulating SNPs could be localized. NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength and their associated SNPs (rs464553, rs1028883 and rs3129753) had the highest priority as candidate targets for new or repositioned drugs.

We propose conducting RT-qPCR to further ascertain the association of enhancement of a gene for patients with a SNP genotype that was associated positively with gene enrichment in the current study. We also proposed further steps that can be taken to further prioritize candidate genes as targets for new drugs or for drug repurposing. One limitation of this study is that the eQTL analysis was not done on trans-association SNPs.

SNPs—Single Nucleotide Polymorphism, GWAS—Genome wide association study, eQTL—expression quantitative trait loci, TADs—Topologically Associated Domains, LM—Lean Mass, LD—Linkage Disequilibrium, HG—Hand Grip strength, TWAS—Transcriptome Wide Association Study

- Ethical Approval and Consent to participate
No individual level data was used for the analysis, and thus consent for publication was not needed.
- CONSENT FOR PUBLICATION
No individual level data has been used for this study, and thus consent for publication was not needed.
- AVAILABILITY OF DATA AND MATERIAL
www.tinyurl.com/abinarain and then navigate to Educational Section
- COMPETING INTERESTS
None to be declared
- FUNDING

The publication costs of this project were funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 802825 to Principal Investigator Minna U Kaikkonen)

ACKNOWLEDGEMENT

The author is thankful for the various software tools that have been made available for academic purposes such as SMR. The author is thankful to Bili Gasman of Bar Ilan University who wrote the gene knockout strategy as a future course of action, and David Karasik of Bar Ilan University who provided GWAS data that he carried out for hand grip and lean mass study for his previous research work. David Laaksonen of UEF helped in contribution of the written content of the manuscript.

Karasik D, Zillikens MC, Hsu YH, et al. Disentangling the genetics of lean mass. Am J Clin Nutr 2019;109(2):276–8.
Zillikens MC, Demissie S, Hsu Y-H, et al. Large meta-analysis of genome-wide association studies identifies five loci for lean body mass. Nat Commun [Internet] 2017 [cited 2019 Feb 11];8(1):80. Available from: http://www.nature.com/articles/s41467-017-00031-7
Willems SM, Wright DJ, Day FR, et al. Large-scale GWAS identifies multiple loci for hand grip strength providing biological insights into muscular fitness. Nat Commun [Internet] 2017 [cited 2019 Feb 11];8:16015. Available from: http://www.nature.com/doifinder/10.1038/ncomms16015
Tikkanen E, Gustafsson S, Amar D, et al. Biological Insights Into Muscular Strength: Genetic Findings in the UK Biobank. Sci Rep [Internet] 2018 [cited 2019 Feb 11];8(1):6451. Available from: http://www.nature.com/articles/s41598-018-24735-y
Risch N, Merikangas K. The Future of Genetic Studies of Complex Human Diseases. Science (80- ) [Internet] 1996 [cited 2019 Jan 12];273(5281):1516–7. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.273.5281.1516
Haines JL. Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration. Science (80- ) [Internet] 2005 [cited 2019 Jan 12];308(5720):419–21. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.1110359
Barbe MF, Gallagher S, Massicotte VS, Tytell M, Popoff SN, Barr-Gillespie AE. The interaction of force and repetition on musculoskeletal and neural tissue responses and sensorimotor behavior in a rat model of work-related musculoskeletal disorders. BMC Musculoskelet Disord [Internet] 2013 [cited 2019 Jan 26];14(1):303. Available from: http://bmcmusculoskeletdisord.biomedcentral.com/articles/10.1186/1471-2474-14-303
Heymsfield SB, Cristina Gonzalez M, Lu J, Jia G, Zheng J. Skeletal muscle mass and quality: evolution of modern measurement concepts in the context of sarcopenia. [cited 2019 Feb 13];Available from: https://doi.org/10.1017/S0029665115000129
Doostparast Torshizi A, Wang K. Next-generation sequencing in drug development: target identification and genetically stratified clinical trials. Drug Discov Today [Internet] 2018 [cited 2019 Feb 11];23(10):1776–83. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1359644618300400
Tikkanen E, Gustafsson S, Amar D, et al. Biological Insights Into Muscular Strength: Genetic Findings in the UK Biobank OPEN. [cited 2019 Jan 23];Available from: www.nature.com/scientificreports
Zerbino DR, Achuthan P, Akanni W, et al. Ensembl 2018. Nucleic Acids Res [Internet] 2018 [cited 2019 Feb 11];46. Available from: http://www.ensembl.org
Rappaport N, Twik M, Plaschkes I, et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res [Internet] 2017 [cited 2019 Feb 11];45:877–87. Available from: https://academic.oup.com/nar/article-abstract/45/D1/D877/2572056
Stelzer G, Rosen N, Plaschkes I, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses [Internet]. In: Current Protocols in Bioinformatics. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2016 [cited 2019 Feb 11]. p. 1.30.1-1.30.33.Available from: http://doi.wiley.com/10.1002/cpbi.5
Finger JH, Smith CM, Hayamizu TF, et al. The mouse Gene Expression Database (GXD): 2017 update. Nucleic Acids Res [Internet] 2017 [cited 2019 Feb 11];45. Available from: http://www.informatics.jax.org/gxdlit
Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res [Internet] 2008 [cited 2019 Feb 11];36:77–82. Available from: http://coxpresdb.
Ward LD, Kellis M. HaploReg: A resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 2012;40(D1):930–4.
Machiela MJ, Chanock SJ. Genetics and population analysis LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. [cited 2019 Mar 21];Available from: https://academic.oup.com/bioinformatics/article-abstract/31/21/3555/195027
Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. [cited 2019 Mar 21];Available from: https://genome.cshlp.org/content/22/9/1790.full.pdf
Shiori TN, Ryo N, Kiyoko O, Eto NN. A novel in vitro model of sarcopenia using BubR1 hypomorphic C2C12 myoblasts. Cytotechnology [Internet] [cited 2019 Feb 13];68. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5023544/pdf/10616_2015_Article_9920.pdf
Westra HJ et. al., Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013 Oct;45(10):1238-1243. doi: 10.1038/ng.2756. Epub 2013 Sep 8. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24013639
GTEx Consortium, Genetic effects on gene expression across human tissues, Nature volume550, pages204–213 (12 October 2017). Available from: https://www.nature.com/articles/nature24277
Pombo, A; Dillon, N (April 2015). "Three-dimensional genome architecture: players and mechanisms". Nature Reviews Molecular Cell Biology. 16 (4): 245–57. doi:10.1038/nrm3965. PMID 25757416
Zhihong Zhu, Futao Zhang, Han Hu, Andrew Bakshi, Matthew R Robinson, Joseph E Powell, Grant W Montgomery, Michael E Goddard, Naomi R Wray, Peter M Visscher & Jian Yang Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics volume 48, pages 481–487 (2016)
Schmitt, A., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C., Li, Y., Lin, S., Lin, Y., Barr, C. & Ren, B. (2016). A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Reports, 17(8), 2042-2059.
Dixon J.R., Selvaraj S., Yue F. , Kim A. , Li Y. , Shen Y. , Hu M. , Liu J.S. , Ren B. , Topological domains in mammalian genomes identified by analysis of chromatin interactions., Nature. 2012; 485: 376-380
Nora E.P. , Lajoie B.R. , Schulz E.G. , Giorgetti L. , Okamoto I. , Servant N. , Piolot T. van Berkum N.L. , Meisig J. , Sedat J. , et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012; 485: 381-385
Narendra V. , Rocha P.P. , An D. , Raviram R. , Skok J.A. , Mazzoni E.O. , Reinberg D. , CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015; 347: 1017-1021
Flavahan W.A., Drier Y. , Liau B.B. , Gillespie S.M. , Venteicher A.S. , Stemmer-Rachamimov A.O. , Suvà M.L. , Bernstein B.E. , Insulator dysfunction and oncogene activation in IDH mutant gliomas., Nature. 2016; 529: 110-114
Lupiáñez D.G. , Kraft K. , Heinrich V. , Krawitz P. , Brancati F. , Klopocki E. , Horn D. , Kayserili H. , Opitz J.M. , Laxova R. , et al., Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015; 161: 1012-1025

Download PDF

Journal Publication

published 24 Feb, 2020

Read the published version in BMC Medical Genetics →

Review #2 received at journal
26 Nov, 2019
Editorial decision: Major revision
26 Nov, 2019
Reviewer #3 agreed at journal
18 Nov, 2019
Review #1 received at journal
24 Oct, 2019
Reviewer #2 agreed at journal
14 Oct, 2019
Reviewer #1 agreed at journal
13 Oct, 2019
Editor invited by journal
11 Oct, 2019
Editor assigned by journal
11 Oct, 2019
Reviewers invited by journal
11 Oct, 2019
Submission checks completed at journal
10 Oct, 2019

You are reading this older preprint version

Read the latest preprint version →

Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as genes for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

Status:

Journal Publication

Version 1

Abstract

Figures

background

results

Discussion

conclusions

Declarations

references

Status:

Journal Publication

Version 1