Apart from the combination of GWAS and eQTL summaries for the tissue of interest and searching for exclusivity of gene enhancement and the presence of SNP regulation near TAD boundaries, we also incorporated a novel scoring system to prioritize genes for functional validation of our results. Functional validation is a slow and costly process. Validation can take months and even years to complete without a promise of a positive result, hence a scoring system is vital for scrutinizing and grading our results to asses which gene might have an effect on muscle health. The process of functional validation is vital for a few reasons. For one, relying on TADs has its limitations. Chromosomes separate active and inactive chromatin into compartments A and B, respectively where compartment A correlates with high gene expression, active histone marks, and early replication timing, whereas the compartment B replicates late, is enriched with repressive histone modifications and has low gene expression. Compartments can be further subdivided into megabase-sized genomic regions known as topologically associating domains (TADs)(13,14). the function of TADs is not fully understood yet, although disrupting the TADs e.g. because of SNPs or InDels (insertions deletions) may result in the establishment of novel inter-TAD interactions. These have been shown to be associated with misexpression of Hox genes(15), up-regulation of proto-oncogenes(16), and developmental disorders(14). Furthermore, functional validation might also allow us to identify drugs that affect muscle in ways unknown before and therefore to reposition existing drugs to other uses, in accordance to their newly found target. This serves two purposes, first is validation of the scoring system itself as an algorithm for GWAS result validation, and the more important one is validation of new targets for further research and potentially, repositioning of existing drugs.
Our approach has its limitations and requires validation, as one can observe from the results. In spite the fact that TADs that were plotted by our approach of combining phenotype-associated SNPs and tissue-relevant gene-associated SNPs show that the genes of interest are located within a frequently interacting region, the rest of the data regarding these genes doesn’t support our hypothesis that they in fact have an effect on muscle health. NUDT3 (Nudix Hydrolase 3) for example, codes for the Nudix protein which act as homeostatic checkpoints at important stages in nucleoside phosphate metabolic pathways, guarding against elevated levels of potentially dangerous intermediates(17). GWAS associate RSP-NUDT3 readthrough to BMI with a P-value of 4*10-12(18). The Malacards database also associates NUDT3 with hyperinsulinism and obesity in specific populations(19). KLF5 (Kruppel Like Factor 5), encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification(20). GWAS catalogue doesn’t relate this gene to muscle health phenotypes(18), the same is true for the Malacards(19) database. In contrast, the STRING database(21) finds relation between NUDT3 and ACTA2 (Actin Alpha 2) and GSK3B (Glycogen Synthase Kinase 3 Beta) which are related to actin production and energy metabolism respectively(20). HLA-DQB1-AS1 (HLA-DQB1 Antisense RNA 1) is an RNA Gene and is affiliated with the lncRNA (Long non-coding RNA) class is related to malignant diseases and doesn’t seem to be associated with muscle wasting disorders according to MalaCards(19). The above information emphasizes that these genes are not directly related to muscle health, yet they may have some indirect regulatory role in defining it. Functional validation is vital in the process of confirming or debunking the hypothesis that these genes are associated with muscle health. We suggest that the above genes be scrutinized using a scoring system for prioritizing candidate genes for functional validation which will be done by knocking out the gene in C2C12 mouse myoblast cell, assessing gene expression using RT-qPCR and comparing cell morphology to the morphology wild type C2C12 cells. The following constitutes the scoring system proposed for functional validation of our results: Potential genes were obtained from the work of Zillikens et al. (8), Karasik et al. (7) for LM and for HG, Willems et al. (9) and Tikkanen et al. (10). Genes provided by Karasik et al. Zillikens et al. and Willems et al. were graded as first tier genes, while genes provided by Tikkanen et al. were graded as second tier genes. The reason behind this is that the Tikkanen et al. research was published at the end of the year 2018, while the database was already being collected. The list of SNPs was mined with cis Expression quantitative trait loci analysis (eQTLs) for transcripts within 2 Mb of the SNP position was carried out as described by Zillikens et al.(8). Other similar datasets were scrutinized, and genes in proximity of SNPs were scaled according to a specifically developed scoring system which utilized the following publicly available databases: Malacards(19), COXPRESdb(22) gene co-expression database, PubMed search engine, Ensembl data base(23), the mouse genome informatics database(24), HaploReg(25) and the LDlink(26) database.
The above functional validation method combined with our approach to gene prioritization might help in Identifying new loci responsible for LM or HG, and thus identifying new genetic markers for sarcopenia. This approach may also be used by the pharmaceutical industry to identify targets for new pharmaceutical products or reposition existing drugs in accordance to new data on the activity of these drugs. The greatest hurdle with drug repositioning today is lack of solid databases needed to produce good results(27). Functional validation of the results presented in this study results can serve as a test to whether our approach to gene prioritization can resolve this problem.