Apart from the combination of GWAS and eQTL summaries for the tissue of interest and searching for exclusivity of enhancement of gene and the presence of SNP regulation near TAD boundaries, as discussed above, we also incorporated other factors, such as gene score for relevant diseases, phenotype also known in mice to be tested, coexpression of the gene in question in relevant databases, known epigenetic factors, and medical implications for known drugs. Though the results for this study will still need time to
be substantiated for publication, we feel that it would make sense to state the method for the sake of completeness.
Potential genes were obtained from the work of Zillikens [2], Karasik[1] for LM and for HG, Willems[3] and Tikkanen[10]. Genes provided by Karasik, Zillikens and Willems were graded as first tier genes, while genes provided by Tikkanen were graded as second tier genes. The reason behind this was that the Tikkanen research [10] was published at the end of the year 2018, while the database was already being collected. The list of SNPs was mined with cis Expression quantitative trait loci analysis (eQTLs) for transcripts within 2 Mb of the SNP position was carried out as described by Zillikens et al [2]. Other similar datasets were scrutinized, and genes in proximity of SNPs were scaled according to a specifically developed scoring system with the following components: GWASs, eQTLs, Malacards, OMIM, TAD and epigenetic data.
For GWASs: 1 point for each significant P value for a locus in any published muscle-related GWAS. For eQTLs, 1 point if the locus has a muscle-specific significant P value (is an eQTL), and 0 if not. Data were provided by three eQTL studies [2,3,10]. Scoring was as follows: 1 point for each time the gene is expressed in relevant tissues (muscle, tendon or the tibial nerve). Data were also extracted from the Ensembl data base (build 95) [11]. One point for was given for each relevant disease the gene was associated with according to “Malacards 1” version 4.9.0.20 and OMIM. The Malacards scoring was on a scale of 0 to infinity. For this study, each disease with a score of more than 3 received one point in the score. OMIM was used for verification of the Malacards data in mice. One point was given for each time the gene was associated with a muscle-related phenotype in mice, in accordance with the mice genome informatics database version 6.3.0.14. One point was subtracted each time the gene was shown to be co-expressed with another gene that has a relevant effect or phenotype according to the COXPRESdb gene co-expression database (version 7.1) [15]
The Malacards relevance score algorithm can be found on this webpage: https://www.malacards.org/pages/searchguide
available datasets for Homo sapiens). Co-expression reduces the chance that a specific gene is responsible for a given trait; thus a point was subtracted. Genes of interest were also examined in relation to the active TAD to which they are mapped, to reveal whether a gene is in euchromatin and whether it interacts with other genes which might carry out a similar function or have a similar phenotype as the gene of interest. TADs were analyzed from samples of psoas muscle (striated) and bladder muscle (m. detrusor, smooth muscle) For epigenetic data: 1 point was awarded each time the SNP appeared to be an enhancer in a relevant tissue, according to HaploReg[16].
Proxies for every variant in the SNP table was extracted using Ldlink[17], the output was filtered according to the following criteria: LD R² = 0.8, location within the region of the gene of interest and a score of 5 or less according to RegulomDB[18], which means that the proxies had sufficient data validating their annotation.
In case two genes received the same score, biological data will be used to determine which one is more relevant to the study. Relevant loci will be then knocked out in the C2C12 mouse myoblast cell line. Identifying new loci responsible for LM or HG may be used by the pharmaceutical industry as new targets for pharmaceutical products or repositioning of existing drugs in accordance to new data on the activity of these drugs. The greatest hurdle with drug repositioning today is lack of solid databases needed to conduct efficient repositioning [9]. This study of future can serve as a test to whether this approach to gene prioritization can resolve this problem.