Data collection and curation.
Refer to the mentioned workflow of “Materials and methods” (Figure 1), we screened and selected 989 literatures related to MI from 25,312 papers, then gave 22 different entries to describe the patient. After processing these data, that 664 genes (515 genes from non-GWAS and 179 genes from GWAS), 68 phenotypes, 3606 mutants and 7985 studies were contained in MIgene (Figure 1 and Supplementary Table S9-S10).
Spermatogenesis is a highly organized process of cell proliferation in seminiferous tubules and terminal differentiation for the development of mature spermatozoa. If spermatogenesis is disturbed, it will cause azoospermia, oligozoospermia and other defects of sperm count, motility, and morphology10,36. Among the whole phenotypes, the azoospermia, spermatogenic failure, and oligozoospermia studies account for 46.47%, 45.37%, 34.64%, respectively.
Analysis of genes, molecular consequence, variant types for clinical significance.
From 989 papers about MI, a total of 664 genes were obtained and classified into four clinical significance and two study types (Figure 2A, Table 2 and Supplementary Table S11-S12). Among these genes, there were 515 genes from non-GWAS, 179 genes from GWAS and 30 genes coexisting in them. Besides, there were 280 genes associated with more than two types of clinical significance. For example, the c.2039A>G mutant of FSHR gene showed four types of clinical significance under different conditions including phenotypes, zygosity and ethnicity, etc (Table 3)37,38, which suggested the same variants of one gene performed different clinically statistics results. Distribution of clinical significance of all the genes and variants were summarized (Supplementary Table S13), which suggested that MI is a multifactorial disease.
Table 2
Distribution of genes among clinical significance for study types.
Study type
Clinical significance
|
Sum
|
non-GWAS
|
GWAS
|
Both studies
|
related-damage
|
363
|
334
|
46
|
17
|
related-protection
|
47
|
46
|
1
|
0
|
unrelated
|
534
|
410
|
141
|
17
|
unknown
|
76
|
70
|
6
|
0
|
Total
|
664
|
515
|
179
|
30
|
Table 3
The partial results of rs6166 c.2039A>G (p. Asn680Ser) of FSHR gene.
Studies ID
|
PMID
|
Position
|
Zygosity
|
Clinical
significance
|
Design type
|
Study type
|
Population
origin
|
Phenotypes
|
S000658
|
20454649
|
48962782
|
heterozygous
|
related-damage
|
case-control
|
non-GWAS
|
Turkish
|
spermatogenesis impairment, male infertility:
non-obstructive azoospermia
|
S000659
|
20454649
|
48962782
|
heterozygous
|
related-protection
|
case-control
|
non-GWAS
|
Turkish
|
spermatogenesis impairment, male infertility:
severe oligozoospermia
|
S000660
|
20454649
|
48962782
|
homozygous
|
related-protection
|
case-control
|
non-GWAS
|
Turkish
|
spermatogenesis impairment, male infertility:
non-obstructive azoospermia
|
S000661
|
20454649
|
48962782
|
homozygous
|
unrelated
|
case-control
|
non-GWAS
|
Turkish
|
spermatogenesis impairment, male infertility:
severe oligozoospermia
|
S000665
|
10022448
|
48962782
|
heterozygous
|
unrelated
|
case-control
|
non-GWAS
|
German
|
male infertility:
non-obstructive azoospermia, severe oligozoospermia
|
S000666
|
10022448
|
48962782
|
homozygous
|
unrelated
|
case-control
|
non-GWAS
|
German
|
male infertility:
non-obstructive azoospermia, severe oligozoospermia
|
S002696
|
17169197
|
48962782
|
heterozygous
|
unrelated
|
case-control
|
non-GWAS
|
Italian
|
spermatogenesis impairment, male infertility:
non-obstructive azoospermia
|
S002702
|
17169197
|
48962782
|
heterozygous
|
unrelated
|
case-control
|
non-GWAS
|
Italian
|
spermatogenesis impairment, male infertility:
severe oligozoospermia
|
Fortunately, there were 103 genes (non-GWAS: 85 genes, GWAS: 19 genes) exclusively in "related-damage" patients and corresponded to 38 phenotypes, the genes’ number for which phenotypes was counted and found that the top three phenotypes were spermatogenic failure (59 genes), azoospermia (47 genes), asthenospermia (20 genes) (Figure 2B and Supplementary Table S14).
Further, that 37.5% missense, 19.1% intron and 10.2% synonymous variants were the top three molecular consequence (Figure 2C) in MIgene. In the related-damage group, the top three results were 44.3% missense, 10.1% intron and 9.8% splice site (Figure 2D). Notablely, the intron mutations could affect MI in accordance with intron retention has the extent and functional significance39.
The comprehensive collection of MIgene database allowed us to have an overview of related-damage genes among different chromosome. The gene ontology analysis revealed that every chromosome had a certain number of genes except chromosome 21 (Figure 2E and Supplementary Table S15). Importantly, a lot of mtDNA genes participated in MI. For example, the mtDNA 4977 deletion was found to be related to MI40,41.
Enrichment analysis of genes and phenotypes.
To find further evidence for the association between genes and phenotypes, an enrichment analysis was performed on the basis of the principle of the hypergeometric distribution. The enrichment results interpreted that the larger the number of samples was for the enriched item in the database, the more stable the results of enrichment were (Figure 3). To take oligoteratozoospermia as an example, in related-damage group, the prioritization for MI candidate genes is presented in Figure 3A. We obtained the most relevant gene PLOG with oligoteratozoospermia.
It is well known that one gene could generate the different phenotypes, thus the phenotype enrichment rank for the gene was further explored. By using PLOG as a training gene, the phenotype enrichment analysis was ranked in graphics (Figure 3B). The top of these consequences was oligoteratozoospermia phenotypes in accord with Figure 3A.
Data search and navigation.
MIgene provides users a powerful and multi-faceted search engine and a user-friendly interface to access, browse and retrieve different data types and analysis results. The website interface comprises seven sections including “Home”, “Browser”, “Submit case”, “Download”, “Tutorial”, “Contact” and “Analysis” (Figure 4). On the “Home” page, a brief introduction of MI, information accessible in the database and gene or phenotype search are provided. There are four search modules, ‘Gene Symbol’, ‘Phenotype’, ‘rs_ID’ and ‘Mutant’. Furthermore, these symbols are not only auto-completed after typing some letters in their corresponding search box, but also cross-accessed using inter-linkages. After selecting “Browser” in the navigation bar, the complete list of MI including genes, related phenotypes, clinical significance, and supporting evidence, could be randomly browsed. On the “Submit case” page, the users could submit new genes, mutants, and phenotypes to our database. These data will be stored, curated and then entered the database. At the same time, this MIgene database will be updated periodically according to the latest publications. The ‘Tutorial’ page presents the database’s guidelines.
MIgene provides a detailed report for each gene. Firstly, to take the gene FSHR as an example, MIgene showed basic information of gene and protein including protein sequence annotations, function analysis and related external databases such as OMIM28, InterPro42, KEGG33, GO32, String31, Compartment, etc. In addition, homology, enrichment phenotypes, and co-expression proteins were also obtained. For variant information of FSHR, the users of MIgene could not only get the variant types and its statistical results but also download the filtered contents at any moment. After the “view” button was clicked, the whole detailed information would be displayed for this genomic mutation. Also, the number of phenotypes and clinical significance associated with the gene was counted respectively. For example, the oligozoospermia was one of the phenotypes related to FSHR. There were 109 studies about it, which were divided into four groups: 5 of related-damage, 5 of related-protection, 97 of unrelated and 2 of unknown. Secondly, for phenotype, MIgene defined the phenotype, the number of studies, the information of enrichment genes and other contents including SNPs, indel, deletion, duplication, insertion, and related clinical significance. Thirdly, in rs_ID modules, the basic information of rs_ID, the number of studies and statistical clinical significance were exhibited by MIgene. Finally, this database provided a powerful and convenient way to search for the mutants of genes and phenotypes for MI.