The results of resequencing analysis showed variations of DNA sequence from the reference sequence in both long and short strains, and the variations were detected more frequently in the long strain in a whole genome. The variations were present in several genes including caffeine metabolism, tyrosine metabolism, tryptophan metabolism, metabolism of xenobiotics by cytochrome P450, longevity regulating pathway, and circadian rhythm.
Small nucleotide variants as SNV and multi-nucleotide variants (MNV) as deletion, insertion, and replacement were detected in long and short strains. The numbers of small variants in total were larger in long strains than short strains (Fig. 1). The most frequent type of small variants was SNV, and the proportions of SNV were 86.3% (5,813 / 6,734) in long strains and 86.7% (1,279 / 1,476) in short strains, respectively (Fig. 1A). The SNVs compared with the reference nucleotide occurred frequently between adenine and guanine or cytosine and thymine in both long and short strains (Fig. 1B), and the frequencies were up to three times as large as other base combinations, indicating more frequent transition and fewer transversion variants. Deletion and insertion ranged from one to nine bases in both long and short strains, with one and three bases especially were frequently deleted or inserted (Fig. 1C). Homozygosity presented more frequently than heterozygosity in all linkage groups, and it was approximately two-fifteenths times and two-twelfths times as large as heterozygosity in long and short strains, respectively (Fig. 1D). Homozygosity of variants was the most frequent in linkage group 7 (LG7) in the long strain and in linkage groups 2 (LG2) and 8 (LG8) in the short strain, respectively. The ratios of homozygosity to heterozygosity were the largest in LGX and LG2 in long and short strains, respectively.
Genes with variants were more numerous in the long strain (3,384) than the short strain (1,075), and 718 genes were overlapped between the strains (Fig. 2A). Among these genes, the most frequent number of non-synonymous variants per gene was 1 in both strains, and the frequency gradually decreased as the number increased (Fig. 2B). The functions of genes with variants were sorted into four categories by enrichment analyses (Fig. 2c). In the biological process, molecular function, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the long strain had a larger fold enrichment and larger statistical values than the short strain, while the short strain had a larger fold enrichment and statistical values in the “cellular component” than the long strain. The Gene Ontology (GO) term of “metabolic process” had the largest statistical value in the long strain, and KEGG Ontology (KO) terms “ECM-receptor interaction”, “ABC transporters” and “other glycan degradation” had larger statistical values in the long strain than the short strain.
Structural variations including large-scale InDel, copy number variation (CNV), and presence/absence variation (PAV) were analyzed in both strains (Fig. 3). Large-scale insertions and deletions were analyzed in 10 to 390 bases, and 15–20 bases of insertion and deletion were the most frequent in both strains (Fig. 3A). CNV deletions were present more frequently in sizes ranging from 5 to 16 kbases than in other size scales that we examined in both strains, whereas CNV duplications were constantly less frequent at 0 to 3 cases (Fig. 3B). In a larger size scale of nucleotides, up to 7000 kbases, the presence of variations less than 500 kbases of nucleotide sizes was most frequent in the long strain (Fig. 3C). All of these are illustrated on each linkage group in Fig. 4A, indicating large-scale insertions and deletions constantly appearing in each linkage group (A and B), less frequent CNV duplications (C), and more frequent CNV deletions (D). Large CNV deletions were present in LG6 and LG7 in the long strain and in LG2 in the short strain, respectively. These variations were sorted into GO and KO terms (Fig. 4B). The term of “neuroactive ligand-receptor interaction” had the largest statistical value in the long strain.
A protein–protein interaction (PPI) network including enzymes involved in dopamine metabolism was constructed (Fig. 5). Tyrosine hydroxylase (Th) was connected with DOPA decarboxylase (Ddc) and dopamine N-acetyltransferase (Dat), and these enzymes have been reported as differentially expressed genes in the long strain analyzed by RNA-seq13. Th also had variations of DNA sequence in the short strain (Fig. 5). Among the PPI network, proteins with variants were more frequent in the long strain. Yellow-like protein had variants in the long strain, and it was indirectly connected with Ddc and Th and directly with Dat.
Pathways containing genes with variants in both long and short strains were analyzed in “caffeine metabolism (tca00232)” (Fig. 6A), “tyrosine metabolism (tca00350)” (Fig. 6B), “tryptophan metabolism (tca00380)” (Fig. 6C), “metabolism of xenobiotics by cytochrome P450 (tca00980)” (Fig. 6D), “longevity regulating pathway - multiple species (tca04213)” (Fig. 6E), and “circadian rhythm - fly (tca04711)” (Fig. 6F). Tyrosine metabolism and longevity-regulating pathways have been listed as pathways containing focal genes with different expressions between long and short strains as detected by RNA-seq13. Except for circadian rhythm, the numbers of variants of genes in these pathways were larger in the long strain than the short strain (Fig. 6).