Comparative analysis of mitochondrial genomes in different species reveals the evolution of G-quadruplexes

G-quadruplexes (G4s) are noncanonical structures that can form in the genomes of a range of organisms and are known to play various roles in cellular function. G4s can also form in mitochondrial DNA (mtDNA) because of their high guanine content, and these G4s may play roles in regulating gene expression, DNA replication, and genome stability. However, little is known regarding the evolution and dissemination of G4s in mitochondria. Here we analyzed the potential G4-forming sequences in mtDNA of 16 species from various families and demonstrated that the heavy strand of mtDNA of higher-order organisms contained higher levels of G4 regions than that of lower-order organisms. Analysis of the codons in the light strand revealed enrichment of guanine/cytosine-rich regions in higher eukaryotes and of adenine/thymidine-rich regions in lower-order organisms. Our study showed the diversity of G4s in species ranging from lower to higher orders. In particular, mammals such as humans, chimpanzees, and monkeys display a greater number of G4s than lower-order organisms. These potentially play a role in a range of cellular functions and assist in the evolution of higher organisms. to explore the background characteristics of G4 sequences and their distribution in the genome of various organisms. One study mapped the formation of G4s in the DNA of multiple species of bacteria, plants, and eukaryotes including human, mouse, and Drosophila, validating the vast majority of G4 structures predicted by computational analysis and demonstrating the existence of G4s in the genome of a range of species.


Introduction
Mitochondria are known as the powerhouse of the cell because of their ability to generate adenosine triphosphate, which is essential for normal cellular function. They also play a vital part in various cellular functions including calcium homeostasis, apoptosis, stem cell generation, and heme synthesis. 1,2 Most eukaryotic mitochondrial DNAs (mtDNAs) are double-stranded, circular molecules that typically occur at several hundreds to thousands of copies per cell. In the majority of eukaryotes, nearly 90% of mtDNA comprises coding regions, unlike nuclear DNA, and its genetic code differs slightly from that of nuclear DNA. 3 Most mammalian mtDNAs encode 37 genes, including 13 genes that form the essential subunits of the mitochondrial respiratory chain complexes, while the organism's remaining genes are encoded by the nuclear genome. 4 MtDNA consists of two strands that can be distinguished by their nucleotide composition and are termed heavy (H) and light (L) strands. Typically, mtDNA strands are separated on the basis of density using the classical biochemical technique ultracentrifugation, producing the H strand, which has a high guanine (G) + thymidine (T) content, and the L strand, which contains low G + T content. 5 The number of coding sequences in the H and L strands is often confused, with the H strand being assumed to contain a large number of coding sequences and L strand fewer. 6,7 However, a detailed analysis of most vertebrate mitochondrial sequences demonstrated that the L strand is the main coding strand. 6,7 Mitochondria are proposed to have evolved from endosymbiotic bacteria, and phylogenetic analysis confirms that the lineage of the mtDNA is closely related to that of alphaproteobacteria. 8,9 This has raised a number of intriguing questions for the fields of mitochondrial and evolutionary research, such as how mitochondria integrate and adapt within the host, and whether mitochondria played an important role in the transition from prokaryote to eukaryote. The advent of the genomics era has allowed researchers to try to solve these mysteries.
Nucleic acids are known to form structures other than the traditional double helix. A Gquadruplex (G4) is a stable secondary structure of nucleic acid that can arise from single-stranded G-rich DNA and RNA sequences. 10 G4s are formed by Hoogsteen hydrogen bonding between four Gs to form a planar G-tetrad. They exhibit extremely high stability under physiological conditions in the presence of monovalent metal cations (such as Na + , K + , and Li + ) and are resistant to degradation by nucleases. 11 G4 regions in the nuclear genome form in G-rich sequences and have been visualized in human cells. 12 The formation of such G4s plays a role in various biological functions such as transcription, DNA replication, DNA damage repair, and telomere maintenance. Most importantly, many studies have shown that G4 structures are prevalent in the promoter regions of genes and play an important role in the regulation of gene expression. 13 Many in vitro techniques and computational analysis methods have been developed to explore the background characteristics of G4 sequences and their distribution in the genome of various organisms. One study mapped the formation of G4s in the DNA of multiple species of bacteria, plants, and eukaryotes including human, mouse, and Drosophila, validating the vast majority of G4 structures predicted by computational analysis and demonstrating the existence of G4s in the genome of a range of species. 14 The role of DNA secondary structures and their contribution to the evolution of species is poorly explored. One study comparing the genomes of multiple species showed that DNA secondary structures appeared in genomes from lower organisms and gradually evolved from lower to higher organisms. 15 It is hypothesized that the increase in G4 structures might facilitate the development of new gene regulatory mechanisms to achieve increasingly complex cellular, physiological, and behavioral activities. 16 Another study using a comparative bioinformatic analysis of seven species of Saccharomyces revealed that the G4 structures are relatively conserved throughout its evolution. 17 MtDNA is known to form more G4 secondary structures than genomic DNA because of the high G content of the H strand, but the evolution of G4s in mitochondria from lower-to higher-order organisms has not been investigated.
In this study, we performed a sequence-wide predictive analysis of G4 motifs in the mtDNA of different species ranging from unicellular to multicellular species and analyzed their evolutionary pattern. We found that the number of G4 DNA motifs increased from lower-to higher-order organisms, suggesting that mitochondria have evolved over time. The increase in the number of G4s with increasing complexity of the organism suggests that they have a potential functional role in facilitating this increasing complexity.

Identification of G4s in mitochondria of multiple species
Using the QGRS mapping tool, 18 we identified G4 motifs in the mtDNA of 16 different species belonging to different families ranging from unicellular to multicellular. Most mitochondrial genomes are between 14 and 17 kb, although the size of the plasmodium mtDNA is relatively small at 6 kb. We analyzed both the H and L strands and found that most of the G4-forming sequences lie on the H strand because of its greater G content. The mitochondrial reference sequences for different species were downloaded from the University of California Santa Cruz genome browser, and a phylogenetic tree was generated by performing multiple-sequence alignment using ClustalW followed by maximum likelihood analysis. The comparative analysis of the mitochondrial sequences of organisms from Saccharomyces to Homo sapiens showed an increasing number of G4s in the H strand of higher-order organisms (Figure 1). The number of G4 motifs ranged from a minimum of two in Drosophila to a maximum of 172 in H. sapiens. However, an analysis of the location of the G4-forming regions in various organisms identified an enrichment from lower-to higher-order organisms of G4 motifs in transcription start sites. 15 It was suggested that this characteristic plays a key role in various epigenetic mechanisms and helps the organism to evolve to meet physiological and biological challenges. The analysis of G4-forming sites in the mtDNA of various species showed an increase in G4 motifs with greater complexity of the organism (Figure 2b). The GC content was also found to increase with the complexity of the organism, suggesting that it correlates with increased G4 formation and plays a role in evolution

Potential role of mtG4s in different species
The G4 distribution in the mtDNA of the 16 species from yeast to mammals was plotted with respect to their genome size (Figure 3). The circos plot shows the varying GC content of the organism also the GC skew was very high in the higher organism evidently from the plot. The density of the G4 motifs in these species indicated an even distribution in the mitochondrial genome, but they were clustered densely in the higher mammalian species with a stepwise reduction to lower organisms. Interestingly, most of the G4s are located on the H strand, which acts as a template for very few genes, rather than the L strand, which has a high C content and acts as the template for most genes. Mitochondria are well known to accumulate reactive oxygen species (ROS), which are known to oxidize DNA. Of the four DNA bases, G is most frequently oxidized (to 8-oxo-7,8dihydroguanine ) by ROS because of its low redox potential. Because G4s are composed of several runs of Gs, they become an easy target for ROS that can affect DNA stability and in turn can modify cellular processes. 19 The increase of G4 in the H strand of mtDNA may act as a cellular stress to control mitochondrial replication. The formation of 8-OG in G4s has several implications: it is known to act as an on/off switch for transcription and may play an important role in the processes of mitochondrial replication and repair. 20 ) Figure 3. G4 localization in the mtDNA of 16 species based on the phylogenetic analysis. The outer circle represents the organism and its mtDNA followed by the GC content in the form of line plot, further followed by GC skew and the next circle shows the localization of the G4 plotted in the form of tiles and the inner circle represents the score of the G4-forming regions plotted in the form of a heatmap.
Human mtDNA contains 37 genes coding for two ribosomal RNAs, 22 transfer (t)RNAs, and 13 polypeptides. All 37 genes have homologs in most other species, although the arrangement of the genes varies between species. Mitochondrial translation differs from the cellular process, the key features being a reduced number of tRNAs and a reassignment of codons. To evaluate the consequences of increasing the level of G in the H strand compared with the major coding (L) strand, we compared the usage of 60 codons other than the four stop codons and ATG in the highly translated genes of the L strand from 11 species and normalized the data. As shown in Figure S2 we found that the lower eukaryotes have high levels of A-or T-rich codons, whereas the higher eukaryotes are rich in G or C. The levels of codons without C and G, such as TTT, ATT, and TTA, were unusually high in the lower-order organisms Drosophila, Caenorhabditis elegans, and Saccharomyces. We next compared the frequency of amino acids by normalizing using eight major coding genes from the L strand of the 11 species ( Figure S3). We found high levels of leucine in higher-order organisms where it is encoded by a CG-rich codon, whereas the lower-order organisms had high levels of phenylalanine, which is encoded by AT-rich codons. In addition, the lower-order organisms showed a strong preference for codons low in G or C content for all amino acids, indicating a strong codon bias, while the higher-order organisms used the codons equally with a slight preference for C-rich codons.
We then compared the codon usage of human mitochondrial genes with that of species such as chimpanzee, mouse, and C. elegans. We calculated the increase of C in the mtDNA across eight different protein-coding genes by aligning with the respective amino acid encoded, and devised a score for comparison. Based on the total C increase in humans, the respective amino acid codons in other organisms were given scores of 1, 2, 3 and -1, -2, -3 for the relative frequency of C (Table 1). This ideally should show the overall change in C content in the protein-coding genes compared with that in higher-order species. Most interestingly, we identified a greater increase in the C content of human mtDNA genes when compared with that of C. elegans. The overall increase in C content of the human mtDNA genes was less relative to mouse than compared with C. elegans.
The chimpanzee, which is closely related to human, had only a slight reduction in the C content compared with human, which indicated an increase in C compared with other species. The increase in C content from the lower-order to higher-order organisms may contribute to a functional role for the mitochondria in the development of the organism to adapt to its physiological needs and evolve. Interestingly, the GRSF1 protein is found only in vertebrates, which are known to have a high level of G4-forming sequences. During evolution, only the mtDNA has undergone a change from GC-poor to GC-rich involving some type of dedicated mechanism for control of G4-forming lncRNA. However, the significance of high levels of G4-forming lncRNA in the mitochondria has not been identified.
The lncRNAs in the nuclear telomere repeat-containing RNA (TERRA), which is G rich and has the potential to form G4s, are known to play an important role in the regulation of telomerase activity and in the protection of the DNA. TERRA is also known to play a role in protecting the cell from oxidative stress by scavenging ROS. 24 The lncRNAs may also play a role in scavenging ROS because the mitochondrion is a hot spot for various ROS that are deleterious to mtDNA.

Summary
In this study, the mtDNA of a range of species from lower orders to higher orders was selected and analyzed for G4 motifs using QGRS. Although this algorithm is widely used to predict G4 formation in nucleic acid sequences, it may have limitations in the identification of actual G4forming motifs inside cells. However, it can be used as a preliminary tool to detect intracellular G4forming sites, and with the advent of next-generation sequencing techniques, it is now possible to map these G4-forming sites in cells. Sequencing has confirmed that most of the computationally predicted sequences do form G4 in cells. The comparative analysis of the mtDNA of various species revealed an increased number of G4 sites in higher organisms. These may have played various roles in the evolution of multicellular organisms by regulating transcription and replication and scavenging ROS inside the cell. We also compared the C content of eight protein-coding mtDNA genes of humans with that of three other species (chimpanzee, mouse, and C. elegans) that are related more or less closely based on a phylogenetic tree. This analysis clearly identified an increase of C in the coding genes of humans in comparison with C. elegans and to a lesser extent with chimpanzees, the species most closely related to human. The increase in C in higher organisms may play a role in the development of the organism to meet changing physiological demands and other factors. The noncoding transcripts in mtDNA have unusually high levels of G compared with genomic transcripts. The increase in G in these noncoding transcripts means that they have a high likelihood of forming G4 structures, because it is well-known that RNA can also form secondary structures such as G4 that render it very stable. Because mitochondria are well known to produce ROS, the G4forming RNA may act as a scavenger or sensor for ROS inside the cell.
In summary, over long-term evolution, the level of G4 has gradually increased from lowerto higher-order organisms. This increase in G4 may have facilitated many aspects of mitochondrial replication and transcription to allow the development of multicellular organisms to meet cellular needs. The increase in G content may also have played a key role in acting as a signal and scavenger for ROS in cells. Our analysis may assist in an initial understanding of mtDNA evolution with respect to secondary structure, but further detailed analysis and confirmation of G4s in the mtDNA of various species is required to identify the potential role of G4.