Spectrum characteristics of motif subsets
In order to separate clearly the three motif subsets from the spectrum of tri-modal distribution, the total motifs were divided into XY0, XY1 and XY2 motif subsets (X,Y = A, C, G, T) according to the XY dinucleotide classification method (Online methods) and the spectra of these motif subsets were obtained in human, chicken and yeast genomes.
We found that only the spectra of CG0, CG1 and CG2 motif subsets turn up the independent unimodal distributions respectively (Fig. 1b) and the three spectra of CG2, CG1 and CG0 motif subsets corresponds strictly to the peak2, peak1 and peak0 distribution of the total motif spectrum respectively in human genome sequence. Thus, the total motifs located in the tri-modal spectrum are cleanly separated. The other 15 kinds of spectra of XY0, XY1 and XY2 motif subsets have not the distribution features and their spectra are still the tri-modal distributions which are similar to the spectrum of total motifs. For the other two representative genomes, the distribution features of CG0, CG1 and CG2 motif subsets is the same as that in human genome sequence, even though their spectra of total motifs are not tri-modal (Fig. 1c).. The multi-modal motif spectrum of genomes can be explained by the spectrum distributions of CG2, CG2 and CG0 motif subsets. If the distance among the spectrum of three subsets is far apart, the spectrum of total motifs superimposed by them is tri-modal. If the distance among the spectrum of three subsets is very close, the spectrum of total motifs superimposed by them is unimodal.
CG independent selection law of genome sequence
The spectra of total motifs and the 16 kinds of XY0, XY1 and XY2 motif subsets for animal genomes were analyzed. For 49 mammal genomes with obvious tri-modal motif spectra, we found that the spectrum features of XY0, XY1 and XY2 motif subsets are the same as that in human genome sequence and the total motifs are strictly separated into three independent subsets: CG0, CG1 and CG2 motif subsets (Fig. 1c).. We named this distribution property as Evolution Independence of genome sequences. Compared with the center frequency of the corresponding random sequences, we found that the frequency distribution of CG0 motif subset is around the random center, and the most probable frequency in CG1 and CG2 motif spectra is lower than that of the random center. It indicates that the occurrence frequencies of CG0 motifs is the result of random selection and the occurrence frequencies of CG1 and CG2 motifs is the result of directional selection. We named this distribution property as Evolution Selectivity of genome sequences. Besides, the spectrum distributions of CG1 and CG2 motif subsets are much narrower than that of CG0 motif subset (Fig. 1c).. It means that the occurrence frequencies of CG1 and CG2 motifs are more conservative than that of CG0 motifs. We named this distribution property as Evolution Conservatism of genome sequences. Generally, the k-mers with the properties of directional selection and conservative usage were considered as functional motifs. We found that only the spectral distributions of the three kinds of CG motif subsets have the three properties, the other 15 kinds of spectra of XY2, XY1 and XY0 motif subsets do not satisfy simultaneously the three properties in mammal genomes. We named this phenomenon as CG Independent Selection Law of genome sequences and it is abbreviated as CG independent selection law. For 63 other vertebrate and invertebrate genomes, though their motif spectra are quasi-di-modal or unimodal, we found that the spectrum distributions of CG0, CG1 and CG2 motif subsets also abide by the CG independent selection law (Fig. 1c)..
TA independent selection law of genome sequences
For plant, fungi and bacteria genomes, the CG independent selection law is obvious in some species genomes, but not obvious in some other species genomes. By observing the spectrum characteristics of XY2, XY1 and XY0 motif subsets in these genomes, we found that there is another type of independent selection law. The spectrum characteristics of TA2, TA1 and TA0 motif subsets still follow the properties: the evolution independence, the evolution selectivity and the evolution conservatism (Fig. 1e).. We named it as TA Independent Selection Law. The other 14 kinds of spectra of XY2, XY1 and XY0 motif subsets (besides three CG motif subsets) do not satisfy the three properties simultaneously.
Based on the above results, we re-examined the spectrum distributions of all motif subsets in all genomes from human to bacteria. We found that both CG independent selection law and TA independent selection law exist simultaneously in genomes. In general, CG independent selection law is obvious in the genomes of higher organisms and TA independent selection law is obvious in the genomes of lower organisms.
Quantitative characterization of CG and TA independent selections
In order to study the phenomena of the two independent selections, the quantitative characterization were given about the spectrum characteristics. For most of the species genomes, their spectra of total motifs and the XY motif subsets are unimodal. We found that these unimodal distributions are not the normal distribution, they are similar to theχ2 distribution with small degrees of freedom and odd to the left. Of this kind of distributions, its average frequency is correlated with its standard deviation. In order to use the average frequency and the standard deviation to describe the location and the degree of variation of the spectra independently, the actual distributions should be converted as close as possible to normal distributions. After the attempts, all of the motif frequencies were transformed by the square root transformation. We found that the transformed distribution is very close to the normal distribution (Fig. 1d, f)..
Based on the average frequency and the standard deviation of total motif spectra and XY motif spectra, the separability and the conservatism were used to characterize the spectrum distribution (Online methods). For a given spectrum, its separability value is denoted as δXYiand its conservatism value is denoted as ρXYi (X,Y = A,C,G,T and i = 0,1,2). The effects of motif absolute frequency and the genome scale are eliminated in the two parameters. So, the two parameters can be used to compare the difference among different spectra in a genome and among genomes. For the spectra of CG0, CG1 and CG2 motif subsets of a genome, their separability values are denoted as δCG0、δCG1、δCG2 and their conservatism values are denoted as ρCG0、ρCG1、ρCG2respectively. For the spectra of TA0, TA1 and TA2 motif subsets of a genome, their separability values are denoted as δTA0、δTA1、δTA2 and their conservatism values are denoted as ρTA0、ρTA1、ρTA2respectively. The values of these parameters for 920 genomes were showed in Additional file 1: Table S1.
In order to inspect the sensibility of the separability and conservatism, the two parameters were calculated for the 16 kinds of XY0, XY1 and XY2 motif spectra of 920 genomes and their variances were calculated. Results indicated that the variances of the separability and the conservatism of CG1/CG2 and TA1/TA2 motif spectra are obviously greater than that of the other XY1/XY2 motif spectra (Fig. 2a, b).. That means the spectrum characteristics of CG1/CG2 and TA1/TA2 motifs are more sensitive to species genomes than that of the other XY1/XY2 motifs. For 16 kinds of XY0 motif spectra, the variances are very small (Fig. 2c). It indicates that the sensitivities of CG0/TA0 motif spectrum to species genomes are obviously lower than that of CG1/CG2 and TA1/TA2 motif spectra. Therefore, we used the separability and conservatism values of CG1and CG2 spectra to represent the intensity of CG independent selection of a genome and that of TA1 and TA2 spectra to represent the intensity of TA independent selection of a genome. We found that some spectrum distributions of XY motifs still show up weak sensitivity, such as the spectra of the motifs containing GC, CC, GG and AT, TT, AA dinucleotide respectively (Fig. 2a, b). We thought that it is a synergistic effect under the affection of CG and TA independent selections. This phenomenon is worthy of further study.
Relations between the separability and the conservatism
We found that the distributions of the separability and the conservatism for CG1 and CG2 spectra and for TA1 and TA2 spectra are similar. So, the liner correlation analysis was done between the separability and the conservatism in each species group. Results showed that the separability of CG1 and CG2 motif spectra and of TA1 and TA2 motif spectra correlates significantly and positively with the conservatism of them (Table 2).. It is to say that the higher the separability is, the more conservative will be for the spectra of the motifs containing CG and TA dinucleotide. We named the distribution property as Evolution Correlation of genome sequences. It is the fourth property of CG and TA independent selection laws. Furthermore, we found that the separability of CG1 motif spectra correlates significantly and positively with that of CG2 motif spectra, the conservatism of CG1 motif spectra correlates significantly and positively with that of CG2 motif spectra. The same conclusions are also happened between TA1 and TA2 motif spectra (Table 2).. It indicates that CG1 and CG2 motifs subsets, TA1 and TA2 motifs subsets abide by the same kind of evolutionary selection pattern. We named the distribution property as Evolution Homoplasy of genome sequences. It is the fifth property of CG and TA independent selection laws. For different species groups, there are not consistent correlation patterns between the separability and conservatism of CG0/TA0 motif spectra. Some of them are positive and some of them are negative. We thought that CG0 and TA0 motifs are the fundamental ‘materials’ and reflect the basic structures of a genome sequence.
It is found that there are exceptions in higher animal genomes. In primate genomes, both the evolution correlation and the evolution homoplasy for TA1 and TA2 motif spectra have disappeared. That means the TA independent selection has disappeared in primate genomes. In rodent, mammal and other vertebrate genomes, the evolution correlation for TA1 and TA2 motif spectra has disappeared, but the evolution homoplasy for TA1 and TA2 motif spectra still exist (Table 2).. With the levels of genome evolution increasing, our results show that what disappeared first are the evolution correlation, and then the evolution homoplasy.
In a word, we found out the CG and TA independent selection laws in genome sequences by analyzing the intrinsic laws of k-mer spectra of genome sequences. The CG and TA independent selection laws have five properties: evolution independence, evolution selectivity, evolution conservatism, evolution correlation and evolution homoplasy.
Table 2 Linear correlation coefficients between the separateness and conservatism of three CG 8-mer spectra and the three TA 8-mer spectra
|
Pri
|
Rod
|
Mam
|
Vrt
|
Inv
|
Dic
|
Mon
|
Gal
|
Sac
|
Aga
|
Pez
|
Arc
|
Eub
|
|
(13)
|
(14)
|
(22)
|
(20)
|
(43)
|
(28)
|
(11)
|
(21)
|
(54)
|
(73)
|
(118)
|
(200)
|
(300)
|
δCG0-ρCG0
|
-0.619*
|
-0.726**
|
-0.612**
|
-0.885**
|
-0.505**
|
-0.330
|
-0.105
|
0.745**
|
-0.287*
|
0.119
|
0.406**
|
0.865**
|
0.863**
|
δCG1-δCG2
|
0.921**
|
0.660*
|
0.928**
|
0.977**
|
0.968**
|
0.938**
|
0.801**
|
0.982**
|
0.981**
|
0.920**
|
0.871**
|
|
|
δCG1-ρCG1
|
0.801**
|
0.736**
|
0.911**
|
0.960**
|
0.909**
|
0.872**
|
0.856**
|
0.893**
|
0.975**
|
0.925**
|
0.895**
|
0.948**
|
0.929**
|
δCG2-ρCG2
|
0.941**
|
0.899**
|
0.929**
|
0.850**
|
0.846**
|
0.769**
|
0.983**
|
0.974**
|
0.928**
|
0.863**
|
0.730**
|
|
|
ρCG1-ρCG2
|
0.902**
|
0.865**
|
0.947**
|
0.887**
|
0.881**
|
0.801**
|
0.746**
|
0.929**
|
0.985**
|
0.952**
|
0.952**
|
|
|
δTA0-ρTA0
|
-0.210
|
0.637*
|
0.233
|
0.165
|
0.793**
|
0.675**
|
0.622*
|
-0.150
|
0.387**
|
-0.765**
|
0.211*
|
0.569**
|
0.284**
|
δTA1-δTA2
|
0.157
|
0.708**
|
0.830**
|
0.968**
|
0.913**
|
0.882**
|
0.897**
|
0.976**
|
0.977**
|
0.961**
|
0.701**
|
|
|
δTA1-ρTA1
|
-0.183
|
0.457
|
0.180
|
0.321
|
0.546**
|
0.475*
|
0.753**
|
0.958**
|
0.861**
|
0.788**
|
0.876**
|
0.950**
|
0.849**
|
δTA2-ρTA2
|
0.464
|
0.657*
|
0.081
|
0.141
|
0.736**
|
0.459*
|
0.877**
|
0.955**
|
0.766**
|
0.753**
|
0.865**
|
|
|
ρTA1-ρTA2
|
0.370
|
0.815**
|
0.452*
|
0.486*
|
0.726**
|
0.814**
|
0.913**
|
0.948**
|
0.857**
|
0.914**
|
0.668**
|
|
|
Note:Two-tailde significance:*,P<0.05;**,P<0.01.
Independent selection laws and genome evolution
According to the species taxonomy (Table 1 and Additional file 1: Table S1),, the point estimation values (p<0.05) of the separability and the conservatism were calculated in each species group (Additional file 2: Table S2).. Results showed that the average values of δCG1/δCG2and ρCG1/ρCG2 have obvious difference among different species groups and their confidence intervals are large. The separability and conservatism of CG1 and CG2 motif spectra correlate positively with the levels of genome evolution and are very sensitive to the genomes within species groups (Fig. 2d).. The average values of δCG0and ρCG0 change little among different species groups and their confidence intervals are relatively small. That means the separability and the conservatism of CG0 motif spectra are not sensitive to the genome evolution and the genomes within species groups relatively. Similar, the average values of δTA0and ρTA0 also change very little among species groups and their confidence intervals are relatively small. That means the separability and conservatism of TA0 motif spectra are not sensitive to the genome evolution and the genomes within the species groups. The average values of δTA1/δTA2andρTA1/ρTA2have obvious difference among species groups and their confidence intervals are relatively large. The separability and conservatism of TA1 and TA2 motif spectra correlate negatively with the levels of genome evolution and are very sensitive to the genomes within the species groups, especially in lower species groups (Fig. 2e)..
Intensity distributions of CG and TA independent selections
The above analysis indicated that there is a significant linear positive relationship between the separability and the conservatism of CG1 and CG2 spectra and of TA1 and TA2 spectra. Therefore, we only chose the separability value of CG1 spectrum (δCG1) to represent the intensity of CG independent selection and chose the separability value of TA1 spectrum (δTA1) to represent the intensity of TA independent selection for each genome. For the convenience of comparison, the average value of the separability δ1of the other 14 XY1 motif spectra is taken as a criterion to represent the background value in each genome. If δCG1>δ1 or δTA1>δ1, the CG independent selection or the TA independent selection is considered to be obvious. In the following distribution figures (Fig. 3),, the genomes in the abscissa are arranged in order of δCG1 values from small to large, the names and sort orders of the genomes in each figure are shown in Additional file 1: Table S1.
Animal genomes: Generally, the CG independent selection is much significant and the TA independent selection is weak in animal genomes. We found that the intensity distribution of CG independent selection presents a positive relationship with the levels of genome evolution. In some genome groups, such as vertebrate genomes, the intensity values of CG independent selection change greatly. We considered that the intensity of CG independent selection can also reflect the evolution rate of a species genome. If the intensity value is higher than its average value of a species group, the genome is evolving fast. On the contrary, if the intensity value is lower than its average value of a species group, the genome is evolving slowly. For instance, lamprey (the leftmost data of the other vertebrate group in Fig. 3a) has the smallest intensity of CG independent selection. It indicates that lamprey genome is evolving very slowly. The result is consistent with the conclusion of biologists. Of medium ground finch, zebra finch and budgerigar in other vertebrate genomes, opossum and wallaby in other mammal genomes, Chinese hamster and squirrel in rodent genomes (data in the right end of the corresponding species groups in Fig. 3a),, these species have obviously higher intensity values. We considered that these organisms are evolving fast in their corresponding species groups.
The intensity values of TA independent selection fluctuate among the background δ1values in vertebrate group and basically disappear in other mammal, rodent and primate groups. It indicates that the TA independent selection is disappearing with the increasing of the genome evolution levels in vertebrate genomes. The results are consistent with the analysis in Table 2. In invertebrate genome group, we found that the intensity distribution of CG independent selection presents a negative correlation with that of the TA independent selection. The TA independent selection is inhibited by the CG independent selection. We named this phenomenon as TA Inhibition. Meanwhile, we found that the intensity values of TA independent selection is even lower obviously than the background value δ1 while the intensity values of CG independent selection are obviously high in some invertebrate genomes. It means the TA independent selection was inhibited strongly by the CG independent selections. We named this phenomenon as Strong TA Inhibition.
Plant genomes: We found that the negative correlation between CG and TA independent selections is obvious and not only the TA independent selection is inhibited by the CG independent selection but also the CG independent selection is inhibited by the TA independent selection. Combined the results in invertebrate genomes, we concluded that the inhibition phenomenon is mutual. We named the phenomenon as Mutual Inhibition (Additional file 3:Animation S1).When comparing the distribution modes of some green algae genomes with that of dicotyledon genomes (Fig. 3b),, we found that the phenomenon of the strong mutual inhibition also exists in plant genomes.
According to the general consensus of biologists, the order of species evolution level is green algae, pteridophyte, monocotyledons and dicotyledons. We found that the intensity of CG independent selection correlate positively and of TA independent selection correlate negatively with the levels of genome evolution. Of the leftmost five green algae genomes in Fig. 3b, Bathycoccusprasinos, Coccomyxa subellipsoidea, Eudorina, Monoraphidium and Picochlorum, their TA independent selection are obviously high and there are strong CG inhibition. We thought that the five green algae genomes are evolving very slowly or they are ancient species.
Fungus genomes: We found that the phenomenon of the mutual inhibition and the strong mutual inhibition also exists in fungus genomes. The intensity distribution pattern of Agaricomycotina and Pezizomycotina genomes is similar, and they are different from the distribution pattern of Saccharomycetales genomes (Fig. 3c-e).. In Agaricomycotina and Pezizomycotina genomes, the TA independent selection is very obvious and the strong CG inhibition is present, but there is no strong TA inhibition. In the Saccharomycetales genomes, the CG independent selection is very obvious and the strong TA inhibition is present, but there is no strong CG inhibition. The independent selection pattern of Saccharomycetales genomes is similar to that of invertebrate genomes.
We found that the independent selection pattern of species genomes is closely related to the life habits of the species. For Agaricomycotina genomes (Fig. 3c) with remarkable TA independent selection and strong CG inhibition, such as Trametes versicolor and Trametes pubescens, they usually grow on trees, and with remarkable CG independent selection, such as Termitomyces and Leucoagaricus, they usually live in the soil. For Pezizomycotina genomes (Fig. 3d) with remarkable TA independent selection and strong CG inhibition, such as Purpureocillium lilacinum and Tolypocladium phioglossoides, they mainly infect plants, and with remarkable CG independent selection, such as Blastomyces gilchristii and Blastomyces dermatitidis, they mainly infect high animals, such as human. In Saccharomycetales genomes, the genome with remarkable CG independent selection and strong TA inhibition is Banseniaspora guilliermondii (Fig. 3e)..
Prokaryote genomes: We found that the mutual inhibition and the strong mutual inhibition relationships exist obviously in prokaryote genomes. The independent selection pattern of species genomes is closely related to the life habits of the species.
In archaea genome group, the genomes with remarkable TA independent selection and strong CG inhibition are Halobacteria and Methanomicrobia genomes. Conversely, the genomes with remarkable CG independent selection and strong TA inhibition are Methanomada and Thermoprotei genomes (Fig. 3f).. For Halobacteria genomes with most remarkable TA independent selection and strong CG inhibition, such as Halosimplex sp. TH32,Halarchaeum acidiphilum MH1–52–1 and Halarchaeum sp. CBA1220 (The leftmost genomes in Fig. 3f),, they are the halophilic bacteria. For the Methanomada genomes with remarkable CG independent selection and strong TA inhibition, some of them have been found in deep-sea hydrothermal vents or marsh gas environments, such as Methanocaldococcus jannaschii and Methanobrevibacter arboriphilus ANOR1, some of them have been found in the stomachs, the intestines and gingiva of animals, such as Methanosphaera cuniculi, Methanobrevibacter olleyae and Methanobrevibacter oralis (The rightmost genomes in Fig. 3f)..
In eubacteria genomes, most of Actinobacteria and part of Proteobacteria genomes have remarkable TA independent selection and strong CG inhibition, and most of Spirochaetales genomes and part of Firmicutes genomes have remarkable CG independent selection and strong TA inhibition (Fig. 3g).. For Actinobacteria genomes with most remarkable TA independent selection and strong CG inhibition, such as Agrococcus sp. SGAir0287, Agrococcus jejuensis and Agrococcus carbonis (the leftmost three genomes in Fig. 3g),, they live in soil and water and usually infect plants and fungi. For Spirochaetales genomes with most remarkable CG independent selection and strong TA inhibition, such as Borrelia recurrentis,Borrelia duttonii Ly and Borrelia miyamotoi LB–2001 (the rightmost three genomes in Fig. 3g),, they live in soil and decaying organic matter and usually infect animals, such as human.
Based on our results, we proposed an evolution mechanism of genomes in which genome evolution is determined by the intensities of CG and TA independent selections and the mutual inhibition relationship between CG and TA independent selections.
Independent selection laws and G+C content of genome sequences
G+C content is the most basic characteristic quantity to describe the composition of DNA sequences. Here, we analyzed the relations between the two independent selection laws and G+C content of genome sequences. Besides primate and rodent genomes, we found that δCG1/δCG2and ρCG1/ρCG2 correlate negatively and significantly with G+C content of genome sequences, δTA1/δTA2 and ρTA1/ρTA2 correlate positively and significantly with G+C content of genome sequences. It indicates that the intensities of CG independent selection correlates negatively and of TA independent selection correlates positively with G+C content of genome sequences. In primate and rodent genomes, there are not significant correlations between CG independent selections and G+C content, and there are not consistent correlations between TA independent selections and G+C content (Additional file 4: Table S3 and Fig. 4).. We thought that disappeared mutual inhibition relationship in primate and rodent genomes is the main reason to bring about the correlations weakened or disappeared.
We can explain the correlation between the independent selections and the G+C content by the mechanism of genome evolution. In theory, the average G+C content is 65.34% in CG1 6-mer subset, 43.75% in CG0 6-mer subset, 34.65% in TA1 6-mer subset and 56.24% in TA0 6-mer subset. When δCG1 goes up, because the mutual inhibition between the CG and TA independent selections, δTA1 must go down. That is to say x̅CG1 goes down or the total number of 6-mers appeared in CG1 6-mer spectrum decrease and x̅TA1 goes up or the total number of 6-mers appeared in TA1 6-mer spectrum must increase (Additional file 5: Figure S1A, B and Additional file 3: Animation S1).. The number of CG1 6-mers with high G+C content decrease and the number of TA1 6-mer with low G+C content increase, the two situations lead to a decrease of G+C content in genome sequence. Conversely, when δCG1 goes down, δTA1 must go up. The two kinds of situations must lead to a increase of G+C content in the genome sequence (Additional file 5: Figure S1C, D and Additional file 3: Animation S1).. Thus, the deeper biological significance of G+C content was revealed through the mechanism of genome evolution. We concluded that G+C content of genome sequences is a comprehensive representation of genome evolution.
Evolution states and process of prokaryote genomes in the beginning of life
The intensity distributions of the two independent selections for 920 genomes (Fig. 3) showed us abundant images which represent not only the evolution state of genomes at present but also the evolutionary states and evolutionary process of genomes at the early stages of life. Here, we used the idea in astronomy to speculate the process of genome evolution in the early stages of life. In astronomy, the evolution of stars can be obtained by studying the sky photographs at present. Countless stars show us not only the state of stars at “present” but also the evolution process of stars, which can be derived by a variety of stars with different evolving states. In the maps of the intensity distribution, when the abscissa is considered a variety of species genomes, the distribution represents the evolution state of different species at present. When the abscissa is considered a timeline of one species genome, the distribution represents the evolution process of the species from ancient to now.
Our results indicate that higher organisms usually have obvious CG independent selection and lower organisms usually have obvious TA independent selection. It is known that prokaryotes first originated in oceans and lakes and there was no oxygen in it and in the earth’s atmosphere. It is called anaerobic environment. If the intensity distribution of eubacteria genomes (Fig. 3g) is considered as a timeline of one species genome, we could conclude that the TA independent selection is the dominated mode of prokaryote genomes in the anaerobic environment. The TA independent selection mode was suitable for prokaryotes to live in the anaerobic environment.
The intensity distribution of archaea genomes showed us another images. Although the TA independent selection is the main evolution mode of early prokaryotes under the anaerobic environment on the earth, in order to live around the extreme environment, such as living in the deep-sea hydrothermal vents or marsh gas, some of the prokaryotes were changing gradually the evolution mode from TA independent selection to CG independent selection to adapt to the extreme environment, but the TA independent selection must be inhibited. In order to live in salt water (another kind of extreme environment), some of the prokaryotes still insisted on the evolution mode of TA independent selection, but the CG independent selection must be inhibited (Fig. 3f and “Prokaryote genomes” section). We can see that the mechanism of mutual inhibition between CG and TA independent selections is a nature selected way for prokaryotes living in the extreme environments. Thus, there were three kinds of prokaryotes in that time. The first one was the prokaryotes with obvious CG independent selection and TA inhibition, and the second one was the prokaryote with obvious TA independent selection and CG inhibition. They all lived in extreme environments. The two kinds of prokaryotes are so called as archaea bacteria. The third one was the prokaryote with obvious TA independent selection, but they lived in a relatively mild environment and they are so called as eubacteria. Archaea has stronger environmental adaptability and evolutionary ability than eubacteria.
As early prokaryotes gradually released oxygen, the concentration of oxygen in the earth’s oceans and lakes gradually increased, and so did the concentration of oxygen in the atmosphere. The aerobic environment is another extreme environment. From the anaerobic environment to the aerobic environment, all of the prokaryotes had to adapt to the aerobic environment. For eubacteria, the TA independent selection mode could not adapt to the oxygen increasing in a short time, due to the weak adaptability, most of them died out and only few of them survived. This is the great oxidation event 2 billion years ago. When standing in the perspective of the timeline to observe the intensity distribution of eubacteria (Fig. 3g),, it is found that the survived eubacteria had to adopt the two different strategies to live in the aerobic environment. The one is to enhance the intensity of TA independent selection, but the intensity of CG independent selection must be inhibited. The other strategy is to try to transform the mode of TA independent selection into the mode of CG independent selection and the TA independent selection must be inhibited. We can see that the mechanism of mutual inhibition between CG and TA independent selections is a nature selected way for eubacteria living in the aerobic environments. For archaea with obvious CG independent selection and TA inhibition, the aerobic environment was exactly a suitable environment, this condition prompted the archaea left the extreme environment and could live in everywhere under the aerobic environment. For the archaea with obvious TA independent selection and CG inhibition, this evolution pattern was also suitable for the archaea living in aerobic environment. Then, this kind of archaea could also leave the salt environment and live in everywhere under the aerobic environment (Fig. 3f).. We considered that the strong adaptability and evolutionary ability of archaea genomes and the stimulation of the aerobic environment are the main reasons to lead to the species transformation from prokaryote to eukaryote.
Origination of eukaryote genomes
When standing in the perspective of the timeline and comparing the intensity distributions of eukaryote genome groups in the early stages of life (left parts in Fig. 3), we found that there are two different distribution patterns. The one is it happens in animals and Saccharomycetales and their intensity distributions are similar. The common features are that the CG independent selection is obvious and the TA independent selection is inhibited (see the right part of Fig. 3a, e). The other is it happens in plants, Agaricomycotinas and Pezizomycotinas and their intensity distributions are similar. The common features are that the TA independent selection is obvious and the CG independent selection is inhibited (see the left part of Fig. 3b-d).. Carl Woese’s Three Domain theory pointed out those eukaryotes originated from archaea and not from eubacteria [22]. We found that the evolution pattern of the archaea genomes with obvious CG independent selection and TA inhibition is similar to that of animals and Saccharomycetales, the evolution pattern of the archaea with obvious TA independent selection and CG inhibition is similar to that of plants, Agaricomycotinas and Pezizomycotinas. Based on the continuity and the similarity of genome evolution modes, we considered that animals and Saccharomycetales originated from the archaea with obvious CG independent selection and TA inhibition, and plants, Agaricomycotinas and Pezizomycotinas originated from the archaea with obvious TA independent selection and CG inhibition.
The independent selection modes and the life habits of species
We found that the independent selection mode adopted by species genomes is closely related to the living habits of the species (see ‘Prokaryote genomes’ sections). The eubacteria with obvious TA independent selection and strong CG inhibition, such as Actinobacteria, usually live with or infect the eukaryote species which originated from the archaea with obvious TA independent selection and CG inhibition, such as plants. The eubacteria and archaea with obvious CG independent selection and strong TA inhibition, such as Spirochaetales and Methanomada, usually live with or infect the eukaryote species which originated from the archaea with obvious CG independent selection and TA inhibition, such as animals. That means similar evolution modes of genomes determine the interaction preference between prokaryotes and eukaryotes.