Taxonomic and functional categories
The list of arcahea was initially divided into three major groups according to the taxonomic groups listed on the COG database to observe the distribution of extremophiles and other archaea among these groups. Crenarchaeaota had 21 species, all of them thermophilic or hyperthermophilic, seven acidophilic, and one metalotolerant. Euryarchaeaota had 56 species, 11 hyperthermophilic, two psychrophilic, 21 halophilic, six alkaliphilic, two radiotolerant, seven acidophiles and 21 methanogens. Finally, Thaumarchaeaota (THAU) had only four species, two of them psicrophilic and the other two non-extremophilic.
We obtained a series of line graphs for each type of extremophile, for each archaeal clade, and for the whole group of archaea; and tables with the mean values and standard deviations (resumed in Table 2) used to generate the graphics. A table listing the mean values and the standard deviation of the number of COGs and percentages in each functional category for each extremophilic group can be seen Table 2 and a comparative graphic representing the data in the line graphs for the whole group of archaea and for every extremophilic group can be seen in Figure 1 which was used for an initial exploration of the differences among functional categories.
Two tables were prepared using the data obtained for every group. These (Table 3 and table 4) show the standard deviations of every extremophilic and taxonomic group separated by sizes for the most divergent functional categories observed in Figure 1 (K, L, T, O, C, E, H, P for quantities and L, P, J, T, C, G, E, H, P for percentages). We used these two tables to contrast the functional categories with higher variation among groups with the variation within each group so we could identify changes that could only be related to the extremophilic groups.
We can observe that as expected some of the functional categories which have the highest differences among all archaea have little variation in certain groups. Metabolic categories remain with high variability among most groups (CEHP for Q; CGEH for %) while K (Transcription) and L (Replication, recombination and repair) have little variation for Q, especially in extremophilic groups; and for % the categories with less variation vary from group to group. In table 4 we can see only the categories that presented the highest differences among all archaea. Categories with standard deviation values < 10 o <1% will be referred as categories of interest. The factorial analysis allowed grouping into four protein components. The first is made up of the COGs T, L, D, K, P, H, X, B, U, O, R, S, Z, M, N, E, F, Q, I, W, G, and V. The second is comprised of COGs J and A, and the third consist only of COG W (p<0.05) (Data in Suppl 1).
In order to observe variations specific for any extremophilic group among the categories of interest identified, we combined the line graphs of each extremophilic group with the taxonomic groups they belong to.
Hyperthermophiles and Thermophiles (HT)
In figure 2 we compare line graphs of the HT group with the two taxonomic groups to which they belong, Crenarchaeaota (CREN) and Euryarchaeaota (EURY). We can observe a very similar amount and proportion of COGs of the T category (Signal Transduction Mechanisms) for HT and CREN without a displacement of the HT line towards EURY in spite of HT having 11 species in the EURY group which is one third of the total HT species. We can also observe that the other categories of interest for HT (K: Transcription, L: Replication, recombination and repair, O: Post translational modification, C: Energy production and conversion, P: Inorganic ion transport and metabolism) have similar values between HT and CREN but with a little displacement to the line of EURY.
Halophiles (HL)
We have that the 21 halophiles (HL) included in this study belong to the euryarchaeota group (EURY) comprising about one third of the group (21 of 56 species). Here we can see that categories of interest are coincident between the two groups and we can only observe relatively high differences for categories T (Transduction) and E (Transport and metabolism of aminoacids); the distribution of COGs, as seen in the percentages, is almost identical for both groups with a little increase in the category T.
Acidophiles
Acidophiles are evenly distributed in Crenarchaeaota (7 species that are also thermophiles) and Euryarchaeaota (7 species, from which 5 are also thermophiles) (Figure 4)
Psychrophiles
Psychrophiles show more divergent lines when compared with the Thaumarchaeaota group (THAU), although only for the amount of COGs per category, we can observe an increase in the amount of COGs for K, L, T C, E, H and P and an increase in the distribution of COGs for T (Figure S1).
Alkaliphiles, Radiotolerants, and Metalotolerants
The six alkaliphilic archaea in this study belong to the Euryarchaeaota group, and in figure S2 we can observe little difference between these groups. We also show the graphs for radio tolerant and metalo tolerant archaea (figures S3 and S4), but as there are only 2 radiotolerant archaea and 1 metalotolerant archaeon we cannot make any assumptions on the distribution of their COGs.
Methanogens
Methanogens are not extremophiles but comprises almost all the non-extremophilic archaea listed in the COG database and were used as a control group for this study. There are 24 methanogens, all of them in the Euryarchaeaota group, and as we can see in the graphs in Figure S5.