Reconstruction of the giardial nucleolar proteome (GiNuP)
To obtain a relatively complete nucleolar proteome of G. lamblia, we have used two independent methods to bioinformatically identify putative nucleolar proteins in the genome of this protist: homology search based on the known nucleolar proteins of the three higher eukaryote representatives and de novo prediction by analyzing protein sequence features. For homology search, 38 candidate Giardia orthologs were obtained when blasting with 246 yeast nucleolar proteins as queries. Analogously, 57 and 189 candidate orthologs were obtained when blasting with 281 A. thaliana and 4705 human nucleolar proteins as queries, respectively. All the Giardia nucleolar proteins orthologous to those of H. sapiens, A. thaliana, and S. cerevisiae were collected together. After discarding the redundant ones, 237 Giardia nucleolar protein candidates were obtained. Subsequent domain analyses of these obtained protein sequences by using PFAM online service showed that 216 ones possess characteristic domains of various nucleolar proteins. They were further confirmed to be nucleolar proteins by Blast searching against the nr protein database in NCBI. Finally, 216 orthologs to the nucleolar proteins of the three representative eukaryotes were identified in the G. lamblia genome database by the homology search approach (Supplementary Table S1).
Since all the available nucleolar proteomes of the three higher eukaryotes each possess their own specific proteins that do not have any homologs in the other two proteomes, it is reasonable to image that G. lamblia, though much more ancient, also has its own specific nucleolar proteins, which are not present in other species. Therefore, to identify such putative Giardia specific nucleolar proteins, we investigated all the Giardia proteins in the genome database to identify those ones that would be predicted to localize to the nucleolus from all the nuclear proteins. First, we got 172 Giardia nuclear proteins by predicting to have nuclear location signal. We also used ‘nucleus/nuclear’ or “nucleolus/nucleolar” as key words to screen the G. lamblia genome database, and obtained 25 annotated nuclear/nucleolar proteins. Then all the 197 (172+25) nuclear proteins were further subjected to the protein sub-localization prediction, and 55 of them were predicted to be most likely localized to the nucleolus.
Altogether, finally 255 (216+39) nucleolar proteins were identified in the G. lamblia genome database after discarding the redundant ones, which includes 216 orthologs to the nucleolar proteins of the three representative eukaryotes and 39 Giardia-specific nucleolar proteins (Table S1). Based on the reported RNA-Seq data of G. lamblia [29], 246 of the 255 identified nucleolar proteins in the G. lamblia genome database were predicted from the transcriptome, which not only helped the annotation of our identified proteins but confirmed that almost all the genes of the identified nucleolar proteins above are transcribed in Giardia.
Thus, we have reconstructed a putative nucleolar proteome of G. lamblia (GiNuP), which contains 255 individual nucleolar proteins.
Reconstruction of the ‘Higher Eukaryote Basic Nucleolar Proteome (HEBNuP)’
To compare the GiNuP with the nucleolar proteomes of the three representatives of higher eukaryotes, we investigated the orthologous relationships between either two or among all the three higher eukaryotes by identifying the nucleolar proteins that are present in all the three genomes. Because of the relatively far less protein numbers in both the nucleolar proteomes of Arabidopsis and budding yeast, to avoid the possible incompleteness of them, we collected all the ortholog groups with the presence of human nucleolar proteins. This investigation revealed the following orthologous relationships: 1) there are 1058 orthologous groups between human nucleolar proteome and Arabidopsis whole proteome, containing 2341 human nucleolar proteins and 2780 Arabidopsis proteins, respectively; 2) there are 856 orthologous groups between human nucleolar proteome and budding yeast whole proteome, containing 1946 human nucleolar proteins and 1078 yeast proteins, respectively; 3) there are 799 orthologous groups among human nucleolar proteome, the whole proteome of Arabidopsis, and budding yeast proteome, containing 1848 human nucleolar proteins, 2227 Arabidopsis proteins, and 1015yeast proteins, respectively (Fig 1 and Supplementary Table S2). As a whole, we called these 799 orthologous groups as ‘Higher Eukaryote Basic Nucleolar Proteome (HEBNuP)’.
The functional inventories of the proteins in the HEBNuP and the GiNuP
The results of functional inventory of the 1848 human nucleolar proteins in the HEBNuP is as follows (Fig 2A): 1) 218 (12%) belong to the “Ribosome related” class; 2) 220 (12%) belong to the “mRNA related” class; 3) 222 (12%) belong to the “Translation related” class; 4) 176 (9.5%) belong to the “DNA binding” proteins; 5) 69 (4%) belong to the “Chromatin related” class; 6) 86 (5%) belong to the “Mitotic cell cycle related” class; 7) 857 (46.5%) belong to none of the six classes, and thus we classify them as “undefined function” class.
The results of functional inventory of the 255 proteins in the GiNuP is as follows (Fig 2B): 1) 73 (29%) proteins are classified among the “Ribosome related” proteins; 2) three (1%) belong to the “mRNA related” class; 3) 12 (5%) belong to the “Translation related” class; 4) 12 (5%) belong to the “DNA binding related” class; 5) six (2%) belong to the “Chromatin related” class; 6) one (0.4%) belong to the “Mitotic cell cycle related” class; 7) 148 (57.6%) belong to the “undefined function” class.
Comparative analysis between the GiNuP and the HEBNuP
To explore the evolution of nucleolus, we compared the GiNuP and the HEBNuP in terms of protein homology and function. From the above results, we know that the HEBNuP consists of 799 orthologous groups, which contains 1848 individual human nucleolar proteins -- the HEBNuP-Hu protein dataset, and that the GiNuP dataset contains 255 orthologous groups and Giardia nucleolar proteins. Since the nucleolar proteome of human seems to be the most complete one among those of the three higher eukaryotes, thus the nucleolar protein groups in HEBNuP-Hu protein dataset were used as representatives of HEBNuP to compare with those in GiNuP in the following analysis.
Comparison of the GiNuP with the HEBNuP in terms of protein homology shows that: 1) 200 orthologous groups (containing 200 individual Giardia nucleolar proteins) are shared by GiNuP and HEBNuP, which make up the HEBNuP-GiNuP-shared dataset, indicating that 78.4% (200 out of 255) of the Giardia nucleolar protein orthologous groups (also the individual proteins) all have their orthologs in the HEBNuP, but these orthologs only occupy 25.0% of the orthologous protein groups of the HEBNuP (and the Giardia nucleolar proteins only occupy 13.8% of the individual human nucleolar proteins in the HEBNuP and HEBNuP-Hu), which means that the majority of Giardia nucleolar proteins belong to the common/basic nucleolar proteins of the higher eukaryotes, and in higher eukaryotes the common/basic nucleolar proteins are much more than in Giardia; 2) 55 Giardia nucleolar orthologous groups (containing 55 individual Giardia nucleolar proteins) are specific to GiNuP, which make up the dataset we call GiNuP-specific datase; 599 orthologous groups (containing 1253 individual human nucleolar proteins) in HEBNuP are specific to HEBNuP, which make up the dataset we call HEBNuP-specific dataset.
The functional distributions of the nucleolar orthologous protein groups in the five datasets mentioned above are shown in Fig 3, and the proportions of the annotated proteins for each nucleolar functional class are shown in Fig 4. Functional distribution comparison of the proteins in the GiNuP with those in the HEBNuP shows that: 1) 68.2% of the annotated proteins in the GiNuP dataset and 68.9% in the HEBNuP-GiNuP-shared dataset are involved in the “Ribosome related” function, respectively, implying that the majority of the annotated Giardia’s nucleolar proteins participate in the “Ribosome related” function, and that these proteins still perform this function in higher eukaryotes; the other about 31% of the annotated proteins in these two datasets are involved in the other five functions, respectively, implying that besides the major “Ribosome related” function, the other five nucleolar functions also exist in Giardia’s nucleolus, though with a very few proteins to perform them, and that these few proteins still perform the five functions in higher eukaryotes. 2) Half (50%) of the annotated proteins in GiNuP-specific dataset are classified into the “Ribosome related” functional class, 25% are classified into the “DNA binding related” functional class, and the other 25% are classified into the “Translation related” functional class, and none are classified into the other three functional classes; 22.7%, 25%, 27.7%, 10.6%, 2.7%, and 11.2% of the annotated proteins in HEBNuP-specific dataset are classified into the “Ribosome related”, “DNA binding related”, “Translation related”, “Chromatin related”, “mRNA related”, and “Mitotic cell cycle related” functional classes, respectively, which means that the basic “Ribosome related” function of nucleolus also needs lineage- and even species-specific protein components to perform it in a certain lineage or species, and so do the other five nucleolar functions; and that such specific proteins, especially those for the other five functions, continuously increased in the evolution of eukaryotes. Besides, obviously, for both the GiNuP and the GiNuP-specific datasets, the proportions of annotated proteins involved in the other five functional classes all are much fewer than those involved in the “Ribosome related” function, while for the HEBNuP-Hu dataset and the HEBNuP-specific dataset, the proportions of nucleolar proteins involved in the other five functions increase much more substantially, compared with those involved in the “Ribosome related” function. This implies that the “Ribosome related” function should arise and consummate earlier than the other five functions, and the other five ones became more and more consummate and complicated latter, especially in the evolution of higher eukaryotes.