Novel proteins and proteome mining
From the 3,182 proteins detected in P. lunula, 175 (5.49 %) turn out to be sequences that do not have any homology with any protein identified
so far in the different databases. Within this group, 59 stand out (PSM ≥ 5) and should be considered with a high level of confidence as novel proteins [see
Additional files 5 and 2]. This is not unexpected since the transcriptomics data sets
of dinoflagellates has showed, among other things, that in some cases more than 50
% of the genes do not have homology with any gene documented so far [
31
</a>]. Many genes appear to be unique in dinoflagellates. For example, although many of
the redox-regulated genes identified in <em>P. lunula </em>are conserved, the majority, do not shown significant similarities with other genes
in the databases as around 68 % were novel [<a href="#_ENREF_9">
9
</a>].</p>
Evolutionarily, dinoflagellate nuclear genomes are very dynamic, because they have
obtained plastid targeted genes through successive horizontal gene transfer from different
kinds of sources (tertiary replacement plastid, peridinin plastid, bacteria, cyanobacteria,
haptophyte, red and green algae), rising to a highly chimeric nuclear genome [
60-63
</a>]. Furthermore, segmental genome duplications [<a href="#_ENREF_64">
64
</a>, <a href="#_ENREF_65">
65
</a>], and retro-transposition of cDNA to the genome [<a href="#_ENREF_66">
66
</a>], are likely contribute to the redundant of genes and expansion of these genomes
[<a href="#_ENREF_28">
28
</a>]. Due to their immense proportions, only a few dinoflagellate genomes have been sequenced
and the information of the genomic structure has mainly derived from the study of
individual genes. Tubulin, actin, polyketide synthase, rubisco, heat shock proteins,
chlorophyll-binding protein, proliferating cell nuclear antigen, LCF, and luciferin-binding
protein (LBP), are among those that have been investigated [<a href="#_ENREF_31">
31
</a>]. Interestingly, we have detected all these mentioned proteins within our study,
and particularly relevant is the identification of LBP [see Additional file 3], which
was thought absent from the bioluminescent system in the genus Pyrocystis.</p>
Luciferin-binding protein (LBP)
The bioluminescence system in dinoflagellates is unique, from a cellular and molecular
perspective. The production of light occurs in organelles named scintillons [
67
</a>], which contain the substrate luciferin, the LCF enzyme, and in the most of the cases
a LBP [<a href="#_ENREF_68">
68-71
</a>]. The light emission depends on LCF-catalyzed oxidation of the luciferin, generally
protected from oxidation by a LBP that binds luciferin at physiological pH. However,
this LBP has not received much attention as LCF, maybe because it has not been identified
in all bioluminescent dinoflagellates, and therefore has not been considered to date
as an essential component of this bioluminescence system [<a href="#_ENREF_72">
72
</a>]. <em>Lingulodinium polyedrum</em> (formerly <em>Gonyaulax polyedra</em>), the first organism from which this protein was isolated, cloned and fully sequenced,
is the main model organism for studies of LBP [<a href="#_ENREF_73">
73-75
</a>]. In <em>L. polyedrum</em>, the LBP gene present two different types or isoforms (LBP<em>a</em> and LBP<em>b)</em> which share 86 % sequence identity and encode two protein that are expressed in equal
amounts [<a href="#_ENREF_74">
74
</a>, <a href="#_ENREF_75">
75
</a>]. Additionally, each gene type has the typical dinoflagellate genomic organization
and is presented in several non-identical tandem copies. Furthermore, in the <em>L. polyedrum</em> proteome, LBP was found to be very abundant, up to 1 % of the total proteins [<a href="#_ENREF_75">
75
</a>, <a href="#_ENREF_76">
76
</a>].</p>
The LBP is well known in L. polyedrum [
68
</a>, <a href="#_ENREF_77">
77
</a>], <em>Noctiluca scintillans</em> [<a href="#_ENREF_78">
78
</a>], and it has also been found in the genera <em>Gonyaulax</em>, <em>Ceratocorys</em>, <em>Protoceratium</em>, and <em>Alexandrium</em> [<a href="#_ENREF_79">
79-82
</a>]; therefore, these genes seems to be a fundamental component of the molecular bioluminescence
system in dinoflagellates. Molecular studies have demonstrated a high variation among
gene copies, revealing a very diverse gene family comprising multiple gene types in
some cases [<a href="#_ENREF_72">
72
</a>]. In fact, Pyrocystis LBP have not been reported in protein extracts screened by
a universal antibody for LBP [<a href="#_ENREF_68">
68
</a>, <a href="#_ENREF_70">
70
</a>]. This could not be taken as a conclusive proof of the absence of the LBP protein,
given the high degree of variability reported for this gene. Currently is unknown
how many of the bioluminescent dinoflagellate species utilize LBP in their bioluminescence
system [<a href="#_ENREF_11">
11
</a>]; nevertheless, emerging information shows substantial evidence that LBP is an important
component of dinoflagellate bioluminescence system [<a href="#_ENREF_72">
72
</a>].</p>
It was reported that dinoflagellates LBP (DinoLBP) primers yielded PCR products from
eleven species out of the eighteen tested, demonstrated the present of LBP in all
members of the Gonyaulacales with the exception of the genera Ceratium and Fragilidium. It is important to note that, due the differences of LBP between species, the efficient
amplification of LBP from Ceratocorys horrida, Gonyaulax spinifer, and Alexandrium monilatum only were possible after a reduction in the PCR stringency. Furthermore, the DinoLBP
primers amplified LBP in all the species known to contain it, except N. scintillans, which present a highly divergent LBP sequence [
78
</a>]. Published research have shown the diversity of LBP in several closely related dinoflagellates,
with first reports in <em>P. reticulatum</em>,<em> C. horrida</em>, and <em>G. spinifera</em>, as well as the first partial sequences from <em>Alexandrium affine</em>, <em>A. monilatum</em>, <em>A. tamarense, and A. fundyense </em>[<a href="#_ENREF_72">
72
</a>]. These results also agree with an absence of LBP in <em>Pyrocystis spp.;</em> however, the lack of detection in <em>N. scintillans</em> suggest that the negative PCR result in <em>P. lunula, Protoperidinium crassipes, Ceratium longipes</em>, and <em>Fragilidium cf.</em> <em>subglobosum</em>, are unlikely to be conclusive. Despite many efforts, the differences in LBP between
species has precluded the design of universal primers for LBP, which has previously
done in the case of LCF [<a href="#_ENREF_83">
83
</a>]. LBP sequences have shown to be a very diverse gene family [<a href="#_ENREF_75">
75
</a>, <a href="#_ENREF_76">
76
</a>]. Furthermore, phylogenetic studies on <em>Alexandrium spp.</em> also suggest the presence of more than one LBP type, similar to previous observations
with their LCF sequences [<a href="#_ENREF_83">
83
</a>]. Multiple gene types in genes involved in bioluminescence seem to be common in both
LBP and LCF for several species [<a href="#_ENREF_75">
75
</a>, <a href="#_ENREF_83">
83
</a>, <a href="#_ENREF_84">
84
</a>], creating more divergence between gene copies of LBP in these genomes. </p>
Studies in N. scintillans alsorevealed that LBP is present in diverse forms, and as the result of important evolutionary
events like gene fission or fusion. N. scintillans is unusual as LCF and LBP are found as two domains in one gene, whereas they are
normally separate genes [
72
</a>]. It has been suggested that the origin of the hybrid structure LCF/LBP was either
by fusion of the two genes of photosynthetic species in <em>N. scintillans,</em> or by fission of the <em>N. scintillans</em> gene into those species. The hybrid LCF/LBP gene (2,396 bp) reported in <em>N. scintillans</em> [<a href="#_ENREF_72">
72
</a>, <a href="#_ENREF_78">
78
</a>] consist of part of the N-terminal region and the LCF domain followed by the LBP
domain. A detailed analysis of our results has revealed a similar situation, being
able to detect two different isoforms of LBP, one version that seems to correspond
to the individual gene LBP, very similar (query length 422, query cover 95 %, ident.
75 %) to that reported for <em>A. tamarense </em>(GenBank AFN27006.1) [<a href="#_ENREF_72">
72
</a>] (Fig. 8) [see Additional file 6], and another that would represent the hybrid version
between LCF/LBP, most similar to the sequences reported for <em>L. polyedra </em>(GenBank AAA29164.1 and AAA29163.1) (query length 1,116; query cover 96 %; ident.
55 %)[<a href="#_ENREF_75">
75
</a>] and A<em>. cantenella</em> (GenBank: ABY78836.1) (query length 1,116; query cover 95 %; ident. 54 %)[<a href="#_ENREF_81">
81
</a>] (Fig. 9) [see Additional file 7]. These results were corroborated by PCR, sequencing
and BLASTx analyses [<a href="#_ENREF_85">
85
</a>]. In fact, a similar situation occurs in the case of the LCF gene in <em>P. lunula,</em> where up to three isoforms have been reported: LCF<em>a</em> (GenBank AF394059.1), LCF<em>b</em> (GenBank AF394060.1), and LCF<em>c</em> (GenBank AF394061.1) [<a href="#_ENREF_84">
84
</a>]. The characteristic element of the hybrid LCF/LBP is the presence of the LCF/LBP
N-terminal domain (pfam05295, Luciferase_N), however, this part is not the catalytic
domain of the protein. It has been suggested that this region could mediate an interaction
between LBP and LCF or their relationship with the vacuolar membrane [<a href="#_ENREF_72">
72
</a>]. </p>
Additionally, in N. scintillans, both fragments present an identical N-terminal region of 108 bp. A study of the two
cDNA sequences showed that in the shorter fragment, the N-terminal region led directly
into the LBP domain in the absence of an LCF domain. Therefore, the combined LCF/LBP
and the single separated LBP, which shared the N-terminal region and LBP domain, but
not the LCF domain, corresponded to two different genes. It was also confirmed that
N. scintillans did indeed express both of these genes [
72
</a>]. It has been speculated that the second LBP in <em>N. scintillans</em> could have a role in binding luciferin in the scintillons or it could act to store
it in the cytoplasm [<a href="#_ENREF_78">
78
</a>]. It has been suggested that if the LCF/LBP of <em>N. scintillans </em>is ancestral to the LCF and LBP of photosynthetic species [<a href="#_ENREF_86">
86
</a>], the single LBP of <em>N. scintillans</em> could have been formed by mRNA splicing of LCF/LBP and subsequent retro-transposition
on the genome [<a href="#_ENREF_66">
66
</a>, <a href="#_ENREF_87">
87
</a>]. The N-terminal region in both genes seems to supports this hypothesis. Having separate
LCF and LBP could enable differential regulation of each gene and this can be advantageous
if LBP needs to be stoichiometrically proportional to luciferin [<a href="#_ENREF_77">
77
</a>], but LCF, which has a triple catalytic capacity in photosynthetic species, can be
re-used [<a href="#_ENREF_72">
72
</a>]. </p>
Glutathione-S-transferase (GST) and the elusive luciferin
In our results, oxide-reduction processes are invoked as a key component, as evidenced
by the GO and KO analyzes [see Additional files 3 and 4]. Oxidative stress is able
to induce a transcriptional response. In L. polyedrum, the oxidative stress induced by metal increased the activity of the superoxide dismutase
[
88
</a>], being dependent on the type of metal, its time of exposure and concentration [<a href="#_ENREF_89">
89
</a>, <a href="#_ENREF_90">
90
</a>]. Furthermore, a study with 3,500 genes from <em>P. lunula</em> showed that up to 200 genes increased in abundance after treatment with sodium nitrite
(1 mM) or paraquat (0.5 mM). It is also important to note that antioxidant enzyme
GST (GenBank AAN85429.1) [<a href="#_ENREF_9">
9
</a>], whose presence has been detected in our results [see Additional file 3], also possesses
the pfam05295 domain (Luciferase_N)<em>. </em>In addition to the pfam05295 domain (N-terminal LCF/LBP), a GST-N-Sigma-like domain
is also present, which belongs to the Thioredoxin-like superfamily. These proteins
function as disulfide oxidoreductases (PDOs), altering the redox state of others proteins
through the reversible oxidation of their dithiol active site. The thiol group of
cysteine, on the reduced state, is able to donate a reduction equivalent (H+, +e-)
to other unstable molecules [<a href="#_ENREF_9">
9
</a>], such as reactive oxygen species like luciferin.</p>
In P. lunula, the gene of theantioxidant enzyme GST has been showed an additional 255 bp extension at its end [
9
</a>], encoding an N-terminal sequence previously reported in LCF and LBP from <em>L. polyedrum </em>[<a href="#_ENREF_73">
73
</a>, <a href="#_ENREF_91">
91
</a>]. The similarity between the N-termini of GST, LCF and LBP is around 45 %, compared
with 50 % between <em>L. polyedrum</em> LCF and LBP [<a href="#_ENREF_92">
92
</a>]; exon recombination was suggested as a plausible explanation for this homology [<a href="#_ENREF_93">
93
</a>]. Although the function of this N-terminal region is still unknown, the presence
of this conserved sequence in GST could indicates that its function is not restricted
to bioluminescence and maybe related to others roles, like cellular processing or
localization of these proteins. It is also important to note that in <em>P. lunula</em>, unlike others bioluminescent dinoflagellates, the amounts of luciferin and LCF are
constant throughout the day and night [<a href="#_ENREF_70">
70
</a>]. Therefore, instead of daily <em>de novo</em> synthesis and destruction of all the components of the bioluminescent system, the
rhythm involve changes in their intracellular localization [<a href="#_ENREF_8">
8
</a>, <a href="#_ENREF_84">
84
</a>], and it is likely that the GST protein may be involved in this process, however
further studies are necessary in order to test this hypothesis.</p>
The luciferin from P. lunula is a tetra pyrrole-type molecule, resembling to chlorophyll a and krill luciferin
[
94
</a>]. This luciferin presents photo-oxidation and it is extremely labile to 0<sub>2</sub>, at high salt concentration, and low pH [<a href="#_ENREF_95">
95
</a>]. <em>P. lunula</em> contains luciferin in larger amounts than other dinoflagellate species, even 100
times more than <em>L. polyedrum</em> [<a href="#_ENREF_94">
94
</a>], and this luciferin it seems to be universal because LCF from any dinoflagellate
can use it as a substrate to produce light [<a href="#_ENREF_96">
96
</a>]. Surprisingly, this luciferin could even cross-react with the bioluminescent system
of the krill (<em>Euphasia superba</em>) [<a href="#_ENREF_68">
68
</a>, <a href="#_ENREF_97">
97
</a>]. It has been hypothesized that <em>P. lunula</em> luciferin is a photo-oxidation breakdown product of chlorophyll a [<a href="#_ENREF_98">
98
</a>]. According to this paradigm, Liu and Hastings (2007) [<a href="#_ENREF_78">
78
</a>] hypothesized that heterotrophic dinoflagellates acquire luciferin from ingested
prey, either directly or from the degradation of chlorophyll provide by the prey.
However, the heterotrophic dinoflagellate <em>P. crassipes</em> can maintain its bioluminescence up to 1 year without the ingestion of chlorophyll
or luciferin containing food [<a href="#_ENREF_99">
99
</a>], and therefore must contain a luciferin originating from a different precursor molecule.
The presence of LBP in several photosynthetic species suggests that even within these
species, there might be an alternative luciferin molecule that requires LBP for stabilization
[<a href="#_ENREF_74">
74
</a>]. In any case, the hypothesis that the origin of luciferin is photo-oxidized chlorophyll
a [<a href="#_ENREF_98">
98
</a>] would only be plausible for <em>P. lunula</em>, which maintains its luciferin throughout the light-dark cycle. <em>L. polyedrum</em> only contains luciferin at night when photo-oxidation is not possible [<a href="#_ENREF_71">
71
</a>], so its synthesis cannot be explained by the photo-oxidation mechanism. Therefore,
it is likely that more than one mechanism is responsible of luciferin production even
in closed related species [<a href="#_ENREF_11">
11
</a>].</p>
In fact, previous reports has confirmed the intracellular production of luciferin
in P. lunula [
100
</a>]. In this regard, Fresneau and Arrio (1988) [<a href="#_ENREF_101">
101
</a>], argued that bioluminescence of <em>P. lunula</em> is controlled by the reduction state of the luciferin precursor. It was reported
that that luciferin and its precursor P630, so called by his excitation wavelength
peak (630 nm), possess the same peptide moiety. P630 is a chromopeptide more stable
than luciferin at low temperature in methanolic solutions, and it is composed by a
polypeptide (4.8 kDa) and a linear tetrapyrrole such as luciferin (600 Da). Cations
could oxidize P630 or break the bond between the peptidic chain and the extended tetrapyrrole.
Reduction of P630 is performed enzymatically by a NAD(P)H-dependent oxidoreductase
or chemically by 2-mercaptoethanol or dithiothreitol. Reduced P630 has the same spectral
characteristics as the purified luciferin, as LCF can oxidize this reduced molecule
with a light emission at 480 nm. It is very important to note that purified luciferin
spontaneously turned partly into P630, in methanol at -20 °C. All these observations
suggest that the level of interconversion between P630 and luciferin could be the
oxide-reduction equilibrium in this system [<a href="#_ENREF_101">
101
</a>]. These observations suggest that reduced P630 is a luciferin molecule, and the oxidized
form seems, in these conditions, to be the precursor of luciferin [<a href="#_ENREF_102">
102
</a>]. According to Fresneau and Arrio (1988) [<a href="#_ENREF_101">
101
</a>], the bioluminescence is a complex mechanism controlled by at least two successive
reactions. The first is the reduction of the luciferin precursor P630 by a NAD(P)H-dependent
reductase [<a href="#_ENREF_102">
102
</a>], and the second is the LCF-luciferin reaction which generates the light emission.
Since this molecule is reversibly reduced, it may be considered as an interchange
point of reducing power involving a currently unknown electron transfer pathway including
NAD(P)H. P630 seems to be a fundamental component involved in complex light-modulated
reactions which are widespread in plants [<a href="#_ENREF_103">
103
</a>]. </p>
Fresneau and Arrio (1988) [
101
</a>] also suggest that a recycling process of the luciferin would lead to the impossibility
of reaching a level of zero light intensity with homogenates or with mixtures of the
purified enzymes and substrates. In such a cyclic process, we would expect a steady
state in the light emission, as in firefly extracts where ADP is converted into ATP
by a kinase [<a href="#_ENREF_104">
104
</a>]. According with these authors [<a href="#_ENREF_101">
101
</a>], the bioluminescence could be considered as a metabolic process related to the regulation
of excess intracellular reducing power produced through the photosynthesis and respiration.
This assumption is consistent with the evidence of a specific chlororespiration reported
by Bennoun (1982) [<a href="#_ENREF_105">
105
</a>] for higher plant chloroplasts and other observations in cyanobacteria [<a href="#_ENREF_106">
106
</a>, <a href="#_ENREF_107">
107
</a>] of an alternative pathway of respiration in photosynthetic thylakoid membranes [<a href="#_ENREF_108">
108
</a>].</p>
Furthermore, studies focused on the NADPH-dependent detoxification of reactive carbonyls,
indicates that eukaryotic cells use different systems to deal with the harmful effects
of this reactive molecules. The GSTs pathway, which is fueled by glutathione (GSH)
or thioredoxin redox cycle, conjugates aldehydes with GSH, supplying an important
mechanism of detoxification. In animals, the main detoxification pathway is GSH-dependent
[
109
</a>], and a GSH-dependent detoxification system is functional in plants as well. The
GST proteins constitute a large group; some of these are localized in the chloroplasts
and mitochondria. Because chloroplasts contain a milimolar order of GSH, the GSH-dependent
detoxification system is seems to be very effective [<a href="#_ENREF_110">
110
</a>].</p>
GSTs comprise a large family of eukaryotic and prokaryotic phase II metabolic isozymes
that catalyze the conjugation of the reduced form of GSH to xenobiotic substrates
for detoxification [
111
</a>, <a href="#_ENREF_112">
112
</a>]. Could GST be the enzyme NAD(P)H-dependent reductase that controls the state of
reduction of P630?. The enzyme GST is also associated with cytochrome P450 [see Additional
file 3]. The presence of this GST with a characteristic domain pfam05295, which is
part of the cytochrome P450 metabolic cycle, could indicate that the GST enzyme is
involved in the synthesis process and/or storage of luciferin through an electron
transfer system unknown until now. Also striking is the absence of cox1 (cytochrome
c oxidase subunit I) and cox3 (cytochrome c oxidase subunit III) in our results(see section above mitochondrial and plastid genome), which could also be associated
with this unknown electron transfer pathway. These are questions that open new lines
of investigation in relation to the function of GST and the process of synthesis of
luciferin in these organisms. The domain pfam05295 (Luciferase_N) is the common thread
between LCF-LBP-GST in <em>P. lunula</em>, since all of these proteins contain it.</p>
The chromosomes of dinoflagellates, having a liquid crystalline structure [
113
</a>] with bivalent cations acting as the stabilization of the matrix [<a href="#_ENREF_114">
114
</a>], are condensed permanently at all stages of the cell cycle. It was suggested that
this DNA could play a role in the organization of the chromosome, maybe by a relation
with a matrix protein [<a href="#_ENREF_28">
28
</a>, <a href="#_ENREF_115">
115
</a>]. A large proportion of this transcriptionally inactive DNA seems to be repeated
sequences, and it is possible that this could contribute to the organization of the
genome [<a href="#_ENREF_82">
82
</a>]. The matrix of the nucleus is a network of fibers that also plays an important role
in the organization of the chromatin. Furthermore, studies in <em>Amphidinium carterae</em> showed the presence of two matrix proteins, topoisomerase II and lamins, similar
to what is found in higher eukaryotes [<a href="#_ENREF_116">
116
</a>]. In our case, it was possible to detect the presence of topoisomerase II and III
(3-alpha and 3-beta-1, respectively) [see Additional file 3]. Therefore, although
this chromatin is organized differently than in other eukaryotes, its nuclear matrix
is conserved [<a href="#_ENREF_117">
117
</a>].</p>
A sequence-specific DNA binding protein is the dinoflagellate nuclear associated protein
(Dinap1) [
118
</a>].Besides Dinap1, homologue of the Tubulin-like protein [<a href="#_ENREF_119">
119
</a>], a group transcription factors (membrane-tethered), related to signaling pathways
[<a href="#_ENREF_120">
120
</a>], were found in <em>Alexandrium</em>. In our case, it was not possible to detect the presence of Dinap1, Dip1, DapB, DapC
or DapG. However, the presence of the DapA (DapA-like;Aldolase-type TIM barrel) and
tubulin-like proteins (Tubulin alpha chain-like) are noteworthy [see Additional file
3]. A group of histone-like proteins (HLPs) [<a href="#_ENREF_121">
121
</a>] were first found in <em>C. cohnii</em> [<a href="#_ENREF_122">
122
</a>, <a href="#_ENREF_123">
123
</a>]. Nowadays, the presence of these proteins has been reported in other dinoflagellates
[<a href="#_ENREF_124">
124
</a>], including our case, where HLPs were detected [see Additional file 3]. In the past,
it was thought that dinoflagellates lacked histone proteins [<a href="#_ENREF_125">
125
</a>] and instead used HLPs for DNA organization [<a href="#_ENREF_126">
126
</a>]; nevertheless, recent studies have showed the presence of four core nucleosomal
histones, H2A, H2B, H3, and H4 [<a href="#_ENREF_127">
127
</a>, <a href="#_ENREF_128">
128
</a>]. It is very important to note that three of the four histones that make up the core
nucleosome in dinoflagellates have been found in our results (H2A, H3 and H4), and
also two other histone proteins: Histone-lysine N-methyltransferase setd3, and Histone
acetyltransferase [see Additional file 3]. Other basic protein identified in our results was DVNP (dinoflagellate/viral nucleoprotein)
[see Additional file 3]. DVNP is found only in dinoflagellates [<a href="#_ENREF_117">
117
</a>]. This protein can bind DNA and also be post-translationally modified. In some dinoflagellates,
DVNP seems to have displaced major histone functions [<a href="#_ENREF_129">
129
</a>]. It is evident that dinoflagellates show ancient conditions in the organization of chromatin,
from which higher eukaryotes evolved the highly conserved nucleosome based organization.</p>
Transcriptional regulation
Transcriptomic and genomic studies in dinoflagellates have reveled there is a paucity
of transcription factors (sequence-specific), related with a constant transcription
of most genes with some of them under transcriptional control [
29
</a>, <a href="#_ENREF_30">
30
</a>, <a href="#_ENREF_130">
130
</a>]. In our results [see Additional file 3], we highlight the presence of putative mediator
of RNA polymerase II transcription subunit 37c, DNA-directed RNA polymerase subunit
beta, DNA-directed RNA polymerase, transcription factor TFIID complex (transcription
initiation from RNA polymerase II promoter), RNA polymerase Rpb1, and DNA -directed
RNA polymerase 2A. In addition to RNAP II, the functional eukaryotic transcriptional
system requires other basal transcriptional factors (TF) [<a href="#_ENREF_131">
131
</a>, <a href="#_ENREF_132">
132
</a>]. TFIID performed the first step of promoter recognition, followed by the TATA binding
protein (TBP) and TBP-associated factors (TAFs) [<a href="#_ENREF_133">
133
</a>, <a href="#_ENREF_134">
134
</a>]. TBP binding is considered to be the rate-limiting step in the transcription process
[<a href="#_ENREF_135">
135
</a>]. Other important factors present in our results related to the transcription process
were: transcription factor BTF3, transcription regulators (SARP family), pre-mRNA
processing splicing factors (PRP8, 17, SPF27), and different splicing factors (U2AF,
3B subunit 1, 3B subunit 3-like, arginine / serine-rich 4) [see Additional file 3].
</p>
One of the most important post-transcriptional modifications is the removal of introns
[
136-138
</a>]. Dinoflagellates genes contain very few or lack introns completely [<a href="#_ENREF_73">
73
</a>, <a href="#_ENREF_93">
93
</a>, <a href="#_ENREF_139">
139
</a>]. However, in <em>P. lunula</em>, the LCF<em>c</em> gene showed a 403 bp intron [<a href="#_ENREF_84">
84
</a>]. Additionally, studies on 17 dinoflagellates species, reported introns in only three:
<em>Peridinium willei</em>, <em>Polarella glacialis</em> and <em>Thecadiniium yashimaense</em> [<a href="#_ENREF_117">
117
</a>, <a href="#_ENREF_140">
140
</a>]. Nevertheless, it is evident that spliced leader (SL) trans-splicing is ubiquitous
in dinoflagellates [<a href="#_ENREF_141">
141-143
</a>]. In this process, a unique and a highly conserved spliced leader sequence is transplanted
to the 5′ end of mRNA molecules. The RNA that donates the spliced leader was found
to be mostly < 50 bp [<a href="#_ENREF_142">
142
</a>, <a href="#_ENREF_144">
144
</a>, <a href="#_ENREF_145">
145
</a>]. SL trans-splicing acts to transform polycistronic to monocistronic mRNA [<a href="#_ENREF_117">
117
</a>, <a href="#_ENREF_141">
141
</a>].</p>
The length of the SL exon is different between species [
146
</a>, <a href="#_ENREF_147">
147
</a>]. In dinoflagellates, is a 22 nt sequence 5′-DCCGUAGCCAUUUUGGCUCAAG-3′ (D = U, A,
or G) [<a href="#_ENREF_141">
141
</a>]. The dinoflagellate SL (DinoSL) sequence contains a Sm binding motif (AUUUUGG) in
the exon, unlike all other SL RNAs where this conserved sequence is found in the intron
[<a href="#_ENREF_141">
141
</a>]. Although SL trans-splicing is not present in organelle-encoded transcripts, recently
a unique type of trans-splicing was found in the mitochondria of some dinoflagellates
[<a href="#_ENREF_148">
148
</a>]. SL trans-splicing is evolutionarily ancient in dinoflagellates, and has nuclear-encoded
transcripts with 5 different SL sequences, 3 of the SL are 22 nt long and similar
to the core dinoflagellates (SL1) while the other two are truncated 21 nt SL with
either A or G as the starting nucleotide (SL2) [<a href="#_ENREF_117">
117
</a>, <a href="#_ENREF_149">
149
</a>]. A careful analysis of our transcriptomic data [see Additional file 2] reveals the existence of up to thirteen transcripts that include the DinoSL sequence
(Table 1), and from this information we could infer the detection of an equal number
of introns in those transcripts. However, from the proteomic information we have identified
that of the thirteen transcripts that showed the DinoSL sequence, only two can be
identified at the proteome level, and the functional description assigned for both
is hypothetical protein AN481_18150. This identification is derived from a metagenomic
analysis performed with freshwater cyanobacteria belonging to the genus <em>Aphanizomenon</em> (GenBank: LJOY01000095.1).</p>
Dinoflagellate introns generally lack the usual splice sites (GU-AG), as found in
the case of the AT-TC intron reported in P. lunula LCFc [
84
</a>], the G(C/A)-AG introns in rubisco of <em>Symbiodinium</em> [<a href="#_ENREF_150">
150
</a>] and the AG-AG intron in sxtG of <em>Alexandrium</em> [<a href="#_ENREF_151">
151
</a>]. Some of these new splice sites, such GC-AG, have been shown to function in other
animal and plant genomes [<a href="#_ENREF_140">
140
</a>]. The paucity of introns, as well as the presence of the SL, has led to the proposal
of a mRNA recycling system whereby mature mRNAs are inserted back into the genome
through a recombination process [<a href="#_ENREF_66">
66
</a>, <a href="#_ENREF_117">
117
</a>].</p>
Table 1: Spliced leader (SL) analysis.
Dinoflagellate nuclear DNA is widely methylated. The genes associated with the regulation
of methylation are unclear but some potential candidates have been reported. The S-adenosylmethionine
synthetase (SAM), which has been detected in our study [see Additional file 3], has
also been recognize in other dinoflagellates in the context of saxitoxin synthesis
which requires methyl transfer [
152
</a>]. In addition, SAM itself is methylated and the inhibition of its methylation produce
the arrest on cell cycle [<a href="#_ENREF_153">
153
</a>].</p>
Other components related to transcriptional regulation found in our data were: i)
ruvB-like 2, RNA-binding protein (RBP), ii) RRM (polyadenylate-binding protein 1,
splicing factor U2AF family SnRNP auxiliary factor large subunit, iii) RRM domain-containing
protein, and putative RRM domain and KH domain -SPAC30D11.14-like KH- protein), iv)
Pumilio (Pumilio homolog 2 isoform X1 and Pumilio RNA-binding repeat), v) S1 (30S
ribosomal protein S1), vi) KH (Tudor and KH domain-containing protein), vii) SAP (Heterogeneous
nuclear ribonucleoprotein U), viii) LSM (LSM domain, eukaryotic/archaea-type), ix)
ATP-dependent RNA helicase (OB NTP-binding), and x) La (Lupus La protein). RNA is
attached to a number of RBP ribonucleoprotein complexes that influence splicing, transport
to the cytoplasm, translation, stability and subcellular localization [
154
</a>]. Recently, has been reported a list of RBP containing different RNA binding domains
on mammalian [<a href="#_ENREF_155">
155
</a>], and when these were used to compared with different dinoflagellate species, four
RBPs were found to have the greatest level of sequence similarity: Pumilio, S1, the
OB NTP fold, and RRM. All these domains have been detected in our results [see Additional
file 3]. In general, the studies have revealed that although dinoflagellates make
use of some transcriptional regulation, post-transcriptional control is the dominant process of the
regulation of gene expression in these organisms [<a href="#_ENREF_30">
30
</a>].</p>
The basic mechanism of protein synthesis can be divided into four stages: i) initiation,
ii) elongation, iii) termination, and iv) recycling. Although a modulation in the
translation can take place on all the stages, the kinetics studies have showed that
the fundamental regulatory step is the initiation [
156
</a>, <a href="#_ENREF_157">
157
</a>]. The initiation alludes to the disposition of ribosomes (translation-competent), where
the initiator tRNA (Met-tRNAi) in the ribosomal P-site is base-paired to the mRNA
start codon. This comprise orthologs to the bacterial/eukaryotic factors IF1/eIF1A
and IF2/eIF5B, which are widely conserved in all the domains of life [<a href="#_ENREF_30">
30
</a>]. In our results, we highlight the presence of eIF3F, translation initiation factor
1A (eIF1A), IF1, IF2, eIF4A, eIF4E, eIF4G, putative translation initiation factor
eIF5A, and translation elongation factors (Tu, Ts, 1-alpha, 1-beta, 2, 3, P, and G) [see Additional file 3].</p>
Additionally, a study of dinoflagellates transcriptomic datasets and environmental
cDNA, revealed the presence of 79 ribosomal proteins that frequently are present in
eukaryotes [
145
</a>], some of which have been reported to be highly represented in dinoflagellates transcriptomes
[<a href="#_ENREF_31">
31
</a>]. Currently, little is known about dinoflagellate ribosomes. Nevertheless, it was
reported that <em>Lingulodinium</em> has ribosomal proteins comparable to those of higher plant and mammalian [<a href="#_ENREF_28">
28
</a>, <a href="#_ENREF_29">
29
</a>, <a href="#_ENREF_130">
130
</a>]. In our case, a plenty of ribosomal proteins were detected [see Additional file 8]. The importance of translational control is emphasize in <em>L. polyedra</em> by the scarce agreement between the quantity of a protein and the level of its transcript
[<a href="#_ENREF_158">
158
</a>]. </p>
Mitochondrial and plastid genome
Dinoflagellates have an extremely reduced mitochondrial genome, which only contain
three proteins: i) cob (cytochrome b), ii) cox1 (cytochrome c oxidase subunit I),
and iii) cox3 (cytochrome c oxidase subunit III), and two fragmented rRNAs. Cox2 has
apparently being relocated to the nuclear genome [
31
</a>]. In the case of the proteome of <em>P. lunula</em>, evidence of the presence of the cob protein was found, but not of cox1 or cox3 [see
Additional file 3]. In dinoflagellates, the mitochondrial genome is also extremely
duplicated and recombined. All the genes are present in multiple copies, generally
interrupted by fragments of other genes [<a href="#_ENREF_159">
159
</a>]. The stop codon TGA requires either its diversion or RNA editing (for example to
TGG) to codons like serine or tryptophan [<a href="#_ENREF_160">
160
</a>]. These genes are transcribed, some of which are polyadenylated [<a href="#_ENREF_161">
161
</a>], and requires trans-splicing to be translatable [<a href="#_ENREF_31">
31
</a>, <a href="#_ENREF_159">
159
</a>].</p>
In dinoflagellate containing peridinin plastids, the genome is separated into plasmid-like
mini-circles, in most of the cases each containing 1-2 genes. The extension of the
mini-circles is different between species. Nevertheless, all the mini-circles share
a conserved core in the non-coding region that could compress transcription initiation
signals [
162
</a>]. The genes encoded in the plastid genome are linked to photosynthesis, including
the core subunits of the photosystem, cytochrome b6f, and ATP synthase complex (atpA,
atpB, petB, petD, psaA, psaB, psbA-E, psbI) as well as four other genes (ycf16, ycf24,
rpl28, and rpl23) [<a href="#_ENREF_31">
31
</a>]. In the case of the proteome of <em>P. lunula</em>, evidence of expression of the proteins ATP synthase complex (petB, psaA, psaB) and
cytochrome b6f was found. It is noteworthy the presence in our results of important
proteins associated with plastids such as: i) ferredoxin, ii) TPT (Phosphotransferase
KptA / Tpt1), iii) light harvesting protein, and iv) photosystem II (D2 subunit) [see
Additional file 3]. The light harvesting protein and the photosystem II subunits seems to be related with
the stabilization and protection of the photosystem, while ferredoxin and TPT play
an important role in the process of exporting the products of photosynthesis from
the plastid [<a href="#_ENREF_31">
31
</a>, <a href="#_ENREF_163">
163
</a>]. In addition, other proteins present in our results, related to plastids, were detected: coproporphyrinogen
oxidase isoform 1, glyceraldehyde 3-phosphate dehydrogenase, fructose 1,6-bisphosphatase,
glutamate 1-semialdehyde 2,1-aminotransferase, transketolase, NAP50, fructose 1,6
-bisphosphatase isoform 2, and uroporphyrinogen decarboxylase isoform 1 [see Additional
file 3]. Studies carried out on non-photosynthetic dinoflagellates (<em>Noctiluca</em>, <em>Oxyrrhis</em>, and <em>Dinophysis</em>) revealed that contain cryptic plastid metabolisms and lack alternative cytosolic
pathways, suggesting that all free-living dinoflagellates are metabolically dependent
on plastids. These finding led to a hypothesis about the dependency on plastid organelles
in eukaryotes that have lost photosynthesis. Furthermore, it is also been proposed
that the evolutionary origin of bioluminescence in non-photosynthetic dinoflagellates
may be linked to plastid tetrapyrrole biosynthesis [<a href="#_ENREF_12">
12
</a>]. However, further studies are necessary to test this hypothesis.</p>
The discovery of the bacterio-rhodopsin, which contains all-trans retinal as a chromophore,
change the notion that biological conversion of light energy to ATP is only possible
through photosynthesis [
164
</a>]. In dinoflagellates, this rhodopsin was first detected in P. lunula, in which was reported to be expressed more actively at early light phase than at
early dark phase, suggesting its implication in circadian photoreception and phase
shifting. This gene has been also reported in other dinoflagellates (<em>O. marina</em>, <em>Polarella antartica</em>, <em>K. veneficum</em>, <em>A. cantenella</em>, <em>P. lunula</em>) as well as in natural dinoflagellate assemblage. These results suggest that rhodopsin
is ubiquitous in dinoflagellates [<a href="#_ENREF_145">
145
</a>, <a href="#_ENREF_165">
165
</a>, <a href="#_ENREF_166">
166
</a>]. In fact, in our transcriptomic data (>TRINITY_DN107399_c0_g1_i6 len=1,117) [see Additional file 2] was also detect the presence of rhodopsin (GenBank: AAO14677.1). Furthermore, this rhodopsin, closely related to xanthorhodopsin,
is unique in the sense that harvests solar energy with carotenoid molecules as the
antenna [<a href="#_ENREF_167">
167
</a>]. Even more, the dinoflagellate rhodopsin sequences contain the characteristic proton
donor site. All these facts lead the proposal that dinoflagellate rhodopsin is a proton-pumping,
instead of a sensory rhodopsin. On mixotrophic species such as K. veneficum, it is remarkable that light stimulate heterotrophic growth [<a href="#_ENREF_168">
168
</a>], where rhodopsin-generated energy may be used to improve ingestion and digestion of
preys [<a href="#_ENREF_145">
145
</a>].</p>
The major contribution of the present work is making available a reference transcriptome
and proteome of P. lunula, that is now accessible for the research community, and a functional description
of the 3,182 proteins identified by proteomics and transcriptomic, including 175 novel
proteins, which have already been deposited in the ProteomeXchange and NCBI SRA databases,
respectively. In addition to this, a series of important factors related to the regulation
of gene expression were identified. This annotated transcriptome and proteome should
help to accelerate functional genomics in this species and perhaps others commercially
and environmentally important microalgae, while full genome sequencing projects are
in progress. The transcriptomic and proteomic data should not be seen as a substitute
for the sequencing of the genome. These approaches have different strengths and weaknesses
and should be considered as complementary and not mutually exclusive. However, until
more genomes are available, the omic approaches could provide a valuable reference point. The presence of the LBP protein
stands out in our results, which previously had been considered absent in the bioluminescent
system of the Pyrocystis genus. Furthermore, the presence of two isoforms of this gene in P. lunula, similar to the reports in other dinoflagellates species, highlights the importance
of this protein in the bioluminescence system of P. lunula. It is also important to highlight the presence of the GST protein, which could be
involved in the process of synthesis and storage (controlling the oxidative degradation)
of luciferin in P. lunula.