PEPC gene family was ubiquitous and conserved in plants
PEPC is a highly regulated enzyme that catalyzes the irreversible β-carboxylation of phosphoenolpyruvate in the presence of bicarbonate and Mg2+ to yield oxaloacetate and Pi, a reaction that serves a variety of physiological functions in plants [12, 27, 28]. As the key carboxylase, PEPC genes were widely distributed in green plants (Table 1). In present study, we identified 264 homologous genes using Arabidopsis PEPC genes (At1g53310, At1g68750, At2g42600 and At3g14940) as the BLAST queries and 179 homologous genes by searching with the Pfam seed of PEPCase domain (PF00311) based on 17 genomic data across all of the green plants, respectively. After combination of two gene sets, we obtained 179 common PEPC homologous genes and then searched conserved domains based on the conserved domain database, 109 genes contained conserved domains, in which plant-type PEPC (PTPC) gene domain (PEPCase) was specifically identified in 90 genes and bacterial-type PEPC (BTPC) gene domain (PRK00009) was found in 19 genes, and 70 genes were contained other PEPCase superfamily domains (Table 1, S1).
Table 1
The information of samples used in this study.
ID
|
Species
|
Class
|
Type
|
PTPC
|
BTPC
|
PEPCase superfamily
|
Data resources
|
Smu
|
Spirogloea muscicola
|
Algae
|
C3-like
|
3
|
2
|
4
|
Figshare [29]
|
Mpo
|
Marchantia polymorpha
|
Bryophytes
|
C3-like
|
1
|
1
|
1
|
EnsemblPlants [30]
|
Ppa
|
Physcomitrella patens
|
Bryophytes
|
C3-like
|
22
|
0
|
10
|
EnsemblPlants [31]
|
Aan
|
Anthoceros angustus
|
Bryophytes
|
C3-like
|
1
|
1
|
2
|
Dryad [32]
|
Smo
|
Selaginella moellendorffii
|
Lycopods
|
C3
|
4
|
4
|
0
|
EnsemblPlants [33]
|
Ien
|
Isoetes sinensis
|
Lycopods
|
CAM
|
2
|
5
|
0
|
Yan et al. unpublished data
|
Afi
|
Azolla filiculoides
|
Ferns
|
C3
|
2
|
0
|
3
|
Fernbase [34]
|
Pbi
|
Platycerium bifurcatum*
|
Ferns
|
CAM
|
2
|
0
|
2
|
GigaDB [35]
|
Pab
|
Picea abies
|
Gymnosperms
|
C3
|
0
|
0
|
10
|
Congenie [36]
|
Gmo
|
Gnetum montanum
|
Gymnosperms
|
C3
|
1
|
1
|
0
|
Dryad [37]
|
Atr
|
Amborella trichopoda
|
Angiosperms
|
C3
|
1
|
1
|
0
|
EnsemblPlants [38]
|
Aco
|
Ananas comosus
|
Angiosperms
|
CAM
|
2
|
0
|
1
|
EnsemblPlants [39]
|
Osa
|
Oryza sativa
|
Angiosperms
|
C3
|
9
|
1
|
4
|
EnsemblPlants [40]
|
Zma
|
Zea mays
|
Angiosperms
|
C4
|
22
|
1
|
33
|
EnsemblPlants [41]
|
Ath
|
Arabidopsis thaliana
|
Angiosperms
|
C3
|
8
|
1
|
0
|
EnsemblPlants [42]
|
Ahy
|
Amaranthus hypochondriacus
|
Angiosperms
|
C4
|
3
|
0
|
0
|
Phytozome [43]
|
Kfe
|
Kalanchoe fedtschenkoi
|
Angiosperms
|
CAM
|
8
|
1
|
0
|
Phytozome [9]
|
Note: PTPC: plant-type PEPC; BTPC: bacterial-type PEPC.
|
The copy number of PEPC genes was various in different clade of green plants. BTPC genes were distributed in 11 of 17 species in this study and retained relatively lower copies than PTPC genes (Table 1). However, due to missing the specific conserved domain of PEPCase, we have not detected PEPC gene in Norway spruce (Picea abies), perhaps owing to numerous pseudogenization and insertion of transposable element in conifers [36] and 10 homologs of PEPC in Norway spruce contained the PEPCase superfamily domain, which probably acted as the PEPC-like physiological functions. Thanks to arising numerous whole-genome duplications [44], PEPC gene family had relatively more copies in angiosperms, especially in maize which contained 22 PEPC gene copies. Interestingly, mosses (Physcomitrella patens) also retained 22 gene copies but its sister clades, hornworts (Anthoceros angustus) and liverworts (Marchantia polymorpha), only had one remanent of PEPC gene family (Table 1). This extreme difference of gene content corresponded to their different adaptation strategies of plant terrestrialization, mosses has occurred whole genome duplication events (WGDs) to increase gene family complexity for coordinating multicellular growth and dehydration response [31], however, liverworts have ancient dimorphic sex chromosomes resulted in a lack of WGDs and reduced proliferation of regulatory genes [30], and the genome of A. angustus was interestingly simplified and obtained stress-response and metabolic pathways genes through horizontal gene transfer from bacteria or fungi, which probably assisted their survival in a terrestrial environment [32].
PEPC genes displayed highly conserved amino acid sequence in all green plants. Here, we predicted ten conserved motifs without overly similar pairs from 109 PEPC proteins and found that the length of all motifs was more than 29 amino acids and each motif was covered more than 104 of 109 PEPC protein sequences (Table 2). Additionally, most sites of amino acid were exactly the same in all motifs and the linear order of these motifs, especially in PTPC genes, was also identical across all of green plants, although some motifs repeated in various genes (Fig. 1). These results clearly indicated that PEPC genes family was extremely conserved throughout its evolutionary history of more than 500 Mya since origin from algae [14, 20, 28].
Table 2
Conserved motifs of PEPC gene family in green plants.
Motifs
|
Width
|
Sites
|
LLR*
|
E-value
|
Consensus sequence
|
1
|
29
|
109
|
8856
|
4.7e-3069
|
RKPSGGIESLRAIPWIFAWTQTRFHLPVW
|
2
|
50
|
109
|
14268
|
9.6e-5072
|
[VI]KLTMFHGRGG[TS]VGRGGGPTHLAILSQPPDT[IV][HN]GSLRVT[VI]QGEVIEQSFG
|
3
|
44
|
108
|
12398
|
9.6e-4364
|
SSWMGGDRDGNPRVTPEVTRDVCLLAR[ML]MAANLYFSQIEDLMFE
|
4
|
50
|
104
|
13093
|
2.6e-4568
|
IQAA[FW]RTDEIRRT[PQ]PTPQDEMRAG[ML]SYFHETIW[KN]G[VL]PKFLRRVDTALKNI
|
5
|
37
|
105
|
10252
|
2.1e-3574
|
FSI[DE]WY[RL]NRINGKQEVMIGYSDSGKDAGR[LF]SAAWQ[LM][YF]
|
6
|
41
|
109
|
10936
|
4.4e-3771
|
DVL[DG]TF[HRK]V[IL]AELP[SA]D[SC]FGAY[IV]ISMATAPSDVLAVELLQREC
|
7
|
41
|
109
|
10861
|
2.7e-3739
|
DF[LM]RQVS[TC]FGLSLV[KR]LDIRQES[DE]RHT[DE]V[LM]DAIT[TN][HY]LGIGSY
|
8
|
50
|
109
|
12887
|
1.6e-4467
|
WRAL[ML]DE[MI]AVV[AS]T[KE]EYRS[IV]VF[QK][EN]PRFVEYFRLATPELEYGR[ML]NIGSRPSK
|
9
|
41
|
108
|
10512
|
6.4e-3599
|
[TM]L[QR][EA]MYN[EQ]WPFFRVTIDL[VI]EMVFAKGDP[GR]IAALYDKLLVS[ED]
|
10
|
29
|
108
|
7470
|
7.5e-2482
|
EEHLCFRTLQR[FY]TAATLEHGMHPPI[SA]PKP
|
*LLR: log likelihood rate.
|
However, so conserved gene family performed hyper-diverse housekeeping functions including photosynthesis and non-photosynthesis for survival in the terrestrial environments [19]. To understand the function and molecular mechanisms of PEPC gene family, the three-dimensional (3D) structure of PEPC proteins were elucidated by X-ray crystallographic analysis [45–48], which discovered many structure-function relationships of PEPC catalysis, allosteric control and regulatory phosphorylation [49]. In this study, we modeled 3D structures of 90 PTPC proteins using SWISS-MODEL server (Figure S1) and predicted three templates of protein structures, 5vyj.1.A (Fig. 2A) [48], 3zgb.1.A (Fig. 2B) [46] and 5fdn.1.A (Fig. 2C) [47], with higher seq identify (Table S2). All PTPC proteins were tetrameric enzyme with three kinds of 3D structures, in which, the template of 3zgb.1.A was widely distributed in all species of this study, but 5vyj.1.A template was distributed in seven species and 5fdn.1.A template was only distributed in Arabidopsis (Fig. 3, Table S2). In the widespread template 3zgb.1.A, compared to C3-PEPC, C4-PEPC isoforms carried two amino acid substitution to increase PEP saturation kinetics and reduce inhibitor affinity, respectively [46, 50]. Therefore, the efficiency of photosynthetic carbon fixation was greatly improved in C4 plants.
PEPC was convergent in C4 but not in CAM photosynthesis
The C4 and CAM photosynthetic pathways evolved independently more than 60 times from different phylogenetic lineages in angiosperms and vascular plants, respectively [4, 6, 8]. However, the molecular mechanisms underlying the convergent evolution of these two carbon concentration mechanisms were poorly understood. Recently, comparative genomic analyses indicated that specific amino acid substitutions at few key sites could lead to highly predictable convergent evolution events [9, 51, 52]. In C4 and CAM plants, PEPC enzyme catalyzed the primary carbon fixation to improve CO2 concentration at the activate site of the ribulose 1,5-bisphosphate carboxylase oxygenase (RuBisCO), which increased water-use efficiency and utilization of nitrogen and other mineral nutrients [3]. Interestingly, PEPC 1 is very important for regulating the core circadian clock in CAM photosynthetic pathway of Kalanchoe laxiflora [23], and PEPC 2 of Kalanchoe fedtschenkoi shared convergent amino acid substitution with diverse CAM species [9]. Moreover, PEPC genes did also undergo parallel adaptive genetic changes in C4 grasses [10, 24]. Therefore, PEPC genes probably played crucial roles in the convergent evolution of C4 and CAM photosynthesis [12–14, 19, 25, 28, 49].
Here, we reconstructed the phylogenetic tree of PEPC gene family in green plants with relatively adequate sampling involved the species of C3, C4 and CAM photosynthetic pathways across the major lineages of green plants (Fig. 1). PEPC gene family consisted of two major subfamilies, PTPC and BTPC, the former of which performed the critical roles for initial carbon fixation in C4 and CAM photosynthesis [12, 13, 25, 27]. Therefore, the evolutionary history of PTPC genes with maximum likelihood was reconstructed and reconciled with species tree based on duplication-loss reconciliation (Fig. 3). The reconciliation tree showed PTPC genes underwent, at least, 71 duplications and 16 losses in the evolutionary history of sampled species in present study, which indicated PTPC genes occurred multiple times large-scale duplication events, maybe caused by whole-genome duplication in the evolutionary process of green plants, especially in angiosperms (Fig. 3) [53]. Gene duplication was critical for plant to response new environments through neo-functionalization of gene copies [54, 55]. Previous researches assumed PEPC isoforms in C4 and CAM species were originated from a non-photosynthetic PEPC gene that already exists C3 ancestral species [14, 22], our results however indicated that PEPC gene duplications corelated with presence of C4 but not with CAM photosynthesis (Fig. 3), which was also found in orchids [56]. In other words, PEPC gene duplications were very important for the evolutionary origination of C4 photosynthesis, but not for CAM pathways, in which posttranslational regulation of PEPC possibly played key roles [19, 27, 57].
To test whether convergent molecular evolution at amino acid level existed in PTPC proteins, we performed the comprehensive detection of convergent evolution sites in C4 and CAM photosynthesis based on the PTPC gene tree of green plants with PCOC pipeline, which could detect not only convergent substitutions to exact same amino acid but also convergent shifts correspond to convergent phenotypic changes [58]. Respectively, we detected convergent sites in all phenotypic convergent clades of CAM and C4 photosynthesis. The results showed two convergent shifts existed in PTPC proteins of CAM species and three convergent shifts existed in C4 species, but identical convergent substitution was not detected in clades of both photosynthetic pathways (Fig. 4), which indicated convergent molecular evolution at amino acid level did not exist in all copies of PTPC proteins.
In addition to photosynthetic functions, PEPC genes also performed hyper-diverse non-photosynthetic functions, such as abiotic stress, fruit maturation, seed formation and germination, and so on [12, 15, 17–19, 28, 59]. In other words, different isoforms of PEPC gene family might perform different functions, and only a few of PEPC isoforms was corresponded to convergent evolution of C4 and CAM photosynthesis. Therefore, we further detected convergent evolution sites of different gene groups, in which each convergent phenotypic species retained one clade or one gene copy (one-to-one). Interestingly, four convergent amino acid sites in one gene group (Ahy03/Zma11) were discovered in C4 species, two of which were also confirmed in previous studies [10, 24, 46, 50]. The convergent amino acid mutations in the active site Ala774 (position 898 in Fig. 5) and the inhibitory site Arg884 (position 1010 in Fig. 5) were sufficient to switch the photosynthetic function from C3 to C4 activity [46]. However, when we detected convergent evolution sites of the one-to-one gene groups in CAM species, none of identical convergent sites were found, which indicated PEPC gene did not exist identical convergent sites resulted in photosynthetic conversion from C3 to CAM pathway.