Novel protein family found in Y. lipolytica secretome – basic analysis of amino acid sequences
Recently, high-throughput proteomics of Y. lipolytica W29 total secretome of strains producing heterologous proteins in industrial fermentation conditions (10 liters fermenter in feed batch mode) have been determined (Onésime et al, to be published). Secretome data mining with X!TandemPipeline allowed for identification of three proteins of unknown function, encoded by YALI0C05687g, YALI0D03245g and YALI0F04620g, with a coverage of 7.08 (7.66)%, 11.66 (12.62)%, and 19.03 (20.57)% (for the full length and the mature form), with an E-value of 50.199, 62.872 and 23.919, respectively. Blast analysis against Y. lipolytica genome from GRYC database, showed that the proteins belong to a multi-gene family of four members (plus YALI0F04598g) of unknown function (Table 1). Sequence-based predictions indicated that the polypeptide chains are built by 223 to 226 amino acids (complete form) / 206 to 209 amino acids (mature forms), show a molecular weight from 21678 KDa to 22725 KDa and an isoelectric point (PI) ranging from 5.1 to 8.1. Systematic gene names from Y. lipolytica strains E150 and W29 genomes and abbreviated names, used hereafter for convenience, are given in Table 1.
Table 1
Nomenclature and basic biochemical characteristics of the newly identified proteins. Systematic gene names from the E150 and W29 wild-type strain genomes. Number of amino acid in the polypeptide chain, predicted molecular weight and predicted isoelectric point (PI) of the mature form. Molecular weight prediction was based on the mature form (monoisotopic mass from Expasy).
E150 strain genome reference
|
W29 strain genome reference
|
working
name
|
Proposed new name
|
AA number*
|
Molecular Weight*
|
Predicted PI*
|
YALI0D03245g
|
YALI1_D04128g
|
UP1
|
eFbp1
|
223 (206)
|
21717.43
|
5.40
|
YALI0F04598g
|
YALI1_F07042g
|
UP2
|
eFbp2
|
224 (207)
|
21678.16
|
5.10
|
YALI0C05687g
|
YALI1_C07288g
|
UP3
|
eFbp3
|
226 (209)
|
21950.50
|
6.47
|
YALI0F04620g
|
YALI1_F07093g
|
UP4
|
eFbp4
|
226 (209)
|
22725.72
|
8.09
|
* For the mature form, AA: Amino acid
Comparison of the amino acid sequences showed that the four proteins are highly similar, as they share between 50 and 70 % of sequence identity (Fig. 1). In addition, all the proteins are equipped with a predicted 17 amino acid signal peptide, with a probability of 0.9946 for D03245 (UP1) (MKFSHVTLAVVAATAIA), 0.9991 for F04598g (UP2) (MQFSTLALVTFAATAMA), 0.9926 for C05687g (UP3) (MKFSAVAVAAVASSALA), and of 0.9995 for F04620g (UP4) (MKLSAVTFIALSAVCLA). For each protein, a similar 3D fold composed of six-helices bundle was predicted (Fig. 1 and section 3D structure modelling).
Uniqueness of the UPs due to lack of similarity with other protein sequences
Complete UPs sequences were subsequently blasted against available protein sequence databases. Strikingly, the only similarity was found with homologous sequences from the other Y. lipolytica strains (apart from E150 and W29, they are also present in the German H222 and Polish A101 strains; data not shown). Since the screening of the protein database using the complete polypeptide sequences as queries was unsuccessful to fish out any significant hits beyond Y. lipolytica homologs, the conserved stretches of amino acid sequences found in the multiple alignment were used. Seven conserved motifs, numbered from 1 to 7, were identified within each of the UP sequence AAP[TS], APV[FY][TS]LAPxxFA, GFLDFSGY, GT[KR]FD[KQ]AVY[EA]F[IL][VI]NSGx[KS]DFL, [IF]LxSPLL, W[IL]FGxKQTVQ, [TS]GF[DN]RA, and served as queries to search against NCBI and UniProt protein sequence libraries. Expectedly, motif 1, localized to the signal peptide (at its C-terminus), was present in the highest number of protein sequences stored in NCBI and UniProt databases (3192524 and 2106782 hits, respectively). Similarly, motifs 5 and 7 were relatively frequently found (18712/13174 and 7511/5086 hits). On the other hand, motifs 2, 4 and 6 were identified in only five (NCBI database) or seven (UniProt) protein sequences within the respective library. Strikingly, irrespective of the queried database, the four UPs were among the identified motif-bearing proteins. Such an outcome highlights that each of these conservative motifs is always accompanied by the other two, and that the motifs are specific and unique for the newly identified protein family.
Cloning and overexpression of the UPs genes.
As no suggestion on biological function of the UPs could be inferred from similarity search of protein databases, overexpression strains were designed and constructed. Each UP gene was amplified with its specific primer pair (additional Table 1) from W29 wild-type genomic DNA, and cloned as BamH1/AvrII fragment into JMP4230 (additional Fig. 1), giving rise to plasmids JMP4440, JMP4442, JMP4444 and JMP4448 for UP1, UP2, UP3 and UP4 overexpression, respectively (Table 2). The individual genes were overexpressed under the control of a strong erythritol ERY-inducible promoter [34].
Table 2
Plasmids used in this study.
Plasmid Name
|
Characteristics
|
Use
|
Reference
|
JMP4230
|
JMP62 pHu8EYK URA3ex
|
overexpression
|
[35]
|
JMP4440
|
UP1; D03245g cloned in JMP4230
|
overexpression
|
This work
|
JMP4442
|
UP2; F04598g cloned in JMP4230
|
overexpression
|
This work
|
JMP4444
|
UP3; C05687g cloned in JMP4230
|
overexpression
|
This work
|
JMP4448
|
UP4; F04620g cloned in JMP4230
|
overexpression
|
This work
|
JMP4472
|
GGA-URA3ex_CRISPRrCas9-yl_RFP
|
gRNA for gene disruption
|
[36]
|
JMP4393
|
GGA-LYS5ex_CRISPRrCas9-yl_RFP
|
gRNA for gene disruption
|
[36]
|
The expression cassettes were liberated from the plasmid by NotI digestion and transformed into strain JMY7126; which is deleted for the three genes encoding the mains secreted lipases (Lip2, Lip7 and Lip8) and for the EYK1 gene for optimal erythritol induction [35]. Such cloning strategy gave rise to strains JMY7283 (overexpressing UP1), JMY7287 (overexpressing UP2), JMY7291 (overexpressing UP3) and JMY7295 (overexpressing UP4) (Table 3).
Table 3
Strains used in this study
Name
|
Genotype
|
Auxotrophy
|
Reference
|
JMY399
|
W29 wild-type French strain
|
no
|
[3]
|
JMY7126
|
MATA ura3-302 leu2-270-LEU2-Zeta, xpr2-322, lip2Δ, lip7Δ, lip8Δ, lys5Δ, eyk1Δ
|
Ura-, Lys-
|
[35]
|
JMY7283
|
JMY7126+jmp4440 D03245g (UP1oe)
|
Lys-
|
This study
|
JMY7287
|
JMY7126+jmp4442 F04598g (UP2 oe)
|
Lys-
|
This study
|
JMY7291
|
JMY7126+jmp4444 C05687g (UP3 oe)
|
Lys-
|
This study
|
JMY7295
|
JMY7126+jmp4446 F04620g (UP4 oe)
|
Lys-
|
This study
|
JMY8651
|
JMY7126+ *GGE114 +YALI0B21582gΔ (fil- ; mhy1Δ)
|
Lys-
|
This study
|
JMY8673
|
JMY7126 +YalI0C05687gΔ (Q1-up3Δ)
|
Ura-, Lys-
|
This study
|
JMY8674
|
JMY7126 +YalI0D03245gΔ (Q1-up1Δ)
|
Ura-, Lys-
|
This study
|
JMY8675
|
JMY7126 +YalI0F04598gΔ (Q1-up2Δ)
|
Ura-, Lys-
|
This study
|
JMY8683
|
JMY7126 +YalI0F04620gΔ (Q1-up4Δ)
|
Ura-, Lys-
|
This study
|
JMY8684
|
JMY8674 + YALI0F04598gΔ (Q2-up1Δ up2Δ)
|
Ura-, Lys-
|
This study
|
JMY8700
|
JMY8684 + YALI0F04620gΔ (Q3-up1Δ up2Δ up4Δ)
|
Ura-, Lys-
|
This study
|
JMY8748
|
JMY 8700 + YALI0C05687gΔ (Q4-up1Δ up2Δ up4Δ up3Δ)
|
Ura-, Lys-
|
This study
|
JMY8761
|
JMY8748 + YALI0B21582gΔ (Q4, fil- ; Q4 mhy1Δ)
|
Ura-, Lys-
|
This study
|
JMY8777
|
JMY8761 + jmp4230 (Q4 URA3)
|
Lys-
|
This study
|
JMY8778
|
JMY8761 + jmp4230 (Q4 URA3)
|
Lys-
|
This study
|
JMY8779
|
JMY8761+jmp4444 C05687g (Q4-mhy1Δ UP3oe)
|
Lys-
|
This study
|
JMY8780
|
JMY8761+jmp4444 C05687g (Q4- mhy1Δ UP3oe)
|
Lys-
|
This study
|
JMY8781
|
JMY8761+jmp4440 D03245g (Q4- mhy1Δ UP1oe)
|
Lys-
|
This study
|
JMY8782
|
JMY8761+jmp4440 D03245g (Q4- mhy1Δ UP1oe)
|
Lys-
|
This study
|
JMY8783
|
JMY8761+jmp4442 F04598g (Q4- mhy1Δ UP2oe)
|
Lys-
|
This study
|
JMY8784
|
JMY8761+jmp4442 F04598g (Q4- mhy1Δ UP2oe)
|
Lys-
|
This study
|
JMY8785
|
JMY8761+ jmp4446 F04620g (Q4- mhy1Δ UP4oe)
|
Lys-
|
This study
|
JMY8786
|
JMY8761+ jmp4446 F04620g (Q4- mhy1Δ UP4oe)
|
Lys-
|
This study
|
oe: overexpression, *GGE114: pSBA-U-Z_NDV_Acceptor vector: pSB1A3-Zeta up Not-Ura Bsa-RFP-Bsa Zeta down Not.
Synthesis and secretion of UP proteins in erythritol inducible strains
The overexpressing strains were subjected to shake flask batch cultivations to study the UPs overproduction pattern by proteomics analysis in non-induced (YNBD2) and induced media (YNBD2E) (Fig. 2). The observed band pattern in SDS-PAGE gels was unexpected, as the most intensive bands appearing in the erythritol-induced cultures, were migrating below the anticipated area (red arrow in Fig. 2).
Thus, three regions were excised from each lane: region 1) around the expected size of UP proteins (about 22 KDa), regions 2), 3) with intensive protein bands (boxed in Fig. 2), and subjected to proteomic analysis. The aim was to determine number of identifying peptides in each of the bands and protein coverage under both non-induced (G) and induced (E) conditions. As shown in Table 4, the numbers of identifying peptides were higher in bands formed in the concentrated supernatants from induced cultures (E), but migrating at < 14 kDa; thus indicating that the UPS proteins did not migrate at the expected size. The sequence coverage by the peptides identified under the inducing condition was high, ranging from 33.6 to 57% of the mature forms. Abundance was variable between UPs as shown by the number of identifying peptides detected under the induced condition: 19 to 11 spectra (Table 4).
Table 4
Proteomic analysis of secreted UP proteins. The number of peptides identifying respective UP protein in the most intensive bands migrating at < 14 kDa. The bands were excised from the SDS-PAGE gel (see Fig. 2) and analyzed by high resolution mass spectrometry. Coded names A2 – concentrated supernatant from UP3 C05687g, B2 - UP1 D03245g, C2 - UP2 F04598g, D2 - UP4 F04620g under non-induced (G) and induced (E) conditions.
UP genes
|
A2G
|
A2E
|
B2G
|
B2E
|
C2G
|
C2E
|
D2G
|
D2E
|
UP1 D03245g
|
5
|
17
|
5
|
5
|
4
|
7
|
4
|
4
|
UP2 F04598g
|
2
|
3
|
2
|
66
|
8
|
3
|
3
|
2
|
UP3 C05687g
|
1
|
2
|
9
|
3
|
1
|
111
|
6
|
2
|
UP4 F04620g
|
2
|
5
|
3
|
4
|
0
|
10
|
19
|
32
|
While the proteomic analysis confirmed the correct synthesis and secretion of the UPs in the over-expressing strains, which was much increased under the inducer provision, it also indicated that the other UPs were also constitutively expressed from their native promoter in these media (identified at lower abundance, based on typically 1-10 identifying peptides). Such an outcome could impair adequate phenotype analysis. Therefore, it was necessary to first construct a quadruple deletant strain (Q4), and then construct Q4 derivatives, overexpressing individually UP proteins in such background.
Overexpression of UPs in a quadruple deletant strain (Q4-mhy1D) and phenotype analysis.
Since accurate phenotype analysis could be impaired due to unintentional co-secretion of the other than targeted UP in the overexpressing strains, a quadruple deletant strain (Q4) was constructed. Cloning strategy comprised successive gene deletion using CRISPR-Cas9 method [36], as illustrated in Figure 3. First, replicative plasmids CRISPR-Cas9-gRNA-UPs-URA3 and CRISPR-Cas9-gRNA-UPs-LYS5 were constructed using the gRNA primer pair designed for the corresponding target sites (additional Table 1). The plasmids were co-transformed in JMY7126 strain, and prototrophic transformants were selected on minimal media YNBD2. After the transformants selection, the corresponding UP locus was amplified, screened for deletion and sequenced. After gene deletion confirmation, the strains were grown in YPD for curing the replicative CRISPR-Cas9 plasmid. Strains bearing the expected deletion were stored (Table 3). The UP1 to UP4 single deletants (Q1) were assigned names JMY8674 (up1Δ), JMY8675 (up2Δ), JMY8673 (up3Δ) and JMY8683 (up4Δ) (Fig. 3). Then, multiple gene deletion was initiated using Q1-up1Δ, by co-transformation with the CRISPR-Cas9-gRNA-UPs plasmids together with a PCR fragment amplified from the corresponding deleted strain resulting in a Q4 strain JMY8748. In addition, since filamentation is known to affect HS phenotype analysis, the MHY1 gene deletion, previously shown to abolish hyphae formation [37], was also introduced in the Q4 deletant using a CRISPR-Cas9-gRNA-MHY1-LYS5 vector. The resulting strain was then transformed with the UPs overexpression cassettes resulting in the overexpressing strains Q4-mhy1Δ-UP1OE, Q4-mhy1Δ-UP2OE, Q4-mhy1Δ-UP3OE and Q4-mhy1Δ-UP4OE. Strain JMY8761 was transformed with an empty vector containing URA3 giving rise to JMY8777 (Q4-mhy1Δ-URA3), was used as control (Fig. 3).
Phenotype studies on HS of different aliphatic chain length
Assuming the involvement of the UPs in hydrophobic substrates utilization, we screened the quadruple deletant strain (Q4) with all the four loci knocked-out and the overexpressing strains for their growth on solid plates, containing HS with different of aliphatic chain length. Strain JMY8777, a derivative of the strain JMY8761 transformed with the empty vector was used as control (Fig. 4).
As depicted in Figure 4, growth inhibition was observed for all strains grown on short chain FAs (mC10 to mC14), except for the strain overexpressing UP3 for which growth was still observed up to the 10-3 dilution. In contrast on FAs of longer chain mC16 and C18:1 (triolein), growth was observed up to the 10-3 dilution. This demonstrates that the deletion of the four UP proteins abolishes growth of the Q4 strain on short chain fatty acids, particularly on mC10; which implies their specific involvement in short chain FAs fixation and internalization. As growth of Q4 was impaired neither on mC16 nor on triolein, these HS must be fixed and internalized by some other mechanism. Furthermore, sole overexpression of UP3 alleviated the growth inhibition particularly on mC10, but also mC12 and mC14. This directly indicates on UP3 implication in the FAs transport of short chain FAs, which was particularly unique for mC10. Both UP2 and UP4 appeared to slightly alleviate growth retardation of the Q4 strains on mC12 and mC14, suggesting their preference towards these FAs. Overexpression of UP1 in Q4 background had minor positive impact on the strain’s growth, mainly observed on mC12 plates, where it grew up to 10-2 dilution, vs 10-1 for Q4. Based on these evidences, it was postulated that the UPs are involved in short/medium chain FAs fixation and internalization. It is suggested that they operate based on FAs chain length dependency (UP3 as the sole acting on mC10; UP2 and UP4 – mainly on mC12 and mC14), but with overlapping specificity (UP1 acts on mC10 to mC14). Interestingly, those discrepancies are consistent with the sequence alignment which ranges UP1, UP3 then UP2 and UP4.
Octanoic acid (C8) toxicity in Q4 and the overexpressing strains
Octanoic acid (C8) is known to be very toxic for Y. lipolytica [29,30]. Presuming the UPs’ involvement in FA transportation (based on drop test data; Fig. 4), we aimed to investigate the effects of Q4 and UPs individual overexpression on C8 toxicity. The Q4 and Q4 derivative strains were grown in minimal media supplemented with different concentration of C8 (0%, to 0.2%; Fig. 5). No main differences could be observed in the absence of C8 (Fig. 5A). The growth was monitored in the absence and in the presence of inducer (erythritol). As shown in Figure 5B and 5C, deletion of the four UP genes (Q4 strain) increases C8 tolerance upon erythritol induction compared to the control strain (JMY8651). Overexpression of UP3 and UP4 increases C8 toxicity at 0.1% (Fig. 5B), while overexpression of all UPs increases C8 toxicity at 0.2% (Fig. 5C). Based on these observations, we postulate that UPs are involved in short chain FAs internalization. Also, indications on the UPs substrate specificity could be inferred from this assay, as UP3 and UP4 both showed higher affinity towards C8.
3D structure modelling
The four sequences of UP1 to UP4 share nearly 40% of amino acid residues identity and hence are clearly homologous. Therefore, they are expected to fold into a similar 3D structure. Similarity of primary structures of UP1 to 4, with respect to known proteins of the protein database, is too low to support consistent homology modelling. We therefore submitted the sequences of UP1-4 to AlphaFold2 computational tool. AlphaFold is a family of structure prediction tools based on deep learning which produces high quality predictions in a blind test of structure prediction (CASP14), also when no clear homologs are known [38]. The 3D structures of the UP1-4, as modelled by AlphaFold, are highly similar to each other (Fig. 6)
Their 3D structures are highly similar from position 70 (50 if numbered from the mature proteins) up to the position 221 at the end of the sequence (UP1 numbering) (Fig. 6A). Consistently, the core of UP1-4 is predicted to fold into a single domain composed of five helices of thirty amino acid residues long in average that assembles into a helix bundle (helix 2 to 6 in Fig. 1). Markedly, helix 4 is locally disordered at the same position in all the models, centered onto the conserved sequence stretch F[IL][VI]NSGx[KS]DFL. This results in a topological kink that could help packing the helix bundle. Helix bundle is expected to form an inner cavity, which is observed here for the four proteins. Interestingly, the N terminal part (20-70 in Fig. 1, or 1-50 in matured protein) forms an extension outside of the main helical bundle. The predicted structure for this N-terminal part is roughly similar (Fig. 6B) for UP1 to 3, with a single alpha-helix, but the position of this N terminal part of the protein relatively to the main helical domain is not accurately predicted, possibly due to structural flexibility at this terminal end. For UP4, the N-terminal part could be also predicted. In the rank 1 prediction, the N-terminus forms a single a helix with an additional b sheet of two strands. However, rank 2 prediction does not predict these b strands stretches but instead a structure very similar to the N-terminal part of UP1 to 3. AlphaFold predictions of the helical domain in UP1 to 4 appears reliable (Fig. 6C), on the following criteria: i) the PLLDT (predicted local difference test) curves is a per residue confidence metric computed by alpha Fold. The score for the rank 1 model of the UP in the 60-80 zone (0-100 scale) for the main helical domain, which suggest that the overal fold of this domain is highly probable, the structure of the N terminal stretch and its relative position with respect to the helical domain is less confidently predicted, ii) the predicted structures for the 4 independent sequences (UP1 to 4) are highly similar as expected for these clearly homologous proteins, iii) the independent predictions of the same sequence consistently envision the helical domains. As an example, the predictions rank 1 to 5 of UP1 are shown in Additional Fig. 2.
Search of known protein with related 3D structure
The coordinates of the predicted UP1 model rank1 was submitted to a systematic comparison of the Full PDB using DALI server [39]. A list of non-redundant structures detected as structurally similar to UP1 is shown in Table 5. The closest structures are the ligand domains 1 and 2 of protein Mp1 from Talaromyces marneffei (previously known as Penicillium marneffei) and the ligand binding domain of a protein from Aspergillus fumigatus, which show a rms deviation of 2,6 and 2,7 Å, respectively.
Table 5
Relevant protein structures similar to UP1 detected by DALI. Abbreviations: Z score as computed by Dali, a score above 2 is considered significant; rmsd: average deviation of equivalent Ca between UP1 model and PDB protein detected as structurally similar; D lali: length of structural alignment; nres: number of residues = length of chain; %id: % identity between the two proteins. The 3 PDB structures indicated in bold are shown in Figure 7.
PDB
|
Z
|
rmsd
|
D lali
|
nres
|
%id
|
Species
|
Protein and Domain
|
Bound Molecules
|
5fb7-B
|
15.4
|
2.7
|
143
|
151
|
10
|
Talaromyces marneffei
|
MP1 ligand binding domain 1
|
Arachidonic acid
(2 molecules)
|
5csd-A
|
15.2
|
3.2
|
147
|
158
|
10
|
Talaromyces marneffei
|
MP1 ligand binding domain 2
|
Arachidonic acid
|
5j5K-A
|
14.8
|
2.6
|
142
|
151
|
11
|
Aspergilus fumigatus
|
AFMP4P ligand binding domain
|
Palmitic acid
|
5csd-D
|
15.1
|
3.3
|
151
|
159
|
10
|
Talaromyces marneffei
|
MP1 ligand binding domain 2
|
Arachidonic acid
(2 molecules)
|
5ecf-B
|
15.3
|
2.8
|
142
|
150
|
9
|
Talaromyces marneffei
|
MP1 ligand binding domain 1
|
Arachidonic acid
|
5e7x-A
|
15.2
|
3.1
|
147
|
155
|
9
|
Talaromyces marneffei
|
MP1 ligand binding domain 1
|
Palmitic acid
|
6zpp-A
|
13.4
|
2.9
|
144
|
157
|
7
|
Drechmania coniospora
|
Virulence factor
|
|
Remarkably, these structures are FA binding domains, and were resolved as bound to arachidonic acid (1 or 2 molecules) or palmitic acid [40]. A similar protein in apo form (i.e without bound FA chain) was previously solved for Drechmania coniospora. With the exception of the additional N-terminal domain extension, which seems to be specific to the Y. lipolytica proteins, all these proteins could be structurally similar as their core domain shares the helix bundle. As such, they are highly suspected to be functionally similar as the topology of the helix bundle could engage a comparable binding of FA (Fig. 7). In the known complexes, the FAs are always bound in an elongated hydrophobic pocket between the helices. Such a binding mode would also be possible in the predicted model for UP1-4.
In the known FA-Complex structures, the orientation of the bound molecule and the position of the Carboxylate group are variable. Indeed, in FAs binding proteins, the polar side chains of Q138 in domain 1 Mp1 from T. marneffei (5E7X), of N105 and S165 in domain 1 Mp1 from T. marneffei (5ECF chain B), and of S136 and S140 in domain 2 Mp1 from T. marneffei (5FB7), make hydrogen bonds with the carboxylate group of either palmitic or arachidonic acids (Table 5).
It is not possible to infer such kept positions for the binding the carboxylate in UPs. Nevertheless, UP1 to 4 should all display a putative binding site within the inner faces of helices. The modeling displays there apolar residues, which are particularly suited to bind alkyl moieties (Fig. 8). Among them, L61, F101, F127, V131, W161 and F182 are strictly conserved in the four proteins (UP1 mature protein numbering). Also, another strictly preserved residue is Y116 could possibly be a H-bonding partner of FA carboxylate. Additional minimization to relax the structures and to perform subsequent docking of FA in UP1-4, with either one or two molecules bound could be envisaged to profile FA binding capacity and specificity. It is beyond the scope of this paper.