Protease activity in the salivary gland and gut of E. onukii
Enzymatic activities of digestive proteases, such as aminopeptidases, cysteine proteases (cathepsin B and L) and serine proteases (trypsin and chymotrypsin) in the SG and gut could provide information about differential expression of the proteases in the digestive related tissues. Homogenates of SGs and guts dissected from E. onukii were mixed with enzyme specific substrates for various proteases respectively. The activities of leucine aminopeptidase, cathepsin B/L, trypsin and chymotrypsin in the SG and gut were examined, and the results are shown in Fig. 1. The overall enzymatic activities of digestive proteases were much higher in the gut than in the SG, except for leucine aminopeptidase activities, which were slightly higher (~ 15%, p < 0.01) in the SG than in the gut. Activities of cathepsin B/L, trypsin and chymotrypsin were significantly higher (p < 0.01) in the gut than in the SG (Fig. 1).
Transcriptomic and proteomic analysis of proteins in the SG and gut of E. onukii
1. RNA sequence assembly and functional annotations
The obtained sequence reads were assembled either by using tissue specific RNA reads or by using the pooled reads derived from RNA of both the SG and gut. Results of sequence assembly are summarized in Additional file 1: Table S1. The generated contigs were annotated by BLASTx against the NCBI (National Center for Biotechnology Information) nr database. Around 42%, 57% and 38% of the contigs assembled from the SG, gut and pooled reads, respectively, hit to proteins (Additional file 1: Table S1). The protein hits for the three sets of contigs were similar, which was reflected in species distribution of the hits (Additional file 2: Fig. S1). The majority of hits (41–43%) were from hemipterans, followed by ~ 23% hit to proteins of Blattaria (Table 1).
Table 1
Distribution of the top hit annotations against the NCBI nr database in different insect orders
Insect order | salivary gland specific assembly | gut specific assembly | combined assembly |
Hemiptera | 43.16% | 40.88% | 40.63% |
Blattaria | 21.88% | 21.35% | 22.15% |
Coleoptera | 4.69% | 4.13% | 4.97% |
Lepidoptera | 5.69% | 3.82% | 3.01% |
Hymenoptera | 2.41% | 2.41% | 2.42% |
Diptera | 0.51% | 0.62% | 0.57% |
Others | 21.66% | 26.78% | 26.26% |
2. Mapping of the assembled genes to the proteomic profiles derived from the SG and gut
To investigate expression of the E. onukii genes at the protein level, we used the assembled transcripts as a database and mapped the peptide sequences resulted from proteomic sequencing of SG and gut proteins to the protein sequences translated from the assembled contigs. The peptide mapping results are summarized in Additional file 1: Table S1. In total, 4,457 unique transcripts in the SG and 3,784 transcripts in the gut were mapped by the peptides derived from proteomic sequencing. Numbers of the mapped proteins from the SG, gut and from both tissues are shown in Fig. 2a. In addition, certain numbers of the transcripts identified from the proteomic profiles were associated with the increase of the FPKM (expected number of Fragments Per Kilobase of transcript sequence per Million base pairs sequenced) value of the transcripts (Fig. 2b), which were similar to the previous observations in the investigation of proteases of stinkbugs [21, 24].
3. Identification of protease genes
We identified 930 unique contigs encoding putative proteases from SG and gut tissues by analyzing the BLAST annotation results. These proteases included aminopeptidase, carboxypeptidase, dipeptidase, aspartic protease, cathepsin B, cathepsin L and serine protease (trypsin, chymotrypsin and elastase). However, only 129 (14%) of the protease proteins were mapped by at least 2 unique peptides derived from proteomic sequencing profiles (Table 2), which included 26 aminopeptidases, 12 carboxypeptidase, 11 dipeptidases, 8 aspartic proteases, 14 cathepsin Bs, 18 cathepsin Ls and 40 serine proteases (Additional file 3). Fifty-two of the putative proteases mapped by proteomic profiles contained a signal peptide, which were potential digestive proteases. The other 77 proteases had either no signal peptides or the signal peptides could not be determined due to lacking the N terminal sequences (Additional file 3). The protein sequences of the possible secreted proteases in the gut or SG of E. onukii are provided in Additional file 4.
Table 2
Summary of identified digestive proteases
Category of protease | Nr annotated contig | Proteome identified | Percentage |
Aminopeptidase | 148 | 26 | 18% |
Carboxypeptidase | 89 | 12 | 13% |
Dipeptidase | 46 | 11 | 24% |
Aspartic protease a | 39 | 8 | 21% |
Cathepsin B | 68 | 14 | 21% |
Cathepsin L | 197 | 18 | 10% |
Serine protease b | 343 | 40 | 12% |
Total | 930 | 129 | 14% |
a Aspartic protease includes cathepsin D and aspartic protease. |
b Serine protease includes trypsin, chymotrypsin and elastase. |
4. Expression of protease transcripts
To assess overall expression of the transcripts in the SG and gut, the FPKM of each contig was estimated. The FPKM data were converted to log scales, and boxplots showing medians and the full range of the variations of FPKMs are presented in Fig. 3a. In addition, boxplots using the FPKM value of the transcripts encoding proteases are also shown in Fig. 3a. The majority of the transcripts had very low FPKM values, and the median FPKM was only 1.47 for the gut and 2.23 for the SG, respectively (Fig. 3a). On the other hand, the proteases had much higher RNA expression levels. The median FPKM of proteases was 84.43 in the gut, which was 54-fold higher than the median FPKM calculated from the FPKM of all transcripts (Fig. 3a). It is notable that proteases were highly transcribed in the gut of E. onukii. Seven out of the top ten most abundant transcripts in the gut were proteases (Additional file 5). Transcripts of protease in the SG were relatively lower, compared with those in the gut of E. onukii, but still higher than the average FPKM of the total SG contig set. The median FPKM of proteases in the SG was 8.70, which was around 4-fold higher than that of the total transcripts (Fig. 3a), although no transcripts of protease were among the top 100 most expressed proteins in the SG (Additional file 5).
The FPKM of each identified protease in the SG or gut is listed in Additional file 3 and presented by a heatmap (Additional file 2: Fig. S2). Of all the identified proteases, 107 of them had an FPKM ≥ 1,000, or 50 ≤ FPKM < 1000. The FPKM values of 49 proteases were higher than 1,000 in the gut, while those of only 10 proteases were above 1,000 in the SG (Additional file 3). Different types of proteases apparently differentially expressed in the SG or gut. Cathepsins and serine proteases were the most abundant proteases among all the proteases analyzed. In addition, the highly transcribed cathepsin B- and cathepsin L-like proteins in the gut also showed relatively higher FPKM values in the SG than the other cathepsin-like proteins (Additional file 3). Serine proteases were also abundant in the gut. Seven out of 40 potential serine proteases had FPKM values over 10,000. Contrary to this, only 2 serine proteases (EMoSerineProtease-24 and − 26) showed FPKM values over 1,000 in the SG (Additional file 3). In the aminopeptidase group, EMoAminopeptidase-10 was the only aminopeptidase showing an FPKM over 1,000 in the gut. Aminopeptidases were also highly transcribed in the SG of E. onukii. FPKMs of EMoAminopeptidase-1 and − 3 in the SG were 710.05 and 755.95, respectively, which were 33-fold and 123-fold higher than those in the gut (Additional file 3). Compared to other protease groups, carboxypeptidases and dipeptidases were relatively lower in expression. EMoCarboxypeptidase-3 was the most abundant carboxypeptidase in the gut, and EMoCarboxypeptidase-5 and − 6 were moderately transcribed (FPKM below 100). To the contrary, EMoCarboxypeptidase-7, -8 and − 9 were abundant in the SG but were less transcribed in the gut. In addition, the most abundant dipeptidase was EMoDipeptidase-2 (FPKM = 402.60) in the gut.
5. Tissue specific distributions of protease proteins
The number of potential secreted proteases identified from the SG and gut or found in both tissues are shown in Fig. 3b. The majority of the proteases (59%) were distributed in both tissues with 8 and 34 proteases uniquely found in the SG or gut. Among protease groups, the same numbers of aminopeptidases were found in the SG and gut, while more cathepsins and serine proteases were found in the gut tissues than in the SG (Additional file 1: Table S2). These results were consistent with the results of enzymatic activity tests, in which higher cathepsin and serine protease activities were observed in the gut, while not many differences in aminopeptidase activities between the SG and gut were detected. Various numbers of proteases identified in the SG and gut provided explanations of the enzymatic activities. Notably, EMoCathepsin L-16 and EMoSerineProtease-21 were highly expressed in the gut with FPKMs of 44,267 and 3,460, respectively (Additional file 3). However, these protease proteins were only found in the SG, but not observed in the gut. This observation suggested that some proteases might be transferred from the gut to the SG.
Comparison of the top expressed proteases in E. onukiiand in four other hemipterans
The SG and gut of E. onukii expressed similar proteases (Table 3), which suggested that the SG and gut of E. onukii may play similar roles in food digestion. To determine whether other hemipteran insects have similar protease distributions, we analyzed available RNA-Seq data isolated from SGs or guts of four hemipteran insects, including two rice planthoppers (N. lugens and L. striatellus), one rice leafhopper (N. cincticeps) and one aphid (A. pisum). Annotation of proteins in N. lugens, L. striatellus, N. cincticeps and A. pisum is shown in File SX. Due to the lack of genomic data for N. cincticeps, translated protein sequences of predicted proteases in N. cincticeps are shown in Additional file 4. The top ten proteases of each insect based on FPKM values were selected and listed in Table 3. The results showed that aminopeptidase, carboxypeptidase and dipeptidase were among the most expressed proteases in the transcripts of the SG in the four hemipteran insects while only one carboxypeptidase was included in the top ten most abundant digestive proteases in the SG of E. onukii. The most abundant transcripts of proteases were serine proteases (trypsin/chymotrypsin) or cysteine proteases (cathepsin B/L) in the gut of four hemipterans (RNA-Seq data of N. cincticeps gut is unavailable) including E. onukii. But different from E. onukii, the top expressed proteases in the SG and gut of other hemipterans were not the same types of proteases. Interestingly, the pattern of highly expressed proteases in the SG of the three rice feeding pests and guts of N. lugens and L. striatellus appeared to be similar (Table 3).
Table 3
Top 10 digestive proteases abundant in the salivary gland and gut
N. lugens | L. striatellus | A. pisum | N. cincticeps | E. onukii |
Salivary gland | Gut | Salivary gland | Gut | Salivary gland | Gut | Salivary gland | Salivary gland | Gut |
Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM | Protease | FPKM |
NlDipeptidase_1 | 320.56 | NlSerineProtease_1 | 7275.85 | LsCarboxypeptidase_10 | 764.96 | LsSerineProtease_1 | 20233.68 | ApAminopeptidase_14 | 511.50 | ApCathepsinB_8 | 16325.05 | NcAminopeptidase_9 | 1911.7 | EMoSerineProtease-24 | 3106.66 | EMoCathepsin L-15 | 82328.09 |
NlCathepsinL_1 | 274.25 | NlCathepsinB_3 | 6369.42 | LsCarboxypeptidase_7 | 673.81 | LsCathepsinB_1 | 14942.57 | ApCathepsinL_1 | 472.22 | ApAminopeptidase_3 | 7108.27 | NcAminopeptidase_16 | 884.43 | EMoCathepsin L-15 | 2073.88 | EMoCathepsin L-4 | 80838.84 |
NlAminopeptidase_10 | 30.18 | NlSerineProtease_2 | 3526.61 | LsCarboxypeptidase_3 | 623.67 | LsSerineProtease_17 | 9103.13 | ApAminopeptidase_26 | 378.70 | ApDipeptidase_2 | 4523.77 | NcAminopeptidase_10 | 294.66 | EMoSerineProtease-26 | 2023.19 | EMoSerineProtease-24 | 79106.64 |
NlAminopeptidase_4 | 29.3 | NlCathepsinB_5 | 3198.8 | LsCathepsinB_1 | 429.44 | LsAsparticProtease_1 | 7172.04 | ApAminopeptidase_19 | 347.40 | ApAminopeptidase_5 | 4104.76 | NcAsparticProtease_1 | 278.32 | EMoCathepsin L-5 | 1954.21 | EMoCathepsin L-5 | 59177.68 |
NlCathepsinL_3 | 23.15 | NlSerineProtease_4 | 1828.83 | LsDipeptidase_3 | 319.59 | LsSerineProtease_16 | 5891.83 | ApDipeptidase_2 | 288.18 | ApCathepsinB_9 | 2604.41 | NcCathepsinB_1 | 229.4 | EMoCathepsin L-4 | 1929.15 | SerineProtease-26 | 56596.09 |
NlAsparticProtease_1 | 19.43 | NlCathepsinB_1 | 1765.26 | LsAminopeptidase_14 | 120.59 | LsSerineProtease_19 | 4743.02 | ApAminopeptidase_3 | 223.01 | ApCathepsinB_1 | 1922.46 | NcCathepsinL_5 | 130.7 | EMoCathepsin L-3 | 1290.16 | EMoCathepsin L-3 | 48466.44 |
NlCarboxypeptidase_1 | 18.21 | NlSerineProtease_7 | 1387.22 | LsCarboxypeptidase_11 | 86.57 | LsSerineProtease_18 | 4469.46 | ApCathepsinF_1 | 178.10 | ApCathepsinB_6 | 1789.98 | NcAsparticProtease_2 | 126.07 | EMoCathepsin L-14 | 1257.15 | EMoCathepsin L-16 | 44267.35 |
NlDipeptidase_1 | 17.53 | NlAminopeptidase_3 | 1263.03 | LsAsparticProtease_2 | 72.24 | LsCathepsinB_3 | 2285.8 | ApAminopeptidase_10 | 167.20 | ApCathepsinB_11 | 1528.08 | NcAminopeptidase_8 | 119.29 | EMoCathepsin L-16 | 1096.19 | EMoSerineProtease-12 | 27053.28 |
NlSerineProtease_1 | 15 | NlSerineProtease_6 | 1249.74 | LsAminopeptidase_11 | 66.05 | LsSerineProtease_15 | 2102.54 | ApCarboxypeptidase_1 | 161.76 | ApCathepsinB_5 | 1078.3 | NcCathepsinL_3 | 101.94 | EMoSerineProtease-12 | 834.63 | EMoCathepsin B-6 | 24658.81 |
NlAminopeptidase_11 | 14.84 | NlAminopeptidase_2 | 1182.63 | LsAminopeptidase_5 | 58.31 | LsSerineProtease_12 | 1704.64 | ApCathepsinK_1 | 147.47 | ApSerineProtease_3 | 817.72 | NcAminopeptidase_14 | 91.59 | EMoCarboxypeptidase-7 | 779.23 | EMoCathepsin L-14 | 18955.53 |
Phylogenetic analysis of potential digestive proteases isolated fromE. onukiiand other hemipteran insects
Phylogenetic trees of aminopeptidases, cathepsin B/L-like proteins or serine proteases isolated from SGs and guts of six hemipteran insects (E. onukii, A. pisum, H. halys, N. cincticeps, N. lugens and L. striatellus) were constructed. In addition, phylogenetic analyses of the proteases from hemipteran insects and from other insect groups were also performed.
1. Aminopeptidase
There are many types of aminopeptidases [26]. Nine different aminopeptidase groups were identified in the transcriptome of the SG and gut of A. pisum, H. halys, N. cincticeps, N. lugens, L. striatellus and E. onukii. The phylogenetic analysis grouped the aminopeptidases into 12 clades (Fig. 4a). Aminopeptidase N (APN) was the largest group and was distributed in four clades (A, B, F and G). The rest of the aminopeptidases were divided into eight clades representing varieties of the aminopeptidase group (Fig. 4a). Insect aminopeptidase N normally contains a gluzincin motif (GXMEN) and a zinc-binding motif (HEXXHX18E) [27]. We noticed that these two motifs were only observed in the APNs in groups A and B, but not in groups F and G. Groups F and G were mainly constituted by deduced aminopeptidases of E. onukii and N. cincticeps, respectively (Fig. 4b). In addition, APNs of group A had generally higher transcriptional levels in the gut than in the SG of the five hemipteran insects. In contrast, group B APNs showed higher expression levels in the SG than in the gut (Fig. 4a). The differentially expressed APNs of groups A and B found in the SG and gut were uniquely clustered when the APNs of more insect orders were introduced for phylogenetic tree construction (Additional file 2: Fig. S3).
2. Cathepsin B- and L-like protein
Cathepsin B- and L-like proteins identified from the SG and gut of E. onukii and five other hemipterans were clearly grouped into two distinct clusters, e.g., the cathepsin B group and the cathepsin L group, in the phylogenetic tree (Fig. 5). In the group containing cathepsin L-like proteins, the cathepsin L-like proteases of E. onukii clustered together in the same clade except for EMoCathepsin L-1, which is closely related to NcCathepsin L-5 (Fig. 5). The cathepsin B-like proteases, which showed higher transcriptional levels in the gut of E. onukii (EMoCathepsin B -3, -5, -6, -7, -9, -10, -11, -12 and − 13) were clustered together with the exception of EMoCathepsin B-13. On the other hand, EMoCathepsin B-8, the only cathepsin B-like protein of E. onukii showing a higher expression level in the SG, was grouped with four other SG enriched cathepsin B-like proteases, e.g., HhCathepsin B-1, ApCathepsin B-3, NcCathepsin B-1, and NlCathepsin B-4 (Fig. 5).
3. Serine protease
The serine proteases (trypsin, chymotrypsin, elastase and other serine proteases) of E. onukii and five other hemipteran insects were mainly divided into two major clusters in the phylogenetic tree (Fig. 6). Most of the cluster I serine proteases showed low to moderate transcriptional abundance in the gut, while higher expressions in the gut were observed in most of the cluster II serine proteases. Significantly, a branch of serine proteases in cluster I, including NcSerineProtease-6, EMoSerineProtease-4, -36, ApSerineProtease-1, -2, -6, and multiple other serine proteases, which were highly transcribed in the SG of stink bugs [21, 24], were grouped together (the clade colored red in Fig. 6). The putative serine proteases from E. onukii were mostly grouped to cluster II with the exception of EMoSerineProteases-4, -35, -36 and − 37 (Fig. 6). In addition to the two major clusters, EMoSerineProtease-35 and seven other serine proteases from A. pisum, H. halys, and L. striatellus were clustered into two small clades, which were independent from other serine proteases (clusters III and IV). Interestingly, when the venom serine protease (serine proteases identified from the SG or saliva) of multiple insect orders were included in the phylogenetic analysis, the serine proteases in cluster II, which were highly expressed in the gut of hemipterans, were again grouped into a distinct clade and located at the root of the tree (Additional file 2: Fig. S4).