Bioinformatic Comparison of Kunitz Protease Inhibitors in Echinococcus Granulosus and E. Multilocularis and the Genes Expressed in Different Developmental Stages of E. Granulosus

Background: Cystic and alveolar echinococcosis caused by the tapeworms Echinococcus granulosus and E. multilocularis, respectively, are important zoonotic diseases. Protease inhibitors are crucial for the survival of both Echinococcus spp. Kunitz-type inhibitors play a regulatory role in the control of protease activity. In this study (cid:0) we identied all the Kunitz-type protease inhibitors present in the genomes of these two tapeworms and analyzed the gene sequences using computational, structural bioinformatics and phylogenetic approaches to evaluate the evolutionary relationships of these genes. Results: A total of 19 genes from E. multilocularis and 23 genes from E. granulosus contained single or multiple Kunitz-domains. A neighbor-joining phylogenetic tree indicated that the E. granulosus and E. multilocularis Kunitz domain peptides were divided into three branches containing 9 clusters. Based on available transcriptome data, we analyzed the expression of these Kunitz-domain protease inhibitors in four major developmental stages of E. granulosus and found they were differentially expressed. Conclusion: We identied 19 and 23 Kunitz protease inhibitors in E. multilocularis and E. granulosus respectively; the majority of these genes were expressed in one or four stages of E. granulosus with some being highly expressed in adult worms indicating that these genes likely play different roles in the different developmental stages.


Background
Cyst echinococcosis (CE) and alveolar echinococcosis (AE) are both medically and economically important diseases caused by the metacestode stages of Echinococcus granulosus and E. multilocularis respectively. The diseases impact on hundreds of millions of people in Asia, Europe, America and Africa [1]. The control and treatment of echinococcosis are di cult. High frequency of dosing dogs with the drug praziquantel has played a key role in the control of the disease [2,3], but undertaking control measures is challenging in remote areas. A vaccine against adult worms in dogs is urgently needed [4].
The life-cycle of these two tapeworms involves four major developmental stages present in their de nitive and intermediate hosts. The survival of these tapeworms relies on evading host immune responses and avoiding attack by proteases; this is especially important for the adult parasites which reside in the gastrointestinal duct, a location where high concentration of proteases are present which are harmful and toxic for the worms.
Eukaryote proteases including serine (trypsin/chymotrypsin-like), cysteine (thiol) and aspartic (pepsin/cathepsin/rennin) proteases play a fundamental role in the regulation of protein function. Their functions are controlled largely by protease inhibitors which play crucial roles in the regulation of proteases involved in a range of biological processes system including cell proliferation, in ammation, immune mechanisms and cell homeostasis [5][6][7]; protease inhibitors act mainly through the control of potentially disadvantageous, excess or inopportune proteolytic activity. Protease inhibitors including aspartic, cysteine, metallo, serine, and threonine inhibitors are super-families based on their similarities at the amino acid sequence level and tertiary structure [8]. Similarities in primary structure and tertiary structure support the common ancestry of many protease inhibitor families.
Kunitz-type domain protease inhibitors (KDPIs) are an important type of protease inhibitor and belong to the I2 family of protease inhibitors [8,9]. These inhibitors contain at lease a cysteine-rich peptide chain (Kunitz-type domain) with α and β sheets. The Kunitz domain consists of around 60 amino acids including six conserved cysteine residues forming three disulphide bonds in a characteristic pattern (C1-C6, C2-C4, and C3-C5) [9]. The protein is stabilized by three conservative disul de bonds. These inhibitors have been characterized from animals and plants [9,10] including helminths [11][12][13]. A previous study described eight genes (EgKU1-EgKU8) isolated from E. granulosus protoscoleces treated with pepsin/H (+) [14]. We previously cloned and characterized two E. granulosus KDPIs, EgKI-1(EG_08721 (GenBank: EUB56407.1)) and EgKI-2 (EG_07242 (GenBank: EUB57880.1)) [13]. EgKI-1 is highly expressed in the oncosphere (egg) stage and is a potent chymotrypsin and neutrophil elastase inhibitor that binds calcium and reduced neutrophil in ltration in a local in ammation model. EgKI-2 is highly expressed in adult worms, it is a potent inhibitor of trypsin and is a potential vaccine candidate against echinococcosis in dogs [13]. Beyond these other E. granulosus and E. multilocularis KDPIs have received little attention.
In the present study, we identi ed all KDPI sequences predicted in the E. granulosus and E. multilocularis genomes and used computerized programs to characterize these Kunitz domain protease inhibitors. We show that the majority of the E. granulosus KDPIs are expressed and are differentially expressed in different life cycle stages and some have a range of GO numbers indicating these inhibitors likely function in different ways in the tapeworm's development.

General characterizations of Kunitz domain protease inhibitors
InterproScan and Motif scan identi ed 19 and 23 genes encoding KDPIs from the E. multilocularis and E. granulosus genomes, respectively ( Table 1). The KDPI family has a typical Kunitz domain of about 50 amino acids in size ( Fig. 1) with a special secondary structure formed by 3 disul de bonds or bridges (Additional le 3: Fig. S1). The echinococcal Kunitz domains contain an average of 52.85 aa (range 47-55 aa) with the majority comprising 53 aa (Fig. 1).  Table S1).
E. multilocularis and E. granulosus have 4 and 5 KDPIs containing transmembrane regions, respectively, and 78.94% and 78.26% of the E. multilocularis and E. granulosus KDPIs are extracellular (Table 1), which matches the GO analysis (Table 2 and Additional le 2: Table S2), indicating that the most KDPIs may involve host and parasite interface responses. The TopPred program indicated that 4 E. multilocularis and 5 E. granulosus KDPIs are located in the cytoplasm (Additional le 1: Table S1) with others, including 15 Em-KDPI sequences and 18 Eg-KDPIs, being extracellular. Cluster and phylogenetic analysis of Kunitz protease inhibitors Multiple sequence alignment and phylogenetic analysis of the amino acid sequences were used to infer the evolutionary relationships between the E. multilocularis and E. granulosus KDPIs and to make a comparison with other species. Figure 2 shows the different evolutionary distances of the genes of a single Kunitz domain of the KDPIs using the neighbor-joining method. The analysis indicated that the E. granulosus and E. multilocularis Kunitz domain peptides were divided into three branches containing 9 clusters.
Comparison of KDPI genes predicated from the E. granulosus and E. multilocularis genomes We compared the KDPI genes predicted from the genomes of E. granulosus and E. multilocularis and found that some genes are species-speci c. E. multilocularis does not have homologues of E. granulosus sequences EG_07242, EG_07266, EG_07243, EG_09006 and EG_09008; whereas EmuJ_001136700.1 and EmuJ_001137100.1 are speci c to E. multilocularis.
The speci city of a protease inhibitor against a protease is mainly determined by the nature of the amino acid residue at position P1 of its active site. It has been shown that Lys(K) and Arg(R) mutants of bovine pancreatic trypsin inhibitor (BPTI) bind to bovine trypsin about 10 5 -fold stronger than BPTI with P1 Tyr(T) [15]. In addition, it has been shown that typical trypsin inhibitors have Arg(R) or Lys(K) at P1, and chymotrypsin inhibitors have Leu (L) or Met (M) at the P1 position [16]. Therefore, the sequence analysis shows that the Em-KDPIs have 8 sequences containing R and 1 sequence containing K whereas the Eg-KDPIs have 8 sequences containing R at P1, which belong to typical trypsin inhibitors. Furthermore, the two tapeworms have 3 or 4 sequences containing L at P1, which are chymotrypsin inhibitors ( Fig. 1 and Table 1).

Two D and three D of Kunitz domain protease inhibitors
The majority of the single E. multilocularis and E. granulosus KDPIs are small proteins sized 16-kDa and contain a relatively high percentage of Lys and Arg residues at the C-terminus. Like most Kunitz domain protease inhibitors, the Em-and Eg-KDPIs contain a conserved Kunitz type sequence with 6 cysteine residues forming 3 disul de bridges (C1-C6, C2-C4 and C3-C5) (Additional le 3: Fig. S1) and these play a key role in the formation of the 2D and 3D structure of these KDPIs. For the single Kunitz domain sequences, the secondary structure prediction revealed 19.01-52.71% and 18.6-60.35% of α-helix and random coil structures in Eg-KDPIs, followed by extended strands and β-turn structure, accounting for 13.1-26.67 and 1.89-10.84%, respectively. Em-KDPIs α-helix and random coil structures account for 19- Three D structure analysis showed that a single Kunitz domain sequence with 3 disul de bonds has a similar structure containing a α-helix and random coils with similar structures (Fig. 3). Some single Kunitz domain sequences losing the second cysteine (C2) the structure is different from 3 disul de bonds ( Fig. 1 and Table 1).

Expression of E. granulosus KDPIs in different developmental stages
To estimate expression of the KDPIs, Hi-seq techniques were employed to obtain the transcript reads of these genes from total RNA from each of 4 developmental stages of E. granulosus. The transcript read information was published in a previous paper of ours [17].
The transcriptome analysis showed that these Kunitz peptides were differentially expressed in the different developmental stages of E. granulosus ( Table 2). All the inhibitors, except EG_09006, were expressed in one or 4 stages of E. granulosus with some being highly and differentially expressed in one or two stages.EG_03480 (extra), EG_03481 (intra), EG_07242 (extra), EG_07243 (intra), EG_07244 (intra), EG_08716 (extra), EG_08720 (extra), EG_09490 (extra) and EG_10096 (extra) were signi cantly highly expressed in the adult worm stage (Table 2 and Additional le 1: Table S1). EG_08716 is an extracellular protease inhibitor and has 42 predicted GOs, including cytoplasmic vesicle for neuromuscular process controlling balance, ionotropic glutamate receptor signaling pathway, regulation of the activity of epidermal growth factor receptor and synapse, regulation of mitotic cell cycle and translation and cellular copper and calcium ion homeostasis (Additional le 1: Table S1). The expression analysis indicated that this gene may play an important role in adult worm development and against host protease attack. EG_07244 is also an endopeptidase, indicating that the protein has two functions, as a peptidase and as a protease inhibitor in adult worms.
EG_08721 is an extracellular inhibitor and was differentially highly expressed in the oncosphere compared with the other stages, indicating this protease inhibitor plays an important role in oncosphere biology, the only stage for primarily infection and EG_08721 may play an important role in oncosphere against host protease attack which may be a candidate for vaccine development.
Although we activated PSC with pepsin, only three KDPIs (EG_01779 EG_05317 and EG_07944) were slightly elevated in this stage. Importantly, we found that EG_09490, EG_09268 and EG_09490 were highly expressed in the cyst membrane and the proteins expressed by these genes may be potential targets for drug development.

Discussion
KDPIs occur in almost all living organisms from bacteria to plants and animals. Kunitz peptides show diverse biological activities including inhibition of proteases and/or blocking or modulating ion channels.
Gastrointestinal helminths survive in an environment containing proteases and these parasites must have mechanisms to control protease activation. Therefore, Kunitz domain inhibitors are important for parasite survival, especially intestinal dwelling helminth parasites, to counteract protease attack.
A remarkable difference between the larval stages of E. multilocularis and E. granulosus is the difference in the lesion pathology in the intermediate hosts. The metacestode of E. multilocularis is a tumor-like, in ltrating structure consisting of many small vesicles embedded in the stroma of connective tissue. The continual growth of parasite vesicles in a proliferative style causes damage of liver tissues, which results in high mortality of AE. In contrast, E. granulosus cysts develop in internal organs (mainly liver and lungs) of humans and other intermediate hosts as unilocular uid-lled bladders with clear edge between cyst and host tissue. CE causes mortality in very few patients and there is a relatively good prognosis after surgical removal of the cystic lesion. Contrastingly, AE causes severe damage to the liver and patients require extensive treatment with albendazole to prevent relapse. However, little is known about the molecular mechanisms underpinning biological differences between the two parasites and the diseases they cause.
In this study, based on the genomic information available for E. granulosus and E. multilocularis we identi ed 23 and 19 KDPIs, respectively. The differential expression of these KDPI genes between E. granulosus and E. multilocularis may be associated with the differences in pathology caused by the metacestodes of the two species. It would be informative to determine whether these genes play a role in determining the different pathologies resulting from infection by the two cestodes in their intermediate hosts.
Signal peptide analysis showed that 89.47% of E. multilocularis KDPIs having signal peptide compared to only 60.87% of E. granulosus KDPIs containing signal peptide sequences. It is not known whether the differential KDPIs of E. multilocularis with signal peptides and being extracellular are associated with a more virulent pathology of AE lesion.
E. granulosus has 5 genes EG_07242, EG_07266, EG_07243, EG_09006 and EG_09008, that E. multilocularis does not have. Whereas, these two genes, EmuJ_001136700.1 and EmuJ_001137100.1 are only existed in E. multilocularis genome. These differential presented genes may play a role in the difference of pathology between the two parasites.
Two Echinococcus stages, the oncosphere and adult worm, are found in the gastrointestinal duct. The oncosphere is activated in the stomach and penetrates through the intestinal wall before being passed into the internal organs, whereas the adult worm spends its whole life in the gastrointestinal duct which contains high concentrations of proteases such as pepsin, trypsin and chymotrypsin. We previously showed that two KDPIs, EgKI-1(EG_08721) and EgKI-2 (EG_7242) function as protease inhibitors. EgKI-1 (also has accession number EUB56407.1) is highly expressed in the oncosphere and EgKI-2 (GenBank: EUB57880.1) is highly expressed in the adult worm [13]. These KDPIs are differentially expressed and stage-speci cally protect E. granulosus from protease attack [12]. In this study, we showed that 11 out 25 Eg-KDPIs were highly expressed in adult worms. These Eg-KDPIs likely protect against protease attacks in the gut during adult worm development. EG_05483 and EG_08721 were relatively highly expressed in oncospheres, suggesting their expressed products might be potential vaccine candidates for use in dogs against adult worm of E. granulosus.
In this study, we did not nd any KDPIs that were differentially and highly expressed in protoscoleces, although a previous study described a multigene family of eight (EgKU1-EgKU8) secreted Kunitz proteins from E. granulosus protoscoleces preferentially expressed by pepsin/H (+)-treated worms [14].
The secondary structures of proteins, especially the α-helix and β-strands play key roles in molecular function, cell stability, mechanical signaling, and tissue constitution as? random coils are easily folded and exposed to the protein surface [18]. The basic structure of a Kunitz peptide domain contains a typical sequence with six highly conserved cysteine residues connecting 3 disul de bridges (C1-C6, C2-C4 and C3-C5) which stabilizes the protein structure. Among the disulphide bonds, the C1-C6 and C3-C5 bridges are required for the maintenance of native con rmation [19],whereas the C2-C4 bond stabilizes the folded structure [20]. We found 10 sequences had lost the #2 cysteine, including 5 from E. granulosus, indicating no C2-C4 bridge in these proteins. It is not known whether these 5 proteins formed different bridges impacting on the function of these KDPIs, indicating that these genes may have a different functional role.
Hydrophilicity analysis showed that the Em-and Eg-KDPIs have high hydrophobicity, which is a typical characteristic of membrane proteins. The transmembrane regions consist of 20 hydrophobic amino acids, which could have an anchoring effect on cell membranes.
We previously showed that EgKI-1 is highly expressed in the oncosphere, indicating this protein helps protect this stage from digestion by trypsin, chymotrypsin and pancreatic elastase before it penetrates the intestinal wall.

Conclusion
In conclusion, based on whole genome analysis, 19 and 23 Kunitz domain protease inhibitors were identi ed the two Echinococcus species and these included single and multi-domain inhibitors. The differential expression of these KDPIs in different developmental stages of E. granulosus suggests they may have different in regulation of host immune responses, but further investigations will be required to determine precisely what roles they play in echinococcal development as such information may provide new insights for the prevention and treatment of both cyst and alveolar echinococcosis.

Expression of Kunitz domain inhibitors in E. granulosus
Transcript reads were obtained for each of the KDPI genes expressed in the adult worm, oncosphere, protoscolex and cyst (cyst membrane) of E. granulosus using Hiseq techniques as described [17].

Statistical analysis
Data are presented as means or median. Two-tailed Student's t test and Mann-Whitney U test was used for comparisons between two groups. Chi square test followed by Fisher's Exact Test was used to compare the sample rate (or constituent ratio) between the two groups. P < 0.05 was considered signi cant in statistical analysis.

Declarations
This study was funded by the National Natural Science Foundation of China (grant numbers 81830066 and U1803282).

Availability of data and materials
All data generated or analyzed during this study are included in this published article and the additional data le.
Ethics approval and consent to participate not applicable.

Consent for publication
Not applicable. Alignment and clustering of E. multilocularis and E. granulosus Kunitz-type domain protease inhibitors.

Figure 2
Phylogram constructed using the neighbor joining method to compare the sequences of E. granulosusand E. multilocularis-KDPIs with KDPIs from bovine and humans and other species. The accession Page 22/23 numbers for sequences included in the phylogenetic analysis are shown after the branches. The bootstrap distance values are shown after the accession numbers. Differentially expressed genes of E. granulosus and E. multilocularis are blocked in grey.

Figure 3
Three dimensional structures of single Kunitz domain protease inhibitors in E. granulosus and E. multilocularis using SWISS-MODEL.