DOI: https://doi.org/10.21203/rs.2.20164/v1
Back ground: Echinococcosis (CE) is a zoonosis and in humans it occurs as a result of infection by the larva of Echinococcus granulosus . CE is seriously affects the development of animal husbandry and endangers human health. Due to the lack of in-depth understanding of the cystic fluid formation pathway, prevention and treatment of CE have been lack of innovative methods.
Result: High throughput RNA-sequencing (RNA-seq) of protoscoleces (PSCs) in the encystation process of total three biological replicates for each period on 0d, 10d, 20d, 40d and 80d were analyzed. The results demonstrated, a total of 32,401 transcripts and 14,903 genes, including numbers new genes, new transcript, stage-specific genes and differently expression genes (DEGs). Genes encoding proteins involved in several signaling pathways, such as putative G-protein coupled receptor (GPCR), tyrosine kinases and serine/threonine protein kinase were predominantly up-regulated during encystation process of PSCs. Moreover, three major antioxidant proteins of PSCs were identified, and these proteins demonstrated have a high expression level, including cytochrome c oxidase, thioredoxin glutathione, and glutathione peroxidase. Intriguingly, The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis suggested that up-regulated DEGs involved in the vasopressin-regulated water reabsorption metabolic pathway might play important roles in the protein, carbohydrate, and other substances transport.
Conclusions: The present study carried out the transcriptomic analysis of the encystation process of E. granulosus PCSs, which provide valuable information for the mechanism of cystic fluid formation during the encystation process. These results provide a basis and reference for further studies for investigation of the molecular mechanisms involved in PSC growth and development. Keywords: Echinococcus granulosus , Encystation process, Differentially expressed genes, Protoscolex, RNA-seq
Cystic echinococcosis (CE), also called hydatid disease, is a chronic neglected zoonotic disease caused by the larvae of Echinococcus granulosus which endangers human health and causes huge economic losses in the animal husbandry [1]. The recent epidemiological studies showed that at least 270 million people (58% of the total population) are at risk of CE in Central Asia, including areas of Mongolia, Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, Uzbekistan, Iran, Pakistan and western China [2, 3]. The life cycle of E. granulosus characterized by long-term growth of larval stages (hydatid cysts) in the internal organs of humans and other intermediate hosts, especially in the liver and lungs [4]. Although benzimidazoles have been widely used to treat CE, the main disadvantages were the poor absorption and high hepatotoxicity [5, 6]. Thus, there remains great interest in developing new chemotherapies against CE.
The hydatid cysts is composed of cyst wall and contents, including cystic fluid, protoscoleces (PSCs) and brood capsules. The cyst wall mainly consists of the outer protective acellular mucin-rich laminated layer [7] and the inner layer of germinal cells (blastoderm) [8]. The germinal layer has many cellular nuclei, which can produce groups of vesicles into the cyst lumen, these vesicles become brood cysts after cell vacuolation and then develop into the initial PSCs [9]. Secondary hydatid cysts are formed by encystation during both in vivo development and in vitro culture of the PSCs [10]. However, despite it is clear that during the process of encystation, the volume of cysts gradually increases, as well as the amount of cystic fluid, and the echinococcus cyst fluid is the internal environment for the growth and development of germinal cells and PSCs, the functional genes and the key metabolic pathways linked to this encystation process are still unknown.
Recently, the second-generation high-throughput RNA sequencing technology is mainly used to detect the expression levels of all genes or transcriptomes in a sample under specific physiological conditions, and it plays an important role in gene expression and transcriptome regulation [11]. While whole-genome sequencing, proteomic analyses, and transcriptomic investigations recently been carried out to identify the DEGs and proteins between the different life stages (adult, oncosphere, hydatid cyst wall, larval worms, and pepsin/H+-activated PSCs) of E.granulosus [12-15]. An unexplored issue associated with the parasites survival in its host is what is the key DEGs and main metabolic pathways that used by PSCs to produce the energy, formation cyst fluid and transport molecular needed to maintain its growth and development during the encystation process.
Here, for the first time, we reported there were 1,991 and 2,517 up- and down-regulated DEGs, respectively, during encystation process in five different stages (0d, 10d, 20d, 40d, and 80d) of PSCs by using RNA seq analysis method. Through enrichment analysis of DEGs, it was found that PSCs were mainly involved in vasopressin-regulated water reabsorption during the process of encystation. Our study identified a range of potential target genes of drugs and vaccines treatment and provided metabolism pathways that are primarily involved in the PSCs encystation process. These findings could facilitate the development of new intervention tools for the treatment and control of CE.
Protoscolex collection and RNA extraction
E. granulosus hydatid cysts were collected from a naturally infected sheep liver, which freshly obtained from a slaughterhouse in Urumqi, Xinjiang, China. The hydatid fluid was aspirated aseptically using a 50 ml syringe, and then the PSCs were collected from hydatid fluid under aseptic conditions and washed 15 times with sterile PBS. Afterward, the PSCs were digested in 1% (W/V) pepsin for 30 min at 37 °C to release PSCs from the brood capsules. During the digestion, the samples were agitated every 5 minutes and observed under an inverted microscope until the tissues were digested completely. Subsequently, the collected PSCs were washed with a solution of PBS that contained penicillin and streptomycin antibodies (Sigma, USA), and stained with 0.4% trypan blue until the viability rate was more than 90%. PSCs culture conditions were referred to Liu et al., 2018 [16]. The PSCs were cultured in RPMI 1640 medium (Gibco, USA) (pH 7.2), with 10% fetal bovine serum and 1% penicillin-streptomycin. The average number of PSCs in monophasic culture medium was 4000 per bottle, and the incubator conditions were 37 °C and 5% CO2. The culture medium was changed every 2 to 3 days. Different stages in culture medium were isolated based on the morphological classification described by Elissondo et al [17]. The isolated of PSCs on different stages were stored in liquid nitrogen for RNA extraction. PSCs tissues were collected from 15 batches of five different culture periods (three replicates each): EgPSC_0d, EgPSC_10d, EgPSC_20d, EgPSC_40d, and EgPSC_80d. Total RNA was extracted from all samples by using TRIZOL-Reagent kit (Invitrogen, CA, USA). The extraction was carried out according to the instructions, and the concentration and integrity of RNA was detected by ultraviolet spectrophotometer and agarose gel electrophoresis.
Construction of strand-specific libraries and RNA-seq
Conversion of mRNA from total RNA in each sample to obtain a strand-specific cDNA library was carried out according to the manufacturer's instructions, using the Illumina® TruSeq® RNA Sample Preparation Kit v2 (Illumina, CA, USA) and Oligo (dT) bead enriched PSCs mRNA. Combined different cDNA libraries according to the effective concentration and target data volume, before undertaking Illumina sequencing (Mig, Shanghai, China).
Data assembly and annotation
The raw reads sequences including linker sequences, low-quality sequences, long sequences, short length sequences and ambiguous N (unknown nucleotide). Thus the software SeqPrep (https://github.com/jstjohn/SeqPrep) was used to perform quality control of the raw data and obtain high-quality data (clean data) before analyzing [18]. TopHat2 [19] software (http://ccb.jhu.edu/software/tophat/index.shtml) was subjected to sequence alignment analysis with the reference genome of E.granulosus. Then, total sequences were assembled and spliced using cufflinks software (http://cole-trapnell-lab.github.io/cufflinks/). After that, all transcripts were aligned to the public databases by BLASTn and BLASTx ( E-value cut-off <10-5) to provide information of transcripts expression and function, in this step, these unannotated transcripts were defined as new transcripts. The public databases include the nonredundant protein (NR) (ftp://ftp.ncbi.nih.gov/blast/db/); Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/); A manually annotated, non-redundant protein sequence database (Swiss-Prot) (http://ww.uniprot.org/); Homologous protein family (Pfam) (http://pfam.xfam.org/); Clusters of Orthologous Groups of proteins (COG) (http://www.ncbi.nlm.nih.gov/COG/) and Gene Ontology (GO) (http://www.geneontology.org/). The best alignment results were used to decide the direction of sequences.
Inter-samples correlation analysis
RSEM (http://deweylab.github.io/RSEM/), as a quantitatively analyzed the expression levels of genes and transcripts software, could be used to further analyzed the DEGs between different samples, and to reveal the regulation mechanism of genes by combining sequence function information [20]. Besides, RSEM distinguished which transcripts were different subtypes of the same gene through established the maximum likelihood abundance estimation model [21]. Venn diagram analysis was performed to display genes that specific and co-expression between samples.
Differential expression analysis
Since there were three biological replicates per period, the raw data were statistically analyzed directly using the DESeq2 software, which based on the negative binomial distribution [22]. EgPSC_0d was used as the control group, while other groups were the experimental groups, and a differential expression analysis of the genes from the samples was performed. Differentially expressed genes (DEGs) was threshold settings as previous studies (p-adjust <0.05&| log2FC | > = 1), transcript expression folds greater than 2 per group, and p-value less than 0.05, after multiple-test corrections, were selected [23].
Verification of RNA sequencing data by Q-PCR
12 DEGs were selected to verify the accuracy of the Illumina His seq X (Illumina, CA, USA) sequencing data, including six up-regulated DEGs and six down-regulated DEGs, respectively. The Primer 5.0 and Oligo 7.0 software were used to design and evaluate specific primers (Table 1). According to the previous study, Actin II was employed as an endogenous reference gene [24], and these primers were synthesized by Tsingke Biotech Company (Beijing, China). ALL 15 RNA templates that were used for RNA-seq were reverse transcribed to cDNA using Prime ScriptTM RT reagent Kit (Takara, Dalian, China) according to the manufacturer’s instructions, individually. The final cDNA products were diluted 10 times with nuclease-free water before QPCR. QPCR was performed using SYBR Premix Ex Taq GC kit (Takara, Dalian, China). The reaction system comprised 10µl SYBR QPCR mix (2x). Forward and reverse primers (0.8 µl, 10 µM each), 4 µl diluted cDNA and added ddH2O to bring the volume to 20 µl. The reaction conditions were as follows: 95 °C, 2 min; 95 °C, 10 s; 55 °C, 10 s; 72 °C, 20 s, 40 cycles; dissolution curve conditions: 65 °C, 5s; 95 °C, 5s; 4 °C, 30s. Each gene was processed in triplicate replicates using CFX-96 TouchTM Real-Time PCR Detection System (Bio-Rad, USA) and the expression of all the selected DEGs was evaluated using 2 - ΔΔCt method.
RNA extraction and RNA-seq quality assessment
At the beginning of the in vitro culture, PSCs were invaginated, and the structure of the calcareous corpuscles and the apical hook could be observed under the inverted microscope (Fig. 1A). After 10 days of incubation, the morphology of the PSCs changed greatly, the apical hook and the sucker were evaginated (Fig. 1B). On day 20, the micro-cysts appeared with a thin laminated layer, and the small hook of scolex moved toward the central position of these micro-cysts, moreover, the calcareous corpuscles of the micro-cysts were observed clearly (Fig. 1C). After 40 days the micro-cysts and laminated layer were observed clearly, the apical hooks and sucker were not completely degraded (Fig. 1D). On 80 days, the apical hook and sucker disappeared or degraded, the calcareous corpuscles decreased, and a considerable increment in the size of the cysts could be observed (Fig. 1E). Samples from three batches of PSCs specimens for five different culture periods (EgPSC_0d, EgPSC_10d, EgPSC_20d, EgPSC_40d, and EgPSC_80d) were collected. Total RNA extraction results showed that the RNA bands was slightly degraded, with no pigment contamination, obvious protein content, sugar, or other impurities.
Sequence assembly and annotations
Through the second-generation high throughput sequencing Illumina platform, the cDNA library forms each sample was constructed and sequenced, which average generated 51,300,044 clean reads representing a total of 7,597,846,677 (7.5 Gb) nucleotides of each sample (Table 2). The GC content is one of the important characteristics of the genome base sequence, which can reflect the structure, function and evolutionary information of the gene [25]. Interestingly, the average GC content of the E. granulosus transcriptional level (46.44%) was similar to both its genome (42.1%) and coding regions (49.3%) [26]. Early studies estimated the S. mekongi, B. malayi and S. mansoni had a GC content of 34.1%, 30.5% and 35.3%, respectively [27, 28], indicated that compared with other parasite species, the GC content of E.granulosus was higher.
After alignment and assembled clean reads, a total of 14,903 genes and 32,401 transcripts were obtained, including 3584 new genes and 21,082 new transcripts, which had not been reported in the previously published transcriptome of the parasite. Among 32,401 transcripts, 20,511 were annotated by GO (63.3%), 13,730 by KEGG (42.38%), 17,152 by COG (52.94%), 28,188 by NR (87%), 16,511 by Swiss-Prot (51.08%), 17,874 by Pfam (55.16%). In addition, Veen diagram analyzed of function annotation showed that 4,458 reference transcripts and 53 new transcripts were simultaneously annotated into all gene databases. Furthermore, some genes or transcripts could not be annotated and were defined as hypothetical protein, which suggested that the group of identified transcriptome dataset is sufficiently reliable for further characterization [29]. The length of the transcript was widely distributed, and the majority of the transcripts longer than 1,800 bp (34.65%) in length. In contrast, the transcripts length between 0 and 200 bp were the least (1.34%) (Fig. 2). The obtained transcriptome raw reads dataset has been submitted to the NCBI Short Read Archive (SRA) database under the accession number: SRP172517.
Veen diagram analysis and differential expression analysis between samples
Based on the expression matrix, inter-sample Venn diagram analysis was performed to obtain the co-expression and specific expression genes (Fig. 3A) or transcripts (Fig. 3B) between groups. Intriguingly, although a total of 14,903 genes and 32,401 transcripts were obtained by transcriptome sequencing at different developmental stages of PSCs in the encystation process, these genes and transcripts did not always present in every period of the cysts formation process. Conversely, the genes and transcripts were specifically expressed in different stages utilizing inter-sample Veen diagram analysis, especially at EgPSC_0d, the number of stage-specific genes and transcripts was the largest, 1079 and 1990, respectively. Secondly, the stage-specific genes and transcripts at the 20d stage were 277 and 704, respectively. While EgPSC_0d were the period when the original PSCs was just removed from the fresh sheep liver, and EgPSC_20d was the initial period of cyst formation, both of which were key periods of the encystation process. In the previous transcriptome sequencing of the E. granulosus indicated that the presence of stage-specific genes among the cyst wall, PSCs and pepsin/H+-activated PSCs [12], which indicated that the stage-specific genes exist not only in different stages of the E. granulosus development but also in different periods of development.
For the DEGs analysis of each sample, EgPSC_0d was used as the control group compared with other groups (EgPSC_0d vs 10d, EgPSC_0d vs 20d, EgPSC_0d vs 40d and EgPSC_0d vs 80d). The parameter set as P-value cutoff of < 0.05 and an FC cutoff of > 2 to identify the significant DEGs, including 1,991 up-regulated and 2,517 down-regulated DEGs, respectively. A set of gene which composed of 1,991 up-regulated DEGs was established to provide the functional classification compared with the COG database. The result showed that most of the genes (25%) were classified into the poorly characterized category, which described as function unknown. Secondly, 144 genes (7.2%) were classified as cellular processes and signaling, which described as intracellular trafficking, secretion, and vesicular transport (Additional file 1: Table S1). Given that high levels of expression often indicate a fundamental role, these represent interesting goals for further research. The finding can open the way to the further studies.
Based on Veen diagram analysis, the total of 1,991 gene were up-regulated and 2,517 were down-regulated DEGs, we found that 52 genes were consistently on the rise during the process of encystation (Fig. 4A), while 152 genes have continued to decline (Fig. 4B). Through cluster heat map analysis of 52 consistently up-regulated DEGs (Fig. 4C), showing that hypothetical protein (EGR_10197) and (EGR_05435) increased significantly at the late stage of encystation, and both of them were annotated as a hypothetical protein. Besides, gene1117 described as an integral component of the membrane and classified into cellular component term of the GO database. Moreover, 13 genes annotated to COG database among 52 DEGs and served many important biological functions, such as amino acid transport and metabolism, intracellular trafficking, secretion, and vesicular transport (Table 3). After analyzed the correlation coefficients between the 52 DEGs with FDR< 0.05 (Fig. 4D), a visualization network was obtained, which showed that cornifelin (EGR_03608), platelet glycoprotein (EGR_09887), potassium voltage-gated channel subfamily C member 3 (EGR_05864) and others are related to the expression of many other genes.
Another interesting found was that none of the genes remained up-regulated during the encystation process when we analyzed by Veen diagram analysis of the gene sets among EgPSC_0d vs 10d, EgPSC_10d vs 20d, EgPSC_20d vs 40d and EgPSC_40d vs 80d (Additional file2: Figure S1). Only ten genes exist at most of four gene sets simultaneously, and two genes of them continued up-regulated from EgPSC_10d to 80d, EGR_05443, and EGR_03870, respectively, and the other 8 genes were continuously up-regulated from EgPSC_0d to 10d and EgPSC_20d to 80d. In addition, of the 10 genes, except EGR_09887 and EGR_09469, which described as platelet glycoprotein and dynein light chain, respectively, the remain genes were described as hypothetical proteins (Table 4).
GO and KEGG enrichment analysis of DEGs
GO was a comprehensive database that aggregates all gene-related research results from around the world and it could standardize biological terms for genes and gene products from different databases, providing a uniform definition and description of gene and protein functions [30]. To obtain a better understanding of the enriched functions of 1,991 DEGs in the process of encystation of E. granulosus, GO term enrichment analyzed was performed. According to the GO classification of DEGs, there were 1094, 148 and 263 genes assigned to cellular component (CC), biological process (BP) and molecular function (MF) categories, respectively, of which 709, 0 and 122 genes were significantly DE (FDR value < 0.05) (Additional file 3: Table S2). The top three predominant terms for BP were endoplasmic reticulum unfolded protein response, O-glycan processing, and ion transport; for CC they were integral component of membrane, intrinsic component of membrane, and membrane part; and for MF they were phosphoric ester hydrolase activity, hydrolase activity, acting on ester bonds and beta-1,3-galactosyltransferase activity (Fig. 5).
KEGG is a large-scale database for systematic analysis of gene function, genomic information, and functional annotation. By analyzing the results using the information in this database, genes or transcripts can be classified according to the metabolic pathways or functions involved, including organismal systems (OS), metabolism (M), environmental information processing (EIP), human diseases (HD), genetic information processing (GIP) and cellular processes (CP) [25]. KEGG pathway enrichment analysis revealed that the 1,991 DEGs classified into 291 pathways (Additional file 4: Table S3) (Fig. 6). Analysis of the top 20 most significantly enriched pathways showed that the map 04962, which described as vasopressin-regulated water reabsorption metabolic pathway was the most significantly enriched pathways (FDR value < 0.05) (Fig. 7), followed by pantothenate and CoA biosynthesis (map00770) and Hippo signaling pathway (map04391).
Since the morphological changed of the PSCs were greatest in the first 10 days, we performed GO and KEGG enrichment chord analysis the top ten of GO term and KEGG pathway with FDR value < 0.05 on the DEGs between EgPSC_0d and EgPSC_10d. After analyzed, 43 and 61 specific genes were obtained, respectively (Additional file 5: Figure S2, Additional file 6: Figure S3).
Real-time quantitative PCR analysis
To validate the RNA-seq data and the analyses derived from them, 12 DEGs were selected to measure their relative expression level by QPCR, including six up- and six down-regulated DEGs. The inter-sample statistical analysis was performed using T-test. 0.01<P<0.05 was considered significant (*), P<0.01 (**) and P<0.001 (***) were considered extremely significant. The QPCR data were compared with the relative expression level of the gene in the RNA-seq TPM value to confirm the reliability of the transcriptome sequencing results (Fig. 8).
The parasite E. granulosus is a cestode tapeworm that acts as the causative agent of CE, one of the 17 neglected tropical diseases to be recently prioritized by the World Health Organization [15]. During its life cycle, hydatid cysts produce the pre-adult form, which has the ability to either differentiate into an adult worm (strobilation) in the terminal host (dog) or dedifferentiate into a secondary hydatid cyst in the intermediate host (human and livestock) [31]. Furthermore, the cysts may be developed in almost any internal organ or tissue by hematogenous dissemination, such as heart, bone and nervous system [32]. The early infection is asymptomatically, with the cyst volume increases, eventually causing physical damage result from mechanical compression of the surrounding tissues and organs [33]. The major sources of morbidity and mortality are pressure effects from cyst size, location in a sensitive organ, or cyst rupture caused by a spontaneous or external force (traumas or surgery) with subsequent anaphylaxis or dissemination of the secondary infection [34]. Thus, novel prevention and control strategies need to be developed based on basic and applied E. granulosus biology, especially genomic, transcriptomic and proteomic analysis. However, our understanding of key DEGs and the main metabolic pathways of PSCs in the encystation process is limited.
The previous study found that PSCs encystation generally appeared during the first week of incubation in vitro, and on day 20, some micro-cysts with a complete laminated layer were observed. Between day 38 and day 42, micro-cysts completely developed could be observed [35]. The times when we observed the above important nodes in our laboratory were 10d, 20d and 40d, respectively, and after 80 days of in vitro culture, the cysts are large enough. These phenomena was slightly different from the previous description, one possible reason for this is the different culture medium and genotype, not only in the time of bladder formation but also in the development process was slightly different [17].
Owing to the problem of isolation PSCs from the abdominal cavity of mouse is time-consuming and difficult to observe the specific encystation state [36]. Here, we provided a new insight to compare the gene expression profile of different developmental stages in the in vitro encystation process of PSCs using RNA-seq. Compared to previous hybrid-based microarrays and sanger sequence-based methods, RNA-Seq provided abundant genomic information on specimens [37]. In addition to quantifying gene expression, data obtained by RNA-Seq facilitate the identification of new genes, new transcripts, DEGs and classification of functions, but also helps to understand the different mechanisms involved in the growth and development of the specimen and important metabolic pathways [38]. Transcripts per million reads (TPM), the number of reads from a transcript per million reads, was used to measure the level of expression. Unlike the fragment per kilobase of exon model per million mapped reads (FPKM), TPM was first homogenizes the length of sequencing and then homogenizes the depth of the gene. The homogenization process of TPM made the total expression levels in different samples consistent, produced the comparison of gene expression levels more intuitive [39].
Here, we obtained a total of 32,401 transcripts and 14,903 genes, identified numbers stage-specific genes, and demonstrated that 1,991 and 2,517 genes were significantly up-regulated and down-regulated, respectively. Genes encoding proteins involved in several signaling pathways, such as putative G-protein coupled receptor (GPCR), tyrosine kinases, and serine/threonine protein kinase were predominantly upregulated during encystation process of PSCs. These signaling pathways play major roles in key functions like movement, development, and reproduction in parasites [40]. Our findings are paralled to the previous studies of S. mekongi [27] and S.japonicum [41]. A previous study classified more than 60 putative GPCRs of E. granulosus as potential anthelminthic drugs target [14]. In this study, we filtered out coding putative GPCRs genes with low expression and un-expression through transcript sequencing, narrowed down to six genes (EGR_01295, EGR_00873, EGR_07838, EGR_08773, EGR_00585 and EGR_06296), in particular, EGR_00873 was significant up-regulate during EgPSC_0d to EgPSC_20d. In addition, EGR_10296, and EGR_09273, which encodes the tyrosine kinase and serine/threonine protein kinase, respectively, did not increase during the first 10 days of PSCs in vitro culture but showed a significant up-regulate trend at the later stage. Members of these pathways characterized here might represent novel drug targets for anti-parasitic intervention and require further study.
The vast majority of invertebrate species have shown an immune response that can inactivate and eliminate penetrating parasites, especially involves the formation of reactive oxygen species (ROS) [42]. ROS is a key trigger of the inflammatory activation of macrophages in malaria, simultaneously, it can damage proteins, carbohydrates, and DNA, which is harmful to the survival of parasites [43]. In order to neutralize the host's immune response and oxidative damage caused by oxygen free radicals, protective antioxidant proteins are produced by the body of parasites (such as schistosomiasis) [44]. In the present study, we identified three major antioxidant proteins of PSCs, including cytochrome c oxidase, thioredoxin glutathione, and glutathione peroxidase. Cytochrome c oxidase (EGR_08027), an up-regulated DEGs, has the molecular function of binding to heme and transporting protein in the mitochondrial electron-transport chain according to the COG database annotation, consistent with previous observations that it can regulate mitochondrial oxidative metabolism of E. granulosus [45]. Previous reports suggested that cytochrome c oxidase dysfunctions is associated with increasing mitochondrial ROS production as well as stimulating inflammation and apoptosis [46, 47]. Hinted that in the face of external environmental pressure, cytochrome c oxidase of PSCs might play essential roles in the antioxidant defenses during the encystation process. Another well-characterized antioxidant defense is the thioredoxin glutathione system, which plays a critical role in maintaining the redox balance in the parasite, such as S. japonicum [48], E. granulosus [12], T. brucei [49], and S. mansoni [44]. In this study, we filtered out three relative proteins through differential expression analysis, including thioredoxin (EGR_6727), thioredoxin related transmembrane protein (EGR_01550) and thioredoxin domain containing protein (EGR_09201). Lastly, glutathione peroxidase (EGR_06748), which presented continuous high expression of PSCs during the encystation process, involved in clearance of cytotoxic and genotoxic compounds and protection against oxidative damage [50]. Studies have reported that glutathione peroxidase of E. granulosus displayed the ability to bind non-substrate molecules, particularly anthelmintic drugs [51], and it acts as an essential enzyme for the survival of the schistosomiasis in the redox environment has been actively explored as a potential drug target [52].
Here, we analyzed the key genes previously identified by genomic [26], proteomic [53] and transcriptomic [12] studies of E. granulosus, and found that some genes were expressed significant differences in the encystation process of PSCs. For example, Heat shock protein, which has important roles in posttranslational modification, protein turnover, and ATP binding, was reported that coding gene EGR_09650 was only expressed in adult worms, while EGR_10561 was highly expressed in the hydatid cyst membrane [26]. However, in this sequencing, we found that both of them are present in the PSCs. It is worth noting that EGR_10561 is the gene with the consistently highest expression among all genes coding heat shock proteins during the encystation process of PSCs. Cadherin, which has important roles in cell adhesion and recognition, overexpression of cadherin induced parasite aggregation in T. vaginalis [54].
Although more than forty genes were found encoding cadherin by sequencing, all the genes except EGR_06182 showed low transcription expression levels. Glycosylphosphatidylinositol (GPI)-anchored wall transfer protein, despite it is important and exclusive to E. granulosus, the expression level of the encoding gene (EGR_06221) is low during the PSCs encystation. One possible reason for this is the heterologous proteins is covalently anchored to the PSCs cell wall by fusing them with the anchoring domain of GPI-anchored cell wall transfer protein [55], but in this study, PSCs was cultured in vitro and there was no heterogeneous source protein. Furthermore, tetraspanins are transmembrane proteins previously described as potential vaccine candidates for other helminth infections and are also found in the membranes of the tegument and extracellular vesicles of O. viverrini [56]. In some organisms, their role is associated with virulence and pathogenesis [57]. A wide variety of molecules have been found to be released by E. granulosus, such as tetraspanin proteins, channel protein that transport lipids, and nucleic acids [58, 59]. Here, we identified two encoding tetraspanin genes (EGR_11042 and EGR_06311), showed a strikingly up-regulate trends during the encystation of PSCs, we speculated that the overexpression of genes related to membrane protein transport in PSCs might be a compensatory mechanism by these worms to adapt to the external environments to satisfy their own continuously encystation development processes.
The most highly enriched KEGG pathway involved in 1,991 up-regulated DEGs of PSCs during the encystation process was vasopressin-regulated water reabsorption metabolic pathways (map 04962), which include a total of 24 DEGs (Additional file 7: Table S4). Dynein light chain (DLC), ras-related protein, synaptobrevin-like protein and aquaporin 4 (AQP 4) was specifically upregulated in the encystation process of PSCs. The vasopressin-regulated water reabsorption metabolic pathways have been reported to play a critical role in regulating water, urea, and sodium transport to keep the water balance of body [60]. However, due to the special structure and life history of the parasite, there is no aquaporin 2, vasopressin and its corresponding V2 receptor (V2R) genes in PSCs. Previous reports indicated that DLC is involved in the process of protein, carbohydrate and other substances transport and retrograde vesicle transport [61]. Ras-related protein are involved in the process of vesicle transport after activation, parasites can secrete extracellular vesicles to achieve intercellular communication, and transfer biologically active molecules to the host cells in order to modulate host immune response [62]. Hence, vesicle transport is considered an important pathway for substances transport and information transmission between parasites and external environment [53]. In this study, we detected the presence of AQP4 in the cell membrane of PSCs, and eukaryotes usually permeate water and regulate water reabsorption by activating AQPs, such as S. japonicum [63] and S. mansoni [64]. Therefore, studying the role of vasopressin-regulated water reabsorption metabolic pathways in PSCs can provide clues useful for the development of new drug targets. However, these results were approached in the manner of bioinformatics analysis and require further studies or investigation.
In summary, through RNA-seq, we presented a transcriptomic analysis during the encystation process of PSCs. The study revealed numbers of new genes, new transcripts, DEGs, stage-specific genes and the key metabolic pathways. These data provide valuable information to understand the growth and development mechanisms of PSCs, which can be used as a base for further studies on new drugs and vaccine targets.
DEGs, differently expression genes; KEGG, Kyoto Encyclopedia of Genes and Genomes; NCBI, National Center for Biotechnology Information; PBS, phosphate buffer saline; PCR, polymerase chain reaction; Q-PCR, quantitative real-time PCR; RNA-seq, RNA sequencing.
Funding
This work was supported by the National Natural Science Foundation of China (grant number: 81672045).
Acknowledgments
We are grateful to the editor and anonymous reviewers for their helpful comments and constructive suggestions.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and material
RNA-Seq reads have been deposited in the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA) database under the accession number: SRP172517.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JJF and BY conceived and designed the study. HYW, XNL and KL provided samples. JJF and QQT performed the experiments and analyzed the transcriptomic data. JJF drafted the manuscript. BL, PL, WQC and XL helped in study design, study implementation and manuscript revision. All authors read and approved the final manuscript.
Table 1
Primers used in quantitative PCR
Gene id |
Gene name |
Gene database prediction |
Forward Primer (FP) and Reverse Primer (RP) |
Product Size (bp) |
Up-regulated DEGs |
||||
Gene7722 |
EGR_03608
|
cornifelin |
FP: AGTCTGCTTCCATTGACGACCA RP: CCATCGCTGTCGCCTTCGGTA |
149 |
Gene11293 |
EGR_00384
|
Thioredoxin-related transmembrane protein |
FP: TGCAAATGTACGCTACTACCC
RP: ACGTAAAGTAGGCTCAAGCAA
|
88
|
Gene10136 |
EGR_01086
|
Vesicle transport through interaction with t-SNAREs 1A |
FP: AGATAGCAGCAACTACCCGTTT
RP: TTCACATCAGCAGGCATGTCC
|
91
|
Gene5023 |
EGR_06269
|
Minor histocompatibility H13 |
FP: TATTACCGTTTTCCGTCGAGA RP: GCGGCAAATATCTTAAACACC |
96
|
Gene1575 |
EGR_09743
|
Collagen, type ⅩⅤ, alpha |
FP: AAAGGACTCTACTGGACCTCG RP: GAAGTCCACTGATCGCACGTT |
84
|
Gene3068 |
EGR_08242 |
Signal transducing adapter molecule |
FP: TGATGCCCAAAATGCTACACC RP: CGAAATCGAATAGAGCACGAAC |
140 |
Down-regulated DEGs |
||||
Gene885 |
EGR_10426
|
Transcription elongation factor |
FP: TGATTGAGCCACTACCGAAGCA RP: AGTCCTCCAGGCATATTCCAC |
123
|
Gene2683 |
EGR_08627
|
Fatty acid-binding protein |
FP: AAACCCGTGCTTACCTTCG RP: CATCCGTGGTGTTCTCATCG |
145
|
Gene1583 |
EGR_09751
|
heat shock protein |
FP: TCCAGGCCGTATTACTGAGCA RP: ATGACTTTAGTGCCACCGT |
96
|
Gene2093 |
EGR_09229 |
LIM and SH3 domain protein |
FP: GCCAGTGTGACGTAGTCCTC RP: CGTTTGTGTGTTCTTCGCCAA |
107 |
Gene10528 |
EGR_00894 |
hypothetical protein |
FP: AGTCAATTCAAGTAATCGTGCTG RP: TTCTCGCTTGATTTTCGCCACT |
80 |
Gene7715 |
EGR_03601 |
BoIA-like protein 3 |
FP: TTCTTAATGTCTGCCGCCAT RP: AGTAGAGCCTCCTTATAGTCC |
104 |
Control primer |
||||
Gene2256 |
EGR_09068
|
Actin II |
FP: ATGGTTGGTATGGGACAAAAGG RP: TTCGTCACAATACCGTGCTC |
118 |
Table 2
Sequencing data statistics
Sample |
Raw reads |
Raw bases |
Clean reads |
Clean bases |
Error Rate (%) |
Q20 (%) |
Q30 (%) |
GC Content(%) |
EgPSC_0d_1 |
60,847,206 |
9,187,928,106 |
60,032,580 |
8,950,587,025 |
0.0274 |
97.19 |
91.78 |
47.07 |
EgPSC_0d_2 |
65,486,882 |
9,888,519,182 |
64,718,518 |
9,648,903,196 |
0.0267 |
97.47 |
92.42 |
47.43 |
EgPSC_0d_3 |
44,838,914 |
6,725,837,100 |
44,709,544 |
6,679,256,433 |
0.012 |
98.31 |
95.24 |
46.25 |
EgPSC_10d_1 |
50,879,460 |
7,682,798,460 |
49,912,092 |
7,433,785,290 |
0.0232 |
98.79 |
95.88 |
41.48 |
EgPSC_10d_2 |
45,451,476 |
6,863,172,876 |
44,642,728 |
6,648,774,193 |
0.0234 |
98.73 |
95.67 |
43.67 |
EgPSC_10d_3 |
51,627,418 |
7,795,740,118 |
50,541,444 |
7,440,873,654 |
0.0234 |
98.72 |
95.73 |
45.1 |
EgPSC_20d_1 |
62,302,076 |
9,407,613,476 |
46,103,180 |
6,618,060,319 |
0.0231 |
98.78 |
96.06 |
48.29 |
EgPSC_20d_2 |
56,237,888 |
8,491,921,088 |
45,097,156 |
6,377,338,828 |
0.0232 |
98.76 |
96.02 |
48.01 |
EgPSC_20d_3 |
54,920,334 |
8,292,970,434 |
52,009,700 |
7,464,895,752 |
0.0233 |
98.77 |
95.88 |
45.28 |
EgPSC_40d_1 |
59,219,662 |
8,942,168,962 |
58,640,210 |
8,770,998,130 |
0.0235 |
98.66 |
95.6 |
47.32 |
EgPSC_40d_2 |
52,049,106 |
7,859,415,006 |
51,576,654 |
7,738,872,659 |
0.0237 |
98.61 |
95.46 |
48.24 |
EgPSC_40d_3 |
54,113,540 |
8,171,144,540 |
53,665,904 |
8,042,017,580 |
0.0238 |
98.58 |
95.35 |
47.31 |
EgPSC_80d_1 |
49,360,298 |
7,453,404,998 |
48,784,352 |
7,308,163,088 |
0.0239 |
98.52 |
95.23 |
47.27 |
EgPSC_80d_2 |
49,876,008 |
7,531,277,208 |
49,296,316 |
7,387,535,278 |
0.024 |
98.49 |
95.13 |
46.91 |
EgPSC_80d_3 |
50,275,646 |
7,591,622,546 |
49,770,288 |
7,457,638,729 |
0.0241 |
98.45 |
95.03 |
46.99 |
Average |
53,832,394 |
8,125,702,273 |
51,300,044 |
7,597,846,677 |
0.0232 |
98.46 |
95.09 |
46.44 |
Table 3
COG classification statistics of the consistently up-regulated differential expressed genes
Category |
Functional description |
Gene id |
Gene name |
Gene description |
Information storage and processing |
RNA processing and modification |
gene9743 |
EGR_01647 |
Transcription elongation regulator |
|
|
gene5441 |
EGR_05864 |
Potassium voltage-gated channel subfamily C member 3 |
metabolism |
Amino acid transport and metabolism |
gene8350 |
EGR_02924 |
Ornithine aminotransferase |
Information storage and processing |
Translation, ribosomal structure and biogenesis |
gene2299 |
EGR_09013 |
Epsin-1 |
|
|
gene11020 |
EGR_00111 |
NEDD8-conjugating enzyme UBE2F |
poorly characterized |
Function unknown |
gene7950 |
EGR_03318 |
Synaptic vesicle membrane protein VAT-1-like protein |
|
|
gene10887 |
EGR_00659 |
WD repeat-containing protein |
|
|
gene1175 |
EGR_10139 |
5'-tyrosyl-DNA phosphodiesterase |
|
|
gene7701 |
EGR_03587 |
Josephin-2 |
|
|
gene7722 |
EGR_03608 |
Cornifelin |
|
|
gene246 |
EGR_11073 |
Glycosyltransferase-like protein LARGE2 |
|
|
gene1459 |
EGR_09846 |
1,5-anhydro-D-fructose reductase |
Cellular processes and signaling |
Intracellular trafficking, secretion, and vesicular transport |
gene1432 |
EGR_09887 |
Platelet glycoprotein |
Table 4
Statistics on the expression of 10 consistently up-regulated genes
Gene ID |
Gene Name |
Gene Description |
TPM of different stages |
||||
0d |
10d |
20d |
40d |
80d |
|||
gene7373 |
EGR_03870 |
hypothetical protein |
1178.71 |
230.68 |
527.33 |
2007.29 |
4039.22 |
gene5859 |
EGR_05443 |
hypothetical protein |
11.67 |
10 |
23.96 |
199.76 |
395.56 |
gene2035 |
EGR_09260 |
hypothetical protein |
0.14 |
2.24 |
0.85 |
4.85 |
10.39 |
gene5066 |
EGR_06312 |
hypothetical protein |
1.71 |
8.72 |
2.01 |
78.61 |
154.78 |
gene9360 |
EGR_01986 |
hypothetical protein |
44.24 |
100.54 |
20.87 |
55.48 |
251.26 |
gene1117 |
EGR_10197 |
hypothetical protein |
5.91 |
53.94 |
13.79 |
907.96 |
1823.58 |
gene9359 |
EGR_01985 |
hypothetical protein |
40.05 |
71.03 |
14.37 |
46.97 |
190.86 |
gene8053 |
EGR_03148 |
hypothetical protein |
63.06 |
122.69 |
38.63 |
115.20 |
275.06 |
gene1432 |
EGR_09887 |
Platelet glycoprotein |
0.56 |
3.95 |
5.30 |
20.32 |
48.29 |
gene1839 |
EGR_09469 |
Dynein light chain 2 |
501.76 |
1340.03 |
248.19 |
1400.56 |
3242.24 |