De novo transcriptome sequencing of Argulus coregoni and Argulus foliaceus
Quality Control Processing
All samples shipped passed the commercial companies external initial QC (Bioanalyser), with RIN values exceeding 0.8. Transcriptome libraries were prepared by the firms using Illumina based kits and platforms. One library sequenced by Edinburgh Genomics (MiSeq platform; 150 base paired-end reads) the other by Thetagen (HiSeq2500 platform, 100 base paired-end reads).
De novo assembly and annotation
The MiSeq transcriptome sequencing for A. coregoni generated 11,265,959 150 base paired-end raw sequence reads (Table 1). Of these, 10,840,092 reads passed quality control and filtering, and were then merged into a single dataset, assembled using Trinity v2.1.1, and blasted to generate a de novo transcriptome assembly comprising 40,954 transcripts (contigs). The maximum transcript contig length was 9,791 bp, the mean contig length was 1,787 and the N50 value was 2,339 bp.
The A. foliaceus sequencing (HiSeq2500 platform) produced 88,255,979 150 base paired-end raw sequence reads (Table 1) with 84,256,934 passing quality control and filtering. The high-quality reads were merged into a single dataset, assembled with Trinity v2.1.1, and blasted to get total assembled contiguous sequences of 66,940 reads. The maximum transcript contig length was 17,078 bp, the mean transcript contig length was 1,842 bp and the N50 value was 2,573 bp.
The pre-existing TSA for A. siamensis (Accession PRJNA167720) and A. foliaceus (Accession: PRJNA293150) were also examined and the total number of reads, number of assembled contigs and N50 are summarised in (Table 1).
Table 1 Summary data generation and statistics for de novo transcriptome assembly of A. coregoni, A. foliaceus and A. siamensis, first two (bolded) columns comprise sequencing results for the current study.
Data generation
|
Argulus coregoni
|
Argulus foliaceus
|
Argulus siamensis⃰
|
Argulus. foliaceus⃰
|
|
Raw reads
|
11,265,959
|
88,255,979
|
77,759,443
|
52,725,850
|
|
High quality reads
|
10,840,092
|
84,256,934
|
|
|
|
Total contigs
|
73,164,334
|
123,272,467
|
50,396,610
|
16,894,535
|
|
Number of contigs
|
40,954
|
66,940
|
46,352
|
8,424
|
|
Largest contig (bp)
|
9,791
|
17,078
|
26,436
|
16,889
|
|
Mean contig (bp)
|
1,787
|
1,842
|
1,211
|
2,006
|
|
N50 value in bp
|
2,339
|
2,573
|
2,302
|
1,499
|
|
GC%
|
41.3
|
40.74
|
38.29
|
42.05
|
|
Illumina platform
|
MiSeq v2
|
HiSeq2500
|
HiSeq 2000
|
HiSeq 2500
|
|
Assembly software
|
Trinity v2.1.1
|
Trinity v2.1.1
|
Velvet/Oases
|
CLC-Genomics Workbench 7.5.1
|
⃰ NCBI Reference genomic resources. A. siamensis (Accession PRJNA167720) and A. foliaceus (Accession: PRJNA293150)
Gene ontology annotation
Gene ontology (GO) terms were retrieved for the annotated transcripts, with the longest sequence for each transcript selected. Following the use of BLASTp against the UniProt_Trembl (invertebrate) database, the maximum number of GO terms were identified. The GO analysis was applied both to the current datasets (A. foliaceus and A. coregoni) and to the pre-existing reference datasets from NCBI (A. siamensis and A. foliaceus). Table 2 summarises the GO distribution for the three Argulus species from the four transcriptome datasets. The GO terms were associated with transcripts for molecular function (MF), biological process (BP) and cellular component (CC). The GO category terms for A. coregoni were the same as those obtained for A. foliaceus. In the MF category, binding activity and catalytic activity were the most heavily represented. For BP the most represented terms were “transport” and “metabolic process” and for CC the most represented terms were “membrane” and “integral component of membrane”
Table 2 Summary of GO distribution for the three Argulus species.
Species
|
A. foliaceus
|
A. coregoni
|
A. siamensis⃰
|
A. foliaceus⃰
|
Total GO terms
|
251,456
|
183,125
|
179,710
|
42,361
|
Number of assigned transcripts
|
2,076
|
1,904
|
1,931
|
1,596
|
Number of cellular component terms
|
48,731 (19%)
|
36,103 (20%)
|
28,251 (16%)
|
8,263 (20%)
|
Number of molecular function terms
|
125,676 (50%)
|
91,041 (50%)
|
93,791 (52%)
|
20,952 (49%)
|
Number of biological process terms
|
77,049 (31%)
|
55,981 (30%)
|
57,668 (32%)
|
13,146 (31%)
|
|
|
|
|
|
|
|
|
|
⃰ NCBI Reference genomic resources. A. siamensis (Accession PRJNA167720) and A. foliaceus (Accession: PRJNA293150)
Overall, the GO distributions for the four transcriptomes of the three species (A. coregoni, A. foliaceus and A. siamensis) were highly similar (Table 2), and in the GO distribution charts of the first 10 clustered annotated proteins for the Argulus species (Fig. 2 & 3)
Identifying shared genes expressed in Argulus species
Venn diagrams (Fig. 4A and B) show the shared orthologous gene clusters among three Argulus species. The transcriptome results for A. foliaceus (135,679 transcripts, 57,928 unique CDS and 7,932 clusters) of this study and A. foliaceus with Accession: PRJNA293150 (8,424 transcripts, 8,424 unique CDS and 6,567 clusters) [15] were combined to check for shared proteins/genes. Out of 7955 clusters over both datasets, 6522 orthologous clusters and 6425 single-copy gene clusters were found (Fig. 4A). OrthoVenn analysis of the three Argulus species showed that 13,324 orthologous clusters were formed based on the protein sequences from the three species. The diagram shows that 6,674 gene clusters were shared by all three species (Fig. 4B).
Blasting specifically selected targets (trypsin, serpin, serine protease, cathepsin-L and aspartic protease, ferritin, cysteine protease, enolase, phospholipase, adenosine deaminase, apyrase, metalloprotease, thrombin inhibitor, venom serine protease) against the resulting shared clusters (6,674) gave a number of hits. For trypsin, 25 annotated clusters were identified and classified into six groups according to their functions. Serpin resulted in four clusters described as alaserpin (Swiss-Prot Hit database), all having the same GO annotation. Eleven clusters were found for serine protease and grouped into eight GO annotations. Only a single cluster each was identified for cathepsin (cathepsin-L), apyrase, aspartic protease (Aspartic protease 6), cysteine protease, enolase and thrombin inhibitor. Two types of ferritin were found with one being yolk ferritin. Two cluster proteins comprised phospholipase A2, enzymes that hydrolyse phospholipids into fatty acids and other lipophilic substances, combining 11 GO annotation functions. There were 17 clusters of adenosine deaminase as determined using Swiss-Prot Hit; each one having a different GO annotation function. By checking for matches to metalloprotease, 5 clusters distributed between three GO annotation functions were identified. Finally, searching for matches to venom serine protease resulted in 6 gene (protein) clusters with 3 different GO annotations (Table 3).
Table 3 Identification of targeted immunomodulator/infection candidates from Argulus spp.
Protein Class
|
Protein
|
# Proteins
|
Function(s)
|
References
|
Serine protease
|
Trypsin
|
25
|
Digestion and anti-haemostatic
|
[30,31]
|
Protease inhibitor
|
Alaserpin
|
4
|
Anti-coagulant and anti-complement activation
|
[32,33]
|
|
Serine protease inhibitor
|
2
|
Anti-coagulation, vasodilator
Anti-inflammation
|
[10]
|
Protease
|
Aspartic protease
|
1
|
Haemoglobin proteolysis
|
[34]
|
|
Cysteine protease
|
1
|
Anti-inflammatory
Haemoglobin digestion
|
[35,36]
|
|
Venom serine protease
|
8
|
Anti-coagulant, vasodilator, anti-inflammatory
|
[10][10][10][10][10][10][10][10][10][10][10][10][10][10][10][37]
|
|
Cathepsin-L
|
1
|
Anti-coagulant
|
[38]
|
|
Metalloprotease
|
4
|
Anti-haemostatic
|
[39]
|
Glycoprotein
|
Ferritin
|
2
|
Iron storage and transport - involved in homeostasis of iron during feeding
|
[40]
|
Metalloenzyme
|
Enolase
|
1
|
Degrades plasminogen, aids host penetration
|
[41]
|
Phospholipase
|
Phospholipase A2
|
2
|
Hydrolyses phospholipids (deactivates platelet-activating factor)
|
[42]
|
Purine metabolism enzyme
|
Adenosine deaminase
|
2
|
Vasodilator
Antiplatelet
|
[32,43]
|
Diphosphohydrolase
|
Apyrase
|
1
|
Anti-pain
Anti-inflammatory
Antihaemostatic
Platelet aggregation inhibitor
|
[32,43,44]
|
Serine protease inhibitor (serpin)
|
Thrombin inhibitor
|
1
|
Anticoagulant
|
[32]
|
3.1.5 Phylogenetic reconstruction
The findings of the phylogenetic analysis (Fig. 5) reflected those of Regier et al. [45] and confirms the position of the sequenced Argulus species with respect to the superclass Oligostraca.