Cytochrome b Gene as a potential DNA Barcoding Marker in ecoraces of Tropical Tasar Silkworm Antheraea mylitta Drury

: Background: Tropical Tasar silkworm Antheraea mylitta Drury, from an economic prospect of view bears a very important role in development of rural community and tribal community of India, through silk production. Since the population of this species has spread over many states with varying environmental factors, the population has isolated into many pockets of population, specifically adapted to that niche, which are called “Ecoraces”. Even though phenotypically these ecoraces may look similar to some level but intra-specific genotypic variability between them affects the yield trait of their cocoons. The purpose of our study was to use Cyt-b gene of mitochondrial genome as a potential DNA barcoding marker for identification of these ecoraces, which is the first report for these ecoraces. Methods and Results: For the amplification of mitochondrial cytochrome b gene, 12s rRNA Specific Primer Cytb_mcb 398 and Cytb_mcb 869 were used. The Neighbour-Joining tree was generated using MEGAX with Kimura 2 Parameter molecular evolutionary model. Statistical support for the nodes and the internodes were calculated using bootstrap analysis with 1000 replicates. Maximum likelihood phylogeny analysis was carried out in MEGA X adjusted to 1000 bootstrap replicates under Tamura-3-Parameter. The strongness of the clade was assessed using bootstrap analysis. The parsimony tree was also created using the same MEGA X software. The genetic distance, gene flow and relation among the ecoraces was obtained. Conclusion:


INTRODUCTION:
The Tasar silk is obtained from the cocoons of Tasar silkworm, semi-domesticated wild type of silk producing variety, belonging to the genus Antheraea (Family:Saturniidae).To date, almost 35 species of Antheraea have been found to be producing wild silk and out of those species, only three are exploited for silk production in India, namely Antheraea mylitta Drury (commonly known as Tropical Tasar silkworm), Antheraea paphia Linn.(Commonly known as South India small Tasar silkworm), and Antheraea assamensis Helfer (Muga Silkworm).A. mylitta Drury and A. paphia Linn.are mainly responsible for Tasar silk production whereas A. assamensis Helfer responsible for the Muga silk (Golden silk) production.Antheraea mylitta Drury distribution is spread over Bihar, Jharkhand, West Bengal, Odisha, Andhra Pradesh, Chhatisgarh, Madhya Pradesh and Maharashtra [1].Since this species is subjected to a wide range of climatic and environmental conditions, the whole population has differentiated itself into many pockets of population, these pocket populations are termed as "Ecoraces".
These ecoraces exhibit both phenotypic and genotypic variation among themselves, owing to the polyphagus nature of the insect, the location of ecozones, etc.The variation in their genetic nature is an important factor for conservation of the biodiversity and gene pool of the animal.CSRTI, Ranchi has been conducting explorative surveys in 17 states of India namely, Kerala, Karnataka, West Bengal, Odisha, Jharkhand, Chhattisgarh, Andhra Pradesh, Himachal Pradesh, Nagaland, Assam, Meghalaya, Manipur, Madhya Pradesh, Maharashtra, Uttar Pradesh, Rajasthan and two of the Union Territories named Dadar Nagar Haveli and Delhi, for the estimation of number of ecoraces of A. mylitta Drury and they have found that till date a total 44 ecoraces have been found [2,3].
For a planet consisting of more than 10-15million species, the taxonomical identification is one of the most difficult tasks for new species identification and existing species classification.In many species, the phenotypic plasticity and genetic variability in characters and morphological characters being restricted to only a particular life stage of the animal, has led to many mis-identification of the species.To avoid such mistakes, for a long time, scientists are trying to implement genetic approaches for the species identification.Genomic approach to classify organisms to exploit the biodiversity among DNA sequences to identify the species [4,5].One of such approach is DNA Barcoding technique.
According to International Barcode of Life (IBOL) DNA Barcoding refers to identification of specimens or species, using short and standardised sequences.
The DNA Barcoding technique is somehow similar to the Universal Product Codes (UPC), which are used for every retail product that is sold online and in stores.The UPC employ 10 alternate numerical (0-9) at 11 positions to generate 100 billion unique identifiers.But if we will look at genomic barcodes, they have four alternative nucleotides at each position, but the genetic site available for the inspection is huge.The survey of just 20 of such nucleotide position creates, the possibility of 4 20 means more 109 billion barcodes, which will provide us more than sufficient number of codes required for the taxonomic classification of the species.The gene sequences those have been standardised by scientists for such genetic analysis are known as "Molecular Markers".
Mitochondrial DNA has been widely used as a marker in evolutionary biology and ecology studies.These mt-DNA are considered great for genetic marker because of their maternal inheritance, near-neutrality [6], technical ease of use, lack of recombination, and rapid evolutionary rate compared to nuclear DNA [7].This is often chose primarily to study evolutionary biology in phylogenetic studies.
The insect mitogenome is circular and double stranded, consisting of 15-18kb in length [8] and 37 genes including 13mitochondrial protein-coding genes (MPCGs), 2 ribosomal RNA genes, and 22 t-RNA genes, and non-coding CR genes which participate in in control of replication and transcription [8,9].The 13 protein coding genes (i.e ND1, ND2, ND3, ND4, ND4L, ND5, ND6, COI, COII, COIII, ATP6, ATP8 and Cytochrome b) in mitochondrial genome are better targets for the Barcoding purpose because the indels are rare since most lead to a shift in the reading frame [10].Even though COI gene has proven itself very useful for the Barcoding purpose, in many cases cytochrome b (cyt-b) gene has proven to be very useful as well.In our current work we have used cyt-b as our DNA Barcoding marker.
A. mylitta mainly resides in forests which are filled with its host plants such as Shorea robusta, Terminalia arjuna, Terminalia tomentosa etc.Many ecologists believe that the continuous anthropogenic activity and habitat fragmentation led to the isolation of the main population into many small population (now known as ecoraces) which originally may have had a single long stretch of population.The most interesting fact about these ecoraces is there is no specific boundary between these ecorace population and they don't obey the concept of Static boundary, for which the identification of the ecoraces have always been a difficult task among the ecologists.To overcome this problem, in our current work we have used Cyt-b as a DNA barcoding marker for five ecoraces namely Modal, Nalia, Jata-Daba, Sukinda and Boudh of Odisha.
This gene have been proven to be very useful for identification of vertebrate species and their conservation in Taiwan [11].Many of the studies have been done on vertebrate classes to estimate the mutation rate and phylogenetic relations between them using Cyt b gene [12][13][14].In insect class two suprageneric studies have been done on Parasitic wasp family [15] and among basal families of Beetles [16].Simons and Weller had conducted research to understand the evolution and utility of the Cytochrome b in insects in 2001, where they had included 4 lepidopteran species; Wasp moth or Hyalaceria gigantean (Arctiidae), Ghost moth or Hepialus dongyuensis (Hepialidae), and two butterfly species Aricia artaxerxes (Lycaenidae), and Euphydryas aurinia (Nymphalidae).In some of the works cyt b has proven more useful in identification of vertebrate species as compared to COI gene.

Collection of Specimen:
All the samples of the ecoraces have been collected from their respective ecozones (refer: Table 1.1).

DNA Isolation:
In our current study we have taken Cytochrome b or Cyt-b gene as our DNA molecular maker for analysing the Intra-specific and Inter-specific relations between these ecoraces of Antheraea mylitta Drury and other lepidopterans.For genetic analysis of the individual ecoraces of A. mylitta Drury, Genomic DNA from individual ecoraces were isolated using Quick-DNA Fungal/Bacterial Miniprep kit by Zymo Research, according to the Manufacturer's Protocol.The DNA was eluted with nuclease free water.The amplified DNA was separated by electrophoresis in 0.8% agarose gel run in 1x TAE buffer at 50V for 30 to 45 minutes till DNA fragments are migrated well.The gel was photographed on gel documentation system.The concentration of each DNA sample was measured by Nanodrop (Biotech instruments, USA).The purified DNA was then stored at -80˚C for further use.

PCR Amplification and sequencing:
For the amplification of mitochondrial cytochrome b gene, 12s rRNA Specific Primer

Sequence Editing and Sequence Data Analysis and phylogenetic studies:
The obtained Cytochrome b sequences were edited and validated using MegaX software.The assembly of the sequences were done by Clustal X2.To detect the similarity between the sequences, BLAST search was performed.After the proper alignment, the ends of the sequences were trimmed and submitted to GenBank, from where GenBank access code was obtained.Analysis of the obtained sequences like Nucleotide frequencies, Nucleotide pair sequences, pairwise distance, Transition/Transversion ratio, Overall Transition/Transversion Bias (R), and Nucleotide substitution per site was calculated using Maximum Composite Likelihood parameter in MEGA X programme.The Intraspecific nucleotide distance between the five ecoraces was also calculated by the same programme.
To understand the Intra-specific phylogenetic relationship between the ecoraces of A. mylitta Drury, and the Inter-Specific phylogenetic relationship between the ecoraces of A. mylitta Drury and other lepidopteran species, we constructed 6 phylogenetic trees based on the Mitochondrial Cyt b sequences obtained from the sequencing.These trees were generated using Neighbour-Joining Method, Maximum Likelihood criterion in MEGAX and Bayesian method.The Neighbour-Joining tree was generated using MEGAX with Kimura 2 Parameter molecular evolutionary model.Statistical support for the nodes and the internodes were calculated using bootstrap analysis with 1000 replicates.Maximum likelihood phylogeny analysis was carried out in MEGA X adjusted to 1000 bootstrap replicates under Tamura-3-Parameter.The strongness of the clade was assessed using bootstrap analysis.The parsimony tree was also created using the same MEGA X software.

RESULT:
After the PCR amplification, a ~450bp length amplicon of Cyt b gene was amplified.This Cyt-b was tested for an appropriate DNA Barcoding marker for A. mylitta Drury ecoraces identification.The obtained sequences of all the five ecoraces have been submitted to the GenBank.The following Id of the sequences are mentioned in Table 3 biases than expected based on evolutionary divergence between sequences and by chance alone.This analysis involved 5 nucleotide sequences.Codon positions included were 1st+2nd+3rd+Noncoding.There was a total of 477 positions in the final dataset.The difference in base composition bias per site between sequence pairs [17] is shown in table 3.4.This analysis involved 5 nucleotide sequences.Codon positions included were 1st+2nd+3rd+Noncoding.The maximum composite likelihood for Estimation of the Pattern of Nucleotide Substitution was calculated with complete deletion option using Tamura-Nei model in MEGA X software [18].
There were a total of 452 positions in the final dataset.The nucleotide frequencies are found to be 31.86%(A), 42.61% (T/U), 15.84% (C), and 9.69% (G).For simplicity, the sum of r values is made equal to 100.The rate of interchange between purine and pyrimidines or the translational substitute rate is more when compared with rate of interchange between purine and pyrimidine i.e overall transversional substitutes in Cyt b gene.The overall transition/transversion bias is R = 0.051%.The transition/transversion rate ratios have been found to be for purines k1=0.025 and for pyrimidines k2=0.173.This analysis included five nucleotide sequences and the codon positions included 1st+2nd+3rd+Noncoding.The evolutionary history was inferred using the Neighbour-Joining method [19].The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed [20].Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test with 1000 replicates are shown next to the branches.The evolutionary distances were computed using the Kimura 2-parameter method [21] and are in the units of the number of base substitutions per site.This analysis involved 5 nucleotide sequences.All positions containing gaps and missing data were eliminated with complete deletion option.

DISCUSSION:
The phylogenetic analysis of these five ecoraces showed that Nalia, Modal and Jata-Daba are near to each other.Boudh and Sukinda ecorace Cyt b sequence showed less variance in between their sequence.Molecular phylogenetic trees made by N-J method and Maximum -Likelihood method also showed similar kind of clustering among the ecorace sequences.In the current study we have used only the ecoraces of the same species hence there are no out-group clustering in the tree.In the N-J method tree Nalia and Modal ecorace, and Jata-Daba and Sukinda ecorace showed highest similarity between them, where-as the origin of Boudh is believed to be similar and of the same origin to the other two cluster.This origin and the gene flow can be explained by ecozones of the ecoraces and distance between them.The ecoraces that have ecozones nearer to each other, seems to have higher genetic flow between them and hence more genetic similarity, whereas the ecozones distant from each other tend to have less genetic outflow between them.If we look at the geographic availability of the ecoraces Modal and Nalia are available in the different regions of Similpal Biosphere Reserve, hence the chances of gene flow increase more as compared to Boudh ecorace ecozones which is in Sunabeda Forest Range of Koraput, a place farther away from Similpal.The only explanation that we have not been able to understand is the gene flow relation between Jata-Daba and Sukinda.Jata-Daba ecorace being native to Similapal nearby regions like Thakurmunda, but showing highest gene-flow with Sukinda which is native to Sukindagarh of Jajpur district is still something that we have not been able to understand.The two possible explanations for this dilemma can be either while distributing or in natural conditions, Jata-Daba and Sukinda ecoraces have somehow got mixed up with each other or, one of these two ecorace isn't truly an ecorace rather just another branch population of the main population.

CONCLUSION:
Our study is the first study to compare the relative genetic distance and relation among the ecoraces of A. mylitta Drury in Odisha.In many of the studies carried out by researchers have showed that Cyt b and COI gene shows similar kind of results in the identification process [22].Currently in our study we have used Cyt b gene to test it as a potential barcoding marker for A. mylitta Drury and use it ecoraces identification for this species.
Previously many Marker such as ISSR (Inter Simple Sequence Repeat) Markers [23], Repetitive Taq1 genomic marker [24], RAPD (Random Amplification of Polymorphic DNA) Markers [25], Microsattelites markers [26] and SSR (Simple Sequence Repeats) Markers [27] had been used for ecorace identification, but using mitochondrial gene Cyt b as ecorace identification is first attempt by us.What we have found in our study is similar to that of Simmons and Weller (2001) where they had found that A/T bias (A=41%, T=45%) is more in Insect Cyt b gene [28], and in our sample A and T on average have been found to be 31.86%and 42.61%.The phylogenetic relationship between the ecoraces have shown that Modal and Nalia show highest similarity which means the gene flow between the ecoarces are more since they belong to nearby regions in the Similapal Biosphere reserve.Whereas Boudh and Sukinda shared less genetic similarity between them as compared to Modal and Nalia, since the ecozones of these two ecoraces are farther away from each other.By understanding these ecoraces and their ecozones we might will be able to save the ecoraces which are on the verge of extinction, because of the overexploitation of certain hybrids or good yielding ecoraces.Not only that, we can also strengthen the ecorace line by interbreeding or inbreeding and save a lot of financial loss by the rearer community.This study also strengthens the belief of using Cyt b gene as a barcoding marker, which can be useful in identification of silk rearing lepidopterans and others.

Nalia
were used.The amplification was carried out in Veriti® 96 well thermal cycler, with 25μl containing 10pmol of each forward and reverse primers, 2.5mM of MgCl2, 200Μm of each of the four dNTPs (deoxy RiboNucleotide Triphosphates), 0.5U of Taq polymerase enzyme, 1x of PCR Buffer (Invitrogen, Life Technologies, Brazil) and 50-100ng of isolated genomic DNA.The template was denatured by heating at pre-denaturation of 95 °C for 5 min.This was followed by 39 cycles of denaturation 30 sec at 95 °C, 45 sec annealing and 1 min elongation at 72°C, with a final extension of 7 min at 72°C.The amplicons were resolved in 1.5% agarose gel using 0.5x tris acetate-EDTA (TAE) buffer.A single discrete PCR amplicon band of ~450bp was observed.The PCR amplicon was bead purified and further subjected to Sanger Sequencing.Bi-directional DNA sequencing reaction of PCR amplicon was carried out with cytb_mcb 398& cytb_mcb 869 primers using BigDye™ Terminator v3.1Cycle sequencing kit on Biosystem Analyser 3500Dx Genetic Analyzer.

Fig 3 . 1 :
Fig 3.1: Phylogenetic tree made with Boot strap consensus .1.The partial Cyt B gene obtained from the PCR has high A and T content.The average T% range up-to 42.2% with an average of 469.4bp gene length in all the ecoraces.The average A, T, C and G frequencies in the Cyt b gene sequences have been estimated to be 42.2, 16.1, 31.6 and 10.1 respectively.The detail of nucleotide frequency (in percentage) of ecoraces has been mentioned in Table3.2.The alignment of the obtained sequences was done by MEGA X software.The result reported that the consensus sequence between the ecoraces have been found to be more as compared to reserved sequences.Out of average 477bp, 450bp are conserved, 50bp are variable, and 11 are singleton sites.Likewise parsimony informative sites have been found to be 39.The codon usage analysis showed that the five ecoraces sequences have total average of 154 codon, and highest 19.6% of it codes for Leucine by UUA(L) 19.6(3.79)whereas no codon was found for cysteine.Disparity Index per site is shown for all sequence pairs in table 3.3.Values greater than 0 indicate the larger differences in base composition

Table 3 .
3: Estimation of Net Base Composition Bias Disparity between Sequences

Table 4 :
Estimation of Base Composition Bias Difference between Sequences

Table 5 :
Maximum Composite Likelihood Estimate of the Pattern of Nucleotide Substitution