Database search, sequence retrieval and structural analysis
In this study, A. thaliana and O. sativa Cobra genes was taken as reference and blasted against wheat genome. As a result, five set of orthologs were find out in wheat having maximum similarity and maximum coverage. All the sets of genes had COBRA conserved domain as shown in Table 1 that reconfirmed the orthologs genes. Biophysical properties were retrieved such as genes position, molecular weight, and number of amino acids, Iso-electric points, and Glycosylation sites. All the information was collected using different bio-informatics tools. The molecular weight all the cobra proteins was in range of 50–75 KDa. The number of residues were in between 429 to 461. The iso- electric points of all these set of cobra protein were in average of about 8. The number of Glycosylation sites were also in average of 8.
Subcellular localization and Karyotyping
Plasma membrane was the site of localization of the cobra protein synthesized from Cobra genes in wheat. Karyotype was constructed by searching all the genes in wheat genome by their respective positions. The predicted gene were present on chromosomes 2, 4 5, 6 and 7 as shown in Fig. 1. Each gene has their respective homeologs were present on sub-genome A, B and D. we found that the cytogenic location of each homeologs were not adjacent to each other, In case of chromosome 2 and 4, they were away from each other’s.
Table.1: Sets of genes and their orthologs having COBRA conserved domain with similarity and e value
Query SEq. Orthologs
|
Position
|
Similarity
|
Coverage
|
E-Value
|
CD-Domain
|
AtCOBL1
|
TaCOBL1A
|
2A: 133673912:133678910
|
88%
|
90%
|
0.01
|
Present
|
|
TaCOBL1B
|
2B: 575934027:575937341
|
83%
|
91%
|
0.5
|
Present
|
|
TaCOBL1D
|
2D: 491498233:491501692
|
91%
|
97%
|
0.2
|
Present
|
AtCOBL2
|
TaCOBL2A
|
6A: 600402485:600407022
|
76%
|
89%
|
0.3
|
Present
|
|
TaCOBL2B
|
6B: 690558034:690562573
|
82%
|
90%
|
0.1
|
Present
|
|
TaCOBL2D
|
6D: 454229171:454234205
|
74%
|
89%
|
0.4
|
Present
|
AtCOBL3
|
TaCOBL3A
|
4A: 540440598:540445318
|
81%
|
87%
|
0.01
|
Present
|
|
TaCOBL3B
|
4B: 83131612:83136786
|
79%
|
90%
|
0.01
|
Present
|
|
TaCOBL3D
|
4D: 56705992:56710599
|
85%
|
97%
|
0.4
|
Present
|
AtCOBL4
|
TaCOBL4A
|
5A: 588374977:588379739
|
88%
|
91%
|
0.3
|
Present
|
|
TaCOBL4B
|
5B: 574674518:574679254
|
87%
|
88%
|
0.2
|
Present
|
|
TaCOBL4D
|
5D: 467600940–467604428
|
80%
|
85%
|
0.1
|
Present
|
AtCOBL5
|
TaCOBL5A
|
7A: 89071769:89075439
|
89%
|
90%
|
0.5
|
Present
|
|
TaCOBL5B
|
7B: 36506027:36509655
|
95%
|
93%
|
0.01
|
Present
|
|
TaCOBL5D
|
7D: 87748276:87751869
|
91%
|
90%
|
0.01
|
Present
|
Exon-Intron structure, Gene Site, phylogenetic tree and domain analysis
Gene structures were drawn using online GSDS server. The number of introns, exons and length of UTRs upstream and downstream can clearly be seen in the structure as shown in Fig. 2.Twelve set of genes has similar structural pattern having same of number of exon and exon as well as their chromosomal locations but in case of TaCobra 5, there were only two exon and only one intron has highly different structural pattern. This also has shorter length as compare to other set of genes. Gene structures was also retrieved from softberry database in order to zoom out further and deep understanding and analysis.In these structures, Transcription starts sites (TSS), Transcription stop site, all internal exons and poly A tail can be seen clearly.
The evolutionary history was inferred using the Neighbor-Joining method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The evolutionary distances were computed using the JTT matrix-based method and are in the units of the number of amino acid substitutions per site. This analysis involved 82 amino acid sequences. All positions with less than 95% site coverage were eliminated, i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position (partial deletion option). There were total of 173 positions in the final dataset. Evolutionary analyses were conducted in MEGA The similarity index of Cobra protein with neighboring Cobra protein present in Arabidopsis, rice and maize were in range of 62–95% which shows their high confident similarity among them and their ancestral relations. Branches of clades of phylogenetic tree shows that they are all originated from a common ancestor.
Mack et al 2001, defined conserved domain as computational biologists define conserved domains based on recurring sequence patterns or motifs. Conserved domain of Cobra genes is COBRA (Accession no. pfam04833, Pssm-ID: 368147) which is consist of 167 amino acids. The residual sEq. of COBRA domain is:
YVAQVTIENYQPYRHIDNPGWNLGWEWAKKEFIWSMKGAQATDQGDCSYFKgLFSIPHCCEKRPTIVDLPPGAPYDTQVGNCCRGGVLLPPTQDPSKSVSAFQMQVGKAPPdNRTTLKPPQNFTIGAPGPGYTCGPPVRVSPTRFPDPDGRRTTQALATWQVTCNYS
Cis -acting elements analysis and detection of SNPs
First of all, transcription start sites of all the genes were retrieved and then extended back up to 2kb. This region was supposed to be the promoter region of the respective gene. TATA boxes and CAAT boxes were almost present in all the gen as shown in Table 3. Five set of cis acting elements responsive to drought stress were screened from previous literature as shown in Table 2. Among these elements, three elements WRKY, ABRE and DRE was repeatedly present in most of the genes as shown in Table 3. The sequence of desired DNA was search in genome of wheat (Triticum aestivum). Nineteen set of novel SNPs were detected. Amongst these, three SNPs were Intronic and three SNPs set were Exonic. Nine SNPs were present in upstream region while three set of SNPs were present in downstream region Table 4.
Table 2
Cis-Element
|
Conserved Cis Motif
|
References
|
ABRE
|
ACGTG/ACGT
|
(Hattori T. et al 2002)
|
DRE
|
CCGAC/RYCGAC
|
(Xue, G.,.2002.)
|
MYC
|
CANNTG
|
(Abe H, et al 2003).
|
MYB
|
CAACNA/CAACNG/TAACNG
|
(Abe H. et al 1997)
|
WRKY
|
TGAC
|
(Pan et al 2009)
|
Table 3
Gene IDs
|
Cis Acting Elements
|
Nos.
|
TraesCS2A02G175100
|
WRKY, ABRE
|
2
|
TraesCS2B02G406700
|
ABRE, DRE, WRKY
|
3
|
TraesCS2D02G386400
|
DRE, WRKY
|
2
|
TraesCS6A02G379800
|
ABRE, DRE, WRKY
|
3
|
TraesCS6B02G418400
|
ABRE, DRE, WRKY
|
3
|
TraesCS6D02G364400
|
DRE, WRKY
|
2
|
TraesCS4A02G231200
|
DRE, WRKY
|
2
|
TraesCS4B02G084800
|
DRE, WRKY,
|
2
|
TraesCS4D02G082700
|
WRKY ABRE
|
2
|
TraesCS5A02G392000
|
ABRE
|
1
|
TraesCS5B02G396900
|
WRKY
|
1
|
TraesCS5D02G401900
|
ABRE, DRE, WRKY
|
3
|
TraesCS7A02G136600
|
DRE
|
1
|
TraesCS7B02G037700
|
WRKY
|
1
|
TraesCS7D02G136500
|
DRE, WRKY
|
1
|
Table 4
Sr. No
|
Chr. No.
|
Positions
|
SNPs
|
Types
|
|
|
|
|
Ref. Sequence
|
Query Sequence
|
|
|
1.
|
chr2A
|
133674024
|
A
|
C
|
Downstream
|
2.
|
chr2A
|
133674066
|
A
|
G
|
Downstream
|
3.
|
chr2A
|
133678611
|
A
|
G
|
Upstream
|
4.
|
chr2A
|
133678612
|
G
|
A
|
Upstream
|
5.
|
chr2A
|
133678724
|
T
|
C
|
Upstream
|
6.
|
chr2A
|
133678798
|
C
|
G
|
Upstream
|
7.
|
chr2A
|
133678816
|
G
|
A
|
Upstream
|
8.
|
chr2A
|
133678859
|
G
|
C
|
Upstream
|
9.
|
chr2A
|
133678872
|
A
|
G
|
Upstream
|
10.
|
chr4B
|
83135305
|
T
|
C
|
Exonic
|
|
11.
|
chr4B
|
83135319
|
G
|
T
|
Exonic
|
|
12.
|
chr4B
|
83135516
|
T
|
C
|
Intronic
|
|
13.
|
chr4B
|
83136481
|
C
|
A
|
Downstream
|
14.
|
chr5A
|
588376735
|
C
|
T
|
Exonic
|
|
15.
|
chr5A
|
588377080
|
C
|
T
|
Intronic
|
|
16.
|
chr6A
|
600402608
|
G
|
A
|
Upstream
|
17.
|
chr6A
|
600402740
|
A
|
G
|
Upstream
|
18.
|
chr6A
|
600402753
|
G
|
C
|
Upstream
|
19.
|
chr6B
|
690561305
|
T
|
C
|
Intronic
|
|
Expression Profiling of TaCOBL genes
Expression profiling of gene helps in to understand its biological function. Hence we investigated the expression of TaCOBL genes in two wheat varieties In-qalab-91 and Seher − 06. TaCOBL predicted genes were successfully validated by PCR amplification by their respective designed primers. The desired and targeted product size were in range of 3 to 4kb, all the amplified PCR products were in range of 3 to 4 kb. All the genes were used to check the tissue specific transcript abundance levels by qRT-PCR in root and shoot. All the studied genes had relatively higher transcript level as compared to control for drought treatment. All the genes showed same pattern of expression except TaCOBL 4 which shows relatively higher transcript level as compared to other genes under drought stress .Together with these observations, TaCOBL genes showed obvious resistance against drought stress advocating that these genes are possible candidates for breeding drought resistant wheat cultivar.