Enrichment of Intrinsically Disordered Residues in Ohnologs Facilitate Abiotic Stress Resilience in Brassica Rapa


 Background: Arabidopsis thaliana and Brassica rapa shared a common evolutionary clade but Brassica species experienced an extra whole genome triplication (WGT) event compared with the model plant A. thaliana. This extra round of WGT confers B. rapa more abiotic stress resistant. The study aims to unravel how the consequences of whole genome duplication steer the variation in stress adaptation competency between the two species.Result: Comparing the duplication status between abiotic stress resistant (ASR) genes in the two species, significant increase in the number of paralogs in ASR genes of B. rapa than A. thaliana was found. Investigation on the proteomic features suggests that the ohnologs pairs in both species are more enriched with intrinsically disordered residues (IDRs) than other duplicated pairs but IDRs only in B. rapa have showed a significant positive correlation with functional divergence between the duplicated pairs. The functional divergence helps to mediate more stress adaptation functions in the ohnologs of B. rapa than that of A. thaliana. Moreover, domain ontology analysis has revealed that the new domains with stress functions are significantly more enriched in the ohnologs of B. rapa. Interestingly, majority of these stress tolerant domains are found to be present in the intrinsically disordered regions of the proteins. Statistical analysis along with these observations make it reasonable to speculate that IDRs expedite stress adaption potentiality in B. rapa by enriching more stress related functions and accommodating stress specific domains in ohnologs of this plant species.Conclusion: With the occurrence of WGT in the Brassica species the stress resilience features of B. rapa as compared to A. thaliana is less studied. This study unveils the role of intrinsically disordered residues (IDRs) of ohnologs in optimizing more stress resilience in B. rapa than A. thaliana. Thus, it will open new avenues in understanding the mechanism of succeeding stress adaptation potentiality in B. rapa over evolutionary time.

The functional divergence helps to mediate more stress adaptation functions in the ohnologs of B. rapa than that of A. thaliana. Moreover, domain ontology analysis has revealed that the new domains with stress functions are signi cantly more enriched in the ohnologs of B. rapa. Interestingly, majority of these stress tolerant domains are found to be present in the intrinsically disordered regions of the proteins.
Statistical analysis along with these observations make it reasonable to speculate that IDRs expedite stress adaption potentiality in B. rapa by enriching more stress related functions and accommodating stress speci c domains in ohnologs of this plant species.
Conclusion: With the occurrence of WGT in the Brassica species the stress resilience features of B. rapa as compared to A. thaliana is less studied. This study unveils the role of intrinsically disordered residues (IDRs) of ohnologs in optimizing more stress resilience in B. rapa than A. thaliana. Thus, it will open new avenues in understanding the mechanism of succeeding stress adaptation potentiality in B. rapa over evolutionary time.

Background
Brassica rapa, member of family Cruciferae (Brassicaceae) is one of the most important economic crops throughout the world since it is a major source of edible mustard oilseed. The Brassicaceae family is one of the largest plant groups comprising about 49 tribes, 321 genera, and 3660 species [1] and each species contains rich morphotypes showing extreme traits. Paleoploidization events are recurrent among owering plants, and the duplicate genes originated via such events contribute signi cantly to plant evolution. There are different mechanisms responsible for gene duplication such as (a) Whole Genome Duplication (WGD) where duplication takes place via increase in ploidy level, (b) Tandem duplication, which occurs via duplication of a gene through unequal crossing over between similar alleles, (c) Transposed duplication which is mediated through transposons mediated replicative transposition.
(d)Dispersed duplication (DSD) which occurs through unpredictable and random patterns by mechanisms that remain unclear, generating two gene copies that are neither colinear nor they lie by side [2]. Among these all mechanisms, polyploidy or whole-genome duplication have imparted much in the morphological and physiological diversity in the course of evolution [3,4]. While most of the plant's species underwent whole genome duplication there are many species with triplication such as Brassicaceae family. The Brassicaceae genomes have experienced two rounds of ancient whole genome duplication (WGDs) followed by an additional genome triplication in the Brassicaspecies approximately 14-24 million years ago [5,6]. Thus, Brassica species endured an extra whole genome triplication (WGT) event compared with the model plant Arabidopsis thaliana. Whole genome sequencing of the Brassica species demonstrated that WGT plays an imperative role in the speciation and morphotype diversi cation of Brassica plants [7]. Prior studies have revealed that paleoploidization not only surges gene expression and cell proliferation but also build up different cellular potentiality like development in cell signalling, homeostasis, and majorly stress resistance capability [8,9]. It was evident in several reports that many crop species increase tolerance to water de cit, temperature, or salt stresses after polyploidization [10][11][12]. Polyploid plants like Brassica rapa L. and black locust (Robiniapseudoacacia L.) showed improved acclimatization to salt stress than their diploids counterparts [13][14][15][16]. Also, tetraploid Dioscorea zingiberensis exhibited higher heat resistance than their diploids [17]. Now, the molecular mechanism behind the stress tolerance by duplicated genes is one of the thrust areas of research. It is inevitable that adaptation to stressful conditions requires modi ed expression of certain genes which is obtained by genome sequence changes. The duplicated genes undergo many modi cation or alterations in themselves such as (A) changes in the protein coding region: alternative splicing [18], gain or loss of exon-intron [19], single nucleotide change [20]; (B) alterations in the regulatory regions like changes in cis elements [21], small RNA binding sites [22], and methylation patterns [23]. The other major dynamic aspect was reported by [24] that impose a new dimension in the consequence of gene duplication relating to protein structural change. They have hypothesized that gain or losses of intrinsically disordered regions in the proteins of duplicated genes are responsible for functional divergence in duplicated genes [24]. Lack of stable 3D structure of intrinsically disordered regions (IDRs) entail them extreme exibility which often enable more interactions with proteins thereby could play multiple biological functions [25,26]. In different plant species like Zea mays, Glycine max, P. trichocarpa it was noticed that IDRs increased in paralogs after WGD or WGT event contribute to the stability of functions such as the catalytic activity of proteins, metabolic and transport processes, and molecular binding [27]. Henceforth, we keen to investigate the interplay between WGD and IDRs in manifesting more abiotic stress tolerance in Brassica rapa compared to A. thaliana.

Results
1. Gene duplication pro les in abiotic stress responsive genes in thaliana and B. rapa.
Gene duplication or Paleoploidization is thought to contribute in the evolution of ultmorphological and ecological diversity. It is a phenomenon which allows the genes to gain novel features by neofunctionalization and sub functionalization [3,4]. It has been reported earlier that the abiotic and biotic stress responsive genes were increased after whole genome duplication (WGD) [28]. In order to reinvestigate the role of gene duplication in increased stress (abiotic) tolerance in Brassica rapa than A. thaliana, we have compared duplication status between abiotic stress resistance (ASR) genes of A. thaliana with their corresponding orthologs present in B. rapa. We have noticed that proportion duplicates in stress resistant genes in B. rapa is signi cantly (P value = 0.00001, Z proportionality test) higher (73.47%) than the stress resistance genes in A. thaliana (52.09%). Moreover, the average number of paralogs of ASR genes are also found to be signi cantly (P value=0.0001, Mann-Whitney-U test) higher in B. rapa (3.3) than A. thaliana (1.8). These observations clearly re ect the fact that Brassica species underwent ve rounds of WGD whereas A. thaliana experienced two rounds of WGD [6,29]. Thus, this result suggests that duplication may play possible roles in maintaining the stress resilience attribute.
Formerly, it has been reported that different modes of duplication in uence the functional role in a biased way [30]. Therefore, in our study mode of duplication of the stress genes was derived and were classi ed into one of the four categories being derived from WGD (Whole genome duplication), Tandem duplication (TD), Transposed duplication (TRD) and Dispersed Duplication (DSD). The number of duplicate gene pairs for each category in each taxon was determined. Percentage of genes underwent different modes of duplication in A. thaliana and B. rapa is delineated in Figure 1. From the above gure it could be clearly depicted that the percentage of stress genes experiencing WGD is signi cantly (P value=0.0001, Mann-Whitney-U test) higher in B. rapa (23.5%) than that of A. thaliana (12.9%). Thus, WGD could be considered as a driving force for the evolution of stress adaptive genes from A. thaliana to B. rapa as also described by Rizzon [28].

Enrichment of intrinsically disordered regions in proteins encoded by duplicated pairs of ASR genes in thaliana and B. rapa
Ohnologs, the duplicates derived by whole genome duplication were reported to contain more intrinsically disordered residues than small scale duplicates [31]. It was also reported in plant system that the proteins of dehydrin family being almost completely disordered could involve in the response to drought and other environmental stresses [32]. Thus, we have analysed intrinsically disordered protein pro le of ASR coded genes in the two species. Consistent with the previous study [31], we have found that the intrinsically disordered residues (IDRs) are signi cantly more enriched in WGD derived duplicated proteins than tandem, dispersed and transposed duplicates in both species ( Figure 2). Consequently, it is also evident from our result that in both plants, duplicated ASR proteins are more frequently contain IDRs than singleton ASR genes ( Figure 3). Because of this fact, average disordered residues are also signi cantly higher in duplicated ASR encoding proteins compared to singleton ASRs in A. thaliana (Average = 0.20) and B. rapa ( Average= 0.27). Thus, it could be manifested from the above facts that WGD is associated with the enhancement of IDRs in proteins. If so, it could be assumed that ohnologs of B. rapa encode more intrinsically disordered proteins than A. thaliana since former is underwent three extra round of WGD. Consistent with our conjecture, we too found that 40.5 % of ohnolog pairs in B. rapa contain at least one intrinsically disordered region but in A. thaliana it is only 29.9 %. The difference is signi cant at P value<0.0001 in Z-proportionality test. Moreover, the average percentages of intrinsically disordered residues (IDRs) are signi cantly (P = 0.001, Mann-Whitney U test) high in the ohnologs of B. rapa compared to A. thaliana ( Figure 4). Moreover, we also retrieved a signi cant positive correlation between WGD derived paralogs number and content of intrinsically disorder residues in B. rapa(Spearman's' ρ =0.1547, P=2e-06) but not in A. thaliana. Hence it would be interesting to investigate how the IDRs in proteins acquired through WGD during evolution stimulate the stress adaptation potentiality in B. rapa than A. thaliana.
3. Functional divergence in duplicated pairs of ASR genes originated through whole genome duplication in thaliana and B. rapa Genome duplication (polyploidy) is a conventional phenomenon of plant evolution. It was proposed earlier that functional diversi cation of the surviving duplicated genes is one of the prime attributes for the long-term evolution of polyploids [33]. It is widely accepted that enrichment of intrinsically disordered residues in proteins could able to impose them diverse functional potentiality because of their high exibility in nature and IDRs are also reported to create functional divergence between ohnologs after WGD [24]. We have measured the functional divergence based on the Gene Ontology terms between the WGD derived duplicates in the two selected species. We have found that paralogs of 67.30% genes in B. rapa turned into functionally diverged after whole genome duplication event whereas it is only 50% for A. thaliana. The difference in the proportion of gene in these two species is also signi cant at 0.99 con dence level. The average functional divergence between the paralogous pairs in B. rapa is also found to be signi cantly (Mann-Whitney U test, P = 0.009) higher than the paralogous pairs in A. thaliana (B. rapa = 0.90; A. thaliana-= 0.88). Next, we checked the correlation between IDRs and functional divergence between duplicated pairs in our datasets and received a strong positive correlation between them in B. rapa but not in A. thaliana (Spearman's' ρ = 0.1458, P=6.539e-05).These results suggest that the increased disordered residues in duplicated pairs of B. rapa may play an important role in creating new functions in ohnologs which in turn impose more stress resistance potentiality in B. rapa compared to A. thaliana.
To give a detailed insight into the functions which are gained after WGD, we have performed functional enrichment analysis of ASR genes and their corresponding paralogs in B. rapa and A. thaliana. We have noticed that three additional stress related functions (response to heat, response to abiotic stimulus, response to temperature stimulus) are more successfully enriched in the paralogs of B. rapa than their ancestral genes ( Figure 5) but in case of paralogs in A. thaliana no such signi cant stress related functional enrichment is observed ( Figure 5).  Figure 6). Next, we have analysed domain ontology by dcGO to compare functional enrichment of unique domains present in A. thaliana and B. rapa duplicated pairs. Figure 7 has clearly depicted that stress related functions are more enriched in the domain which are solely present in B. rapa than the domains in A. thaliana. The domains in B. rapa which are annotated with the stress related functions are enlisted in Table 1. In previous research, the distribution of disordered regions in GATA-type binding Domain is responsible for manifesting functional speci city [35]. Therefore, we also analysed the domain content in disordered regions of the protein. The results we found was consistent with the previous results where the percentage of stress speci c domain in disordered regions (9/14=64.3%) was signi cantly (P=0.001, Z proportionality test) higher than the domain present in the ordered domain (5/14=35.7%) (Table1). The enrichment of stress related domains in the disordered regions of duplicated pairs provides an important cue that genome triplication in B. rapa helps to encode disordered residues which might help in their stress resistance.

Statistical analysis of IDRs and stress resistance potentiality of rapa
Here, we have noticed that whole genome gene duplication triggers the enrichment of intrinsically disordered residues in proteins. These IDRs mediate functional divergence in the ohnologs as well as confer the sites for enriching new domains. The functional divergence in the duplicated pairs and the new domains originated during evolution of A. thaliana to B. rapa contribute a lot for escalating stress tolerance potentiality in B. rapa. Thus, we intend to explore whether these four factors (no. of paralogs, functional divergence, IDR content, domain content) in ohnologs acts mutually inclusive way or they are independent in B. rapa. For this, we have categorized stress related genes in two classes-mono stress (associated with one stress condition) and poly stress (associated with more than one stress condition). Then, we have performed linear regression analysis with stress and stress regulatory four factors. We found that only IDRs could independently control stress resistance potentiality in Brassica rapa (Spearman's' ρ = 0.185, P= 5.943e-03).

Discussion
Comparative genome analysis have shown that A. thaliana shared a common evolutionary clade and underwent whole genome duplication around 20 MYA whereas the Brassica lineage went through whole genome triplication (WGT) [6,[36][37][38][39][40]. Gene duplication is the key source of genetic novelty originated during genomic evolution. This genetic novelty drives structural changes within the DNA which may lead to changes in the protein structure as well as its function. Recent study proposed that with the occurrence of this WGT, successive genomic rearrangements, divergence and speciation in Brassica species confer B. rapa more abiotic stress resilient than A. thaliana [41]. It will therefore be very interesting to explore the attributes imposed by gene duplication to mediate stress tolerance in plants. Comparing ASR genes of A. thaliana and their corresponding orthologs present in B. rapa, we have noticed that most of the ASR genes in B. rapa (82.3%) take part in gene duplication events. Along with that numbers of paralogs are signi cantly higher in B. rapa than their ancestral species. Subsequently, it is also evident from our result that the number of paralogs originated via WGD in A. thaliana (12.9%) is nearly half to the paralogs in B. rapa (23.5%) which re ects the effects of extra round of genome duplication in B. rapa. This result is also echoed in a previous study on A. thaliana hypothesized that many of the identi ed duplicated gene pairs evolving by WGD mechanism might function in cell wall integrity pathways, growth and stress tolerance [42]. Unlike small-scale gene duplications, whole-genome duplications copy entire pathways or networks, thus play an important role in the evolution of novel traits and increased biological complexity [43][44][45]. Previous corroborations illustrate that genomic recombination and chromosomal rearrangements during gene duplication would create exible or ductile regions in proteins which are termed as intrinsically disordered proteins [27,46] There is profuse evidence that disordered residues confer exibility to proteomes [47][48][49]35 most importantly they contribute to organismic plasticity by facilitating complex gene regulatory network dynamics and protein multifunctionalities [50][51][52][53][54][55][56][57][58]. To further explore the role of WGD events in stress tolerance of plant we have analysed disordered content of ASR gene paralogs originated through different modes of duplication i.e. WGD, tandem, transposed and dispersed. We noticed that IDRs content is explicitly enriched in ohnologs of A.thalina and B. rapa than any other paralogs ( Figure 2). By comparing average IDRs in the ohnologs of B. rapa and A. thaliana, we have con rmed that WGT events in B. rapa increased IDRs in their proteomes thereby may contribute to more successive adaptation to stress. To a rm this hypothesis, we have measured functional divergence in the ohnologs of both species, where we found that B. rapa ohnologs undergo more functional divergence than A. thaliana. Interestingly, we have observed that 76.03% of ohnologs having at least one disordered region (residues >30) are become functionally diverged and only 0.03 % remain functionally redundant after WGD in B. rapa and we also retrieved signi cant positive correlation between IDRs and functional divergence (Spearman's' ρ = 0.1458, P= 6.539e-05). However, in case of A. thaliana paucity in IDRs in their proteome leads to the deviation of the above fact in them. Moreover, we have illustrated that the functional divergence in the duplicated pairs of B. rapa indeed help in the enrichment of stress responsive functions in their paralogs ( Figure 5). Our result supports previous evidence on a distinguished NAC family protein in A. thaliana that stated the differences in IDR region contributed to functional divergences of NAC92 and NAC59 which are involved in senescence, salt stress responses, and lateral root development [35,59]. Now, protein functions achieved via protein-protein interaction which is invariably mediated by protein domain. The structure-function continuum concept i.e. unique biological functions of proteins require unique 3D-structures dominated scienti c minds for more than a century. However, the growing understanding of the signi cant functional role of IDRs has challenged the year-old concept, proving indisputably that structure-less IDPs/IDPRs are functional, being able to engage in biological activities and perform impossible tricks that are highly unlikely for ordered proteins. Presence of IDRs in the binding domain of singlish interface hub proteins for pursuing diverse functionality had also been reported in few years back [60][61][62]. We have studied the domain content in ordered and disordered region of proteins in the ohnologs of B. rapa and A. thaliana. We found that the number of total domain content (present in ordered and disordered regions) per gene is signi cantly high in B. rapa than A. thaliana. Computing domain enrichment analysis, we evidenced that 181 new domains are gained in B. rapa which were entirely absent in their orthologs in A. thaliana ( Figure 6). Thus, we could infer that these domains are acquired after WGT events in B. rapa. Interestingly, analysing the ontology of domains present in the ordered and disordered region, we noticed that stress related functional domains are signi cantly more enriched in disordered regions. These results provide signi cant evidence on the origin of IDRs by evolution and their implications for stress tolerance. Now, to prove the stress responsive roles of IDRs statistically, we have performed partial and regression analysis in Brassica rapa by categorizing the stress genes according to number of stress factors into poly stress (involve in more than one stress response) and mono stress (restricted in one stress response Paralogous genes having at least 60% sequence identity are selected for the study. The coding and amino acid sequences of the stress genes and their paralogs are extracted from Ensembl. The abiotic stress genes were categorized into singleton genes without any or with only one paralog and duplicated genes with more than one paralogs.

Identi cation of Duplication Mode
Plant Duplicate Gene Database (PlantDGD, http://pdgd.njau.edu.cn:8080) was used to categorise genes into different modes of duplications as WGD, Tandem, Transposed and Dispersed. Duplicated gene pairs list were downloaded from the database and were matched with the abiotic stress genes and then catagorised.

Functional Divergence and Functional Enrichment Analysis in duplicated genes
The GO annotation for B. rapa and A. thaliana ohnologs were obtained from the ENSEMBL Biomart(http://plants.ensembl.org/biomart/martview). Among the three top GO categories: molecular function (MF), biological process (BP),and cellular component (CP), only the Biological process were considered for functional divergence analysis. Functional divergence was determined using Czekanowski-Dice distance formula.
Functional distance (i, j) = [Number of(Terms (i) Terms(j)) /[Number of(Terms(i) Terms(j)) Number of (Terms(i) Terms(j))] Here, i represents GO terms of individual plant genes, j represents GO terms of the paralogous genes, Δ corresponds to the symmetrical between the GO term sets of two genes, ∪ and ∩ represents the nonredundant and common GO terms, respectively. Functional enrichment analysis was done by Gene Ontology Annotation Database (ref). We have used corrected P value < 0.05 as the threshold to determine signi cant overrepresentation of certain GO terms.

Identi cation of Intrinsically Disordered regions
The prediction of intrinsic protein disorder was carried out by the IUPred (https://iupred.elte.hu/) algorithm using the option "long disordered regions". We choose IUPred algorithm for disorder prediction as IUPred didn't train on any speci c dataset and hence it offers an unbiased estimate of disorder score [63]. IUPred assigns disorder score based on the pairwise interaction energy score for each amino acid [64]. Residues having a disorder score above threshold value 0.5 were considered as disordered residues.

Prediction of stress related functional domain enrichment in the duplicated genes
Pfam domains analysis was performed using Pfam 32.0 database( https://pfam.xfam.org/)which uses the basics of HMMER (v 3.0). Then t obtained he Pfam ids were search with dcGOdatabase (http://supfam.org/SUPERFAMILY/cgi-bin/dcenrichment.cgi) for the enrichment analysis of domain centric gene ontology. To evaluate signi cant enrichment of certain GO terms in the identi ed domains, P values were calculated based on Poisson function. Finally, we used P value < 0.05 as the threshold to determine signi cant overrepresentation of certain function speci c domain.

Statistical tests
Online Z-test was carried out for proportionality test between two groups. All other statistical analyses and Heatmap were performed using R.

Declarations
Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials Additional le 1: Dataset of abiotic stress resistance genes of A. thaliana and their corresponding orthologs playing similar functions in B. rapa.

Competing interests
All authors did not have any con ict of interest.

Funding
The research discussed in this paper is funded by Swami Vivekanand Merit-Cum-Means Scholarship conducted by Government of West Bengal.
Authors' contributions SP designed the work, SDL executed the whole work, DS helped in several analyses, SP and SDL wrote the manuscript, TCG helped in critical reading of manuscript.