Homoeolog gene expression analysis reveals novel expression biases in upland hybrid cotton under intraspecific hybridization

Hybridization is useful to enhance the yield potential of agronomic crops in the world. Cotton has genome doubling due to the allotetraploid process and hybridization in coordination with duplicated genome can produce more yield and adaptability. Therefore, the expression of homoeologous gene pairs between hybrids and inbred parents is vital to characterize the genetic source of heterosis in cotton. Investigation results of homoeolog gene pairs between two contrasting hybrids and their respective inbred parents identified 36853 homoeolog genes in hybrids. It was observed both high and low hybrids had similar trends in homoeolog gene expression patterns in each tissue under study. An average of 96% of homoeolog genes had no biased expression and their expressions were derived from the equal contribution of both parents. Besides, very few homoeolog genes (an average of 1%) showed no biased or novel expression in both hybrids. The functional analysis described secondary metabolic pathways had a majority of novel biased homoeolog genes in hybrids. These results contribute preliminary knowledge about how hybridization affects expression patterns of homoeolog gene pairs in upland cotton hybrids. Our study also highlights the functional genomics of metabolic genes to explore the genetic mechanism of heterosis in cotton.


Introduction
Hybridization or polyploidy is a vital process, which produces exceptional phenotypes through the interaction of two distinct genomes, and has a substantial influence on the plant genome (Dufresne and Hebert 1994;Yoo et al. 2013). The novel phenotypes in plants can be a consequence of many modifications not only at genomic and epigenetic levels but also in gene expression patterns (Doyle et al. 2008;Kashkush et al. 2003). Several examples in cotton , Brassica napus (Gaeta et al. 2009), and Spartina (Chelaifa et al. 2010) stated that hybridization (genome merger) modified more transcriptome than polyploidy. About 15% of flowering plants are polyploidy (Wood et al. 2009) and divided into categories of autopolyploid and allopolyploid. Autopolyploids evolved from a genome merger of the same or similar species but allopolyploids contain a genomic combination of two different species (Doyle et al. 2008). Allopolyploids have enhanced growth and wider adaptations relative to autopolyploids. Many important crops are allopolyploids together with upland cotton (Gossypium hirsutum). In allopolyploids, hybridization or polyploidy Kashif Shahzad, Xuexian Zhang, and Meng Zhang contributed equally to this work. is allied with various kinds of exchanges and interactions between homoeolog genes (duplicate genes) (Kovarik et al. 2008;Salmon et al. 2010;Szadkowski et al. n.d.). Further modifications are consisting of epigenetics (Doyle et al. 2008;Madlung et al. 2002), gene activation (Kashkush et al. 2003;Kraitshtein et al. 2010), and sequence removal (Anssour et al. 2009;Han et al. 2005). The result of different genomic interactions by hybridization or polyploidy produced several expression patterns of duplicate or genes with similar functions. For instance, these may be gene silencing, gene activation, novel expression, and altered parental contribution (Yoo et al. 2013;Li et al. 2014). However, the direction and magnitude of duplicate gene expression are varied with crop, tissue, and organ (Adams et al. 2003;Bottley et al. 2006;Buggs et al. 2010;Chaudhary et al. 2009).
Previous studies have identified four different expression forms of duplicated genes in plants, e.g., one duplicate gene copy may be lost its expression pattern as compared to other (De Smet et al. 2013), and duplicate gene copies may maintain original expression (McGrath et al. 2014), one duplicate gene copy may adopt new expression (Hughes et al. 2014;Liu and Adams 2010), and sometimes original expression is the equal contribution of duplicates (Kleinjan et al. 2008). Importantly, the duplicate gene pairs may show homoeolog expression bias in allopolyploid (Grover et al. 2012). Where one homoeolog displays preferential expression to the transcriptome as compared to the other. The phenomena of homoeolog expression have already been stated for allopolyploids like cotton Hovav et al. 2008;Samuel Yang et al. 2006), wheat (Bottley et al. 2006;Mochida et al. 2004), Arabidopsis (Chang et al. 2010;Wang et al. 2004), and brassica (Auger et al. 2009;Gaeta et al. 2007). The analysis results in cotton showed that hybridization caused more homoeolog expression changes and fewer changes in gene expressions (Yoo et al. 2013). In contrast, polyploidy formation is linked with more gene expression modifications and fewer changes in homoeolog expression. A study in cyprinid fish also reported hybridization brought more homoeolog expression changes than polyploidy (Ren et al. 2016). Homoeolog expression can be either balanced or unbalanced (Grover et al. 2012). When duplicate gene pairs display biased expression towards one parental genome is almost equal to the biased expression towards other parental genomes. It shows balanced homoeolog expression. On the other hand, an unbalanced homoeolog expression change states a situation where more duplicate gene pairs demonstrate expression twisted towards one parental genome. An unbalanced form of homoeolog expression is mostly reported in allopolyploids (Akhunova et al. 2010;Schnable and Freeling 2011;Wang et al. 2006).
Upland cotton is a prime fiber, an important source of edible oil, and cash crops in the world (Shahzad et al. 2019b). The demand for cotton fiber in the textile industry is increasing. While supply is reducing due to low area of cultivation and staggered yield potential of cultivars. Heterosis is and will continue to be a basic breeding tool to produce superior genotypes (Birchler et al. 2003). Many researchers reported meaningful heterosis in seed cotton yield and fiber quality traits in upland cotton (Ali et al. 2016;Li et al. 2019). Furthermore, cotton genotypes produced through the utilization of heterosis exhibit more stability and adaptability to changing the environment (Shahzad et al. 2019a). The genetic mechanism of heterosis is a perturbed and socalled genetic model of dominance (Xiao et al. 1995), over dominance (Shahzad et al. 2020b), and epistasis (Yu et al. 1997) does not reveal it. However, progress has already been achieved in the recent few years with advanced molecular research. For instance, high-throughput transcriptome sequencing is a useful technology to get accurate knowledge about global gene expressions (Shahzad et al. 2020a;Wang and Gerstein 2009). This also creates a possibility to discriminate the expression of homologous genes (Wu et al. 2018). To date, many previous studies in agronomic crops analyzed patterns of homoeolog expression in allotetraploids and their diploid progenitors, e.g., cotton (Yoo et al. 2013) and brassica (Li et al. 2020). However, information regarding how hybridization effect homoeolog expression changes in newly formed upland cotton hybrids is limited. Thus, we conducted a study to investigate the effect of hybridization on homoeolog gene expression modifications on a genome-wide scale. The direction and extent of homoeolog gene expressions between contrasting F 1 hybrids and their reciprocal parents were analyzed by RNA-seq of leaf, root, and flower buds. The result will provide new insights into how homoeolog gene expressions contribute to growth heterosis in upland cotton and enhance our genetic knowledge for hybridization process.

Plant material and RNA sequencing
The plant material used in this study included two seed cotton yield contrasting F 1 hybrids of upland cotton and their three reciprocal inbred parents named A, B, and D. The high yielding hybrid (H) was a cross of inbred A and B, whereas a low hybrid was a combination of cross A and D. The seedling was raised in the lab for all these materials, root and leaf samples were picked at true leaf stage, and flower buds were picked at squaring stage from the field in three biological repeats. Total RNA was isolated using the Sigma Spectrum Plant Total RNA kit (Sigma-Aldrich, USA) as described by the manufacturing company. After quality checks and preparations, paired-end sequencing (300 ± 50 bp) on a secondgeneration Illumina Hiseq 4000 was performed for hybrid H and L. These sequencing details can be seen at NCBI with accession number GSE144676 (Shahzad et al. 2020b) and GSE150052 (Shahzad et al. 2020a). The full-length Pacbio transcriptome sequencing (Unpublished) for parental A, B, and D was used for a better comparison of homoeolog gene expression patterns between parents and progeny.

Analysis of homoeolog expression changes and functional enrichment
To find homologous sequences between reciprocal parents of hybrid H and L, the full-length Pacbio sequences of the two parents were homoeologously aligned using the reciprocal BLAST (BLASTN) hit method with a screening index of e −20 (Ren et al. 2016). Two sequences were homologous, if each of them was the best hit of the other, and if the sequences were aligned over 300 bp. The total number of homoeolog gene pairs identified in parents A and B was 45216, while the parents A and D had 44920 homoeolog gene pairs. These homologous sequences were used as a reference for the already published second-generation Illumina sequencing H and L hybrid samples (Shahzad et al. 2020a, b). Each sample was subjected to mutation detection and specific SNPs. The hybrid offspring samples may be homozygous or heterozygous at the specific SNP sites of their respective parents. Later, we compared the SNPs of two parents to select the homoeologue-specific SNPs bins using custom python scripts (Li et al. 2020). Furthermore, genes with homoeologue-specific SNPs were compared on a case-by-case basis between the parents and then progeny to perform synchronous classification of homoeolog expression bias. The index to determine the significant difference was p-value <0.05. Compared with the diploid progenitors, the homoeolog gene pairs between both parents of hybrids were divided into 12 main categories with three subcategories, including the parental condition, no bias, and novel bias in progeny as already described (Yoo et al. 2013). Functional analyses including gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) were performed for all homoeolog expression genes that showed novel bias in different tissues of hybrids. GO terms and enriched pathways were retrieved from the cotton functional genomics database (https:// cotto nfgd. org/ analyz) with p-value < 0.05.

Homoeolog expression changes in different tissues of high hybrid
To observe the pattern of homoeolog expression in different tissues of high hybrid, a case-by-case gene comparison was performed for high hybrid (H) and its parents (maternal=A, paternal=B). The results are detailed in Table 1. Notably, a pattern of homoeolog expression was conserved in this hybrid for the greater number of genes in all tissues under investigation. Fewer homoeolog gene copies reflected a novel biased pattern. For example, 36853 total number of homoeolog gene pairs were identified in each tissue. We observed that 96% in the root, 97% in leaf, and 97% in flower buds had maintained parental expression patterns in hybrid. The total expression level of a homoeolog gene pair had no change or was equal to that of both parents. An average of 1% homoeolog gene pairs showed no bias in each hybrid tissue. According to the results of novel bias expression, 2.27% of homologous gene pairs had a novel bias in the root. Only 2.1% of homoeolog gene pairs in leaf exhibited novel bias expression. It was observed that 2.02% of homologous gene pairs had a novel bias in flower buds. The majority of novel biased homoeolog gene pairs exhibited maternal-like expression patterns in root and leaf ( Fig. 1a-b), wherein parental-like novel expressions were observed in flower buds  (Fig. 1c). Further analysis indicated that 1.2% homoeolog gene pairs in the root, 1.06% in leaf, and 1.1% in flower buds demonstrated overall maternal parent biased expression, whereas overall homoeolog gene pairs with paternal parent biased expression were 1.1%, 1.04%, and 0.95% in the root, leaf, and flower buds, respectively. The overall statistics of homoeolog expression changes in different tissues of high hybrid seem to show a highly balanced biased expression of homoeolog gene pairs in each tissue. These results suggested that homoeolog gene pairs that displayed fewer novel patterns of expression may be responsible for the high level of heterosis in upland cotton.

Homoeolog expression changes in different tissues of low hybrid
The results of homoeolog expression analysis in different tissues of low hybrid (L) and its reciprocal parents (A= maternal, D= paternal) are represented in Table 2. Interestingly, the homoeolog expression patterns in hybrid were in the same direction for the majority of homoeolog gene pairs as observed in parents. For instance, 97.3% in the root, 97.2% in the leaf, and 97.6% in flower buds out of all homoeolog gene pairs (36853) reflected equal parental expression patterns in this hybrid. These result predicted that expression According to the results of novel bias expression in hybrid, 1.75, 1.78, and 1.76% of homoeolog gene pairs adopted novel bias in the root, leaf, and flower buds, respectively. The majority of homoeolog gene pairs exhibited novel biased expression similar to maternal parents in root and leaf ( Fig. 2a-b). In contrast, flower buds had most of the homoeolog gene pairs that had shown novel biased expression with a higher paternal contribution in progeny (Fig. 2c). Overall, a group of maternal or paternal biased homoeolog gene pairs had very similar trends in each tissue. There was 0.83% maternal and 0.84% paternal biased homoeolog gene expression in the root, whereas the leaf had 0.98 and 0.81% maternal and paternal biased expression, respectively. Flower buds also showed a similar percentage of maternal and parental biased expression. Altogether, these results suggest that the majority of the homoeolog gene exhibited balanced or non-differential expression patterns in low hybrid. In addition, few homoeolog genes with novel expression profiles probably contribute to the low level of heterosis in upland cotton.

Distribution of novel expressed genes among different hybrids
This study further investigated the distribution of novel expressed genes among each tissue of hybrids and different hybrids. In the high hybrid, 374, 325, and 301 novel genes exhibited tissue-specific expression in root, leaf, and flower buds, respectively (Fig. 3). However, nonspecific or overlapped genes compared to tissue-specific expression did not account for the majority in these tissues. For example, 94 genes were overlapped between the comparison of root and leaf whereas 100 genes were overlapped between root and flower buds. The distribution results had shown that 298 novel genes displayed specific expression in root of low hybrid (Fig. 4). Similarly, the novel genes with specific expression were 324 in leaf and 308 in flower buds of low hybrid. There were 57 overlapped novel genes between root and leaf while flower bud and leaf had 76 overlapped novel genes. The novel gene distribution between high and low hybrids revealed the majority of novel genes were hybridspecific and less overlapped among both hybrids (Supplementary Fig. S1). For instance, only 122 genes were overlapped among hybrids, while 877 genes were specifically expressed in high hybrid and 808 in low hybrids. These results anticipated that novel genes with hybrid-specific expression most likely caused contrasting levels of heterosis in hybrids.

Functional analysis of homoeolog genes exhibiting novel expression changes in hybrids
As to understand the possible functions of homoeolog genes that had novel biased expression under hybridization in cotton, we performed GO and KEGG enrichment analysis in each hybrid. GO enrichment analysis (p-value < 0.05) for high hybrid showed a majority of novel biased expressed homoeolog genes had functional annotation related to catalytic activity, nucleus, metabolic process, transmembrane transport, binding of nucleotide, nucleic acid, protein, ATP, and RNA (Fig. 5a). Genes associated with the binding of protein and ATP had the highest portion of novel biased expressed homoeolog genes in this hybrid. Most enriched GO terms for novel biased expressed homoeolog genes in low hybrid were involved in functional processes related to the nucleus, catalytic activity, binding of ATP, protein, and nucleotide (Fig. 5b). Interestingly, genes that performed the binding function of protein and ATP also had the majority of novel biased expressed homoeolog genes in this hybrid. KEGG analysis revealed that the majority of novel biased expressed homoeolog genes in high hybrid were enriched in pathways such as biosynthesis of amino acids, mRNA surveillance pathway, glycolysis/gluconeogenesis, biosynthesis of antibiotics, metabolism of carbon, glutathione, cysteine/methionine, biosynthesis of secondary metabolites, and microbial metabolism in diverse environments (Fig. 6a). Biosynthesis of secondary metabolites had more significant gene enrichment as compared to other pathways. It was observed that a major portion of novel biased expressed homoeolog genes in low hybrid had enrichment in mRNA surveillance pathway, RNA degradation, oocyte meiosis, metabolism of pyruvate, carbon, sugars, and biosynthesis of secondary metabolites (Fig. 6b). In a low hybrid, biosynthesis of secondary metabolites also had a maximum number of novel biased expressed homoeolog genes as compared to other pathways. Remarkably, the biosynthesis of secondary metabolites had not only a higher percentage of novel biased expressed homoeolog genes but also was the overlapped pathway in both hybrids. Therefore, our functional analysis predicts that genes annotated in the biosynthesis

Discussion
Cotton is the backbone of the textile industry because of its unique fiber characteristics. However, the area of cotton cultivation, as well as production, is decreasing in recent decades all over the world. Hybridization has effectively resolved the problem of decreased crop productivity. For example, maize and rice cultivars developed with the exploitation of heterosis have more harvest potential than conservative genotypes (Schnable and Springer 2013). Cotton belongs to the genus Gossypium and upland cotton (G. hirsutum) is stated as new world cotton. The hybridization in upland cotton whether it is between species (interspecific) or within species (intraspecific) produced meaningful heterosis in yield-related traits (Li et al. 2019). It was observed that intraspecific crosses and improvement in fiber quality traits require more breeding efforts in cotton. How hybridization increased yield potential in upland cotton needs plenty of investigation at the genomic, transcriptomic, and epigenetic levels. Interestingly, the previous study has shown hybridization itself had sound effects on duplicate genes (homoeolog genes) in allopolyploids (Yoo et al. 2013;Ren et al. 2016). This study analyzed the effects of intraspecific hybridization on naturally occurring homoeolog genes in upland cotton through RNA-seq data of contrasting hybrids and full-length Pacbio sequences of their respective parents. In particular, gene expression data from three tissues (root, leaf, and flower buds) were used to explore changes in homoeolog expression and functional annotation of novel biased genes of hybrids.

Hybridization produced a parent legacy of homoeolog expressions in upland cotton
Hybridization is a vital phenomenon to produce genetic improvement in crops. It is a common method in which two genetically different genotypes are crossed to obtain vigor progeny in desired traits. More than two sets of chromosomes in allotetraploid upland cotton caused genetic, gene expression, and epigenetic changes. These effects may result in more Fig. 3 Distribution of novel biased homoeolog expressed genes in high hybrid. Here R, L, and FB stand for root, leaf, and flower buds, respectively vigorous growth or heterosis (Chen 2007). Previous studies in maize (Yao et al. 2013) and alfalfa (Bingham et al. 1994) suggested that allopolyploids exhibit more heterosis as compared to autopolyploids and tetraploids display increased heterosis as compared with diploids. It was found that intraspecific crossing in upland cotton produced progeny that often displays heterosis in early biomass growth and seed cotton yield (Shahzad et al. 2020a, b). However, there has been no study to analyze changes in homoeolog gene expression of inbred parents due to hybridization. The result of our study revealed that intraspecific hybridization in upland cotton produced highly balanced changes in the pattern of homoeolog expression. The pattern was conserved in hybrids for the greater number of genes. For example, both higher and lower hybrids had around 95% homoeolog gene pairs with similar expression pre-existing in parents in all tissues under investigation. Fewer homoeolog gene copies imitated novel homoeolog expression patterns in hybrids. Furthermore, fewest homoeolog gene pairs exhibited maternal or paternal overall biases expression. A previous study in allopolyploid cotton showed that most homoeolog gene pairs reflected parental expression conditions in hybrid and synthetic allopolyploids (Yoo et al. 2013). Similar results were reported in hybrid fish and its progenitors (Ren et al. 2016;Ren et al. 2017). An assessment of homoeolog expressions in Brassica napus described that majority of homoeolog gene pairs maintained parental expression patterns in synthetic hybrids and natural allotetraploids (Zhang et al. 2016). The effect of polyploidization on homoeolog expression in hexaploid wheat revealed equal homoeolog expression of parental lines is often conserved in progeny (Li et al. 2014). Altogether, these results indicated that the effect of hybridization or genome doubling on the expression of homoeolog gene pairs in derived hybrids is often inherited from diploid parents. Though the relationship between heterosis and homoeolog gene pairs is still not fully understood, these findings suggest that fewer changes at the transcriptomic level could be associated with heterosis in upland cotton.

Novel homoeolog gene expressions probably regulate heterosis in cotton through biosynthesis of secondary metabolites
As already described in the majority of the results, homoeolog gene pairs had a parental legacy in hybrids. Theoretically, these gene pairs do not contribute to the superior or inferior performance of cotton hybrids. Homoeolog gene pairs with a novel or differential expression in hybrids could regulate heterosis. Functional analysis of these novel homoeolog gene pairs revealed that protein and ATP binding genes through the secondary metabolism pathway play an essential role to increase growth potential in cotton hybrids. Metabolites are an essential component for adaptation, stress management, and regulation of different biological processes in plants (Kleessen et al. 2014;Miller et al. 2015). Previous research showed that modification of secondary metabolism is useful for growth heterosis in oilseed crops such as Brassica juncea (Bajpai et al. 2019). Furthermore, these researchers reported an additive mode of inheritance of metabolite components in hybrids. Previous metabolic investigation in hybrids of maize (Wang et al. 2014), rice (Ma et al. 2011), and Arabidopsis (Korn et al. 2010) provided preliminary perceptions and enhanced knowledge about the molecular mechanism of plant heterosis. There is plenty of emphasis on genetic, gene expression, and epigenetic-related studies to explore the biological basis of heterosis in cotton. Yet, the relationship between heterosis and metabolic changes in hybrids has been poorly known in cotton. Our results underpin the importance of genomic inheritance of secondary metabolites between hybrids and inbred parents in cotton.

Conclusion
In this study, global homoeolog gene expression patterns between intraspecific contrasting cotton hybrids and their reciprocal inbred parents were analyzed using RNA sequencing. The results indicated that hybridization caused no expression biases of homoeolog gene pairs in hybrids. Hybrids showed no biased expression preference towards any parents. Furthermore, functional annotation of novel expressed homoeolog gene pairs explained that changed secondary metabolic gene expression between parents and hybrids may regulate heterosis in upland cotton. Together, our results provide a general understanding of homoeolog gene expression patterns and may help to explore the genetic basis of heterosis in cotton.