Global insights into homoeolog gene expression in upland cotton under intraspecic hybridization

Background: Hybridization is useful to enhance yield potential of agronomic crops in the world. Cotton has genome doubling due to alloteraploidy process and hybridization process in coordinate with duplicated genome can produce more yield and adaptability. Therefore, expression of homoeolog gene pairs between hybrids and inbred parents are vital to characterize genetic source of heterosis in cotton. Results: Investigation results of homoeolog gene pairs between two contrasting hybrids and their respective inbred parents identied 36853 homoeolog genes in hybrids. It was observed both high and low hybrids had similar trends in homoeolog gene expression patterns in each tissue under study. An average of 96% of homoeolog genes had no biased expression and their expressions were derived from the equal contribution of both parents. Besides, very few homoeolog genes (An average of 1%) showed no biased or novel expression in both hybrids. The functional analysis described secondary metabolic pathways had a majority of novel biased homoeolog genes in hybrids. Conclusions: These results contribute preliminary knowledge about how hybridization affects expression patterns of homoeolog gene pairs in upland cotton hybrids. Our study also highlights the functional genomics of metabolic genes to explore the genetic mechanism of heterosis in cotton.


Abstract
Background: Hybridization is useful to enhance yield potential of agronomic crops in the world. Cotton has genome doubling due to alloteraploidy process and hybridization process in coordinate with duplicated genome can produce more yield and adaptability. Therefore, expression of homoeolog gene pairs between hybrids and inbred parents are vital to characterize genetic source of heterosis in cotton.
Results: Investigation results of homoeolog gene pairs between two contrasting hybrids and their respective inbred parents identi ed 36853 homoeolog genes in hybrids. It was observed both high and low hybrids had similar trends in homoeolog gene expression patterns in each tissue under study. An average of 96% of homoeolog genes had no biased expression and their expressions were derived from the equal contribution of both parents. Besides, very few homoeolog genes (An average of 1%) showed no biased or novel expression in both hybrids. The functional analysis described secondary metabolic pathways had a majority of novel biased homoeolog genes in hybrids.
Conclusions: These results contribute preliminary knowledge about how hybridization affects expression patterns of homoeolog gene pairs in upland cotton hybrids. Our study also highlights the functional genomics of metabolic genes to explore the genetic mechanism of heterosis in cotton.

Background
Hybridization or polyploidy is a vital process, produces exceptional phenotypes through the interaction of two distinct genomes, and has a substantial in uence on the plant genome (Dufresne et al. 1994;Yoo et al. 2013). The novel phenotypes in plants can be consequent of many modi cations not only at genomic and epigenetic levels but also in gene expression patterns (Doyle et al. 2008;Kashkush et al. 2003). Several examples in cotton ) Brassica napus (Gaeta et al. 2009), and Spartina (Chelaifa et al. 2010) stated that hybridization (genome merger) modi ed more transcriptome than polyploidy.
About 15% of owering plants are polyploidy (Wood et al. 2009) and divided into categories of autopolyploid and allopolyploid. Autopolyploid is evolved from a genome merger of the same or similar species but allopolyploids contain a genomic combination of two different species (Doyle et al. 2008).
Allopolyploids have enhanced growth and wider adaptations relative to autopolyploid. Many important crops are allopolyploids together with upland cotton (Gossypium hirsutum). In allopolyploids, hybridization or polyploidy is allied with various kinds of exchanges and interactions between homoeolog genes (duplicate genes) (Kovarik et al. 2008;Salmon et al. 2010;Szadkowski et al. 2010). Further modi cations are consisting of epigenetic (Doyle et al. 2008;Madlung et al. 2002), gene activation (Kashkush et al. 2003;Kraitshtein et al. 2010), and sequence removal (Anssour et al. 2009;Han et al. 2005). The result of different genomic interactions by hybridization or polyploidy produced several expression patterns of duplicate or genes with similar functions. For instance, these may be gene silencing, gene activation, novel expression, and altered parental contribution (Doyle et al. 2008;Li et al. 2014;Yoo et al. 2013). However, the direction and magnitude of duplicate gene expression are varied with crop, tissue, and organ (Adams et al. 2003;Bottley et al. 2006;Buggs et al. 2010;Chaudhary et al. 2009).
Previous studies have identi ed four different expression forms of duplicated genes in plants e.g., one duplicate gene copy may be lost its expression pattern as compared to other (De Smet et al. 2013), duplicate gene copies may maintain original expression (McGrath et al. 2014), one duplicate gene copy may adopt new expression (Hughes et al. 2014;Liu et al. 2010), and sometimes original expression is the equal contribution of duplicates (Kleinjan et al. 2008). Importantly, the duplicate gene pairs may show homoeolog expression bias in allopolyploid (Grover et al. 2012). Where, one homoeolog display preferential expression to the transcriptome as compared to the other. The phenomena of homoeolog expression have already been stated for allopolyploids like cotton Hovav et al. 2008;Samuel Yang et al. 2006), wheat (Bottley et al. 2006;Mochida et al. 2004), Arabidopsis (Chang et al. 2010;Wang et al. 2004), and brassica ( Auger et al. 2009;Gaeta et al. 2007). The analysis results in cotton showed that hybridization caused more homoeolog expression changes and fewer changes in gene expressions (Yoo et al. 2013). In contrast, polyploidy formation is linked with more gene expression modi cations and fewer changes in homoeolog expression. A study in cyprinid sh also reported hybridization brought more homoeolog expression changes than polyploidy (Ren et al. 2016). Homoeolog expression can be either balanced or unbalanced (Grover et al. 2012). When duplicate gene pairs display biased expression towards one parental genome is almost equal to the biased expression towards other parental genomes. It shows balanced homoeolog expression. On the other hand, an unbalanced homoeolog expression change states to a situation where more duplicate gene pairs demonstrate expression twisted toward one parental genome. An unbalanced form of homoeolog expression is mostly reported in allopolyploids (Akhunova et al. 2010;Schnable et al. 2011;Wang et al. 2006).
Upland cotton is a prime ber, an important source of edible oil, and cash crops in the world (Shahzad et al. 2019a). The demand for cotton ber in the textile industry is increasing. While supply is reducing due to low area of cultivation and staggered yield potential of cultivars. Heterosis is and will continue to be a basic breeding tool to produce superior genotypes (Birchler et al. 2003). Many researchers reported meaningful heterosis in seed cotton yield and ber quality traits in upland cotton (Ali et al. 2016;Li et al. 2019). Further, cotton genotypes produced through the utilization of heterosis exhibit more stability and adaptability to changing the environment (Shahzad et al. 2019b). The genetic mechanism of heterosis is a perturbed and so-called genetic model of dominance (Xiao et al. 1995), overdominance (Shahzad et al. 2020a), and epistasis (Yu et al. 1997) does not reveal it. However, progress has already been achieved in the recent few years with advanced molecular research. For instance, high-throughput transcriptome sequencing is a useful technology to get accurate knowledge about global gene expressions. This also creates a possibility to discriminate the expression of homologous genes (Shahzad et al. 2020b;Wang et al. 2009;Wu et al. 2018). To date, many previous studies in agronomic crops analyzed patterns of homoeolog expression in allotetraploids and their diploid progenitors e.g., cotton (Yoo et al. 2013) and brassica (Li et al. 2020). However, information regarding how hybridization effect homoeolog expression changes in newly formed upland cotton hybrids is limited. Thus, we conducted a study to investigate the effect of hybridization on homoeolog expression modi cations on a genome-wide scale. The direction and extent of homoeologue expressions between contrasting F 1 hybrids and their reciprocal parents were analyzed by RNA-seq of leaf, root, and ower buds. The result will provide new insights into how homoeologue expressions contribute to growth heterosis in upland cotton and enhance our genetic knowledge for hybridization process.

Plant material and RNA sequencing
The plant material used in this study included two seed cotton yield contrasting F 1 hybrids of upland cotton and their three reciprocal inbred parents named as A, B, and D. The high yielding hybrid (H) was cross of inbred A and B. whereas, a low hybrid was a combination of cross A and D. The seedling was raised in the lab for all these materials, root and leaf samples were picked at true leaf stage, and ower buds were picked at squaring stage from the eld in three biological repeats. Total RNA was isolated using the Sigma Spectrum Plant Total RNA kit (Sigma-Aldrich, USA) as described by the manufacturing company. After quality checks and preparations, paired-end sequencing (300 ± 50 bp) on a secondgeneration Illumina Hiseq 4000 was performed for hybrid H and L. These sequencing details can be seen at NCBI with accession number GSE144676 (Shahzad et al. 2020a) and GSE150052 (Shahzad et al. 2020b). The full-length Pacbio transcriptome sequencing (Unpublished) for parental A, B, and D were used for a better comparison of homoeolog expression patterns between parents and progeny.

Analysis of homoeolog expression changes and functional enrichment
To nd homologous sequences between reciprocal parents of hybrid H and L, the full-length Pacbio sequences of the two parents were homologously aligned using the reciprocal BLAST (BLASTN) hit method with screening index of e-20 (Ren et al. 2016). Two sequences were homologous, if each of them was the best hit of the other, and if the sequences were aligned over 300 bp. The total number of homologous gene pairs identi ed in parent A and B were 45216. While the parents A and D had 44920 homologous gene pairs. These homologous sequences were used as a reference for the already published second-generation Illumina sequencing H and L hybrid samples (Shahzad et al. 2020a;Shahzad et al. 2020b). Each sample was subjected to mutation detection and speci c SNPs. The hybrids offspring samples may be homozygous or heterozygous at the speci c SNP sites of their respective parents. Later, we compared the SNPs of two parents to select the homoeologue-speci c SNPs bins using custom python scripts (Li et al. 2020). Further, genes with homoeologue-speci c SNPs were compared on a case by case basis between the parents and then progeny to perform synchronous classi cation of homoeolog expression bias. The index to determine the signi cant difference was P-value <0.05. Compared with the diploid progenitors, the homologous gene pairs between both parents of hybrids were divided into 12 main categories with three subcategories, including the parental condition, no bias, and novel bias in progeny as already described (Yoo et al. 2013). Functional analyses including gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) was performed for all homoeolog expression genes that showed novel bias in different tissues of hybrids. GO terms and enriched pathways were retrieved from the cotton functional genomics database (https://cottonfgd.org/analyz) with P-value < 0.05.

Homoeolog expression changes in different tissues of high hybrid
To observe the pattern of homoeolog expression in different tissues of high hybrid, a case by case gene comparison was performed for high hybrid (H) and its parents (maternal=A, paternal=B). The results are detailed in Table 1. Notably, a pattern of homoeolog expression was conserved in this hybrid for the greater number of genes in all tissues under investigation. Fewer homoeolog gene copies re ected novel biased pattern. For example, 36853 total number homoeolog gene pairs were identi ed in each tissue. We observed that 96% in the root, 97% in leaf, and 97% in ower buds had maintained parental expression patterns in hybrid. The total expression level of a homologous gene pair had no change or equal to that of both parents. An average of 1% homoeolog gene pairs showed no bias in each hybrid tissue. Moreover, an average of 2.1% of genes re ected novel bias expression in each tissue of hybrid. Further analysis indicated that 1.2% homoeolog gene pairs in the root, 1.06% in leaf, and 1.1% in ower buds demonstrated overall maternal parent biased expression in hybrid. Whereas, overall homoeolog gene pairs with paternal parent biased expression in hybrid was 1.1%, 1.04%, and 0.95% in the root, leaf, and ower buds, respectively. The overall statistics of homoeolog expression changes in different tissues of high hybrid seem to show a highly balanced biased expression of homoeolog gene pairs in each tissue of high hybrid. These results suggested homoeolog gene pairs that displayed fewer novel patterns of homoeolog expression may be responsible for the high level of heterosis in upland cotton.

Homoeolog expression changes in different tissues of low hybrid
The results of homoeolog expression analysis in different tissues of low hybrid (L) and its reciprocal parents (A= maternal, B= paternal) are represented in Table 2. Interestingly, the homoeolog expression patterns in hybrid were in the same direction for the majority of homoeolog gene pairs as observed in parents. For instance, 97.3% in the root, 97.2% in leaf, and 97.6% in ower buds out of all homoeolog gene pairs (36853) re ected equal parental expression patterns in this hybrid. These result predicted that expression biases pre-existing in parents was simply conserved in these tissues of hybrid. The group of homoeolog gene pairs with bias expression in parents and reverted to no bias expression in hybrid had an average of 1% in root and leaf. On the other hand, only 0.60% of homologous gene pairs in ower buds had this reversion. According to the results of novel bias expression in hybrid, 1.75, 1.78, and 1.76% homologous gene pairs adopted novel bias in the root, leaf, and ower buds, respectively. Overall, a group of maternal or paternal biased homologous gene pairs had very similar trends in each tissue. There was 0.83% maternal and 0.84% paternal biased homologous gene expression in the root of hybrid, whereas leaf had 0.98 and 0.81% maternal and paternal biased expression, respectively. Flower buds also showed a similar percentage of maternal and parental biased expression. Altogether, these results suggest that the majority of the homoeolog gene exhibited balanced or non-differential homoeolog expression patterns in low hybrid. Further, homologous genes with few novel patterns of homoeolog expression probably contribute to the low level of heterosis in upland cotton.
Functional analysis of homoeolog genes exhibiting novel expression changes in hybrids As to understand possible functions of homoeolog genes that had novel biased expression under hybridization in cotton, we performed GO and KEGG enrichment analysis in each hybrid. GO enrichment analysis (p-value < 0.05) for high hybrid showed a majority of novel biased expressed homoeolog genes had functional annotation related to catalytic activity, nucleus, metabolic process, transmembrane transport, binding of nucleotide, nucleic acid, protein, ATP, and RNA (Fig. 1a). Genes associated with binding of protein and ATP had the highest portion of novel biased expressed homoeolog genes in this hybrid. Most enriched GO terms for novel biased expressed homoeolog genes in low hybrid were involved in functional processes related to the nucleus, catalytic activity, binding of ATP, protein, and nucleotide (Fig. 1b). Interestingly, genes that performed the binding function of protein and ATP also had the majority of novel biased expressed homoeolog genes in this hybrid.
KEGG analysis revealed that the majority of novel biased expressed homoeolog genes in high hybrid were enriched in pathways such as biosynthesis of amino acids, mRNA surveillance pathway, Glycolysis / Gluconeogenesis, biosynthesis of antibiotics, metabolism of carbon, glutathione, cysteine/ methionine, biosynthesis of secondary metabolites, and microbial metabolism in diverse environments (Fig. 2a).
Biosynthesis of secondary metabolites had more signi cant and gene enrichment as compared to other pathways. It was observed that a major portion of novel biased expressed homoeolog genes in low hybrid had enrichment in mRNA surveillance pathway, RNA degradation, Oocyte meiosis, metabolism of pyruvate, carbon, sugars, and biosynthesis of secondary metabolites (Fig. 2b). In a low hybrid, biosynthesis of secondary metabolites also had a maximum number of novel biased expressed homoeolog genes as compared to other pathways.

Discussion
Cotton is the backbone of the textile industry because of its unique ber characteristics. However, the area of cotton cultivation, as well as production, is decreasing in a recent decade all over the world.
Hybridization has effectively resolved the problem of decreased crop productivity. For example, maize and rice cultivars developed with exploitation of heterosis have more harvest potential than conservative genotypes (Schnable et al. 2013). Cotton belongs to the genus Gossypium and upland cotton (G. hirsutum) is stated as new world cotton. The hybridization in upland cotton whether it is between species (interspeci c) or within species (intraspeci c) produced meaningful heterosis in yield-related traits (Li et al. 2019). It was observed that intraspeci c crosses and improvement in ber quality traits require more breeding efforts in cotton. How hybridization increased yield potential in upland cotton needs plenty of investigation at the genomic, transcriptomic, and epigenetic levels. Interestingly, the previous study has shown hybridization itself had sound effects on duplicate genes (homoeolog genes) in allopolyploids (Ren et al. 2016;Yoo et al. 2013). This study analyzed the effects of intraspeci c hybridization on naturally occurring homoeolog genes in upland cotton through RNA-seq data of contrasting hybrids and full-length Pacbio sequences of their respective parents. In particular, gene expression data from three tissues (root, leaf, and ower buds) were used to explore changes in homoeolog expression and functional annotation of novel biased genes of hybrids.
Hybridization produced parent legacy of homoeolog expressions in upland cotton Hybridization is a vital phenomenon to produce genetic improvement in crops. It is a common method in which two genetically different genotypes are crossed to obtain vigor progeny in desired traits. More than two sets of chromosomes in allotetraploid upland cotton caused genetic, gene expression, and epigenetic changes. These effects may result in more vigorous growth or heterosis (Chen 2007). Previous studies in maize (Yao et al. 2013) and alfalfa (Bingham et al. 1994) suggested that allopolyploids exhibit more heterosis as compared to autopolyploid and tetraploid display increased heterosis as compared with diploids. It was found that intraspeci c crossing in upland cotton produced progeny that often displays heterosis in early biomass growth and seed cotton yield (Shahzad et al. 2020a(Shahzad et al. , 2019b. However, there has been no study to analyze changes in homoeolog gene expression of inbred parents due to hybridization. The result of our study revealed that intraspeci c hybridization in upland cotton produced highly balanced changes in the pattern of homoeolog expression. The pattern was conserved in hybrids for the greater number of genes. For example, both higher and lower hybrids had around 95% homoeolog gene pairs with similar expression pre-existing in parents in all tissues under investigation. Fewer homoeolog gene copies imitated novel homoeolog expression patterns in hybrids. Further, fewest homoeolog gene pairs exhibited maternal or paternal overall biases expression. A previous study in allopolyploid cotton showed that most homoeolog gene pairs re ected parental expression conditions in hybrid and synthetic allopolyploids (Yao et al. 2013). Similar results were reported in hybrid sh and its progenitors (Ren et al. 2016;Ren et al. 2017). Assessment of homoeolog expressions in Brassica napus described that majority of homoeolog gene pairs maintained parental expression patterns in synthetic hybrids and natural allotetraploids (Zhang et al. 2016). The effect of polyploidization on homoeolog expression in hexaploid wheat revealed equal homoeolog expression of parental lines is often conserved in progeny (Li et al. 2014). Altogether, these results indicated that the effect of hybridization or genome doubling on the expression of homoeolog gene pairs in derived hybrids is often inherited from diploid parents. Though the relationship between heterosis and homoeolog gene pairs is still not fully understood, these ndings suggest that fewer changes at the transcriptomic level could be associated with heterosis in upland cotton.
Novel homoeolog gene expressions may be regulated heterosis in cotton through secondary metabolism As already described in the majority of the result of homoeolog gene pairs had a parental legacy in hybrids. Theoretically, these gene pairs do not contribute to the superior or inferior performance of cotton hybrids. Homoeolog gene pairs with a novel or differential expression in hybrids could regulate heterosis. Functional analysis of these novel homoeolog gene pairs revealed that the binding function of protein and ATP through the secondary metabolism pathway plays the role to increase growth potential in offspring relative to their parents. Metabolites are an essential component for adaptation, stress management, and regulation of different biological processes in plants (Kleessen et al. 2014;Miller et al. 2015). Modi cation of secondary metabolism is useful to growth heterosis in oilseed crop Brassica juncea (Bajpai et al. 2019). Further, these researchers reported an additive mode of inheritance of metabolites in hybrids. Previous metabolic investigation in hybrids of maize (Wang et al. 2014), rice (Ma et al. 2011), andArabidopsis (Korn et al. 2010) provided preliminary perceptions and enhanced knowledge about the molecular mechanism of plant heterosis. There is plenty of emphasis on genetic, gene expression, and epigenetic related studies to explore the biological basis of heterosis in cotton. Yet, the relationship between heterosis and metabolic changes in hybrids has been poorly known in cotton. Our results underpin the importance of genomic inheritance of secondary metabolites between hybrids and inbred parents in cotton.

Conclusion
In this study, global homoeolog gene expression patterns between intraspeci c contrasting cotton hybrids and their reciprocal inbred parents were analyzed using RNA sequencing. The results indicated that hybridization caused no expression biases of homoeolog gene pairs in hybrids. Hybrids showed no bias expression preference towards any parents. Further, functional annotation of novel expressed homoeolog gene pairs explained changed secondary metabolic gene expression between parents and hybrids may regulate heterosis in upland cotton. Together, our results provide a general understanding of homoeolog gene expression patterns and may help to explore the genetic basis of heterosis in cotton. Here A= Maternal parent and B= Paternal parent  Most enriched GO terms for novel biased homoeolog expressed genes in each hybrid. a GO terms for high hybrid. Black numbers represent total genes for each term at signi cance of 0.05. b GO terms for low hybrid. Black numbers represent total genes for each term at signi cance of 0.05.

Figure 2
Most enriched Pathways for novel biased homoeolog expressed genes in each hybrid. a Pathways for high hybrid. b Pathways for low hybrid.