Identication and Characterization of TUBBY genes in developmental stages of Zea mays

In this paper, we studied the organization and function of TUBBY Transcription Factor gene family in maize. Initially, using comparative approach, we discovered the Arabidopsis thaliana orthologs in Zea mays. We found in total 13 genes, 12 of which are orthologs and a unique paralog that exhibits the highest activity in maize. We studied the role of TUBBY gene family across different developmental stages using existing expression data, and discovered the binding motifs present in the promoter region of the genes.


Introduction
Maize is one of the most important agricultural crops worldwide, while United States of America is the top maize producer. Maize prospers in a range of climatic zones and it ts the niche between rice and wheat in the phylogenetic tree (Kellogg 2001). It is used as livestock feed and for industrial production of starch and oil extraction. For human consumption, maize is an excellent source for carbohydrates, minerals, protein and iron as well as for vitamin B.
Transcription Factors (TFs) are trans-regulatory proteins controlling the expression of other genes by recognizing their promoter regions (Zheng, Jiao et al. 2016, Tian 2020. Speci c domains in the proteins recognize speci c DNA motifs that bind to, thereby inducing the transcription of a particular gene. TFs are clustered in different groups of genes depending on the motifs and the domains they contain (Boggon, Shan et al. 1999), as well as the developmental stages and environmental conditions they have been found to be associated with ( TUBBY genes were rst identi ed to be expressed in obese in mice (Kleyn, Fan et al. 1996), and they were characterized as TFs (Boggon, Shan et al. 1999). Nevertheless, they have been identi ed in many eukaryotic species ranging from animals to plants implying that they are involved in conserved functions (Wang, Xu et al. 2018). The TUBBY gene family was suggested to expand via segmental duplication and random translocation and insertion (Yang, Zhou et al. 2008). Members of the Tubby-Like Protein family (TLP) are characterized by a conserved C-terminal TUB domain, a domain hypothesized to bind to speci c membrane phosphoinositide phosphatases (PIPs), to form a stable but exible complex (Meng . In regards to their subcellular localization, TUBBY proteins are expressed mainly in plasma mebrane, cytoplasm and nucleus (Wang, Xu et al. 2018).
In plants, TUBBY genes were rst characterized systematically in Arabidopsis thaliana, where it was found that they are involved in ABA signaling pathway, in both biotic and abiotic stress responses. In the vast majority of plants, TUBBY genes contain an N-terminal F-box domain. Moreover, the ten aminoacid residues between the F-box domain and TUB domain is suggested to be evolutionarily conserved in plants. However, TUBBY genes in some plant species may not contain the F-box domain. For instance, in Arabidopsis thaliana, all AtTLP genes except AtTLP8 possess an F-box domain (Lai, Lee et al. 2004), and in Oryza sativa, OsTLP13 is the only gene lacking the F-box domain (Liu 2008). Furthermore, within the C terminal region of the TUB domain, two segmental patterns that are de ned as TUB1 and TUB2 motifs consisting of 14 and 16 amino acids, respectively (Lai, Lee et al. 2004).
In this paper, we used comparative genomics to identify the Zea mays TUBBY homologue genes using the Arabidopsis thaliana TUBBY genes. After manual annotation and support of further analysis, we ended up with twelve orthologs and a novel maize-speci c TUBBY gene. We used available expression data at different developmental stages and performed manual curation of the genes using long and short reads. We also looked for the motifs present in each of the ZmTLP and their alternative-splicing products. We identi ed cis-regulatory elements for each TUBBY gene, to improve our understanding of the processes each gene is involved. Finally, we used RT-PCR to con rm the expression of the genes in the different tissues. Our results offer insight into possible functions of a largely unknown transcription factor family.

Discovering the TUB Homologs
Initially, we have used as our reference the TUBBY annotated genes from Arabidopsis thaliana genome using The Arabidopsis Information Resource (www.arabidopsis.org/browse/genefamily/TUB.jsp) on www.arabidopsis.org Feb 5, 2019. Following the identi cation of the gene names associated to the TUBBY family, each of the genes were then searched in the Gramene website, a comparative resource site for plants(Tello-Ruiz, Naithani et al. 2018), under the Arabidopsis genome. After searching for a particular gene name, we were directed to the Ensembl database to view gene trees. The gene trees displayed the orthologs of the TUBBY gene family for various plant species. In order to nd the orthologs of Zea mays to Arabidopsis thaliana genome, we searched for Zea mays, current genome version (v4), under the tab Plant Compare. Once the orthologs were identi ed, we selected the compared regions to obtain the coordinates needed to locate the genes in the Zea mays genome. In addition to nding the orthologs, we interested in nding paralogs within Zea mays, if any, using the paralog tab. The overall work ow of our analysis can be found in Fig. 1.

Annotating the TUB genes
For the manual annotation of the of Arabidopsis thaliana TUBBY gene family orthologs in Zea mays, we used the Apollo website utility (Fig. 2). Apollo is an interactive browser, which grants one entry to modify and re ne the exact location and structure of genomic features (Lee, Helt et al. 2013). The chromosomal region of interest was located using the coordinates found from the Gramene database as described in the previous section. Before annotating the genes, we selected the following evidence tracks from the annotator panels: est2genome.iso, est2genome-c, Maker_ updated, and v_ model _mapped to_v4. These tracks provided us with various gene predictions, which accurately re ect the structure of a given gene. By utilizing the tracks, we were able to produce reasonable gene models. Afterwards, we selected the gene models that most resembled the structure of the gene, using the gene as an archetype to re ne the gene models. When necessary, editing functions were used to add exons and modify splice sites in transcript alignments to correspond the selected gene model to the given gene. We continued this process for each ortholog found. To strengthen the validation, we have mapped Illumina BAM les from different tissues within Apollo (Fig. 2).

Construction of phylogenetic trees
To identify the levels of conservation among plant species of the TUBBY genes, we used the Gramene tab resource on Homology. Then, we followed the link for the Ensembl Gene Tree view, under which we checked for the presence of each of the TUBBY genes in available genomes in Gramene. Using the Genomic regions of each of the TUBBY genes, we used the online clustalw (Larkin, Blackshields et al. 2007) algorithm in GenomeNet (http://www.genome.jp) to construct the alignment of all TUBBY genes in maize followed by the construction of a phylogenetic tree using the PhyML (Guindon, Lethiec et al. 2005) and boostrap 10,000.

Digital expression
We used the Gramene nomenclature to explore the digital expression of the gene family using the free version of Genevestigator platform (Hruz, Laule et al. 2008). Within this platform, we explored clustering for both gene and conditions or developmental stages, using the Euclidian clustering method.

Designing primers to validate gene structures
To validate the gene modules, primers were designed to amplify the whole length mRNA of the genes. RT-PCR were performed on maize cDNA from leaf, stem and root tissues. Ampli ed products were visualized on 1% electrophoresis gel made with TBE buffer. The work ow is graphically illustrated in Fig. 1.

Discovering the TUB Homologs
In Arabidopsis thaliana, there are 10 TUBBY (AtTLP1-7 and AtTLP9-11) genes. For each of the genes, we searched for Zea mays orthologs. Out of the 10 genes in Arabidopsis thaliana, we found ve to have homologs in Zea mays. Those are the AtTLP2 and AtTLP3 that have two homologs each in Zea mays, namely ZmTLP6 and ZmTLP9, and ZmTLP10 and ZmTLP11, respectively. AtTLP5 has ve homologs in Zea mays, namely, ZmTL2, ZmTLP5, ZmTLP7, ZmTLP8 and ZmTLP14. Also, AtTLP7 has three homologs ZmTLP3, ZmTLP4 and ZmTLP12 and nally AtTLP8 with one homolog, ZmTLP13. As far as ZmTLP10 concerns, in the process of manual curation, we did not nd enough evidence supporting the gene module and therefore it was eliminated. In total, we came up with thirteen unique Zea mays TUBBY TFs genes homologs to Arabidopsis thaliana named as ZmTLP. Three ZmTLP genes are located in chromosome four, two genes in each of chromosomes three and ve, and one gene in each of chromosomes six, eight, nine and ten (Table 1; Fig. 3). Moreover, we found one paralog gene -ZmTLP1-unique to maize that is located in chromosome 8. Table 1 Comparative analysis of TUBBY Transcription Factor Gene Family between A.thaliana and Z. mays. The rst column refers to the general TUBBY gene module. The second and third columns refer to the Zea mays gene, and the Arabidopsis thaliana gene module, respectively. In the next two columns the alternative-splicing events in Zea mays are presented: more speci cally the Before manual annotation refers to the alternative-spliced modules found for each locus initially, and the After manual annotation column refers to the modules we kept after performing the manual annotations. In the last three columns under the Z. mays Genomic location part, the chromosome number, start and end coordinates of each gene locus in Zea mays is shown We also searched for alternative-splicing events. We found in total 63 alternative-splicing events that per ZmTLP gene ranged from two to eight as shown in Table 1. Speci cally, the maximum of eight splicing events were found in ZmTLP4, and the least of two splicing events in ZmTLP13. After manual annotation, we ended up with 35 alternative-splicing events.

Annotating the TUBBY genes
Once we identi ed the Zea mays TUBBY transcription factors, we used a web version of Apollo to validate the annotations (Table 1; Fig. 1). In this process, we took into consideration expression data (both PacBio and Illumina) from different tissues. We validated 12 TUBBY genes (Table 1; Fig. 1) from which 35 alternative-spliced models were con rmed by expression data (Table 1). By going over with the manual annotation of the TUBBY genes, we deleted 44.4% of the automatically predicted alternative-spliced transcripts.

Phylogenetic analysis
We found that all ZmTLP genes have homologs in all vascular eukaryotic plants but not in non-vascular plants ( Supplementary Fig. 1 Fig. 1).
To nd the evolutionary relationship of the TLP genes in maize, we constructed the phylogenetic tree of the Zea mays genes (Fig. 4). We observed four sub-groups to be formed. One group consisted ZmTLP11 and 13, another group with ZmTLP3, 4 and 12, a third group with ZmTLP6 and 9, and nally, the last and largest subgroup contained ZmTLP2, 5, 7, 8, 14 and ZmTLP1. These results are in agreement with the ndings in Table 1.

Discovering motifs and domains
We were further interested in characterizing the annotated products/proteins (Fig. 5 and TUB2 motifs with the exception of ZmTLP6.5 that has only TUB2 motif. The ZmTLP3 locus has ve alternative spliced products and have only TUB1 motif. Interestingly, the maize-speci c gene ZmTLP1 has X alternative-spliced products and consists of both TUB1 and TUB2 motifs. Finally, ZmTLP13 has no TUB1 or TUB2 predicted motif. Further, we searched the maize TLP genes for cis-regulatory elements using the PlantPan 3.0 database and selecting all available plant genomes. In total, 55 elements in the 13 genes were identi ed (Supplementary table 1), out of which 22 cis-regulatory elements were found present in all genes. On our set of genes, we identi ed motifs for binding site of other transcription factors that are responsible for organ development. Those are motifs recognized by B3, SRS, AT-hook, Trihelix. We also found motifs that other regulatory transcription factors involved in both stress and development bind to, such as bHLH, bZIP, C2H2, HD-ZIP, ZF-HD transcription factors. Moreover, additional motifs for binding of transcription factors involved only in the stress responses such as WRKY, MADF, MADS and Homeodomain were identi ed. Finally, there are binding sites for transcription factors such as the C3H involved in embryo development, the MYB involved in organ identity, the NAC involved in leaf senescence, the SBP involved in ower development, and the VOZ involved in pollen maturation.

Digital Expression across developmental stages
The next step was to use available data to study gene expression of the different homologs and across different developmental stages. We clustered the results for both genes and developmental stages ( Fig. 6). Two major groups are formed regarding the developmental stages, the dough stage and fruit formation called Group 1, where we do not observe high expression of the genes. Group 2 is sub-divided into two subgroups SG1 subgroup with relative higher expression values, and the SG2 subgroup with even highest expression than SG1. The developmental stages in SG1 subgroup are the germination, stem elongation and the seedling stage, while the SG2 subgroup is the in orescence formation and anthesis. Speci cally, the group composed of ZmTLP2, 4, 5, 6, 7, and 9 displayed higher expression than the second group composed of ZmTLP3, 8, 11, 12 and 14. Finally, the ZmTLP1 has the highest expression and the ZmTLP13 has the lowest expression of all (Fig. 6).
In the process of the gene annotation, we took into consideration the tissues that genes were expressed.
We found all genes expressed in tassel, ear, and embryo tissues, while ZmTLP1 and ZmTLP13 are additionally expressed in root tissues. In endosperm tissue, the ZmTLP2, ZmTLP4, 5, and 6 as well the ZmTLP9, 11 and 14 are expressed. In pollen tissues, only ZmTLP1, 3 and 14 are expressed (example gene, TUBBY 11, is shown in Fig. 2b). To verify the expression of the genes we used cDNA from roots, leaves and stems. Here, we con rmed the expression of ZmTLP6.2, ZmTLP14.1, ZmTLP3.1, ZmTLP3.2 and ZmTLP11.1 in stem tissue (Fig. 7).

Discussion
In this paper, we described the 13 TUBBY transcription factor genes in maize genome (version 4). Twelve of them were orthologs to ve out of ten Arabidopsis thaliana TUBBY transcription factors (AtTLP2, AtTLP3, AtTLP5, AtTLP7 and AtTLP8), and one of them, ZmTLP1, was unique to Zea mays (Table 1, Fig. 3). We observed from simple to complex evolutionary relationships. For example AtTLP7 has a one to many relationships since, AtTLP7 is a homolog to three Zea mays genes, while AtTLP8 has a one to one relationship to the homolog ZmTLP13(Gabaldon and Koonin 2013).
TLP genes (Chen Yulong 2016, Xu, Xing et al. 2016) belong to a conservative transcription factor family found in many eukaryotic organisms in both animals and plants (Wang, Xu et al. 2018). When we explored in detail the phylogeny of the genes in plants, we noticed that they are predominately found in vascular plants ( Supplementary Fig. 1), with the exception of ZmTLP1 that is found in few vascular plants. Moreover, by performing motif prediction, we noticed that for some loci with alternative-spliced nal products, there were no characteristic TUB1 and TUB2 motifs present. This indicates that those proteins might be involved in different functions (Fig. 5).
Based on the digital expression analysis, several groups and sub-groups were formed at the developmental stages and at gene level. We found that the unique maize TLP, ZmTLP1, has the highest expression (Fig. 6) and it is expressed in most of the developmental stages, suggesting it plays a major regulatory role. The similarity analysis indicates ZmTLP1 gene does not have a homolog, but it is close to the pair of ZmTLP2 and 5 (Fig. 4). ZmTLP13, on the other hand, is the least expressed gene with the exception of seedling stage, indicating it might have a specialized role as a transcription factor to this particular stage. From our analysis, dough stage, fruit formation, anthesis with in orescence formation, and stem elongation with seedling stages might be regulated by the same set of transcription factors. The manner in which the genes are clustered is an indication that co-clustered genes might be involved in the same developmental stages. For example, the ZmTLP11 and ZmTLP14 have the highest expression in the germination stage and, therefore, play a major regulatory role in this developmental stage.
Our search for cis-regulatory elements (Supplementary table 1