Structural and functional analysis of CCT family genes in pigeonpea

Pigeonpea (Cajanus cajan L.) is a photoperiod-sensitive short-day plant. Understanding the flowering-related genes is critical to developing photoperiod insensitive cultivars. The CCT family genes were identified using ‘CCT DOMAIN PROTEIN’ as a keyword and localized on the chromosomes using the BLAST search option available at the LIS database. The centromeric positions were identified through BLAST search using the centromeric repeat sequence of C. cajan as a query against the chromosome-wise FASTA files downloaded from the NCBI database. The CCT family genes were classified based on additional domains and/or CCT domains. The orthologous and phylogenetic relationships were inferred using the OrthoFinder and MEGA 10.1 software, respectively. The CCT family genes′ expression level in photoperiod-sensitive and insensitive genotypes was compared using RNA-seq data and qRT-PCR analysis. We identified 33 CCT family genes in C. cajan distributed on ten chromosomes and nine genomic scaffolds. They were classified into CMF-type, COL-type, PRR-type, and GTCC- type. The CCT family genes of legumes exhibited an extensive orthologous relationship. Glycine max showed the maximum similarity of CCT family genes with C. cajan. The expression analysis of CCT family genes using photoperiod insensitive (ICP20338) and photoperiod sensitive (MAL3) genotypes of C. cajan demonstrated that CcCCT4 and CcCCT23 are the active CONSTANS in ICP20338. In contrast, only CcCCT23 is active in MAL3. The CCT family genes in C. cajan vary considerably in structure and domain types. They are maximally similar to soybean’s CCT family genes. The differential photoperiod response of pigeonpea genotypes, ICP20338 and MAL3, is possibly due to the difference in the number and types of active CONSTANS in them.


Introduction
Flowering is a critical event in a plant's life cycle [1]. It is regulated by many intrinsic and extrinsic factors. The intrinsic factors comprise mainly plant's age, hormonal balance, and nutritional status, while the extrinsic factors include, majorly, temperature and photoperiod [2]. Photoperiod-mediated flowering regulation occurs through the modulation of the expression of a cascade of floweringrelated genes [3]. The major flowering-related genes are CIRCADIAN CLOCK ASSOCIATED 1 (CCA1), Pseudoresponse Regulator (PRR), FLAVIN-BINDING KELCH REPEAT F-BOX 1 (FKF-1), LATE ELONGATED HYPOC-OTYL (LHY), CYCLING DOF FACTORs (CDFs), TOC1, GIGANTEA (GI), FLOWERING LOCUS T (FT), and CON-STANS (CO) [4]. Among these, FT and CO are the primary 1 3 flowering-inducing genes. These genes are the most studied and are well-characterized in Arabidopsis [5,6]. The FT gene product 'florigen' triggers the induction of flowering [7]. It is synthesized in the leaf companion cells and translocated to the shoot apex through the phloem [6,8,9]. At the shoot apex, it interacts with 14-3-3, a class of tetratricopeptide repeat (TPR) proteins and FD proteins, and forms a 'florigen activation complex (FAC),' which induces transcription of floral meristem identity genes such as APETALA1 (AP1) and LFY, leading to flower development [10]. Activation of FT transcription in response to favorable photoperiod conditions is mediated primarily through the CO transcription factors [11]. The CO transcription factors sense the favorable photoperiod condition through their interactions with the products of several photo-inducible genes like GI, FKF1, CCA1, LHY, and PRR [12][13][14][15][16][17]. The CO transcription factor gene is a member of the CCT (CONSTANS, CO LIKE; COL, and TIMING OF CAB; TOC1) gene family, possessing characteristic CCT and B-box domains in their protein products [18]. The CCT family genes include CON-STANS like (COL), PRR, CCT motif family (CMF), and GATA/tifi containing CCT (GTCC) genes [19]. These genes regulate photoperiod-dependent flowering in various plant species [20,21]. Moreover, they play vital roles in different physiological processes, including stomatal opening, photomorphogenesis, abiotic stress response, hormone metabolism, and determine various morphological traits like shoot branching, sink size, plant height, and grain number [19].
Cajanus cajan (L) Millsp., commonly known as pigeonpea or red gram, is a highly drought-tolerant pulse crop mostly cultivated in the world's tropical and subtropical parts [22]. Globally, it is grown in about 5.61 million hectares producing nearly 4.42 million tons of seed grain. India is the largest producer, contributing around 74.8% (3.31 million tons) of the global pigeonpea production [23]. Still, it imports more than 95% of the total global import volume of pigeonpea to meet its domestic demand. We can improve the situation only by enhancing pigeonpea's productivity in India, which hovers around 780 kg/ha for the last five decades (http:// faost at. fao. org/). Sensitivity to biotic and abiotic stresses, and strong photoperiod sensitivity are the critical constraints limiting the country's productivity of pigeonpea. Pigeonpea requires short-day (< 12 h) photoperiod conditions for induction of flowering [24]. Nevertheless, variation in terms of photoperiod response exists among the pigeonpea germplasm and its wild relatives. Understanding the molecular mechanisms underlying photoperiod response in pigeonpea and identifying the trait determinants is critical for breeding photoperiod insensitive cultivars, thereby enabling its cultivation across the seasons with varying photoperiod regimes. Towards this, we report the identification, chromosomal localization, and structural and functional analysis of CCT family genes in pigeonpea. We also report an account of the ortholog groups, intergeneric/interspecific sharing, and phylogenetic relationships of CCT family genes of pigeonpea and other legumes.

Plant materials
Three biological replicates of photoperiod insensitive (ICP20338) and photoperiod sensitive (MAL3) genotypes [25] of C. cajan were grown under long-day (16 h light/8 h dark) and short day (12 h light/12 h dark) photoperiod conditions with 30 °C temperature and 60% relative humidity at National Phytotron Facility, ICAR-IARI, New Delhi. We isolated the total RNA from the leaf tissues from the short day (SD) and the long day (LD) grown plants. Since ICP20338 enters the reproductive phase 45-50 days after sowing (DAS), we harvested the leaf tissues 30 and 50 DAS from both the SD and LD-grown plants. The photoperiodsensitive genotype MAL3 enters the reproductive phase 115-120 DAS; hence, we harvested the leaf tissues from MAL 3 50 and 120 DAS from the SD and LD-grown plants.

RNA isolation and cDNA synthesis
Total RNA was isolated from the leaf tissues using Spectrum Plant Total RNA Kit (Sigma Aldrich, USA). We removed the residual DNA in the total RNA through DNase (Thermo Fisher Scientific, Waltham, USA) treatment and quantified the total RNA using a Nanodrop ND-1000 spectrophotometer (Thermo Fisher Scientific, USA). We assessed the quality and the integrity of the isolated RNA through electrophoresis on 1.2% denaturing agarose gel. The cDNA synthesis was performed using the RevertAid first-strand cDNA synthesis kit (Thermo Fisher Scientific, Waltham, USA) as per the manufacturer's instructions.

Genome-wide identification of CCT family genes
We identified the CCT family genes in the C. cajan genome available at the Legume Information System (LIS) database (https:// legum einfo. org/) using 'CCT DOMAIN PROTEIN' as a keyword. The genes were located on the chromosomes using the BLAST search option available at the LIS database. We named all the CCT family genes with a prefix Cc representing Cajanus cajan and a numeral suffix in ascending order denoting their chromosomal positions [26]. The chromosome-wise distribution of CcCCT family genes was depicted using Mapchart software 2.3 [27]. The protein sequences of CCT family genes of Vigna angularis, Vigna radiata, Vigna unguiculate, Glycine max, Medicago truncatula, and Cicer arietinum were downloaded from the 1 3 LIS database. The protein sequences of CCT family genes of Arabidopsis thaliana were downloaded from the Uniprot protein database (https:// www. unipr ot. org/ unipr ot/).

Identification of centromere positions of C. cajan chromosomes
We identified the centromere positions of the C. cajan chromosomes using the procedure given by Melters et al. [28]. The chromosome-wise FASTA files of C. cajan, downloaded from the NCBI database (accession: GCA_000340665.1), were used as a local database, while the centromeric repeat sequence of C. cajan (ACT TGC TAC ACC TGG GGA GAC TAA TAA CCA ACA CAG ATG CAC AAC ATA GCA TGT AAT TGG TTT ACT GTT CAT TGG TTC TCT CTA ATT CTT CAA CTG ACTT) identified by Melters et al. [28] was used as the query sequence. The blast hits showing more than 85% similarity were designated as the centromeric positions of the chromosomes.

Domain prediction, classification, and structural analysis of CcCCT family genes
We predicted conserved domains in the amino acid sequences of the CcCCT family genes using the 'Web CDsearch tool' available at https:// www. ncbi. nlm. nih. gov/ Struc ture/ bwrpsb/ bwrpsb. cgi, and the exon-intron structures in the CcCCT family genes using the online software Gene Structure Display Server 2.0 (GSDS 2.0) (http:// gsds. cbi. pku. edu. cn/). We classified the CcCCT family genes into COL, CMF, PRR, and GATA and tifi containing CCT (GTCC) types based on the presence of additional domains and/or CCT domain.

Prediction of orthologous relationships, intergeneric/interspecies sharing, and phylogenetic analysis of CCT family genes in legume species
We compared the amino-acid sequences of the CCT family proteins of the seven common legumes, including C. cajan, V. angularis, V. radiata, V. unguiculate, G. max, M. truncatula, and C. arietinum, and the non-legume model species A. thaliana through the pair-wise BLASTP analysis keeping the sequence similarity and the E-value cutoff minimum at 90% and 1e − 10, respectively. We presented the connectome using Circos software (version 0.69). The orthologous relationship between the CCT family genes of different species was inferred using the OrthoFinder software (version 2.2.6). The multiple sequence alignments of amino acid sequences of the CCT family genes of seven common legumes and A. thaliana were performed using the Clustalw software. The phylogenetic analysis was performed using the MEGA 10.1 software [29], and the tree was constructed using the maximum likelihood method with the bootstrap value of 1000.

Expression analysis of CcCCT family genes using transcriptome data
The RNA-seq data generated from the leaf tissues collected 50 and 120 DAS from ICP20338 and MAL3, respectively grown under SD and LD conditions were analyzed to determine the level of expression of the CcCCT family genes. Firstly, we identified the CcCCT family genes' transcripts corresponding to the CDS through BLAST-based sequence identification analysis. The transcripts' ID was then used to extract the expression values (FPKM) of the CcCCT family genes from all the four samples derived from ICP20338-SD, ICP20338-LD, MAL3-SD, and MAL3-SD used in the study.

Expression analysis of CcCCT family genes through qRT-PCR
We designed the qRT-PCR primers for the CcCCT family genes using PrimerQuest™ Tool (https:// eu. idtdna. com/ Prime rquest/ Home/ Index) of Integrated DNA Technologies (IDT). The primers were custom synthesized at the Bioengineering Services, India Private Ltd. The details of qRT-PCR primers are given in Supplementary Table S1. The qRT-PCR assay was performed in a Light Cycler qRT-PCR (Roche) instrument using a Brilliant III Ultra-Fast SYBR Green qPCR master mix kit (Agilent Technologies, USA). Each qRT-PCR reaction comprised of 0.5 µl of each primer, 2 µl of cDNA, 10 µl SYBR green master mix, and 7.5 µl DEPC treated water. We used the tubulin gene-specific primers of C. cajan as an internal control. The qRT-PCR reaction conditions were set at 94 °C for 3 min, followed by 40 cycles of 94 °C for 30 s, 60 °C for 15 s and 72 °C for 20 s. The melting curve analysis followed with 10 s at 95 °C and then 10 s each at 0.5 °C increments between 56 and 95 °C. The fold change expression of CCT genes in the leaf tissues collected 50 and 120 DAS from ICP20338 and MAL 3, respectively, was calculated by ∆∆CT method, considering the expression values of CCT genes in the leaf tissues collected 30 and 50 DAS from ICP20338 and MAL 3, respectively as reference.

Genome-wide identification and chromosomal localization of CCT family genes of pigeonpea
We identified a total of 33 CCT family genes in the genome of C. cajan (Table 1). These genes were designated as CcCCT1 to CcCCT33 in ascending order based on their chromosomal positions. All the 33 CcCCT family genes were dispersed on 10 C. cajan chromosomes and nine genomic scaffolds (Fig. 1). Chromosomes 3, 11, and 7 contained a relatively higher number of CcCCT family genes, with 6, 4, and 3 genes. Each of the chromosomes 2, 4, 6, and 9 contained 2 CcCCT family genes, while chromosomes 1, 8, and 10 had a single CcCCT family gene. Nine CcCCT (CcCCT25-CcCCT33) genes could not be placed on any of the chromosomes or anchored scaffolds. The genes CcCCT7 and CcCCT8, and CcCCT15 and CcCCT16 were located in close vicinity, as pairs of repeat genes, on chromosomes 3 and 7. The BLAST analysis using the conserved centromeric tandem repeats as query against the C. cajan chromosome sequences revealed the tentative physical positions of centromeres in all the 11 chromosomes of C. cajan (Supplementary Table S2). The chromosomal localization study of the CcCCT family genes indicated that many of these genes were localized in the chromosomes' centromeric regions, explaining the substantial structural and functional conservation of CCT family genes [30].

Domain prediction and structural analysis of CcCCT family genes of pigeonpea
Domain prediction of CcCCT family genes using 'web CD-search tool' revealed that among the 33 CcCCT family genes, protein products of 12 genes contained only the CCT domain, while that of the remaining 21 genes  (Fig. 2). The study indicated that C. cajan CCT family is composed of a broad group of genes. The protein products also displayed a vast diversity within the gene family members (Fig. 2). Among the 12 COL type CcCCT proteins, two proteins contained three B-box domains, four proteins containing two B-box domains, and six proteins containing a single B-box domain along with a CCT domain. An integrated comparative and phylogenetic analysis of the four subfamilies of CCT genes in Poaceae has revealed that the non-B-box domain-containing CMF types have evolved from the B-box domain-containing COL types through the successive loss of B-box domains. Evidence suggests that COL types continue to lose their B-box domains from three to two and two to one and then to none [31].
The structural analysis revealed a considerable variation in the length and structure of the CcCCT family genes (Fig. 2). Among the 33 CcCCT family genes, CcCCT33 was the smallest (~ 4 kb), while CcCCT11 was the largest (~ 16 kb). The COL type CcCCT family genes varied in length from 3 to 5 kb and contained either one or three introns. Similarly, the CMF type CcCCT family genes varied from 3 to 6 kb with 1-4 introns, while GTCG type CcCCT family genes ranged from 5 to 7 kb with 5-9 introns. The PRR type CcCCT family genes were the longest, ranging from 6 to 16 kb, and contained 7-8 introns (Fig. 2). The structure of the CCT family genes observed in the study showed significant similarity with the structure of the CCT family genes of A. tauschii [30]. The molecular weight of protein products of CcCCT family genes ranged from 19.4 kD (CcCCT28) to 84.8 kD (CcCCT11) with the isoelectric points (pI) ranging from 4.37 (CcCCT18) to 9.77 (CcCCT16) ( Table 1). The range of the molecular weight of the protein products of CcCCT family genes and their pIs were comparable to those observed in A. tauschii [30].

Orthology, intergeneric/interspecific sharing, and phylogenetic analysis of CCT family genes
Besides 33 CCT family genes identified in C. cajan, the genome-wide survey of CCT family genes in other legumes We may attribute the presence of a significantly higher number of CCT family genes in G. max to genome duplication events during its evolution from common ancestors [32]. The analysis of CCT family genes for predicting orthogroups revealed all three kinds of relationships, namely one-to-one, many-to-one, and manyto-many types, between the orthologous genes of the different legume species [33]. All 33 CCT family genes identified in C. cajan clustered into nine paralogous groups and nine singletons (Supplementary Table S3). Among the nine paralogous groups, five groups had only two paralogues, while the remaining four groups comprised 3-4 paralogues. All the nine paralogous groups and the singletons had the corresponding orthologs in the other six common legumes. However, only five paralogous groups and six singletons had their corresponding orthologs in A. thaliana. The results indicated that orthologous genes across the different legume species and A. thaliana had a variable number of paralogs in other species. These results support the earlier reports indicating that the proportion of duplicated genes retained in a genome is highly variable. As a result, the number of paralogous genes in a paralogous group varies significantly among different species [34]. Non-adapted species tend to possess a higher number of duplicated genes which provides an opportunity for neofunctionalization and thus adaptive evolution of the species [35][36][37]. A significant variation in the number of members in different paralogous groups may also be attributed to the fact that gene duplication variation allows subfunctionalization if the duplication occurs in a gene performing more than one function. In such a case, the duplicated gene pair might adapt to serve the individual function separately from the parental gene [38].
The Circos map revealed an extensive sharing of CCT family genes among the legumes. It indicated that C. cajan shared the highest proportion of CCT family genes with G. max, followed by V. unguiculata, V. radiata, V. angularis, C. arietinum, and M. truncatula (Supplementary Fig. S1). These results are, in general, congruent with the taxonomic relationships of these legume species [39]. The phylogenetic analysis based on the alignment of amino acid sequences of the 291 legume-based and 26 wellcharacterized A. thaliana CCT family genes showed that the CCT family genes could be grouped in seven different clusters designated as clusters A-G (Fig. 3). Cluster-wise distribution of the protein products of the CCT family genes exhibited all GTCC types' affiliation to cluster A. The CMF type CCT family genes were distributed in clusters B and C, while COL type CCT family genes were distributed in clusters D, E, and F. All the PRR type CCT family genes showed their affiliation to cluster G. The C. cajan CCT family genes were dispersed in all the clusters of the phylogenetic tree. Cluster B contained the highest number of CcCCT family genes (9). Clusters C and D contained 3 CcCCT family genes; clusters A and E contained 4 CcCCT family genes, and clusters F and G contained 5 CcCCT family genes. C. cajan and G. max CCT family genes showed maximum similarity and co-clustered in the phylogenetic tree.
As evident in other species, the long-term evolution and anthropogenic-driven selection pressures during domestication of the common legumes may have resulted in a significant inter and intra-species/generic variation in the

Expression analysis of CcCCT family genes in photoperiod sensitive and insensitive pigeonpea genotypes
In the photoperiod-sensitive plant species, COL, PRRs, and CMF type of CCT family genes play a crucial role in photoperiod-dependent flowering by either modulating the expression of FT or activation of photoperiod independent pathways of flowering [19,21,26]. The CO protein controls flowering induction events in plants through interaction with environmental stimuli and plants' circadian rhythm proteins. Under favorable conditions, it activates the expression of FT genes to induce flowering. Transcription and translation of FT occur in the plant's reproductive leaves, and translated protein products are translocated to the plant's shoot apex through the phloem [40]. Hence, reproductive leaves are ideal tissues for studying expression profiles and identifying probable candidate gene(s) for photoperiod sensitivity. We analyzed the expression profile of all the 33 CcCCT family genes using transcriptome data obtained from the leaf tissues collected 50 and 120 DAS from ICP20338 and MAL3, respectively grown under SD and LD conditions. The results indicated a significantly higher expression of only three CcCCT family genes, namely CcCCT4, CcCCT23, and CcCCT32 in the leaf tissues collected 50 and 120 DAS from ICP20338 and MAL 3, respectively (Fig. 4). We further analyzed the expression profile of these three genes using realtime PCR. The expression analysis of CcCCT4 indicated its higher expression in the leaf tissues collected 50 DAS than the leaf tissues collected 30 DAS from ICP20338 under both SD and LD conditions. However, it did not show its expression in MAL 3.
Compared to the leaf tissues collected 30 DAS, CcCCT23 also exhibited higher expression in the leaf tissues collected 50 DAS in both SD and LD-grown ICP20338 and the leaf tissues collected 120 DAS in the SD-grown MAL 3. Its expression in LD-grown MAL3 was, however, non-detectable. The expression of CcCCT32 in the leaf tissues collected 30 and 50 DAS, and 50 and 120 DAS from ICP20338 and MAL 3, respectively, under SD and LD conditions did not show significant variation (Fig. 5). ICP20338 is a photoperiod insensitive genotype and flowers under both SD and LD conditions. On the other hand, MAL3 is photoperiod sensitive and flowers only under SD conditions. The expression of CcCCT23 in the SD-grown and not in the LD-grown MAL 3 genotype strongly indicates that CcCCT23 is the candidate gene for the synthesis of CONSTANS protein in pigeonpea.
Moreover, the expression of CcCCT4 only in ICP20338 and not in MAL3 indicates the possibility of two active CONSTANS genes, namely CcCCT4 and CcCCT23 in ICP20338, making it photoperiod insensitive. On the other hand, only CcCCT23 as a functional CONSTANS gene makes MAL3 photoperiod sensitive.