The legume family (Leguminosae) is economically one of the most successful lineages among flowering plants 33. It exhibits a significantly higher species diversification rate over the last 60 million years compared to angiosperms as a whole 71. Within this family, winged bean distinguishes itself with its nutrient-rich components and effective symbiotic associations with a broad spectrum of rhizobia strains, making it suitable for low-input and self-resilient agricultural systems 72. Chloroplast genomes, with their conserved nature, are crucial resources for studying evolutionary dynamics and phylogenetic relationships across plant taxa 73, 74. The present study reports the sequencing and characterisation of the chloroplast genome of P. tetragonolobus, along with its comparative analysis with other legumes, including V. radiata, P. vulgaris, L. purpureus, C. tetragonoloba, G. max, C. cajan, M. truncatula, C. arietinum, and A. hypogaea, with A. thaliana serving as the outgroup.
Leguminosae is divided into three subfamilies: Caesalpinioideae, Mimosoideae, and Papilionoideae. Caesalpinioideae, a paraphyletic group, is the ancestral base for the monophyletic subfamilies Mimosoideae and Papilionoideae 75. Papilionoideae, the largest subfamily, comprises 13,800 species across 28 tribes in 478 genera 33. It is further divided into Swartzioid and Aldinoid lineages and other genera within a larger monophyletic group marked by a 50 kb inversion in the chloroplast genome 24. The 50 kb inversion group includes three major clades: Genistoids, Dalbergioids, and the Old World clade. The Genistoid clade is characterised by the accumulation of quinolizidine while the Dalbergioid clade typically exhibits "aeschynomenoid" root nodule morphology 76. The Old World clade further segregates into the Indigoferoid/Millettioid and Hologalegina clades, with the latter splitting into the Robinioid and Inverted Repeat-Lacking Clade (IRLC). Indigofereae is sister to the Millettioid group, comprising Phaseoloid and core Millettieae clades and allies 77, 78.
In recent years, there has been a growing interest in the legume systematics community to combine expertise and data to capitalise on new approaches in genetics and bioinformatics.With the advancement of sequencing technologies, an increasing number of chloroplast genomes have been sequenced and used for phylogenetic analysis. We used the nucleotide sequences of all predicted chloroplast genes of P. tetragonolobus and the reported chloroplast genes for nine other legumes, including V. radiata, P. vulgaris, L. purpureus, C. tetragonoloba, G. max, C. cajan, M. truncatula, C. arietinum, and A. hypogaea, along with A. thaliana as the outgroup, to delineate the phylogenetic position of P. tetragonolobus in relation to the related genera. The multigene-based phylogenetic tree resolved the cladistic position of all the legumes considered for the study with robust bootstrap support. P. tetragonolobus clustered closely with C. cajan, G. max, L. purpureus, P. vulgaris, and V. radiata of the Millettioid group under Tribe Phaseoleae within the Phaseoloid clade. Moreover, all the other legume species considered for the study showed affiliation with their respective clades consistent with the current state of legume phylogeny 79, 80, 81, reinforcing the utility of chloroplast genome sequences in deep phylogenetic analysis.
The comparative analysis of the sequences of the chloroplast genomes of P. tetragonolobus and other legumes revealed clade-wise general conservation in genome size, length of IR, LSC, and SSC regions, along with their GC contents and gene content. As expected, C. arietinum and M. truncatula, belonging to the Hologalegina/ Inverted Repeat Lacking Clade (IRLC), exhibited the smallest genome size and gene contents, attributed to the presence of only a single copy of IR 82, 83, 84. A. hypogaea, belonging to the Dalbergioid clade, exhibited the largest genome size, whereas the genome sizes for Indigoferoid/Millettioids, comprising P. tetragonolobus, V. radiata, P. vulgaris, L. purpureus, C. tetragonoloba, G. max, and C. cajan, were slightly smaller than the Dalbergioids, ranging from 151,294 bp to 152,530 bp, varying by only 1236 bp. These results indicate a notable degree of genomic homogeneity among the leguminous species under investigation. Moreover, they highlight the strength of the cladistic approach to biological classification based on the hypotheses of most recent common ancestry.
We observed a notable uniformity in GC content across the LSC, SSC, and IR regions among various legume species. Furthermore, the GC content of tRNAs and rRNAs was significantly higher than that of protein-coding genes. Notably, a proportionately higher number of GC-rich tRNAs and rRNAs in the IR regions contributed to their overall higher GC content compared to the LSC and SSC regions. These GC-rich regions ensure structural integrity and functional resilience across diverse taxa 85, 86, 87.
The synteny analysis of whole chloroplast sequences from P. tetragonolobus, C. cajan, G. max, and L. purpureus, along with A. thaliana, provided valuable insights into the structural variations within these genomes. Pairwise alignments revealed high synteny among the chloroplast genomes but also unveiled a notable signature of rearrangements and inversions, particularly within the IR and SSC regions. Notably, the SSC segment exhibited an inversion in P. tetragonolobus compared to C. cajan and G. max, indicating a structural deviation specific to this species. Complementing the pairwise analysis, global alignment using the shuffle-LAGAN algorithm highlighted a region of reduced exon conservation spanning from 24.5 to 46 kb within the chloroplast genomes. This region corresponded to the site of inversion observed in the pairwise synteny analysis, reinforcing the presence of rearrangements within this segment of the chloroplast genome. The observed rearrangements may be attributed to flip-flop intramolecular recombination in the plastome, a mechanism proposed by Ogihara et al.88. While such events are rare, recent studies 89, 90 highlight their significance as evolutionary drivers, potentially conferring adaptive advantages. Identifying structural variations within chloroplast genomes, such as inversions and rearrangements, underscores the dynamic nature of plastome evolution. Understanding the mechanistic underpinnings and functional implications of these rearrangements offers valuable insights into the evolutionary trajectories of plant species within the Leguminosae family.
The Ka/Ks ratio value, which infers the rate of gene divergence between species, serves as an indicator to identify genes undergoing different selection pressures 91, 92. In protein-coding genes, synonymous substitutions occur more frequently than non-synonymous substitutions 91, 93. In this study, since the majority of genes had values between 0 and 0.5, this is a strong signature representing that in P. tetragonolobus, nonsynonymous mutations are being removed from the population at a faster rate than synonymous mutations. Thus, the genes are under purifying selection and tend to maintain their required functions. Meanwhile, the only gene observed to be under diversifying selection is rpl23. This gene has been reported to be deleted, duplicated, and accumulate mutations not only in legumes but also in other plant species, including cereal crops 29, 94–100.
Among the 64 codons directing protein synthesis, 61 encode standard amino acids, while 3 serve as translation stop signals. Most amino acids have multiple synonymous codons, except for tryptophan and methionine, typically encoded by one codon each 101. The degeneracy of the genetic code allows the same amino acid to be encoded by different codons 102, 103. However, codon usage varies among organisms, genes, and even the same gene from different species, resulting in codon usage bias 104, 105. Codon usage bias leads to non-random appearance of synonymous codons with different frequencies 106, 107. Codon bias impacts numerous cellular processes, such as mRNA stability, transcription, translation efficiency, and protein expression and cotranslation folding 108–110. It influences chromatin structure and mRNA folding, thereby regulating transcription levels and translation efficiency by modulating the elongation rate of translation 108–111. Codon bias analysis aids in revealing horizontal gene transfer and evolutionary relationships between closely related organisms 112, 113. The RSCU value compares the observed frequency of a specific synonymous codon to the expected frequency (no codon usage bias). A value of 1.0 suggests no bias, with equal codon usage for that amino acid. Values above 1.0 indicate positive bias, while those below 1.0 indicate negative bias. RSCU values exceeding 1.6 or falling below 0.6 indicate overrepresented and underrepresented codons, respectively 38, 114.
Excluding the termination codon, we found 61 codons for 20 amino acids in the winged bean chloroplast genome, with RSCU values ranging from 0.47 to 4.38. Notably, CGC, encoding alanine, exhibited the lowest RSCU, while TTT, encoding phenylalanine, showed the highest RSCU. Phenylalanine, lysine, and asparagine were the most abundant amino acids encoded by the winged bean genome. Among the various synonymous codons for amino acids such as arginine, asparagine, aspartic acid, glutamic acid, isoleucine, leucine, lysine, serine, and tyrosine in the winged bean chloroplast genome, codons ending with either A or U were overrepresented. This bias towards codons ending with A or U may be attributed to the higher AT content of chloroplast genomes, resulting from mutation and natural selection processes 38, 115.
Understanding repeat sequences within genomes is critical for deciphering evolutionary patterns and genetic diversity 116, 117. In the present study, we identified 59 repeats, primarily concentrated in intergenic spacers, consistent with patterns in other legumes studied recently 82, 118, 119. A large proportion of repeats were also found in the genes namely ycf2, ndhA, ndhF, rps12, rpl22, pafI and psaA, indicating potential functional implications. The prevalence of repeats in intergenic regions highlights their role in genomic rearrangements and evolutionary dynamics 120, 121. Leveraging these conserved patterns as genetic markers can enhance phylogenetic and population studies in legumes, providing insights into chloroplast genome evolution and plant adaptation 120–122.
We identified 84 perfect SSRs, two compound SSRs, and 15 VNTRs in the chloroplast genome of winged bean. Their distribution in the LSC, SSC, and IR regions was generally similar to that observed in other legumes 99, 82, 118, 123, 124,125. Typically, a significant proportion of SSRs in genic regions consist of trinucleotide repeats, which help mitigate the detrimental effects of frame-shift mutations 126, 127. However, in our study, we found only mono- and dinucleotide repeats, predominantly concentrated in intergenic spacers and introns rather than exons. This may help counteract the detrimental effects of frame-shift mutations caused by mono- and dinucleotide repeats in genic regions 128–130. The SSR markers identified in the chloroplast genome of the winged bean are of significant advantage in evolutionary and taxonomic research due to their maternal inheritance and lower mutation rates 126, 127.
In conclusion, the study of the chloroplast genome of P. tetragonolobus and its comparative analysis with other legumes has provided valuable insights into the evolutionary dynamics and phylogenetic relationships within the legume family. The findings highlight the Andrews, utility of chloroplast genome sequences in deep phylogenetic analysis and support the strength of the cladistic approach to biological classification based on the hypotheses of most recent common ancestry. The observed genomic homogeneity among the leguminous species under investigation and the uniformity in GC content across different regions underscores the conserved nature of chloroplast genomes within this plant family. These results contribute to our understanding of legume systematics and emphasise the importance of combining expertise in genetics and bioinformatics to capitalise on new approaches for studying plant evolutionary biology.