Homology: Feature of Chloroplast Genomes
In this study, the chloroplast genome features and phylogenetic relationships of Buchanania latifolia were comprehensively analyzed. The chloroplast genome of B. latifolia showed a high degree of homology compared to other reported chloroplast genomes of Anacardiaceae species, including Dracontomelon delavayi, Lannea coromandelica, Pistacia nitida, Spondias paniculata, and Swintonia reticulata. The genome lengths of these species ranged from 159,485 to 162,509 bp, with a maximum difference of 3,024 bp. The overall GC content is comparable to that of other species in the Anacardiaceae family, such as Rhus chinensis (37.79%), Pistacia weinmannifolia (37.84%), Toxicodendron vernicifluum (37.96%), and Cotinus species (37.9%-38.1%)
(Wang et al., 2020, Zheng et al., 2018, Liu et al., 2023). The number and types of genes were also very similar, reflecting the highly conserved characteristics of chloroplast genomes. The homology among closely related woody species is typically considered reasonable due to their long generation times and the relatively low number of substitutions occurring over a given period. Substitutions from parent plants can only be passed to offspring during the process of germination.
The study of codon preference aids in understanding the evolutionary processes of plant species and optimizing the expression of exogenous genes in chloroplasts, enabling the prediction of gene function and expression levels (Li et al., 2019b). Consistent with previous findings, Buchanania latifolia demonstrates a preference for A or U bases in its chloroplast genome codons, a common trait in plant chloroplast genomes (Zhou et al., 2008). In B. latifolia, 30 high-frequency codons were identified, with 29 of them ending with an A or U base, likely influenced by natural selection and mutations (Necşulea and Lobry, 2007). Previous studies have also noted that in the chloroplast genomes of Anacardiaceae, high-frequency codons tend to utilize A or U bases as the third codon base (Liu et al., 2023, Xin et al., 2023, Wang et al., 2020).
The most prevalent SSRs in the chloroplast genomes of B. latifolia were mononucleotide repeats. Similar to other plants, chloroplast SSRs predominantly consist of short poly-A or poly-T repeats, with mononucleotide repeats being the most common forms (Tao et al., 2023, Vu et al., 2020, Djedid et al., 2021, Provan et al., 2001, Yang et al., 2020). Moreover, the majority of SSRs are located in the LSC and SSC regions, consistent with previous findings on chloroplast genomes (Alshegaihi, 2024, Liu et al., 2023, Wang et al., 2020). Palindrome sequences accounted for 39.68%, forward repeated sequences for 57.14%, and reverse sequences for 3.17%. No complementary sequences were found. Forward and palindromic repeats were the most common repeat types, with most dispersed repeats being less than 40 bp, as reported in previous studies (Liang et al., 2020, Kirov et al., 2020, Tian et al., 2021, Yuan et al., 2005).
The IR region of the chloroplast genome is thought to be the most conservative section. The expansion and contraction of the IR region are pivotal factors influencing the length variation observed in plant chloroplast genomes, typically categorized into two types (Yi et al., 2013, Zhang et al., 2013). Such expansions and contractions in the IR region across most species manifest as minor deviations in the IR/SC boundary within a few fixed genes, which may lead to pseudogenization of certain genes. The examination of the chloroplast genome of B. latifolia and the Anacardiaceae family in this study aligns with prior reports in angiosperms (Liu et al., 2023, Wang et al., 2020, Xin et al., 2023). Typically, the LSC/IRb boundary is situated on or near rps19, rpl2, or rpl22, while IRb/SSC is generally positioned on ycf1 or between ycf1 and ndhF. The SSC/IRa boundary typically lies on ycf1, and the IRa/LSC boundary usually falls on or near rps19, rpl2, rpl12, and trnH (Alshegaihi, 2024).
Intergenic spacers are more divergent than introns and protein-coding sequences (Meng et al., 2018). However, pseudogenes suffer the same fate as intergenic spacers due to a lack of functional importance, leading to less conservative strains. Pseudogenization was common in the evolution of chloroplast genomes, such as accD, ccsA, ycf1, rps19 and psbB pseudogenes (Krawczyk et al., 2018, Li et al., 2021). The most divergent genes among the six Anacardiaceae species were two pseudogenes accD and ycf1, an intron of the rps16 genes, and the intergenic spacer ndhF-rpl32. These genes or regions can be utilized for developing molecular markers for species identification and population studies (Magdy et al., 2019).,
The synonymous (Ks) and non-synonymous (Ka) nucleotide substitution pattern is a well-recognized marker for assessing genome evolution; and the Ka/Ks ratio reflects selection pressure on genes (Yang and Nielsen, 2000, Guo et al., 2017). Ka/Ks < 1, Ka/Ks = 1, and Ka/Ks > 1 indicate genes that underwent purifying, neutral, and positive selections, respectively (Yang and Nielsen, 2000). Generally, synonymous mutations occur more frequently than nonsynonymous mutations within genes, causing the Ka/Ks values to be below 1 (Makałowski and Boguski, 1998). Our results using B. latifolia and five species from the Anacardiaceae family indicate that only the psbT gene in L. coromandelica and the rpl22, rpl32, rps16 and ycf2 genes in S. paniculata have Ka/Ks values > 1, suggesting strong positive selection acting on these two genes. Ka/Ks values for all other detected genes are below 1, indicating widespread purifying selection on these chloroplast genomes.