The chloroplast genome sequence of the S. macrophylla was studied in this research. The LSC and SSC regions are separated by a circular quadripartite structure formed of two inverted repeats. The S. macrophylla chloroplast genome consists of 150,778 bp with a large single copy (LSC) of 83,681 bp, a small single copy (SSC) region of 19,813 bp, and a pair of inverted repeats with a length of 23,642 bp. All Dipterocarpaceae species have similar genome size, LSC, SSC and IR length [17, 22–24]. S. macrophylla had 112 unique genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. All Shorea species have nearly same gene arrangement and gene order. However, among all the Shorea species, the atpF and rpl2 genes are absent in S. zeylanica and S. roxburghii respectively. Besides that, the ycf1 gene are absent S. macrophylla. While certain genes are only found in certain species, for example, rpl22 gene is exclusively found in S. macrophylla, while the ycf15 gene is only found in S. leprosula and S. roxburghii. The gene content of eleven chloroplast genomes of Dipterocarpaceae species was compared by Yu et al., (2021). They discovered that all Dipterocarpus species lack the rps16 gene and ycf15 gene is absent in only 2 species which are V. xishuangbannaensis and S. henryana. However, we included two more Shorea species in our research: Shorea macrophylla and Shorea zeylanica and we solely compared among Shorea genera. The missing genes are also reported in other studies [25, 26].
Expansion and contraction of the IR region, which contribute to difference of chloroplast genome size, play a crucial role in the evolution of plants. The junctions between single-copy and IR regions in S. macrophylla and closely related species are analysed. Although there was some variation at the junctions, the gene structures and orientations were all the same. The psbA, trnH and rps19 genes of S. macrophylla embedded completely in the LSC, which is consistent among all the Shorea and Parashorea species. ycf1 gene was located at the boundary of IRb/SSC regions in all the species. However, the length until the IRb/SSC boundary is different in the range of 51 to 600 bp in which according to Dong et al., (2015) study, ycf1 gene in SSC region is highly variable which is compatible with our analysis. Apart from the discussed genes above, there are expansion and contraction at the IR region detected due to the different gene patterns showed by different species at the boundaries. For example, at the IRa/SSC boundary, the ndhF gene located completely in the SSC region for S. leprosula, S. pachyphylla, S. zeylanica and P. macrophylla, while part of the gene (43–63bp) are found to expanded into the IR region for S. henryana, S. macrophylla, S. roxburghii and Parashorea chinensis. This condition was also found in rpl2 genes at IRa/LSC and IRb/LSC boundaries. However, the gene patterns across thes eight species are different from each other and no specific grouping can be done within the eight species.
Codon usage analysis of S. macrophylla showed biased which is conserved in Dipterocarpaceae family. All bias in codon usage among Dipterocarpaceae species was preferentially concluded with A/U bases. A/U biases are not limited to the Dipterocarpaceae family; they can be found in many other plant families, including Arecaceae, Zingiberaceae, Asteraceae and Gesneriaceae [20, 21, 28, 29]. Natural selection, species mutation and genetic drift are all thought to have contributed for such bias [30].
Future genetic investigations will benefit from the short and long repeats. In total, 262 SSRs were discovered in S. macrophylla, with 187 SSRs (71.37%) from LSC, 51 SSRs (19.47%) from SSC and 24 SSRs (9.16%) from IR. The ratio is quite consistent across all other Shorea and Parashorea species, with the LSC having the most SSRs and the SSC and IR regions always having the least. In all Shorea and Parashorea genus analogues, mononucleotide repeats made up the highest percentage of the overall SSR population, consistent with the chloroplast genome of most of the plant, except for wild rice, which favours dinucleotide repeats over mononucleotide repeats [31]. In all species, SSRs are more common on A/T than G/C nucleotides. Most of the repeats in S. macrophylla’s long repeat analysis are forward and palindrome type, which is similar to some other plant species such as Alpinia oxyphylla Miq., Oreocharis esquirolii, Hosta spp. (Gao et al., 2019; Gu et al., 2020; Lee et al., 2019). SSRs and repeats can be used to provide lineage-specific markers for S. macrophylla and its related species, allowing for genetic diversity study.
The study of nucleotide variations in the chloroplast genomes of closely related Dipterocarpaceae family counterparts is important for future species identification and taxonomy. The ycf2, psbI-trnG, ndhA-ndhI, psbZ-trnG, rps12-ndhB, rpl33-rpoA, rpoA-psbB and rpoC1 regions were revealed as mutational hotspots of six Shorea species with two Parashorea species in this study. Among the mutational hotspot region, ycf2, psbI-trnG, psbZ-trnG has been found to have the highest degree of differentiation and hence can be utilised to investigate adaptive evolution traits of the gene [33–36].
We discovered that the clades created closely resembled those of their taxonomic subgroup based on maximum likelihood phylogenetic analysis of all the protein-coding genes from chloroplast genomes of 24 Dipterocarpaceae family members in this study. When compared to the phylogenetic tree generated by Yu et al. (2021) using the full chloroplast genome sequences of 22 sequences from Dipterocarpaceae family, the likeness to the tree constructed in this study is striking, with the exception of the inclusion of S. macrophylla species. Neobalanocarpus heimii, Shorea zeylanica, Shorea pachyphylla, Dryobalanops aromatica, and our experimented species are among the species not included in Yu et al. (2021)’s phylogenetic study. Dryobalanops aromatica is situated furthest from other Dipterocarpaceae species in our phylogenetic tree. It is closer to the outgroup, where the conclusion is symmetrical to Wang et al., (2021) phylogenetic analysis. According to the results of our phylogenetic analysis, Shorea was not a monophyletic group, with some Shorea species forming a clade with Parashorea species, as reported by Gamage et al., (2006); Indrioko et al., (2006); Yu et al., (2021), and this is incompatible with traditional taxonomy, requiring new taxonomic treatment.