Plastome sequencing of South American Podocarpus species reveals low rearrangement rates despite ancient gondwanan disjunctions

Historical reconstructions within Podocarpaceae have provided valuable information to disentangle biogeographic scenarios that begun 65 Mya. However, early molecular phylogenies of Podocarpaceae failed to agree on the intergeneric relationships within the family. The aims of this study were to test whether plastome organization is stable within the genus Podocarpus, to estimate the selective regimes affecting plastome protein-coding genes, and to strengthen our understanding of the phylogenetic relationships and biogeographic history. We sequenced the plastomes of four South American species from Patagonia, southern Yungas, and Brazilian subtropical forests. We compared their plastomes to those published from Brazil, Africa, New Zealand, and Southeast Asia, along with representatives from other genera within Podocarpaceae as outgroups. The four newly sequenced plastomes ranged in size between 133,791 bp and 133,991 bp. Gene content and order among chloroplasts from South American, African and Asian Podocarpus were conserved and different from the plastome of P. totara, from New Zealand. Most genes showed substitution patterns consistent with a conservative selective regime. Phylogenies inferred from either complete sequences or protein coding regions were mostly congruent with previous studies, but showed earlier branching of P. salignus, P. totara and P. sellowii. Highly similar and conserved plastomes of African, South American and Asian species suggest that P. totara plastome should be revised and compared to other species from Oceanic distribution. Furthermore, given such structural conservation, we suggest plastome sequencing is not useful to test whether genomic order can be climatically or geologically structured.


Introduction
The genus Podocarpus (yellowwoods or mañíos) is the largest genus in the Podocarpaceae family, both in diversity and in distribution. It is divided into two monophyletic subgenera, Podocarpus and Foliolatus. The subgenus Podocarpus occurs in all continents of Gondwanan origin (i.e., South America, Africa, Oceania and Australia), while M. Paula Quiroga paulaquiroga@comahue-conicet.gob.ar 1 The chloroplast genome (or plastome) of terrestrial plants is approximately 120-220 kb in size and consists of two copies of the inverted repeats (IRs) that separate the small and large single copy regions, SSC and LSC respectively [10,11]. In gymnosperms, the plastome is highly variable in size and organization, and this variation is often used in phylogenetics to identify families, genera, and subgenera [12]. Yet, certain families remain poorly sampled such as only eight of the 187 species within Podocarpaceae have been sequenced so far [13]. Also, the loss of the large IR has been reported in several species, mainly conifers [14][15][16]. Furthermore, many rearrangements can be observed in these plastomes, some of which appear to have played an important role in their evolution [15][16][17]. A frequent change is the deletion of genes, as reported within the inverted repeat region in Cupressophyta (Araucariales and Cupressales) and Pinaceae [18]. In Cupressoidaea, Sudianto et al. [19] suggest that added complexity results from the existence of isomeric plastomes. As in other species of Podocarpaceae [16,17], the plastome of Retrophyllum piresii lacks one of the IRs. Similarly, the genus Lagostrobos, having the largest plastome described so far for the family (151.5 kb), lacks the IR region but hosts numerous intergenic spacer repeats. Wu and Chaw [20] concluded that in cupressophytes, mutation rates have a critical role in driving the evolution of plastomic size, while plastomic inversions evolve in a neutral manner. As repeats play an important role in plastomic rearrangements [8,21,22], plastome sequencing and de novo assembly can be considered an efficient tool to understand phylogenetic relationships at different taxonomic levels, as well as to investigate the structural and functional evolution of plants [23].
A comparison of plastomes of two Podocarpus species, P. lambertii and P. totara, found differences in internal gene order due to four large inversions of about 20,000 bp in length each [17]. Since P. lambertii occurs in Brazil, while P. totara occurs in New Zealand, and given the vicariant biogeographic history of the genus, it could be hypothesized that the large geographic distance between them correlates with genetic differences, reflected both as nucleotide substitutions and plastome rearrangements. Thus, geographic distance, evolutionary time and different adaptive traits could explain plastome structural differences found between these two congener species [24]. Alternatively, these differences could have resulted from changes in a single lineage leading to either species rather than reflecting a trend for the whole genus. By comparing plastomes from only two species, these two alternatives cannot be distinguished.
More than half of the species of Podocarpus occur in South America. They are distributed in four clades [2] that are geographically structured into southern, tropical, and subtropical distributions, the latter of which is sister to the African subtropical clade and thus a good system to test the above hypotheses. Therefore, we sequenced the plastomes of four additional species of South American Podocarpus: P. nubigenus and P. salignus, found along the Patagonian temperate forests, P. parlatorei found at the southern Yungas rainforest, and P. sellowii, mainly distributed in the Atlantic rainforest, but also present in other Brazilian biomes in subtropical forests. We compared their plastomes with those available for P. lambertii from Mata Atlantica (southern Brazil) [17], P. latifolius and P. milanjianus from Africa [25], P. totara from New Zealand [26], and P. neriifolius from Southeast Asia [27], along with representatives from other genera within Podocarpaceae as outgroups. The aim of this work was to analyze stability (or lack thereof) of plastome organization within the genus Podocarpus, estimate the selective regimes affecting plastome proteincoding genes, and strengthen our current understanding of the phylogenetic relationships and biogeographic history of Podocarpus.

Methods
Plant leaf tissue was collected from individuals of four Podocarpus species within their natural distributions. These South American species studied de novo here were previously analyzed by means of chloroplast and nuclear sequences that yielded three clades (following [2]): an austral clade (P. nubigenus and P. salignus), a subtropical clade (P. parlatorei), and a tropical clade (P. sellowii). Total DNA from samples of P. nubigenus, P. salignus and P. parlatorei was extracted from 2 g of fresh leaves, using a modified ATMAB protocol (adapted by Doyle and Doyle [28]). All samples are registered in BCRU herbarium. For P. sellowii, genomic DNA was extracted from leaves sampled from a voucher registered at the Brazilian National Management System for Genetic Heritage and Associated Traditional Knowledge (SisGen, accession code AF0D72D) using a modified CTAB (cetyltrimethylammonium bromide) protocol [28], adapted by Ferreira and Grattapaglia [29].

Sequencing
Library construction, sequencing and adapter trimming of genomic DNA from P. nubigenus, P. salignus and P. parlatorei was performed by BGI Genomics (https://www.bgi. com). One µg of high-molecular weight DNA was randomly sheared with a Covaris ultrasonicator. The fragmented DNAs were checked by gel electrophoresis, purified using the AxyPrep Mag PCR Clean Up Kit (Corning MAG-PCR-CL-50), end-repaired by incubation at 20 °C for 30 min in BGI's proprietary End Repair Mix, and re-purified using the same kit, followed by A-tailing. Illumina adaptors were ligated to A-tailed fragments, and incubated at 16

Assembly and annotation
Raw reads were quality trimmed using Trimmomatic 0.39 [30] with default settings (ILLUMINACLIP:TruSeq3-PE. fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36 SLIDINGWINDOW:4:15 MINLEN:36). Trimmed reads were then assembled into chloroplast genomes using the Assembly by Reduced Complexity (ARC) approach [31] (https://ibest.github.io/ARC/): reads were mapped to the Podocarpus latifolius plastome sequence [32] using bow-tie2 [33], and mapping reads were then assembled de novo using SPAdes [34]; ARC configuration files are available at https://doi.org/10.5281/zenodo.6991189. Contigs output from ARC were imported to Geneious 8.1.9 [35] and further assembled using this software's de novo assembler with default settings. Although the bowtie2 mapping reference was used only to retrieve cpDNA reads from the total gDNA library and not as a scaffold for assembly, we repeated the pipeline using the plastome sequence of P. totara [26] to ensure that genome structure was not biased by the choice of the initial reference seed. Due to the presence of inverted repeats, joining some of the contigs was ambiguous: contigs were spliced into alternative assemblies, and the alternatives were tested first based on the frequency of reads spanning the different possible junctions, and later using PCR validation with ad-hoc designed primers (see below). The genomes were annotated using Geneious: each assembly was aligned to existing annotated genomes from other Podocarpus species, for regions having over 75% similarity in sequence were automatically transferred to the new genomes, and annotations were manually curated through comparisons with annotated published plastomes from other species.

PCR verification of assembly ambiguities
To test the alternative assemblies, we designed five primers targeted to conserved 20 bp regions near the inverted repeats flanking the junction points: 90 F (TCGGGTGCT-TATTTGGTGCA); 659 F (CACTCCCCGTTGTTCCT-CAA); 1515 F (GCATTACGCCCAAAACGGTT); 955R (AATCCCTCCTGGCTCACAAC); 2767R (CGAAC-GAAATGCCAGTGACC). Using highly stringent PCR conditions with 2µL high quality DNA (~ 10ng), 2,5µL of 10X Reaction Buffer (Genbiotech), 0.25 mM of each dNTP, 1.75 µl of 50 Mm MgCl, 0.3 µM of each primer, 0.1 µl of 1000U DNA polymerase (Genbiotech) in a total volume of 20 µL. The PCR amplification conditions were: 5 min at 95 °C; 25 × (40seg at 95 °C; 40seg at 65 °C; 1 min at 72 °C) and 7 min at 72 °C. Using genomic DNA from each species as a template, we ran reactions for different combinations of these primers, and ran the products of the reactions on a 1% agarose gel. The presence of bands serves as a test for the amplification of fragments of specific sizes or lack of amplification as predicted for each alternative assembly.
of Podocarpaceae, and then repeated it only for the 9 species of Podocarpus. The ratio ω summarizes gene evolution rates, and can be an informative feature, because it can identify which genes are the most (or least) conserved and also identify genes that may have undergone periods of adaptive evolution. CODEML configuration and output files are available at https://doi.org/10.5281/zenodo.6991189.

Results
Reads used for plastome assembly are available from NCBI under BioProject PRJNA869366, BioSample accessions SAMN30306625, SAMN30306626, SAMN30306627, SAMN30306628. Input data, software parameters, intermediate files and final results from the analyses performed for this work are available at https://doi.org/10.5281/ zenodo.6991189.

Sequencing, chloroplast genome assembly and content
Genomic DNA (gDNA) libraries for Podocarpus nubigenus, P. salignus and P. parlatorei yielded over 8 million paired end reads for each species; for P. sellowii gDNA, 1.7 million paired end reads were obtained (Table 1). From those total gDNA reads, over 100 thousand reads for P. nubigenus and P. salignus, 160 thousand reads for P. parlatorei and 20 thousand reads for P. sellowii were matched to the Podocarpus latifolius plastome reference; these represent a mean coverage of 226x, 230x, 369x and 77x respectively ( Table 1). For each of the four species, ARC reconstructed two contigs flanked by inverted repeats; the correct joints were determined through PCR tests (see methods).
The circular plastomes of the four species ranged in size between 133,791 bp and 133,991 bp (Tables 2, Fig. 1), well within the range of the previously sequenced Podocarpus plastomes. The %GC ranged between 37.0 and 37.2% ( Table 2). The gene content was mostly similar ( Table 2): one of the two trnD-GUC genes in P. sellowii could have suffered the deletion of 46 out of 74 bases towards the 3' end due to 112 bases missing (although low quality base Sequence alignment, phylogenetic tree inference and plastome rearrangement reconstruction Whole genome alignment yielded a series of locally collinear blocks (LCBs), each consisting of a multiple sequence alignment (MSA) of a portion of the plastome with conserved synteny across all species. We concatenated MSAs for all LCBs, and used this dataset to infer phylogenetic relationships among all 17 species by maximum likelihood and Bayesian approaches. Maximum likelihood inference was performed by RAxML 7.2.8 [37] with a general time reversible model with gamma-distributed rates plus a proportion of invariant sites and a rapid hill-climbing algorithm using 1,000 bootstrap replicates. Bayesian inference was performed using MrBayes 3.2.6 [38], using general time reversible model with four categories of gamma-distributed rates plus a proportion of invariant sites; four heated chains (temp: 0.2) were run for 1,100,000 iterations; the first 100,000 iterations were discarded as burn in, and the chains were sampled every 200 iterations. The priors were set to unconstrained branch lengths with GammaDir(1, 0.1, 1, 1). Default values were used for all other parameters. Convergence was assessed by visually inspecting the trace plots in the viewer included in the MrBayes plugin for Geneious, and quantitatively through the values of the average standard deviation of split frequencies. Stricter tests were not considered since the inferred topology was consistent across several pilot runs and across methods. Since alignment of non-coding sequences is often ambiguous and could potentially bias inference, we also extracted all protein-coding genes from MSAs, concatenated them and ran the same analyses described above on this dataset. The results of the whole genome alignment analysis were mapped on the resulting phylogenetic trees. Given that the number and types of rearrangements found were relatively few and simple, the simplest hypothetical scenario for plastome structural evolution was deduced by eye rather than using a computer algorithm.

Substitution rate
We estimated a single ratio of nonsynonymous to synonymous substitution rates ω = dN/ dS for each protein-coding gene. This ratio measures the strength and mode of natural selection acting on protein genes, with ω > 1 indicating positive (adaptive or diversifying) selection, ω = 1 indicating neutral evolution, and ω < 1 indicating negative (purifying) selection [39]. We used CODEML from the PAML package [40], specifying NSites = 0 and model = 0 (default values were used for all other parameters). In order to detect a change in selective regime within the genus Podocarpus for any genes, we ran the analysis for all 17 species

Phylogenetic analysis
Runs of RAxML and MrBayes based on whole plastome alignment (i.e., the concatenation of aligned locally collinear blocks) yielded the same best tree topology (Fig. 2). This topology recovered the same relationships among Podocarpaceae genera found by a previous report [8].
Within the genus Podocarpus, the Southeast Asian species P. neriifolius of Foliolatus subgenus branches off first, the South American P. salignus branches off next, followed by New Zealand's P. totara and the South American P. sellowii and P. nubigenus. Two clades separate last, both comprising subtropical species: the African P. milanjianus and P. latifolius, and the South American P. lambertii and P. parlatorei. Both analyses yielded high support to most bipartitions (100 bootstrap support in RAxML and > 0.99 posterior probabilities in MrBayes); the only bipartition that had lower support. Was the one separating Podocarpus sellowii from P. nubigenus and the clade formed by P. milanjianus, P. latifolius, P. lambertii and P. parlatorei (73% bootstrap support in RAxML), and the bipartition separating P. milanjianus + P. latifolius from P. lambertii + P. parlatorei (93% bootstrap support in RAxML).
Running MrBayes on the aligned protein-coding regions resulted in a tree with the same topology found for whole plastome alignment and with similarly high posterior calls flanking them hint that this loss might be an assembly artifact).
We found that the four species were mostly syntenic, with no evidence of major rearrangements (Fig. 2, right  side). Average plastome-wide pairwise sequence identity was 96.8% (min: 96.3%; max 97.5%). They were also syntenic to the available plastomes of P. neriifolius, P. milanjianus, P. latifolius and P. lambertii within the Podocarpus genus, and with those of Nageia nagi, Afrocarpus gracilior and Dacrydium elatum. Across the Podocarpus genus, average plastome-wide pairwise sequence identity was 96.3% inversion of a large block comprising LCBs 3-7, followed by a translocation of LCB 4 to become located between LCB 6 and LCB 7 in P. totara.

Synonymous and non-synonymous substitution rates
We estimated a single rate, single site ω (model M0 in PAML, see Methods) for each protein coding gene across the 17 species of Podocarpaceae. The distribution of ω values shows that most genes have a ω ratio of less than 0.5 ( Fig. 3 A, upper half), and that in all cases dN/dS ratios are below 1 (Fig. 3B). This implies that non-synonymous mutations are being selected against, as expected from the overall conservative nature of chloroplast genomes. Notice, however, that some genes have much larger tree lengths, driven by faster dS rates (psbT), faster dN rates (matK, ycf1 and ycf2), or both (clpP).
We estimated the single rate, single site ω value again, but now using as input a subset comprising only the nine Podocarpus species, to detect any change in selective regime compared to the family background rates. Overall, rates are similar, although there might be a slight skew towards higher ω values within the genus (Fig. 3 A, lower half). However, we detected some genes showing noticeably more relaxed (i.e., ω closer to 1 and thus under less intense negative selection) dN/dS ratios within the genus, like psbK and rps7, while a few others showed stronger conservation, like psbM and psaM (Fig. 3 C). A single gene, rps2 (30 S ribosomal protein S2, chloroplastic), seems to have switched probabilities (all above 0.98). The same result was found for the RAxML analysis, except that the consensus tree from the 1000 bootstrapping replicates yielded an unresolved polytomy comprising P. sellowii, P. nubigenus and the clade formed by (P. milanjianus + P. latifolius)+(P. lambertii + P. parlatorei), the latter of which with a 69% support.

Reconstruction of plastome rearrangements
Comparing the plastome structure of all species to their phylogenetic relationships shows that the arrangement found in all Podocarpus species, except for P. totara, is conserved in 11 out of the 17 species analyzed (Fig. 2), differing from the arrangement found in the outgroup and monotypic South American species Saxegothaea conspicua by the inversion of a single collinear block present in the latter (Fig. 2, LCB  6). A reconstruction of the minimum events of translocation and inversion suggests that the plastome arrangement seen within most Podocarpus species is likely ancestral for Podocarpaceae. From this hypothetical ancestral arrangement, we reconstruct an inversion event affecting LCB6 in the lineage leading to Saxegothaea conspicua; an inversion and event affecting LCBs 9 and 10, followed by translocation of LCB 9 in the lineage leading to Pherosphera fitzgeraldii; a single inversion of a large block comprising LCBs 3-7 in the ancestral lineage including Dacrycarpus and Dacrydium, followed by an inversion of the same block in the Dacrydium elatum lineage that effectively reverted to the ancestral arrangement; a single inversion event affecting LCBs 9 to 2 in the Retrophyllum piresii lineage; and an was performed by skilled researchers, using gap closing and verification through PCR (Peter Lockhart, pers. comm.), it was done before the plastome of any other conspecific had been published. Alternatively, it is possible that, even if plastome rearrangements are not common within the genus, they are likely enough to have happened in at least the one lineage leading to P. totara. Therefore, it would be advisable to both resequence P. totara's plastome using longread approaches and to obtain plastid sequence information from other species with an Oceanic distribution to confirm if the observed rearrangements represent indeed an evolutionary event or are just an assembly artifact. If plastome rearrangements are confirmed, this could open Podocarpus as a potential system to study the underlying mechanisms of plastome evolution. For future whole genome plastome analysis we suggested using long-read methods of sequencing, thus avoiding assembly problems caused by inverted repeat readings.
We observed full synteny among plastomes of P. nubigenus, P. salignus, P. parlatorei, P. sellowii, P. lambertii, and P. latifolius/P. milanjianus, all of these being significantly different from that of P. totara, which occurs in New Zealand. A biogeographic connection has been suggested between subtropical Africa and South America, probably related to geographic and paleoclimatic belts at such latitudes [2]. Interestingly, the orderings are similar also with P. neriifolius, a species that belongs to the subgenus Foliolatus from being on average under negative selection across the family to being under positive selection within the genus.

Discussion
Previous work has revealed considerable diversity in chloroplast genome organization in gymnosperms in general [12], and within Podocarpaceae in particular [8]. Podocarpus is the most diverse genus within the family Podocarpaceae, and a comparison of the plastomes of the first two sequenced species, P. lambertii and P. totara, suggested that plastome instability might also be a general property at the intrageneric level [17]. After sequencing and analyzing the plastomes of four South American species, plus three recently sequenced species from Africa and Southeast Asia, we found that all seven species share the same organization with P. lambertii. The fact that no differences were found in the internal genome arrangement of the South American and African species reflects a shared biogeographic past within the Podocarpus lineage which in turn shows a highly conserved evolutionary history. The differential arrangement of P. totara within the lineage of the genus Podocarpus is striking, since even P. neriifolius, that belongs to the subgenus Foliolatus, has the same arrangement as other species within subgenus Podocarpus. One possibility would be a misassembly: although the P. totara plastome assembly and that will put forward new questions to be addressed. For this purpose, we suggest the use of long read sequencing methodologies, in order to avoid assembly problems in inverted repeat regions. and has an Asian distribution. Thus, plastome conservation has maintained synteny throughout the tectonic history of the group. Given this degree of conservation, plastome sequencing is not useful to test whether genomic order can be climatically or geologically structured.
Our analysis of family-and genus-wide substitution rates confirms the overall conservative evolutionary rates of plastomic protein-coding genes and agrees with previous estimates [8]. Interestingly, a single gene, rps2 seems to have changed under distinct selective regimes in the Podocarpus lineage. This gene, encoding the S2 subunit of the plastid ribosome, has also been reported to switch due to selection in other unrelated lineages, like Vicia (Fabaceae), Arabidopsis (Brassicaceae) or parasitic Scrophulariaceae [41][42][43]. Considering that, at least for Arabidopsis, allelic variants of this gene are linked to variance in disease resistance and susceptibility, it is tempting to speculate on the selective pressures that might have affected Podocarpus lineages throughout its biogeographic history. A more targeted study of rps2 sequence variability, both within-and between species, would be however needed before ascribing a differential role for this gene in Podocarpus evolution.
The phylogenetic trees including all plastid genes inferred using RAxML and MrBayes yield an arrangement quite consistent with the topology that was previously published for Podocarpus [2,44], despite the lower number of taxa included here. The Asian species P. neriifolius showed a sister group relationship to the rest of the Podocarpus clade, as expected given that it is part of the subgenus Foliolatus [2]. Similarly, the warm-temperate South American P. salignus showed a sister group relationship to a clade containing the austral P. totara from New Zealand, the South American tropical P. sellowii and austral P. nubigenus, and a subtropical clade containing South American P. lambertii and P. parlatorei, and its sister group the African P. milanjianus and P. latifolius. This subtropical clade containing taxa currently geographically disjunct was interpreted as an indication of a past Atlantic subtropical biogeographical corridor [2].
In this work we have shown that the genome-wide arrangement of chloroplast genomes in the genus Podocarpus is relatively conserved compared to other genera of the family. The evidence does not allow us to associate changes in genome arrangement as a result of vicariant events; however, the study of specific regions with higher mutation rates may promote a better understanding of the biogeographic history of this particular genus. Furthermore, given such structural conservation, we suggest plastome sequencing is not useful to test whether genomic order can be climatically or geologically structured. As for evolutionary aspects, regarding the differences observed with other groups of Coniferophytes, these are issues that remain to be studied