Full-length chloroplast genome of Dongxiang wild rice reveals small single-copy region switching

doi:10.21203/rs.3.rs-1498134/v1

Download PDF

Research Article

Full-length chloroplast genome of Dongxiang wild rice reveals small single-copy region switching

https://doi.org/10.21203/rs.3.rs-1498134/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: Plant chloroplast DNA (cpDNA) typically has a circular structure, including a large single-copy region (LSC), a small single-copy region (SSC) and two inverted repeats (IR1 and IR2). The organization of these four elementary regions LSC-IR1-SSC-IR2 is highly conserved across all plant cpDNAs. Only very a few structural variations (SVs) occurring at the elementary-region level have been reported.

Results: In the present study, we assembled the full-length cpDNA of Dongxiang wild rice line 159 (DXWR159). Using the long PacBio subreads, we discovered a large inversion of SSC and a large duplication of IR in DXWR159 cpDNAs. The large inversion of SSC results in a reverse orientation of SSC. According to the orientation of SSC, the structures of cpDNAs can be classified into the forward SSC (SSC-F) and reverse SSC (SSC-R) types. As the most significant finding, both SSC-R and SSC-F type cpDNAs were detected in several seedlings of DXWR159 with a ratio of 1.2. The frequent inversion of SSC was named as SSC switching.

Conclusions: The frequencies of Short tandem repeats (STRs) and large SVs in plant cpDNAs need to be intensely investigated for rediscovery. STRs may not be useful as molecular markers in phylogenetic studies, particularly at low taxonomic levels. We propose that: (1) SSC switching ubiquitous occurs in plant cpDNAs and is not a rarely occurring event; (2) SSC switching may be active or inactive by regulation at the molecular level; and (3) further investigation of the underlying mechanism may reveal novel and important functions associated with SSC switching.

cpDNA

full-length genome

SSC

structural variation

plant evolution

Chloroplast genomes (plastomes), also called chloroplast DNAs (cpDNAs) contain valuable markers for studying evolutionary relationships and population genetics of plants [1]. In contrast to mitochondrial and nuclear genomes, cpDNAs across spermatophytes (i.e. seed plants) exhibit a higher degree of conservation with respect to their gene content, structure and organization [2]. Most cpDNAs encode about 80 protein-coding genes that are primarily involved in photosynthesis and other biochemical processes along with 30 tRNA and four rRNA genes [1]. A plant cpDNA typically has a circular structure and includes a large single-copy region (LSC), a small single-copy region (SSC) and two inverted repeats (denoted as IR1 and IR2) separating the LSC and SSC. As the organization of four elementary regions is highly conserved across all plant cpDNAs, LSC-IR1-SSC-IR2 is used a common structure for the assembly of chloroplast genomes. Although many structural variations (SVs) including duplications, rearrangements, and losses have been reported at the levels of genome and genes among some of the angiosperm lineages, including Asteraceae [3], Campanulaceae [4], Onagraceae [5], Fabaceae [6] and Geraniaceae [7], only very a few of these SVs occur at the elementary-region level, for instance, a SV resulting in loss of an IR in a clade of Papilionoideae [1], which is regarded as a rarely occurring event.

Cultivated rice (Oryza sativaL.), belonging to the grass family Poaceae (Gramineae), was domesticated from common wild rice (Oryza rufipogonGriff.) thousands of years ago. Dongxiang wild rice (DXWR) is a Chinese common wild rice (O. rufipogon) that was firstly discovered in Dongxiang county, Jiangxi province of China in 1978, which is northernmost (28°14’N) of the regions where many common wild rice population have been discovered globally. In our previous study [8], we have compared the DXWR genome with that of O. sativassp. japonica (the Nipponbare genome) and discovered some SVs associated with rice domestication. To exactly determine more genomics features of wild rice, we initiated a project to obtain the full-length nuclear, mitochondrial and chloroplast genomes of DXWR, using the PacBio DNA-seq [9]. In the present study, we aimed to perform a comparative genomic analysis using rice cpRNAs. Unexpectedly, we discovered forward and reverse SSCs in cpDNAs of several seedlings of DXWR line 159 (DXWR159), which is named as SSC switching. Accordingly, we report this very important finding for two main purposes: (1) to provide a new understanding of the conservatism and variation in cpDNAs, and (2) to initiate further investigation of the biological functions of SSC switching.

Genome sequencing, assembly and annotation

For the de novo assembly of full-length nuclear, mitochondrial and chloroplast genomes of Dongxiang wild rice line 159 (DXWR159), one 500 bp and one 10 kb DNA libraries were prepared using fresh leaves from a few (<4) seedlings of DXWR159 and sequenced on the Illumina and PacBio platforms, respectively. In the subsequent data analyses, 217,263 subreads extracted from the PacBio DNA-seq data were used to assemble the DXWR159 cpDNA with a total length of 134,509bp at an extremely high depth of approximately 13,441×. Then, the two IRs, long poly(GC),low complexity, and other repeat regions (Supplementary file 1) were exactly determined by manual curation (Methods and Materials). In particular, 13,117 long (> 20 Kb) PacBio subreads were to validate the structure of DXWR159 cpDNA. However, the draft genome using high-depth PacBio data may still contain two types of errors in the low complexity and STR regions, respectively. Finally, 6,480,424 bp cleaned Illumina DNA-seq data were aligned to the DXWR159 cpDNA and only one error in a STR region was corrected. The DXWR159 cpDNA is a full-length genome (Figure 1), which is defined to has no gaps and ambiguous nucleotides [10].

In a previous study [1], 4 rRNA, 30 tRNA and 76 protein-coding genes (Table 1) have been annotated in the Onobrychis gaubae cpDNA (GenBank: LC647182) with a length of 122,688bp and also in the O. viciifolia (GenBank: MW007721) with a length of 121,932bp. 4 rRNA, 30 tRNA and 90 protein-coding genes or open reading frames (ORFs) have been annotated in the cpDNA ofO. sativassp. japonica (Nipponbare). Both DXWR159 and Nipponbare have almost identical chloroplast rRNA, tRNA and protein-coding genes. Based on the previous annotations, 73 of the 76 protein-coding genes are common genes which are present in both rice and Onobrychis cpDNAs (RefSeq: NC_001320), while three protein-coding genes (accD, ycf1 and ycf2) and three genes (infA, rpl22 andrps16) are absent in rice and Onobrychis cpDNAs, respectively. Furthermore, 15 ORFs had been annotated in rice cpDNAs. By sequence alignment, we updated the annotations of rice and Onobrychis cpDNAs with correction (Table 1), particularly: (1) the annotation of psbF was added into the Onobrychis cpDNAs, respectively, (2) rps16, ORF23, ORF28, ORF56, ORF72, ORF82, ORF85, ORF100, ORF137 and the third exon of rps12 in rice were removed, and (3) ORF44 in rice was renamed as psaJ.

Finally, 4 rRNA, 30 tRNA, 75 protein-coding genes and 6 ORFs were annotated in the DXWR159 cpDNA (Supplementary file 2). The 75 protein-coding genes includes the 73 common genes which are also present in Onobrychis cpDNAs, and two new genes(infA andrpl22). Among the 30 tRNAs, six are multi-exon genes which contain two exons, and they are tRNA^Lys(AAA), tRNA^Gly(GGA), tRNA^Leu(UUA), tRNA^Val(GUA), tRNA^Ile(AUC), tRNA^Ala(GCA). Among the 75 protein-coding genes, nine multi-exon genes are rpl2, rps12, ndhA, ndhB, petB, petD, atpF, and clpP with two exons, and ycf3 with three exons. The biological function of the 6 ORFs (ORF63, 70, 91, 106, 133, and 249) remains unknown.

Short tandem repeats between individuals

Blasting the DXWR159 cpDNA sequence to the NCBI NT database, we found that the best hit is the cpDNA (GenBank: CP056064) of Zhenshan97, a cultivar of O. sativa ssp. indica.The length of DXWR159 cpDNA was determined to be 134,509 bp, which is very close to the Zhenshan97 and Nipponbare cpDNA (RefSeq: NC_001320) lengths of 134,501 bp and 134,525 bp, respectively. The GC contents of the DXWR159, Zhenshan97, and Nipponbare cpDNAs were also very close (approximately 39 %). Furthermore, the LSC, IR1, SSC, and IR2 of the DXWR159 cpDNA had lengths of 80,553, 20,805, 12,346, and 20,805 bp, and shared nt sequence identities of 99.93 % (80505/80565), 99.99 % (20804/20805), 99.99 % (12346/12347), and 99.99 % with those of the Zhenshan97 cpDNA, respectively. Likewise, the LSC, IR1, SSC, and IR2 of the DXWR159 cpDNA shared the nt sequence identities of 99.54 % (80338/80709), 99.81 % (20779/20817), 99.76 % (12320/12350), and 99.81 % with those of the Nipponbare cpDNA, respectively. These results are consistent with those from the previous studies [2]: cpDNAs across spermatophytes exhibit a higher degree of conservation with respect to their gene content, structure and organization. Multiple sequence alignment of the three cpDNAs demonstrated that about 76% (1519/2000) of the variation sites were single nucleotide polymorphisms (SNPs), while the others were small insertions and ieletions (InDels). Furthermore, almost all the InDels were attributed to the presence of STRs [11]. STRs, which are widely used by forensic geneticists and genealogy experts, are often referred to as simple sequence repeats (SSRs) by plant geneticists or microsatellites by oncologists. The minimum repeat unit length of STR is evidently 1 bp, this type of STR is predominantly referred to as a polynucleotide (e.g. polyAs and polyTs).

The above finding thus prompted us to investigate the STRs between different lines of DXWR. Subsequently, we performed the detection of STRs between the cpDNAs of DXWR159 and DXWR line 44 (DXWR44). Accordingly, aligning the NGS data of DXWR44 to the DXWR159 cpDNA demonstrated that the DXWR159 and DXWR44 cpDNAs were identical. Moreover, most of the detected STRs between the DXWR159 and DXWR44 cpDNAs were polyAs or polyTs. We then examined the STRs in cpDNAs among individuals of DXWR159 and DXWR44, respectively. Our analysis revealed that the DXWR159 and DXWR44 cpDNAs share at least nine polyA or polyT sites. They are 11291[T]_7-8, 31462[T]_7-8, 36488[T]_9-10, 49216[T]_10-15, 63462[A]_6-9, 80673[A]_2-3, 102303[A]_6-8, 107081[A]_6-7 and 111165[A]_6-8, wherea STR is described by its genomic position in numbers, the repeat unit [in brackets], and its copy number as subscripts, as described in our previous study [11]. These results thus suggest that polyAs or polyTs occur frequently among the individuals of each rice line.

Discovery of two large structural variations

The comparative genomics analysisrevealed many SVs between rice andOnobrychis cpDNAs and most of them concentrated in the 70% parts of the LSCs(Figure 2A).Among SVs in other regions, two large deletions resulted in the loss of two genes (ycf1 and ycf2) in the rice IRs. As for the other twohypothetical chloroplast reading frames(ycf3 and ycf4),Nipponbare (rice) and Onobrychis gaubae share a comparatively highnucleotide (nt) sequence identityof 88% (447/507)forycf3, while they share a very lowidentity (far less than 70%) for ycf4. The gene ycf3 is the only gene containing three exons in cpDNAs(Table 1), however, it is stillhighly conservedat the nt sequence level, indicating thatycf3 may has important biological functions. As a notable SV, a large inversion of SSC is present between rice andOnobrychis cpDNAs, although the SSCsare highly conserved with respect to their gene content, structure and organization.

Subsequently, we discovereda large inversion of SSC and a large duplication of IR inDXWR159 cpDNAs using the long PacBio subreads.Particularly, the large inversion of SSC results in a reverseorientationof SSC. As the reference cpDNA of rice, the organization of the Nipponbare cpDNA (RefSeq: NC_001320) is characterized by the common cpDNA structure LSC-IR1-SSC-IR2, which was defined as the forward SSC (SSC-F) type structure in the present study. Accordingly, the organization of the DXWR159 cpDNA with the reverseorientationof SSC was defined as the reverse SSC (SSC-R) type structure (denoted as LSC-IR1-ssc-IR2). The SSC-F and SSC-R structures were also designated as wild types and mutants of cpDNAs (Figure 2). As the most significant finding,both SSC-F and SSC-R type cpDNAs were detected in several seedlings of DXWR159. Then, we used long (> 20 Kb) PacBio subreads as junction reads spanning the LSC-IR1-SSC and LSC-IR1-ssc regions (Figure 2B) for validation (Materials and Methods). As a result, these junction reads (Supplementary file 2) demonstrated the presence of both SSC-R and SSC-F type cpDNAs with a ratio of 1.2 (99/82). This high ratio suggested that the inversion of SSCis not a rarely occurring event and may occur more frequently in cell growth. We named the frequent inversion of SSCas SSC switching, as the SSC in cpDNA is analogous to the mating-type (MAT) region in a yeast genome [9]. The SSC with a length of ~12.4 Kbp between two IRs has a structure of IR-SSC-IR, while the MAT region with a length of ~19 Kbp between two IRs has a structure of IR-MAT-IR. During MAT switching, the two copies of the IR recombine, inverting the MAT region relative to the rest of the chromosome. Neither MAT nor SSC switching changes the sequences of genes, as the structure of IR-MAT-IR or IR-SSC-IR is symmetric. By MAT switching, yeasts initiate the transcription of some mating related genes, determining the sexual identity of a cell. This process may be regulated by centromeric heterochromatin [9]. However, the underlying mechanism of SSC switching is unknown.

Based on this novel finding, the structures of cpDNAs can be classified into the SSC-F and SSC-R types.Further analysis showed that rich cpDNAs are highly conserved on the junction site between IR1 and SSC (IR1-SSC) or that between SSC and IR2 (SSC-IR2). As rice cpDNAs are linearized for their representation, starting at the first three nts “CCC” of LSC, almost all the IR1-SSC junction sites contain the highly conserved sequences of TGGAAAAAATCG|GCAAATAGGAAA and TGGAAAAAATCG|CGGAAAACCGAA for the SSC-F and SSC-R types, respectively. This feature was then used to classify the structures of rice cpDNAs from the NCBI GenBank database. The results revealed that the structures of most rice cpDNAs belong to the SSC-F type, whereas very few (e.g. Zhenshan97) belong to the SSC-R type. Notable SSC-F type cpDNAs included the cpDNAs of O. sativa L. spp. indica (GenBank: JN861110), O. nivara (KF359901), O. glaberrima (KF359903), O. barthii (KF359904), O. glumaepatula (KF359905), O. meridionalis (KF359906), O. punctata (MT726932) and O. brachyantha(MT726939). However, some of SSC-F type cpDNAs are chimeras. For instance, O. nivaraandO. punctatahave IR1-SSC junction sites for the SSC-R type, but SSC bodies for the SSC-F type. This indicated that these SSCs are likely to contain sequence errors, resulting from the assembly of hybrid data. This suggested that both SSC-F and SSC-R type cpDNAs were present in these species. As many of these cpDNAs were not absolutely de novo assembled using short NGS or Sanger sequencing data, the orientation of large segments was determined according to the common structure LSC-IR1-SSC-IR2 without validation using junction reads (Figure 2B). According to the classical definition, de novoassembly refers to the process of reconstructing a genome without knowing its structure. Using PacBio DNA-seq,the Zhenshan97 cpDNA (GenBank: CP056064) was absolutely de novo assembled, indicating the presence of SSC-R type cpDNA in the sequenced sample. The SSC-F type cpDNA of Zhenshan97 was also present, but its proportion was less than the SSC-R type cpDNA. Therefore, the assembled genome only represents the dominant SSC-R type cpDNA of Zhenshan97.

The other large SV reported in the present study is a large duplication of IR in cpDNAs of severalDXWR159 seedlings. Compared to the wild-type cpDNA, the mutant acquired a 22804-bp insertion and a 319-bp deletion. The 22804-bp insert includes an entire IR and a segment of LSC (noted as e1), while the 319-bp deleted segment is the 3' end of the IR and the 319-bp lacking IR is noted as IRa (Figure 2C). Six long (> 20 Kb) PacBio subreads (Supplementary file 2) were used as junction reads for manual curation (Materials and Methods). As a result, these junction reads span the IR-e1-IR regions (Figure 2C), validating this new finding. Although the underlying mechanism remains unknown, we have proposed a recombination model to explain these results. At the 5' end of the 319-bp deleted segment, we discovered the high-score segment pairs (HSPs) that may be involved as recombination sites (Figure 2C). Accordingly, in this model, a recombination event between two cpDNA molecules, results in an exchange between e1+IR2+SSC and the 319-bp segment plus SSC. Consequently, one of the two cpDNA molecules acquired a large duplication of IR, whereas the other lost an IR. Although the IR‑lacking sequence was not detected in the present study, it had been previously reported in Onobrychisspp. [1]. Using this model, we also explained the SSC switching (Figure 2B).

In the present study, we have assembled the full-length chloroplast genome of Dongxiang wild rice (DXWR), a Chinese common wild rice. The two main findings include the STRs among individuals of DXWR159 and two large SVs at the elementary-region level. PolyAs or polyTs maybe occur frequently among individuals of each rice line. In our previous study, we also discovered that copy number variation of STRs accounts for mtDNA diversity among and even within individuals of tick, insect and human. However, STRs may not be useful as molecular markers in phylogenetic studies, particularly at low taxonomic levels. Besides, the frequencies of STRs in plant cpDNAs or animal mtDNAs were under-estimated in all previous studies and merit further investigation in future studies.

In cpDNAs of several DXWR159 seedlings, we have discovered a large inversion of SSC and a large duplication of IR. This suggest that the frequencies of large SVs, particuarly SVs at the elementary-region level were under-estimated in all previous studies and need to be intensely investigated for rediscovery. As the most significant finding, both SSC-R and SSC-F type cpDNAs were detected with a ratio of 1.2. The frequent inversion of SSC was named as SSC switching. Although the underlying mechanism remains unknown, we propose a recombination model to explain these results. We also propose that: (1) SSC switching ubiquitous occurs in plant cpDNAs and is not a rarely occurring event, (2) SSC switching may be active or inactive by regulation at the molecular level, and (3) further investigation of the underlying mechanism may reveal novel and important functions associated with SSC switching.

All specimen used in the present study was identified by Fantao Zhang. DXWR159 and DXWR44 are two rice lines isolated from the Dongtangshang and Anjiashan populations, respectively. DNA extraction and quality control were performed as described in our previous study [12]. A 500 bp DNA library of DXWR44 was constructed as described in our previous study [12] and sequenced on the Illumina HiSeq 2000 platform to produce 90-bp paired-end data. A 350 bp DNA library of DXWR159 was constructed and sequenced on the Illumina HiSeq X Ten platform to produce 150-bp paired-end data. A 10 Kb DNA library was constructed and sequenced on the PacBio Sequel platform, according to the manufacturer's instructions. The cleaning and quality control of PacBio data were performed with the software SMRTlink v5.0, while the cleaning and quality control of Illumina data was performed with the software Fastq_clean [13] v2.0.

PacBio data were used to assemble the genomes with the software CANU [14] v2.2. The alignment of Illumina and PacBio data were performed with the software BWA and Minimap2 [15] v2.23. Genome graphs (i.e. Figure 1) were plotted using the softwareCircos v0.66 [16]. Statistics and plotting were conducted usingthe softwareR v2.15.3[17]with the package ggplot2 [18].Using the software Tablet [19] v1.17, manual curation of the two IRs, long poly(GC), low complexity, and other repeat regions were performed. To validate large repeats (e.g. IR1-SSC-IR2), junction reads are defined to satisfy: (1) at least five reads need be aligned to IR1-SSC-IR2 with more than 90% of their length, and (2) the reads covered 100% of SSC with two 200-bp long overhangs that can be aligned to IR1 and IR2 at the 5 ' and 3 ' ends of SSC, respectively. All the junctions reads were aligned to the cpDNAs in the SAM format (Supplementary file 3).

DXWR: Dongxiang wild rice, cpDNA: chloroplast DNA, SV: structural variation, NGS: next-generation sequencing, LSC: large single-copy region, SSC: small single-copy region, IR1: inverted repeat 1, IR2: inverted repeat 2, SSC-F: forward SSC, SSC-R: reverse SSC, InDel: small insertion and deletion, SNP: nucleotide polymorphism, STR: short tandem repeat, SSR: simple sequence repeat, ORF: open reading frame.

Acknowledgments

We appreciate the help from Professor Jiankun Xie from Jiangxi Normal University and Professor Wenjun Bu in College of Life Sciences, Nankai University. This manuscript was online as a preprint on March 20^nd, 2022 at Research Square with the DOI 10.21203/rs.3.rs-1470820/v1.

Authors’ Contributions

S.G. conceived this project and discovered SSC switching. S.G. and S.Y. supervised this study. J.L., F.Z., Q.W. and Y.Y. analyzed the data. R.C. performed the programming. F.Z. conducted experiments. S.G. drafted the main manuscript. S.G. and S.Y. revised the manuscript and prepared all the figures, tables and additional files. All authors have read and approved the manuscript.

Funding

This work was supported by the Yunnan Applied Basic Research - Yunnan Provincial Science and Technology Department - Kunming Medical University joint projects (Grant No. 202101AY070001-073), National Natural Science Foundation of China (31700787) to Guangyou Duan and National Natural Science Foundation of China (31900444) to Zhi Cheng. The funding bodies played no role in the study design, data collection, analysis, interpretation or manuscript writing.

Availability of data and materials

The NGS data are available in the NCBI SRA database with ID SRP070627. All the supporting data are included as additional files.

Ethics approval and consent to participate

All research on the rice lines detailed in this manuscript comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. All lines of Dongxiang wild rice (DXWR) are preserved ex situ at Jiangxi Academy of Agricultural Sciences, Nanchang, China (http://www.jxaas.com/index.html), and the seeds of DXWR are freely available for scientific research. The data will be shared on the NCBI website under the accession number PRJNA312733.

Consent to publish

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Antunes AM, Soares TN, Targueta CP, Novaes E, Telles M: The chloroplast genome sequence of Dipteryx alata Vog. (Fabaceae: Papilionoideae): genomic features and comparative analysis with other legume genomes. Brazilian Journal of Botany2020(2).
Asaf S, Khan AL, Khan MA, Imran QM, Lee IJ: Comparative analysis of complete plastid genomes from wild soybean (Glycine soja) and nine other Glycine species. Plos One2017, 12(8):e0182281.
Ki-Joong K, Keung-Sun C, Jansen RK: Two Chloroplast DNA Inversions Originated Simultaneously During the Early Evolution of the Sunflower Family (Asteraceae). Molecular Biology and Evolution2005(9):1783-1792.
Haberle RC, Fourcade HM, Boore JL, Jansen RK: Extensive Rearrangements in the Chloroplast Genome of Trachelium caeruleum Are Associated with Repeats and tRNA Genes. J Mol Evol2008, 66(4):350-361.
Stephan G, Xi W, Uwe R, Silber MV, Klaus M, Jörg M, Georg H, Herrmann RG: The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. Sequence evaluation and plastome evolution. Nucleic acids research2008, 36(7):2366-2378.
Cai Z, Guisinger M, Kim HG, Ruck E, Blazier JC, Mcmurtry V, Kuehl JV, Boore J, Jansen RK: Extensive Reorganization of the Plastid Genome of Trifolium subterraneum (Fabaceae) Is Associated with Numerous Repeated Sequences and Novel DNA Insertions. J Mol Evol2008, 67(6):696-704.
Jansen RK: Extreme reconfiguration of plastid genomes in the angiosperm family Geraniaceae: rearrangements, repeats, and codon usage. Molecular Biology & Evolution2011, 28(1):583-600.
Zhang F, Xu T, Mao L, Yan S, Chen X, Wu Z, Chen R, Luo X, Xie J, Gao S: Genome-wide analysis of Dongxiang wild rice (Oryza rufipogon Griff.) to investigate lost/acquired genes during rice domestication. BMC plant biology2016, 16(1):1-11.
Xu X, Ji H, Jin X, Cheng Z, Yao X, Liu Y, Zhao Q, Zhang T, Ruan J, Bu Wet al: Using pan RNA-seq analysis to reveal the ubiquitous existence of 5' and 3' end small RNAs. Frontiers in Genetics2019, 10:1-11.
Chang J, Bei J, Qi S, Wang H, Fan H, Yau TO, Bu W, Ruan J, Wei D, Gao S: Full-length Genome of a Ogataea polymorpha strain CBS4732 ura3Δ reveals large duplicated segments in subtelomeric regions. Frontiers in Microbiology2022, 13:1-10.
Ze Chen, Xuan Y, Liang G, Xiaolong Yang, Zhijun Yu, Stephen C. Barker, Kelava S, Wenjun Bu, Liu J, Shan Gao: Precise annotation of tick mitochondrial genomes reveals multiple copy number variation of short tandem repeats and one transposon-like element. BMC Genomics2020, 21(488):1-11.
Wang Y, Wang Z, Chen X, Zhang H, Guo F, Zhang K, Feng H, Gu W, Wu C, Ma L: The Complete Genome of Brucella Suis 019 Provides Insights on Cross-Species Infection. Genes2016, 7(2):1-12.
Zhang M, Zhan F, Sun H, Gong X, Fei Z, Gao S: Fastq_clean: An optimized pipeline to clean the Illumina sequencing data with quality control. In: Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on: 2014. IEEE: 44-48.
Sergey, Koren, Brian P, Walenz, Konstantin, Berlin, Jason R, Miller, Nicholas H, Bergman: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res2017.
Li H: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics2018.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res2009, 19(9):1639-1645.
Gao S, Ou J, Xiao K: R language and Bioconductor in bioinformatics applications(Chinese Edition). Tianjin: Tianjin Science and Technology Translation Publishing Ltd, 2014.
Wickham H: ggplot2: elegant graphics for data analysis: Springer Science & Business Media, 2009.
Milne I, Stephen G, Bayer M, Cock PJA, Pritchard L, Cardle L, Shaw PD and Marshall D. 2013. Using Tablet for visual exploration of second-generation sequencing data. Briefings in Bioinformatics 14(2), 193-202.

Table 1. Genes predicted in cpDNAs

Category	Group	Genes
Self-replication$	Large subunit of ribosomal proteins	rpl2, rpl14, rpl16, rpl20, rpl23, rpl32, rpl33, rpl*36
	Small subunit of ribosomal proteins (11)	rps2, rps3, rps4, rps7, rps8, rps11, rps12*, rps14, rps15, rps18, rps19
	DNA-dependent RNA polymerase (4)	rpoA, rpoB, rpoC1, rpoC2
	Ribosomal RNA genes (4)	rRNA 16S, 23S, 4.5S, 5S
	Transfer RNA genes (30)	30 tRNA genes (6 contain an intron)
Genes for photosynthesis$	Subunits of NADH-dehydrogenase (11)	ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
	Subunits of photosystem I (5)	psaA, psaB, psaC, psaI, psaJ
	Subunits of photosystem II (15)	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
	Subunits of cytochrome b/f complex (6)	petA, petB, petD, petG, petL, petN
	Subunits of ATP synthase (6)	atpA, atpB, atpE, atpF*, atpH, atpI
	Subunit of rubisco (1)	rbcL
Others$	Maturase K	matK
	Envelope membrane protein	cemA
	Subunit of Acetyl-CoA-carboxylase	accD
	C-type cytochrome synthesis gene	ccsA
	Protease	clpP*
Unkown function$	Conserved hypothetical ORFs (4)	ycf1, ycf2, ycf4, ycf3**
Only present in Rice	Two genes and six ORFs	infA, rpl12, ORF63, 70, 91, 106, 133, and 249

$76 protein-coding genes are present in Onobrychis. The number in parentheses after the group name is the gene number of this group. The number of asterisks after the gene names indicates the number of introns contained in the genes. The duplicated genes in two IRs were counted once. Three genes accD, ycf1 and ycf2 are absent in rice cpDNAs, while two genes infA and rpl12 are absent in Onobrychis cpDNAs. in addition, ORF63, 70, 91, 106, 133, and 249 were predicted in rice cpDNAs. The sequences of all these genes are provided in the supplementary file 1.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Full-length chloroplast genome of Dongxiang wild rice reveals small single-copy region switching

Status:

Version 1

Abstract

Figures

Background

Results

Conclusions

Materials and Methods

Abbreviations

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1