CLas strain GDCZ was originally collected from a HLB-affected fruit of Citrus reticulata cv. Tankan showing HLB symptoms (small asymmetrical fruit with uneven coloring of fruit) in a citrus orchard (23°40”22”N, 116°38”33”E, 25 m) located in Chaozhou City (Chaoshan area), Guangdong Province, China. DNA was extracted from fruit piths because of the high concentration of CLas [7]. Total DNA was extracted from fruit piths using an E.Z.N.A. High-Performance Plant DNA Extraction Kit (Omega Bio-Tek Co., China). The presence of CLas was confirmed by a real-time quantitative PCR with primer set CLas4G / HLBr [8] with cycle threshold (Ct) value = 21.73. Phage typing PCR with phage specific primer sets [9–10] showed that CLas strain GDCZ contained a Type 2 phage and a CLasMV1 phage.
Genome sequencing was performed by a Pacbio Sequel system with 20-kb library insert size (Pacific Biosciences, Menlo Park, CA, U.S.A.) and an Illumina Hiseq Xten platform with 150-bp paired-end output (Illumina Inc., San Diego, CA, U.S.A.) through a commercial source. In total, 2,328,216 clean long-reads with a length range from 5,374 to 111,541 bp (N50 length = 6,426 bp) and 95,698,604 clean short-reads (150-bp) were generated from the GDCZ sample (Table 1, Data file 1) [11].
All reads mapped to Citrus maxima genome (MKYQ00000001.1), C. reticulata genome (NIHA00000000.1), C. sinensis genome (AJPS00000000.1), C. sinensis mitochondrion (NC_037463.1) and C. reticulata chloroplast (KU170678.1) were removed using Bowtie2 v2.4.1 (for short-reads) and BWA v0.7 (for long-reads) with default settings [12–13]. A total of 152,762 (6.56%) unmapped long-reads and 15,179,368 (15.86%) unmapped short-reads were retained for assembly (Data file 1) [11]. The de novo assembly was performed by Canu v2.1.1 (for long-reads) (genomeSize = 1.2M, corrected ErrorRate = 0.40) and CLC Genomic Workbench v20.0 (for short-reads) (minimum contig length = 500 bp) [14]. A total of 65 contigs (N50 = 13,580 bp) were generated from the long-reads and 39,316 contigs (N50 = 857 bp) from the short-reads (Data file 1) [11]. Contig blast against strain A4 genome (CP010804.2) by BLAST + v2.12.0 [15] identified a total of 85 CLas contigs (62 from long-reads and 23 from short-reads), generating a GDCZ scaffold sequence. This scaffold had three segments apart by two inverted 222-bp repeat gaps (Data file 1) [11]. The two 222-bp gaps can be satisfactorily filled by reads mapping with Illumina short-reads. These efforts generated the GDCZ whole-genome sequence with a total of 1,230,507 bp (with average G + C content of 36.4%). Coverage levels analysis of reads mapping to three types of known prophage genomes (Type 1: SC1, HQ377372.1; Type 2: SC2, HQ377373.1; Type 3: P-JXGC-3, KY661963.1) showed that CLas strain GDCZ only harbored a Type 2 prophage (93.38%, from position 1,193,056 to 1,230,507 bp) (Data file 2) [16]. In addition, a circular contig generated from long-reads (8,869 bp) by Canu v2.1.1 was identical as the CLasMV1 phage genome (CP045566.1). Genome annotation revealed that CLas strain GDCZ contained 1,057 open reading frames and 53 RNA genes.
The average nucleotide identity (ANI) was further analyzed between the strain GDCZ genome and 10 CLas genomes originally from China using FastANI v1.33 (Fragment length = 1,000 bp) [17] (Data file 3) [18]. Three distinct branches were generated based on ANI matrix (Data file 4) [19]. Particularly, strain GDCZ was clustered with two strains from Guangdong province (strain A4 and PGD) but far apart from CLas strains from others provinces (Jiangxi province: strain JXGZ and JXGC, Yunnan province: strain YNJS and PYN; Guangxi province: gxpsy).
Please see Table 1 for links to Data files 1–4 and Data sets 1–3.
Table 1
Overview of data files/data sets.
Label
|
Name of data file/data set
|
File types
(file extension)
|
Data repository and identifier (DOI or accession number)
|
Data file 1
|
A workflow of genome assembly for CLas strain GDCZ
|
Portable Document Format file (.pdf)
|
Figshare (https://doi.org/10.6084/m9.figshare.23614437.v1) [11]
|
Data file 2
|
Prophage detection of CLas strain GDCZ
|
Portable Document Format file (.pdf)
|
Figshare (https://doi.org/10.6084/m9.figshare.23614437.v1) [16]
|
Data file 3
|
ANI values for CLas strain GDCZ
|
Portable Document Format file (.pdf)
|
Figshare (https://doi.org/10.6084/m9.figshare.23614437.v1) [18]
|
Data file 4
|
Cluster analyses
|
Portable Document Format file (.pdf)
|
Figshare (https://doi.org/10.6084/m9.figshare.23614437.v1) [19]
|
Data set 1
|
Sequencing long-reads of CLas strain GDCZ
|
Fasta file (.fa)
|
NCBI (SRR23622213) [20]
|
Data set 2
|
Sequencing short-reads of CLas strain GDCZ
|
Fasta file (.fa)
|
NCBI (SRR23622214) [21]
|
Data set 3
|
Genome assembly of CLas strain GDCZ
|
Genbank file (.gb)
|
NCBI (CP118922.1) [22]
|
Limitations
Genome assembly of Pacbio long-reads sequencing of CLas DNA samples extracted from plant host sources was insufficient to obtain a complete CLas strain GDCZ genome. Due to the current inability to culture in vitro, the high ratio of citrus DNA as compared to CLas DNA in total DNA made the CLas genome sequencing more challenging. Therefore, further research on CLas DNA enrichment, e.g. removing of citrus host DNA or enriching bacterial cells before DNA extraction, can be established to increase the ratio of CLas DNA in total DNA from citrus host sources.