Viral species found in the 20 bulk samples
A total of 40,388,602–81,134,080 (total 997,990,050) read sequences were obtained from each of the 20 bulk samples g1 to g20 and were assembled into 50,132–101,309 contigs. Following the BLAST searches in 2018, nt sequences of some long contigs were confirmed by Sanger sequencing of fragments amplified in two-step RT‒PCR using primers designed on the basis of the contig sequence, and the 5′- and 3′-termini of some contigs were determined by RACE and deposited in the databases (Table S2). The deposited sequences were found to share 98–100% identity with the original contig sequences. Among them, grapevine Kizil Sapak virus (GKSV), grapevine virus F (GVF), and grapevine virus T (GVT) were identified for the first time in Japan (Table 1). They showed 79–89% nt sequence identity with other variants in the databases, and the translated products of their nt sequences had more than 85% amino acid (aa) sequence identities in taxonomically relevant gene products of the family Betaflexiviridae, to which the viruses belong and in which 80% aa identity of them are the species demarcation threshold [18–20]. Short fragments amplified by one-step RT‒PCR to detect some viral species also underwent Sanger sequencing and the sequences were deposited in the databases (Table S2). Nt sequences of the fragments of grapevine virus L (GVL), grapevine Sylah virus 1 (GSyV-1), and grapevine yellow speckle viroid 2 (GYSVd-2) that had never been found in Japan shared a maximum 97–99% identity with those of variants of the targeted species, which supported their first identification in Japan (Table 1). Consequently, the viral species (26 viruses and 5 viroids) into which the contigs were categorized by the BLAST searches against the databases from March 2022 are shown in Table 1. Among the reads mapped to the contigs, the most (7,208,187) reads were categorized into grapevine leafroll-associated virus 3 (GLRaV-3), which was detected in 19 of the 20 bulk samples. The least (4) reads and one contig with a 139 nt sequence, which is deposited in the databases with the accession number LC763778, were categorized into grapevine satellite virus (GV-Sat) [21]. Long contigs whose nt lengths had more than 60% (90% for viroids) of the reference sequence for each viral species were detected in 23 species (Table 1).
Table 1
Reads and contigs categorized to the 26 viruses and 5 viroids initially found in the 20 bulk samples
Speciesa
|
Reads
|
Contigs
|
Positivesb
|
Long contigsc
|
Identityd
|
Reference sequence
|
(Virus)
|
GLRaV-3
|
7,208,187
|
521
|
19
|
19
|
98%
|
NC_004667
|
GVE
|
1,862,547
|
407
|
18
|
14
|
97%
|
NC_011106
|
GLRaV-13
|
1,052,115
|
75
|
13
|
10
|
73%
|
NC_029783
|
GVL
|
391,592
|
239
|
14
|
4
|
98%
|
NC_076884
|
GRSPaV
|
382,198
|
693
|
20
|
26
|
91%
|
NC_001948
|
GVA
|
237,303
|
466
|
15
|
30
|
58%
|
NC_003604
|
GLRaV-2
|
521,302
|
16
|
7
|
8
|
98%
|
NC_007448
|
GLRaV-1
|
356,536
|
51
|
11
|
10
|
91%
|
NC_016509
|
GVB
|
109,670
|
145
|
10
|
7
|
95%
|
NC_003602
|
GFabV-RNA1
|
113,705
|
174
|
20
|
18
|
93%
|
NC_039073
|
GFabV-RNA2
|
17
|
89%
|
NC_039072
|
GFLV-RNA1
|
139,005
|
32
|
3
|
3
|
82%
|
NC_003615
|
GFLV-RNA2
|
4
|
85%
|
NC_003623
|
GLRaV-4
|
170,424
|
46
|
6
|
4
|
73%
|
NC_016416
|
GPoV-1
|
29,732
|
8
|
1
|
1
|
99%
|
LC507098
|
GGVA
|
14,672
|
87
|
20
|
1
|
100%
|
NC_031340
|
GFkV
|
18633
|
357
|
14
|
0
|
–
|
NC_003347
|
GVF
|
12,879
|
31
|
4
|
2
|
91%
|
NC_018458
|
GLRaV-7
|
14,394
|
1
|
1
|
1
|
100%
|
NC_016436
|
GKSV
|
5,535
|
23
|
1
|
2
|
99%
|
MN172165
|
GVT
|
4,342
|
38
|
4
|
1
|
100%
|
MF095096
|
VCV-RNA1
|
1,197
|
11
|
4
|
2
|
96%
|
LC602838
|
VCV-RNA2
|
2
|
90%
|
LC602839
|
GRVFV
|
1,796
|
143
|
14
|
0
|
–
|
NC_034205
|
GSyV-1
|
1,567
|
102
|
12
|
0
|
–
|
NC_012484
|
GINV
|
895
|
3
|
1
|
0
|
–
|
NC_015220
|
GAMaV
|
341
|
15
|
1
|
0
|
–
|
NC_031692
|
GRGV
|
12
|
4
|
2
|
0
|
–
|
NC_030693
|
GV-Sat
|
4
|
1
|
1
|
0
|
–
|
NC_021480
|
(Viroid)
|
GYSVd-1
|
139,582
|
36
|
20
|
27
|
96%
|
NC_001920
|
HpSVd
|
68,571
|
48
|
20
|
23
|
98%
|
NC_001351
|
GYSVd-2
|
16,077
|
14
|
13
|
11
|
98%
|
NC_003612
|
JGVd
|
3,388
|
2
|
1
|
0
|
–
|
LC500206
|
AGVd
|
2,221
|
7
|
7
|
7
|
99%
|
NC_003553
|
a Abbreviations are as follows: grapevine leafroll-associated viruses 1 (GLRaV-1), 2 (GLRaV-2), 3 (GLRaV-3), 4 (GLRaV-4), 7 (GLRaV-7), and 13 (GLRaV-13); grapevine viruses A (GVA), B (GVB), E (GVE), F (GVF), L (GVL), and T (GVT); grapevine rupestris stem pitting-associated virus (GRSPaV); grapevine fabavirus (GFabV); grapevine fanleaf virus (GFLV); grapevine polerovirus 1 (GPoV-1); grapevine geminivirus A (GGVA); grapevine fleck virus (GFkV); grapevine Kizil Sapak virus (GKSV); Vitis cryptic virus (VCV); grapevine rupestris vein feathering virus (GRVFV); grapevine Sylar virus 1 (GSyV-1); grapevine berry inner necrosis virus (GINV); grapevine asteroid mosaic-associated virus (GAMaV); grapevine Red Globe virus (GRGV); grapevine satellite virus (GV-Sat); grapevine yellow speckle viroids 1 (GYSVd-1) and 2 (GYSVd-2); hop stunt viroid (HpSVd); Japanese grapevine viroid (JGVd); and Australian grapevine viroid (AGVd). The species found for the first time in Japan are underlined.
|
b Number of samples in which contigs categorized into the viral species were detected by the BLAST searches among the 20 bulk samples.
|
c Contigs sharing more than 60% (90% for viroids) of the reference sequence for each viral species.
|
d The lowest maximum nucleotide sequence identity of the long contigs with variants whose determined sequences exceed 60% (90% for viroids) of the reference sequence in the databases of March 2022.
|
Some of the long contigs showed comparatively low nt sequence identity with the variants in the databases (Table 1). The long contigs with nt sequences that were not confirmed by Sanger sequencing and shared less than 97% (98% for viroids) nt sequence identity with variants of each species in the databases are listed in Table S3, and their nt sequences are deposited in the databases with the accession numbers LC746701–LC746761. Among them, the contigs g13-C294 (LC746714) and g12-C21 (LC746719) shared a maximum of only 73% nt sequence identity with the grapevine leafroll-associated virus 4 (GLRaV-4) variant Car and the grapevine leafroll-associated virus 13 (GLRaV-13) variant a177 in the genus Ampelovirus, respectively (Table S3). g13-C294 and g12-C21 shared 87–91% and 82–98% aa sequence identity with GLRaV-4-Car and GLRaV-13-a177, respectively, in three taxonomically relevant gene products of ampeloviruses [22]. Although the divergences were not small, they were within the range of the variant demarcation threshold of 75% [22]. One of the long contigs categorized into grapevine virus A (GVA) in the genus Vitivirus showed extremely low nt sequence identities with the GVA variants (Table 1): g12-C1434 with a length of 7,461 nt shared a maximum of only 58% nt sequence identity with the GVA variant RSA-48-09 (Table S3). Phylogenetic analysis of the long contigs with the GVA variants in the databases was performed, which revealed at least five genomic phylogroups (I–V) (Fig. 1). Groups I to IV were indicated by Alabi et al. [23], in addition to another distinct phylogroup V. Although most of the long contigs were classified within any of the phylogroups, g12-C1434 obviously formed a distinct branch apart from all the phylogroups (Fig. 1), which indicated its distinct genomic features. The molecular characterization described below showed that g12-C1434 could not be derived from a GVA variant but was a novel vitivirus species. Because vitivirus species are named serially from GVA to grapevine virus O (GVO) [20, 24], the novel vitivirus was provisionally named grapevine virus P (GVP).
Molecular characterization of GVP
Of the ten cultivars in the bulk sample g12 in which g12-C1434 was detected, one-step RT‒PCR using the primers C21f1 and C21r1 (Table 2) resulted in detection of only a single positive, which was in the ‘Nachubearmarie’ vine (Vitis labrusca L. × Vitis vinifera). Chiaki and Ito [10] detected JGVd in this cultivar. In our greenhouse, to demonstrate graft transmission of JGVd, we maintained an ‘LN33’ (‘Couderc 1613’ × Vitis berlandieri) vine grafted with scions of the ‘Nachubearmarie’ [10]. RT‒PCR using the primers C21f1 and C21r1 detected negative results in an ungrafted ‘LN33’ clone but positive results in the grafted clone. The nt sequence of the fragments matched perfectly to that of g12-C1434, which showed graft transmissibility. One-step RT‒PCR to obtain long fragments and RACE-PCR were performed with primers designed based on the contig sequence (Table 2). All obtained fragments underwent Sanger sequencing to confirm their correspondence to the contig sequence. The confirmed sequence was almost the same (99.9%) as the original contig sequence, and a poly(A) tail immediately after the sequence was found in 3′-RACE. Although the 5′-RACE trial was unsuccessful with the primer used, multiple reads were mapped to nt positions 1–20 of the contig to which the primer 102Pf1 annealed to amplify long PCR fragments (Table 2), thus serving as validation of the nt sequence at this position. The determined sequence (LC746753) could represent the nearly complete genome of GVP with a 7,461 nt sequence, excluding the poly(A) tail. The predicted structure of GVP included at least an 86 nt 5′ untranslated region (UTR), followed by five open reading frames (ORFs), a 103 nt 3′UTR, and the poly(A) tail (Fig. 2). The five deduced ORFs were ORF1 with 1,708 aa (194.4 kDa), ORF2 with 171 aa (18.6 kDa), ORF3 with 283 aa (31.3 kDa), ORF4 with 198 aa (21.7 kDa), and ORF5 with 110 aa (12.7 kDa). The sizes of the genome, the two UTRs, and the five ORFs were within the ranges of those of vitiviruses [25]. Using InterProScan [26] from the InterPro project [27], a viral methyltransferase (PF01660), an alkylation B (AlkB)-like domain (IPR027450) of the 2OG-Fe(II) oxygenase superfamily (PF13532), a viral RNA helicase (PF01443), and an RdRp (PF00978) at aa 47–337, 622–741, 932–1,150, and 1,366–1,603, respectively, of the ORF1 protein were identified (Fig. 2), which indicates that ORF1 could be the replication-associated protein (RAP) gene [20]. InterProScan also detected a viral movement protein (MP) (PF01107), a trichovirus coat protein (CP) (PF05892), and a viral nucleic acid binding protein (NABP) (PF05515) at aa 60–187 of ORF3, aa 9–198 of ORF4, and aa 8–66 of ORF5, respectively; no domains were found in ORF2. The reading frames ORF2, ORF3, ORF4, and ORF5 encode a hypothetical protein of unknown function, MP, CP, and NABP, respectively (Fig. 2).
Pairwise identities of the genetic regions of GVP with members of the genus Vitivirus were calculated. In each of the analyzed genetic regions, GVP shared the highest sequence identity with GVA, GVD, GVF, GVK, GVH, and mint virus 2 (MV-2) in the GVA superclade [20] (Table 3): 59.4% with GVA for the genome; 57.5% nt and 52.5% aa with GVD for RAP; 68.3% nt with GVK and 72.7% aa with GVA for CP; 48.5% with GVK for the 5′UTR; 59.8% with MV-2 for the 3′UTR; 48.0% aa with GVA for HP; and 59.8% aa with GVF for MP. The phylogenetic trees of vitiviruses place GVP at the nearest position to GVA or GVF based on the genome sequences or RAP and CP aa sequences (Fig. 3). GVP was clearly distinct from the vitiviruses in all the trees, forming one clade with GVA, GVD, GVF, GVJ, and GVK in the GVA superclade (Fig. 3). The ORF5 encoding NABP in members of the GVA superclade starts immediately after the end of ORF4 encoding CP, except in GVF, which contains a short intergenic region of 36 nt between the ORFs [20]. A conserved peptide signature in the NABP of the GVE superclade, SPEETPEF(Y)Y, is observed in only three members of the GVA superclade, GVF, GVH, and MV-2 [20]. Similar to members of the GVA superclade other than GVF, GVP has no intergenic region between ORFs 4 and 5 (Fig. 2) but contains the conserved peptide signature in NABP similar to GVF, GVH, and MV-2. The pairwise identities among the RAP aa sequences of GVA, GVF, and GVP were calculated to generate a heatmap (Fig. S1). The matrix showed that each of the phylogroups of GVA (Fig. 1) was distinct from the others, and they were divided into two major groups: one comprising phylogroups I, II, and IV and the other comprising phylogroups III and V. The identities within each of the major groups were generally over 80%, and those between the groups were only 70–77% (Fig. S1). Additionally, the identities were found to be over 87% among the GVF sequences. In contrast, the identities between GVA and GVF sequences were a maximum of only 52%, and those between GVP and any sequences of GVA and GVF were a maximum of only 53% (Fig. S1).
Table 3 Sequence identities (%) of genomic regions of grapevine virus P with those of vitiviruses including possible members
|
|
|
Nucleotide
|
|
Amino acid
|
Speciesa
|
Accessions
|
Genome
|
5'UTR
|
RAP
|
CP
|
3'UTR
|
|
RAP
|
HP
|
MP
|
CP
|
NABP
|
(GVA supercladeb)
|
GVA
|
NC_003604
|
59.4c
|
38.8
|
53.5
|
67.8
|
58.5
|
|
52.1
|
48.0
|
56.5
|
72.7
|
68.4
|
GVB
|
NC_003602
|
49.9
|
31.1
|
53.1
|
59.8
|
27.4
|
|
48.8
|
30.5
|
35.1
|
60.2
|
33.5
|
GVD
|
MF774336
|
54.9
|
46.6
|
57.5
|
66.9
|
56.1
|
|
52.5
|
45.9
|
56.3
|
70.8
|
66.5
|
GVF
|
NC_018458
|
54.6
|
30.1
|
52.2
|
67.3
|
57.9
|
|
52.0
|
45.5
|
59.8
|
68.1
|
49.4
|
GVH
|
NC_040545
|
52.3
|
33.0
|
54.2
|
65.2
|
50.6
|
|
48.1
|
44.3
|
50.0
|
62.0
|
69.6
|
GVJ
|
NC_040564
|
55.7
|
41.7
|
56.9
|
67.8
|
52.4
|
|
52.0
|
45.9
|
55.7
|
70.8
|
69.0
|
GVK
|
NC_035202
|
54.7
|
48.5
|
56.9
|
68.3
|
56.1
|
|
52.1
|
47.6
|
57.1
|
69.9
|
65.8
|
GVM
|
MK492703
|
51.6
|
24.3
|
53.5
|
63.9
|
NA
|
|
47.9
|
44.3
|
51.9
|
62.5
|
67.7
|
AcVA
|
NC_043087
|
49.4
|
NA
|
53.0
|
59.9
|
31.7
|
|
48.9
|
14.2
|
40.5
|
60.6
|
49.4
|
AcVB
|
NC_016404
|
50.8
|
29.1
|
53.7
|
60.9
|
53.0
|
|
49.4
|
15.4
|
45.7
|
60.2
|
49.4
|
AVV
|
NC_034264
|
46.4
|
28.2
|
46.6
|
51.5
|
44.5
|
|
39.1
|
NA
|
37.5
|
45.8
|
40.5
|
HLV
|
MN314973
|
50.3
|
21.4
|
52.8
|
59.2
|
NA
|
|
48.9
|
28.0
|
31.0
|
58.8
|
47.5
|
MV-2
|
NC_043088
|
29.6
|
NA
|
22.8
|
52.5
|
59.8
|
|
23.1
|
20.3
|
33.7
|
53.7
|
64.6
|
(GVE superclade)
|
GVE
|
NC_011106
|
41.0
|
27.2
|
43.4
|
50.1
|
36.0
|
|
33.1
|
24.8
|
39.1
|
44.9
|
44.3
|
GVG
|
MF405923
|
40.7
|
31.1
|
43.2
|
51.9
|
47.6
|
|
32.2
|
33.3
|
38.6
|
44.9
|
35.4
|
GVI
|
NC_037058
|
40.8
|
35.9
|
42.7
|
49.5
|
56.1
|
|
33.0
|
34.1
|
39.7
|
41.7
|
35.4
|
GVL
|
MH248020
|
40.7
|
32.0
|
43.5
|
49.2
|
46.3
|
|
33.7
|
17.1
|
38.6
|
45.4
|
28.5
|
GVN
|
MZ682355
|
39.8
|
30.1
|
42.2
|
49.5
|
45.7
|
|
31.6
|
32.5
|
40.8
|
45.8
|
40.5
|
GVO
|
MZ682356
|
41.1
|
28.2
|
43.5
|
44.7
|
37.2
|
|
32.7
|
24.4
|
38.6
|
37.5
|
42.4
|
ATLV
|
NC_034833
|
39.8
|
32.0
|
41.3
|
52.4
|
52.4
|
|
33.4
|
34.1
|
43.8
|
46.8
|
36.1
|
BVA
|
NC_040630
|
41.2
|
26.2
|
42.2
|
50.8
|
41.5
|
|
31.6
|
35.8
|
41.0
|
45.4
|
43.0
|
a Abbreviations are as follows: grapevine viruses A (GVA), B (GVB), D (GVD), E (GVE), F (GVF), G (GVG), H (GVH), I (GVI), J (GVJ), K (GVK), L (GVL), M (GVM), N (GVN), and O (GVO); Actinidia viruses A (AcVA) and B (AcVB); arracacha virus V (AVV); Agave tequilana leaf virus (ATLV); blackberry virus A (BVA); Heracleum latent virus (HLV); mint virus 2 (MV-2); untranslated region (UTR); replication-associated protein (RAP); coat protein (CP); hypothetical protein (HP); movement protein (MP); nucleic acid binding protein (NABP), and not analyzed (NA).
|
b GVA and GVE superclades are described by Maree et al. [20].
|
c The highest value is shown in bold for each of the regions.
|